]project-open[ : @This Wiki
Portrait

Welcome, Unregistered Visitor

Contact ·  Index · Login · Register
 

Contents

Remote Training
Support
SaaS Service

7 registered users
 in last 24 hours

]po[ Full-Text File Search

Forum Discussions File Storage Email Integration Idea Management Notes, URLs, Addresses etc. Full-Text Search Engine Calendar Wiki with Media-Wiki Syntax Collaboration & Knowledge Management Surveys, 360°, Customer Satisfaction, etc.

(Please click on the hexagons for more information)


Provides full-text indexing for filenames and files in the ]po[ filestorages. Uses a number of external filters to periodically scan the ]po[ file storage for new files, builds up a Full Text Index and allows the user to retrieve the files using the normal search interface.

Required Software

intranet-search-pg-files requires the following software to extract indexable strings from different file formats:

  • CatDoc: /usr/local/bin/catdoc
  • HTMLtoTxt /usr/bin/html2text
  • wvText: /usr/bin/wvText (optional)

Basic Operation

The package will periodically (default: every 5 minutes) check a maximum number of objects (default: 100) for new files. Please see below for the parameters controlling the indexing behaviour.

This scheduled behaviour is necessary in order to balance the desire for fast indexing with the considerable load that full text indexing will pose on your database.

Supported File Types

  • txt, text, perl, php, sql:
    These files are considered to consist fully of indexable text.
  • doc:
    We use CatDoc to extract strings from Microsoft Word format
  • htm, html, xml, asp:
    We use HTMLtoText to extract the indexable text from these files.
  • The following extensions are explicitely ignored:
    • Image files: gif, jpg, pgp, bmp, png, wav, mp3, ico
    • File types without reasonable converter: xls, rtf (may be added later)
    • Other files: log, bz2, zip, tar, tgz, rar, gz, js, mso, exe 

 To add new file type please see ~/packages/intranet-search-pg-files-procs.tcl and search for "intranet_search_pg_files_fti_content". Very basic TCL skills are sufficient to add a new converter once you have the converter running on the shell level.

Administration & Control

To control indexing please see the page http://<your_server>/intranet-search-pg-files/. In this page you can see the files found by the indexer and you can re-index certain business objects.

Please see the error log at ~/log/error.log for detailed messages.

Parameters

  • IndexerMaxFiles - 100
    Limit indexer activity to MaxFiles. You can determine this parameter by dividing the number of files in your intranet (example: 30.000) by the time interval (in seconds) to check all files (for example: 24*60*60 for 1 day) and multiplying with the SearchIndexerInterval (example: 300). You have to make sure that the indexer can handle MaxFiles in SearchIndexerInterval, otherwise the system may get overload.
  • SearchIndexerInterval - 300
    Run the search indexer every X seconds
  • IndexFileContentsP - 1
    Should we index the _contents_ of a file, in addition to its filename?
    Disable this parameter if you are running a translation business, because your file contents are related to your customers, but not to your own business (in general). Set the parameter to 1 if you are interested in the contents of your files.


References

Related Packages

Related Modules

Related Software

  • PostgreSQL  - we use the TSearch2 engine from PostgreSQL for full text indexing

Package Documentation

Kind: Publicity:
[Library Files | Procedures | SQL Files | Content Pages] [Public Only | All]

TCL Libraries

tcl/intranet-search-pg-files-procs.tcl       File Search Library 

TCL Procedures

im_package_intranet_pg_files_id       Returns the package id of the intranet-search-pg-files module 
intranet_search_pg_files_fti_content       Extract and normalize the file contents - using a best effort attempt using variuos filters 
intranet_search_pg_files_index_all       Index the entire server 
intranet_search_pg_files_index_object       Index the files of a single object such as a project, company or user. 
intranet_search_pg_files_search_indexer       Index the entire server. 

SQL Files

sql/postgresql/intranet-search-pg-files-create.sql        
sql/postgresql/intranet-search-pg-files-drop.sql        
sql/postgresql/upgrade/upgrade-3.4.0.1.0-3.4.0.2.0.sql        

Content Files

www/
      index.adp
      index.tcl Show files that are not indexed by the FTS
      reindex-biz-object.tcl Show files that are not indexed by the FTS
 

 


Please take a moment to complete this form to help us improve our service.

Note:
Please only provide feedback in regards to content this page shows. For support inquiries please refer either to the Community Support forum at Sourceforge or check out our 'Professional Support'

Did this page help you to achieve your goal?

 Yes  No  Don't know

Please provide us with comments to improve this page:

How useful is the information?

 1  2  3  4  5
Not
useful
      Extremely
useful
 
  

Explore

Installers
Demo Server
Modules & Functionality
Packages
Business Processes supported
FAQ's

Help

Getting started
User Manuals
Configuration Manuals
Community Support
Professional Support

News

News
Twitter
RSS Community / Sourceforge
Register for Newsletter

Get in touch

Contact
Register



Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic License - Privacy Policy