Doodle – Desktop Search – Libextractor – References

10 June, 2005 at 12:18 Leave a comment

Doodle is a tool to quickly search the documents on a computer. Doodle builds an index using meta-data contained in the documents and allows fast searches on the resulting database. Doodle uses libextractor to support obtaining meta-data from various file-formats. The database used by doodle is a suffix tree, resulting in fast lookups. Doodle supports approximate searches.
Features that Doodle does not have at the moment include:

  • A web interface
  • Ordering of search results
  • Spidering (indexing the Internet or websites)

First the doodle database needs to be created. The simplest way to create the database is to run doodle with the -b option on the directories that are to be indexed. You can achieve a (limited) form of full-text search with doodle. For that, the dictionary-based plaintext extractors from libextractor are used. In order to use them, you need to pass the option -b LANG to doodle. LANG is a two letter language code that selects the dictionary. Available languages at the moment are en, es, fr, it and no. Words and sentences that are available in the respective dictionaries for these languages will then be added to the index. While libextractor attempts to avoid full-text extraction for certain kown binary formats, it may still find words in non-text files. Running with this option will dramatically increase the size of the index and the time it takes to build the index. Note that if you change the options used to build a database will not (!) result in doodle re-indexing files that were processed with other options previously. The only way to force doodle to re-index files with different options is to either touch the files (change modification timestamp) or to delete the old database and start from scratch. Finally, here are some links:

Advertisements

Entry filed under: Computers/ICT, Glasgow-Travails, Projects, Research, WebXP.

Portable GNOME Usability Lab – Interesting Linux Directions

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


Calendar

June 2005
M T W T F S S
« May   Jul »
 12345
6789101112
13141516171819
20212223242526
27282930  

Tweets


%d bloggers like this: