Welcome to the IBM OmniFind Yahoo! Edition Forum
August 20, 2008, 06:41:22 PM *
Welcome, Guest. Please login or register.

Login with username and password
News:
 
   Home   Help Search Login Register  
« previous next »
Pages: [1] Print
Author Topic: API to add new file types/filters and display filters  (Read 2814 times)
wadester
Newbie
*
Posts: 12


View Profile
« on: December 15, 2006, 09:53:14 AM »

A simple API to add new file types would be very nice.  For example if I wanted to add a legacy or proprietary binary file or a multimedia file like MP3 (id3 tags), a simple filter could be written to extract the text and provide it to the search engine (much like what Glimpse has provided for a very long time).

Another example might be a code file (e.g., abc.c) where you wish to run through a filter prior to indexing (e.g., only get comments) and wish to have a custom display filter (e.g., add colors to code, comments, strings, etc.).

1)  An contrib extension repository could be created and provided to users as-is
2)  A directory of extensions could be added
3)  An option on the manage system screen could be used to enable/disable extensions and maybe even change their priority

Outputs of the "file" command or equivalent could be used to trigger specific file filters:
    C/C++ > parsec.pl
    java -> parse_java.pl
    mp3 -> extract_id3

This would open up a huge class of applicaitons....
Logged
Jim Kovacs
Jr. Member
**
Posts: 77


View Profile
« Reply #1 on: December 15, 2006, 12:54:42 PM »

A simple API to add new file types would be very nice. 
<snip>
This would open up a huge class of applicaitons....

Rather than start a new thread:

a) Does OYE index metadata of files?

b) In particular, the EXIF (Exchangeable Image File Format) metadata of JPEG files?

If OYE can access that metadata, I would use it to help me find photos in the photo albums on my home PC (currently not running OYE). The Minolta DiMage Xt camera names the pictures PICT0001.jpg, etc. I organize the photos by creating folders ("Christmas 2006") and dumping the related pics in them. But let's say that I want to find all the pictures I took in July of 2005. Well, the pics may be spread out over a dozen or so folders (and I don't remember their names of course). Yeah, I could use a shell command to list only *.jpg with a given date _but_ the OS date may not equal the date stored in the EXIF metadata! So what to do? Well, with the EXIF metadata included in the index _and_ allowing a query string like `*.jpg AND exif_date EQUAL 2005-07', OYE would find exactly what I'm looking for. Voila!

Update: Wait! I want more! "onMouseOver" with a pop-up thumbnail of the photo!
« Last Edit: December 15, 2006, 03:54:54 PM by Jim Kovacs » Logged
andreas
Newbie
*
Posts: 34


View Profile
« Reply #2 on: December 15, 2006, 01:45:56 PM »

This involves several questions:
  • Does the crawler include images? No, image types such as JPG, GIF and PNG are excluded from the crawl
  • Do we extract meta data from binary files and make them searchable as fields? No, the only searchable fields are: title, doctype, url and language
  • Is search on dates supported? No there is no specific date handling; dates are treated like all other text.
These are interesting feature requests, but not supported today.
Logged
Sean
Administrator
Sr. Member
*****
Posts: 384


Product Manager / Hockey Goalie


View Profile
« Reply #3 on: December 18, 2006, 05:05:10 AM »

Jim,

We recommend you don't run IBM OmniFind Yahoo! Edition on your home PC to find your photos since it's not optimized for desktop search. If you have a server at home and want to index all the machines on your home network... then you'd be closer to the scenario we are building and optimizing the product for.

All that being said, there are some good requests about filters and display extensions. We've also considered onMouseOver preview of all search results, not just photos, as this feature appears to be all the rage in web search these days.

Thanks,
Sean
Logged
wadester
Newbie
*
Posts: 12


View Profile
« Reply #4 on: December 18, 2006, 08:52:56 AM »

For a more detailed search of data for a corporate intranet, other metadata could be quite useful.  How many corporate sites have pictures on their servers (real estate companies, developers, etc.)?  How many corporate sites have audio and video files (for marketing, training, corporate announcements).  What about saved e-mails on the server (searching enron's files)? 
Logged
Pages: [1] Print 
« previous next »
Jump to:  

IBM OmniFind Yahoo! Edition Forum | Powered by SMF 1.1.2.
© 2005, Simple Machines LLC. All Rights Reserved.