Lucene Revolution Keynote – Marc Kellenstein

This week I’m back in SF and this time I’m attending the Lucene Revolution conference.  The conference kicked off with Marc Kellenstein emphatically saying, “It is easier to search than to browse.”  Ain’t that the truth.

Over the next few days I blog my notes from the sessions that I attend at the conference.  I hope they provide some insight for others and reminders for me!

Keynote Notes

Google was first to use spell checking against terms in the docs from the index rather than just a big dictionary.

Recall is the percent of relevant docs returned (50 available only 25 returned is 50%)

Precision is the percent returned that are relevant (100 returned, 25 relevant, 25% precise)

100% recall is easy but really are striving for 100% precision too, which is a lot harder to do.

Getting good recall

  • Use spell checking, synonyms to match users’ vocab
  • NLP
  • Normalize data
  • collect, index and search all data

Getting good precision

  • queries are too short (have users rank terms and use machine learning)
  • implicit relevance feedback is available but doubles search execution and no one really uses it although it should be considered
  • Watson or Google translate doesn’t use NLP but instead huge data set statistical analysis

Some history

  • Lucene created by Doug Cutting and Apache release in 2001, wide acceptance by 2005
  • Solr built in 2005 by Yonik Seeley for CNET; Apache release in 2006 and provide Lucene capabilities over http with faceting

Strengths:

  • Best segmented index (like Google)
  • Open Source
  • Great Community

Basic premise is to use Lucene/Solr since it is the best and it’s free.  It continues to innovate and have strong community support.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s