Lucene Search Engine

Lucene Search Engine

Lucene Search Engine

  • Open source search engine used in Sitecore CMS for indexing and searching Web site content
  • Builds its own indexes by scanning Sitecore items
  • Used by applications that require full text indexing and searching
  • Useful for local, single-site searching
  • Can be used to extend the indexing and search capabilities used on a Sitecore site

Lucene Search Components

  • Tokenizers generate output token streams by splitting text into chunks
    • Different analyzers use different tokenizers
    • KeywordAnalyzer doesn't split the text at all and takes the entire field as a single token
    • StandardAnalyzer and most other analyzers use spaces and punctuation as split points
  • Stemmers get the base of words
  • Stop-words filters remove the ordinary words from the token stream
    • Words like "a", "an", "the", "is",  etc. 
    • This is done to lower noise in the search results

Lucene Search Configuration

  • Sitecore 5.x Lucene configuration is stored under /sitecore/indexes in web.config
  • Sitecore 6.x Lucene configuration is stored under /sitecore/search cin web.config
    • Analyzer section defines the Lucene analyzer for analyzing and indexing content
    • Categories section defines categories (groupings) for search results
    • Configuration section stores index definitions in /sitecore/search/configuration/indexes
    • Location section specifies the location of the physical index files
    • Include section specifies the item types (templates) that should be included or excluded from the index
    • Tags section defines tags that can be used for indexed content during a search

Sitecore Lucene Search

  • A Sitecore shared-source module that is easy to implement
  • It is a simple way to implement Lucene search indexing for a Sitecore site
  • It requires little or no developer knowledge and minimum configuration
  • Lucene.Search module supports the newer Sitecore.Search API

Guidelines for Sitecore Lucene Search

  • Indexing of the web database is not set up by default
  • Try not to add unnecessary data to the indexes because they consume additional space
  • Updating indexes adds load on the processor
<configuration> <sitecore> <indexes>    <index id="system" singleInstance="true" type="Sitecore.Data.Indexing.Index, Sitecore.Kernel">      <param desc="name">$(id)</param>      <fields hint="raw:AddField">        <field target="created">__created</field>        <field target="updated">__updated</field>        <field target="author">__updated by</field>        <field target="published">__published</field>        <field target="name">@name</field>        <field storage="unstored">@name</field>        <field target="template" storage="keyword">@tid</field>        <field target="id" storage="unstored">@id</field>        <type storage="unstored">memo</type>        <type storage="unstored">text</type>        <type storage="unstored" stripTags="true">html</type>        <type storage="unstored" stripTags="true">rich text</type>      </fields>    </index>  </indexes>

Sources