Lucene 3.0: A really concise, functional index/search example

Why another Lucene tutorial? One problem I have with many of the Lucene tutorials out there is that they initially introduce the API using examples cluttered with lots of extraneous code that obscure the quite simple, basic API itself.

For example, in “Lucene in Action” the initial indexer example devotes quite a bit of code to traversing a directory tree and determining if files are indexible. As an aging Unix geek I prefer to let find, ls, xargs take care of that chore and just pipe a filename to a concise module, especially when the objective of the module is just to illustrate a particular point.

Oh, and over the years the Lucene developers can’t seem to resist the temptation to twiddle with things like important constant names even though the constant’s purpose hasn’t changed, breaking code and tutorials.

So here’s a simple, up-to-date example, compiled with lucene-core-3.0.2.jar:


/*
IndexFile: A simple indexer example for Lucene 3.0
Author: John Reece

To create an index or add a single text file to it:
	java IndexFile  

If  doesn't exist it will created, otherwise it will
be updated.
*/
import java.io.*;
import java.util.*;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.Directory;
import org.apache.lucene.util.Version;

public class IndexFile {
    public static void main (String [] args) throws Exception {
    Directory dir = FSDirectory.open(new File(args[0]));
    // The Version.LUCENE_XX is a required constructor
    // argument in Version 3.
    Analyzer analysis = new StandardAnalyzer(Version.LUCENE_30);
    // IndexWriter will intelligently open an index for appending
    // if the  index directory exists, else it will create a new
    //index directory.
    IndexWriter idx = new IndexWriter (dir,analysis,
                                 IndexWriter.MaxFieldLength.UNLIMITED);
    File f = new File(args[1]);
    Document doc = new Document();
    // Fields you want to display in toto in search results
    // need to be stored using the Field.Store.YES and
    // NOT_ANALYZED constants. The NOT_ANALYZED
    // constant has replaced UN_TOKENIZED from previous
    // versions.
    doc.add(new Field("name",f.getName(),Field.Store.YES,
                             Field.Index.NOT_ANALYZED));
    doc.add(new Field("path",f.getCanonicalPath(),Field.Store.YES,
    Field.Index.NOT_ANALYZED));
    doc.add(new Field("contents",new FileReader(f)));
    idx.addDocument(doc);
    idx.close();
    } // main
} // IndexFile 

Now for the corresponding search utility:



/*
Search: A simple, functional search example for Lucene 3.0
Author: John Reece
Usage: java Search
*/
import java.io.*;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.Directory;
import org.apache.lucene.util.Version;

public class Search {
	public static void main (String [] args) throws Exception {
		Directory dir = FSDirectory.open(new File(args[0]));
		IndexSearcher srch = new IndexSearcher(dir);
		// The Version constant is a new, required argument as of 3.0
		Analyzer std = new StandardAnalyzer(Version.LUCENE_30);
		QueryParser parser = new QueryParser(Version.LUCENE_30,
				"contents",
				std);
		Query query = parser.parse(args[1]);
		TopDocs hits = srch.search(query,3);
		System.out.printf("Found %d hits for <%s>.n", hits.totalHits,args[1]);
		for (ScoreDoc scoreDoc : hits.scoreDocs) {
			Document doc = srch.doc(scoreDoc.doc);
			System.out.printf("%5.3f %sn", scoreDoc.score ,doc.get("path"));
		}
		srch.close();
	}
} // Searcher