Java API access to Lucene queries

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Java API access to Lucene queries

Thomas Hallgren
I'm working on a plug-in for Nexus that uses an artifact type that needs to store some additional data in the Lucene index. I've been doing this by using custom IndexCreator and the 'attributes' map of the ArtifactInfo. This is fine for getting things into the index but I've found it to be a bit inconvenient when doing queries.

My main concern is that there seems to be no way of doing sorting until all ArtifactInfos have been retrieved. This will result in a very large memory consumption in my case since each ArtifactInfo is fairly big. Each entry also preallocates both a HashMap and an ArrayList. My use-case must support a large number of concurrent queries over a very large number of artifacts that results in a limited but sorted result.

Typically when working with a Lucene index, the code would pass a sort option and then retrieve only a very memory conservative array of ScoreDoc elements limited to some max number. This is exactly what I want. I found the DefaultIndexerManager.shared() method which seems to give me more or less exactly what I want. It hands me a locked IndexingContext that I can use to safely execute a Lucene query:

((DefaultIndexerManager) indexerManager).shared(repository, new DefaultIndexerManager.Runnable() {
        @Override
        public void run(IndexingContext ctx) throws IOException {
                final IndexSearcher indexSearcher = ctx.acquireIndexSearcher();
                try {
                        TopFieldDocs docs = indexSearcher.search(query, max, sort);
                        ...
                }
                finally {
                        ctx.releaseIndexSearcher(indexSearcher);
                }
        }
});

The only drawback that I see with this solution is that by using this IndexerManager implementation explicitly, I'm hacking into the Nexus code domain in a bad way. I would feel much happier if the shared() method and the associated Runnable interface was actually part of the IndexerManager interface. This leads me to two questions:

1. Why is this mechanism hidden in the implementation?
2. If I create a pull-request to make it public API, is there a chance that it would be accepted?

Thanks,
Thomas Hallgren