Apache lucene query exact match

5/6/2023

LUCENE-3233: fast SynonymFilter using an FST, including an optimization to the FST representation allowing array arcs even when some arcs have large outputs this resulted in a good speedup for Memor圜odec, which also speeds up the primary key lookup performance. Also switched to NRTCachingDirectory for the NRT test, so that small new segments are written only in RAM. Switched to Memor圜odec for the primary-key 'id' field so that lookups (either for PKLookup test or for deletions during reopen in the NRT test) are fast, with no IO. See this post for details.Īdded TermQuery, sorting by date/time and title fields.Īdded TermQuery, grouping by fields with 100, 10K, 1M unique values.Īdded Term (bgroup) and Term (bgroup, 1pass) using the BlockGroupingCollector for grouping into 1M unique groups. Increased number of indexing threads from 6 to 20 and dropped the IndexWriter RAM buffer from 512 MB to 350 MB. This results in exactly the same index structure (same segments, same docs per segment) from night to night, to avoid the added noise from change B. Ĭhanged how I build the index used for searching, to only use one thread.

Unfortunately, the index produced by concurrent flushing will vary, night to night, in how many segments it contains, so this is a further source of noise in the search results. Some queries did get slower, because the index now has more segments.

On highly concurrent hardware (the machine running these tests has 24 cores) this can result in a tremendous increase in Lucene's indexing throughput. Before this change, flushing a segment in IndexWriter was single-threaded and blocked all other indexing threads after this change, each indexing thread flushes its own segment without blocking indexing of other threads. Switched from a traditional spinning-magnets hard drive (Western Digital Caviar Green, 1TB) to a 240 GB OCZ Vertex III SSD this change gave a small increase in indexing rate, drastically reduced variance on the NRT reopen time (NRT is IO intensive), and didn't affect query performance (which is expected since the postings are small enough to fit into the OS's IO cache.Ĭoncurrent flushing, a major improvement to Lucene, was committed.

0 Comments

Apache lucene query exact match

Leave a Reply.

Author

Archives

Categories