14 Oct 13:02

javanna

eadc07c

10.0.0 Latest

Latest

System requirements

Lucene 10.0 requires JDK 21 or newer

API changes

KNN vector values now have a random-access API.
Deprecated APIs have been removed and a number of API changes have been made. Please consult the migrate guide for an extensive list and actions to take to migrate to 10.0.

New Features

A new IndexInput#prefetch API has been added, allowing query evaluation logic to let the Directory know about regions of data that are about to be read. This helps perform I/O concurrently under the hood. MMapDirectory implements this API using the madvise system call and the MADV_WILLNEED flag on Linux and Mac OS.
Lucene now supports sparse indexing on doc values via FieldType#setDocValuesSkipIndexType. The sparse index will record the minimum and maximum values per block of doc IDs. Used in conjunction with index sorting to cluster similar documents together, this allows for very space-efficient and CPU-efficient filtering.
Search concurrency is now decoupled from the index geometry, so that an index can be searched using any number of threads, regardless of its number of segments.
Kmeans clustering on vectors

Improvements

Lucene now opens files with the MADV_RANDOM advice by default on Linux and Mac OS. This results in better efficiency for indexes that exceed the size of the page cache, but can make it slower to load indexes in the page cache. It is possible to revert to the MADV_NORMAL read advice by default by passing -Dorg.apache.lucene.store.defaultReadAdvice=NORMAL as a JVM startup flag.
Snowball dictionaries have been upgraded, resulting in improved tokenization. This may require reindexing to ensure consistency of search results with pre-10.0 indexes.
The expressions module is now using MethodHandles and Dynamic Class-File Constants (JEP 309) in combination with hidden classes (JEP 371) to implement a strict and type-safe call to external functions. This allows to easier extend expressions with custom functions in secure way because runtime linking of custom functions is no longer the responsibility of the expressions scripting engine. In addition, the hidden classes created by the expressions engine no longer suffer from global classloader locks.

... plus a multitude of helpful bug fixes!

Assets 2

28 Sep 20:19

ChrisHegarty

releases/lucene/9.12.0

e913796

9.12.0

Security Fixes

Deserialization of Untrusted Data vulnerability in Apache Lucene Replicator - CVE-2024-45772

New Features

Improve intra-merge parallelism for many value types. (Ben Trent)
Add support JDK 23 to the Panama Vectorization Provider. (Chris Hegarty)

Improvements

Add Intervals.regexp and Intervals.range methods to produce IntervalsSource for regexp and range queries. (Mayya Sharipova)
Remove support for writing 8 bit scalar vector quantization. 4 and 7 bit quantization are still supported (Michael McCandless )

Optimizations

Inline postings skip data to improve performance of queries that need skipping such as conjunctions. (Adrien Grand)
Optimizations to the decoding logic of blocks of postings. (Adrien Grand, Uwe Schindler, Greg Miller)
Avoid performance degradation with closing shared mapped segment data (Chris Hegarty, Michael Gibney, Uwe Schindler)

... plus a multitude of helpful bug fixes!

Assets 2

27 Jun 13:46

iverase

releases/lucene/9.11.1

0c087df

9.11.1

Bug Fixes

Fix performance regression in NumericComparator.
Remove intra-merge parallelism for everything except HNSW graph merges.
Fix bug that prevented adding a parent field to an index with no fields.
Fix IndexOutOfBoundsException thrown in DefaultPassageFormatter by unordered matches.
StringValueFacetCounts stops throwing NPE when faceting over an empty match-set.

Assets 2

06 Jun 14:29

benwtrent

releases/lucene/9.11.0

d433394

9.11.0

New features

Add support for posix_madvise to MMapDirectory: If running on Linux/macOS and Java 21 or later, MMapDirectory uses IOContext to pass suitable MADV flags to kernel of operating system. This may improve paging logic especially when working with large indexes under memory pressure.
Expand support for new scalar bit levels for HNSW vectors. This includes 4-bit vectors and an option to compress them to gain a 50% reduction in memory usage.
Recursive graph bisection is now supported on indexes that have blocks

Improvements

MergeScheduler can now provide an executor for intra-merge parallelism. The first implementation is the ConcurrentMergeScheduler.
Upgrade icu4j to version 74.2.

Optimizations

Use RWLock to access LRUQueryCache to reduce contention.
Speedup multi-segment HNSW graph search for diversifying child kNN queries.
Add a MemorySegment Vector scorer - for scoring without copying on-heap. This can improve search latency by almost 2x for byte vectors.
Switch to using optimized, primitive collections where possible to improve performance and heap utilization.

Full Changelog: releases/lucene/9.10.0...releases/lucene/9.11.0

Assets 2

20 Feb 17:21

jpountz

releases/lucene/9.10.0

695c0ac

9.10.0

New Features

Support for similarity-based vector searches, ie. finding all nearest neighbors whose similarity is greater than a configured threshold from a query vector. See [Byte|Float]VectorSimilarityQuery.
Index sorting is now compatible with block joins. See IndexWriterConfig#setParentField.
MMapDirectory now takes advantage of the now finalized JDK foreign memory API internally when running on Java 22 (or later). This was only supported with Java 19 to 21 until now.
SIMD vectorization now takes advantage of JDK vector incubator on Java 22. This was only supported with Java 20 or 21 until now.

Optimizations

Tail postings are now encoded using group-varint. This yielded speedups on queries that match lots of terms that have short postings lists in Lucene's nightly benchmarks.
Range queries on points now exit earlier when evaluating a segment that has no matches. This will improve performance when intersected with other queries that have a high up-front cost such as multi-term queries.
BooleanQueries that mix SHOULD and FILTER clauses now propagate minimum competitive scores to the SHOULD clauses, yielding significant speedups for top-k queries sorted by descending score.
IndexSearcher#count has been optimized on pure disjunctions of two term queries.

Assets 2