Skip to content

Commit

Permalink
Release notes for v0.18.0 (#1275)
Browse files Browse the repository at this point in the history
  • Loading branch information
lintool authored Sep 26, 2022
1 parent 5fab143 commit 11124e9
Show file tree
Hide file tree
Showing 2 changed files with 65 additions and 8 deletions.
27 changes: 19 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,26 @@ For additional details, [our paper](https://dl.acm.org/doi/10.1145/3404835.34632

## Important Note: Lucene 8 to Lucene 9 Transition

The [PyPI release 0.17.1](https://pypi.org/project/pyserini/0.17.1/) at commit [`33c87c`](https://github.com/castorini/pyserini/commit/33c87c982d543d65e0ba1b4c94ee865fd9a6040e) (2022/08/13) is the last official Pyserini release built on Lucene 8, based on [Anserini v0.14.4](https://github.com/castorini/anserini/releases/tag/anserini-0.14.4).
Main Anserini trunk has been upgraded to Lucene 9.3 and the latest release, [Anserini v0.15.0](https://github.com/castorini/anserini/releases/tag/anserini-0.15.0), is built on that version.
tl;dr — Pyserini just underwent a transition from Lucene 8 to Lucene 9.
Main trunk is currently based on Lucene 9, but pre-built indexes are still based on Lucene 8.

This is an important but disruptive upgrade, as indexes built with Lucene 8 are not backwards compatible with Lucene 9 code (see [Anserini #1952](https://github.com/castorini/anserini/issues/1952)).
There is a workaround, but we have yet to implement in Pyserini.
Furthermore, Lucene 8 code is _not_ able to read indexes built with Lucene 9.
An upgrade to Lucene 9 is necessary to use Lucene's HNSW indexes, which will increase the capabilities of Pyserini and open up the design space of dense/sparse hybrids.
More details:

We are working hard on a corresponding Pyserini upgrade right now.
For a development installation, make sure you grab the `anserini-0.15.0-fatjar.jar` from [here](https://repo1.maven.org/maven2/io/anserini/anserini/0.15.0/) to drop into `pyserini/resources/jars` to make sure that you're using Lucene 9.
+ [PyPI v0.17.1](https://pypi.org/project/pyserini/0.17.1/) (commit [`33c87c`](https://github.com/castorini/pyserini/commit/33c87c982d543d65e0ba1b4c94ee865fd9a6040e), released 2022/08/13) is the last Pyserini release built on Lucene 8, based on [Anserini v0.14.4](https://github.com/castorini/anserini/releases/tag/anserini-0.14.4).
Thereafter, Anserini trunk was upgraded to Lucene 9.
+ [PyPI v0.18.0](https://pypi.org/project/pyserini/0.18.0/) (commit [`5fab14`](https://github.com/castorini/pyserini/commit/5fab143f64ed067ecf619c7d83ecd846aa494fbe), released 2022/09/26) is built on [Anserini v0.15.0](https://github.com/castorini/anserini/releases/tag/anserini-0.15.0), using Lucene 9.
Thereafter, Pyserini trunk advanced to Lucene 9.

**What's the impact?**
Indexes built with Lucene 8 are not fully compatible with Lucene 9 code (see [Anserini #1952](https://github.com/castorini/anserini/issues/1952)).
The workaround, which has been implemented in Pyserini, is to disable consistent tie-breaking.
This happens automatically if a Lucene 8 index is detected.
However, Lucene 9 code running on Lucene 8 indexes will give slightly different results than Lucene 8 code running on Lucene 8 indexes.
Since pre-built indexes are still based on Lucene 8, some experiments will exhibit small score differences.
Note that Lucene 8 code is _not_ able to read indexes built with Lucene 9.

**Why is this necessary?**
Although disruptive, an upgrade to Lucene 9 is necessary to take advantage of Lucene's HNSW indexes, which will increase the capabilities of Pyserini and open up the design space of dense/sparse hybrids.

## Installation

Expand Down Expand Up @@ -686,6 +696,7 @@ The following guides provide step-by-step instructions:

## Release History

+ v0.18.0 (w/ Anserini v0.15.0): September 26, 2022 [[Release Notes](docs/release-notes/release-notes-v0.18.0.md)] (First release based on Lucene 9)
+ v0.17.1 (w/ Anserini v0.14.4): August 13, 2022 [[Release Notes](docs/release-notes/release-notes-v0.17.1.md)] (Final release based on Lucene 8)
+ v0.17.0 (w/ Anserini v0.14.3): May 28, 2022 [[Release Notes](docs/release-notes/release-notes-v0.17.0.md)]
+ v0.16.1 (w/ Anserini v0.14.3): May 12, 2022 [[Release Notes](docs/release-notes/release-notes-v0.16.1.md)]
Expand Down
46 changes: 46 additions & 0 deletions docs/release-notes/release-notes-v0.18.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Pyserini Release Notes (v0.18.0)

+ **Release date:** September 26, 2022
+ **Anserini dependency:** v0.15.0

## Summary of Changes

+ Upgraded to Lucene 9 (w/ fixes to unit and integration tests).
+ Added automatic detection of Lucene 8 indexes and disables consistent tie-breaking to handle Lucene 8/9 incompatibilities (see [Anserini #1952](https://github.com/castorini/anserini/issues/1952)).
+ Improved support for dense vector encoding.
+ Fixed minor bug with `trec_eval`.

## Contributors

### This Release

Sorted by number of commits:

+ Jimmy Lin ([lintool](https://github.com/lintool))
+ Xinyu (Crystina) Zhang ([crystina-z](https://github.com/crystina-z))
+ Ogundepo Odunayo ([ToluClassics](https://github.com/ToluClassics))

### All Time

All contributors with five or more commits, sorted by number of commits, [according to GitHub](https://github.com/castorini/pyserini/graphs/contributors):

+ Jimmy Lin ([lintool](https://github.com/lintool))
+ Xueguang Ma ([MXueguang](https://github.com/MXueguang))
+ Yuqi Liu ([yuki617](https://github.com/yuki617))
+ Johnson Han ([x65han](https://github.com/x65han))
+ Stephanie Hu ([stephaniewhoo](https://github.com/stephaniewhoo))
+ Xinyu (Crystina) Zhang ([crystina-z](https://github.com/crystina-z))
+ Manveer Tamber ([manveertamber](https://github.com/manveertamber))
+ Arthur Chen ([ArthurChen189](https://github.com/ArthurChen189))
+ Jack Lin ([jacklin64](https://github.com/jacklin64))
+ Hang Li ([hanglics](https://github.com/hanglics))
+ Ronak Pradeep ([ronakice](https://github.com/ronakice))
+ Matt J. H. Yang ([justram](https://github.com/justram))
+ Chris Kamphuis ([Chriskamphuis](https://github.com/Chriskamphuis))
+ Habeeb Shopeju ([HAKSOAT](https://github.com/HAKSOAT))
+ Shengyao Zhuang ([ArvinZhuang](https://github.com/ArvinZhuang))
+ Sailesh Nankani ([saileshnankani](https://github.com/saileshnankani))
+ Xinyu Mavis Liu ([x389liu](https://github.com/x389liu))
+ Zeynep Akkalyoncu Yilmaz ([zeynepakkalyoncu](https://github.com/zeynepakkalyoncu))
+ Ogundepo Odunayo ([ToluClassics](https://github.com/ToluClassics))
+ Pepijn Boers ([PepijnBoers](https://github.com/PepijnBoers))

0 comments on commit 11124e9

Please sign in to comment.