- Change scoring to penalize a query in qrels without any results
- Limit scoring to top 1000 results
- Add qld support for PSQ
- Add patapsco-query command line tool for querying lucene index built with patapsco
- Fixed a bug with handling new lines in text normalization
- Collect memory and timing stats into log file for qsub
- Add additional escaping in PSQ query handling
- Don't write out scoring config to artifact configs
- Maintain separate full config and run config when using artifacts from previous runs
- Adds Lucene query parsing
- Fixes bug with batch mode when running partial pipelines
- Fixes bug with batch size larger than input iterable
- Adds PSQ
- Increases Java heap size
- Better support for using symbolic links as output directories
- Adds a character size limit for documents (1 million)
- Adds a rule based Farsi stemmer (parsivar)
- Collect warnings from qsub log files into base directory of run
- Updates to pyserini 0.13.0 which has fixed RM3
- Adds better logging of queries (query expansion and explanations)
- Handling recursion error in porter stemmer caused by bad MT decode
- Fixed arguments to reranker script so that ints and floats are accepted
- Not storing raw documents in Lucene to save space and memory
- Added flag to turn off text processing checks to support PSQ
- Added support for slurm
- Added code parameter to run.parallel for adding code to bash scripts
- Upped Java heap size to prevent out of memory errors due to large documents
- Fixed issue with using symbolic links as output directories
- Update stanza to 1.2.1 to fix a tokenization bug
- Fixes reranker shell input so that jsonl is used
- Adds a lang field to the top level of topic jsonl object
- Fixes creation of sqlite database for parallel jobs
- Using tabs to separate columns in scoring output to match trec_eval
- Adds support in retrieval for query likelihood model and pseudo relevance feedback
- Topics can now be filtered by lang_supported field
- Additional validation added for catching bad data in topics and qrels
- Symbolic links are resolved for input paths (documents, topics, qrels)
- Conda environment includes GPU version of pytorch
- Lucene index built with term vectors to support RM3
- Sqlite document database uses table name 'patapsco'
- Sqlite document database stores docs in JSON rather than pickles