Skip to content

Commit

Permalink
Add to onboarding reproduction logs + instructions for Windows (#2583)
Browse files Browse the repository at this point in the history
  • Loading branch information
setarehbabajani authored Aug 30, 2024
1 parent 859c7bb commit e0a9578
Show file tree
Hide file tree
Showing 3 changed files with 35 additions and 2 deletions.
20 changes: 20 additions & 0 deletions bin/run.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
@echo off
REM This script is a Windows equivalent of bin/run.sh.
REM It finds the latest fatjar and runs the specified Anserini command

setlocal enabledelayedexpansion

cd /d "%~dp0"

REM Locating the latest fatjar in the target directory
for /f "delims=" %%f in ('dir /b /o-n ..\target\*-fatjar.jar 2^>nul') do (
set FATJAR=..\target\%%f
goto :found
)

echo No fatjar found in target directory!
exit /b 1

:found
REM Running the specified command using the found fatjar with memory and module settings
java -cp "!FATJAR!" -Xms512M -Xmx64G --add-modules jdk.incubator.vector %*
14 changes: 13 additions & 1 deletion docs/experiments-msmarco-passage.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
This page contains instructions for running BM25 baselines on the [MS MARCO *passage* ranking task](https://microsoft.github.io/msmarco/).
Note that there is a separate [MS MARCO *document* ranking task](experiments-msmarco-doc.md).
This exercise will require a machine with >8 GB RAM and >15 GB free disk space .
If you're using a Windows machine, equivalent commands are provided alongside the Unix-like (Linux/macOS) commands.

If you're a Waterloo student traversing the [onboarding path](https://github.com/lintool/guide/blob/master/ura.md), [start here](start-here.md
).
Expand Down Expand Up @@ -101,6 +102,12 @@ bin/run.sh io.anserini.index.IndexCollection \
-generator DefaultLuceneDocumentGenerator \
-threads 9 -storePositions -storeDocvectors -storeRaw
```
For Windows:
```bash
bin\run.bat io.anserini.index.IndexCollection -collection JsonCollection -input collections\msmarco-passage\collection_jsonl -index indexes\msmarco-passage\lucene-index-msmarco -generator DefaultLuceneDocumentGenerator -threads 9 -storePositions -storeDocvectors -storeRaw
```



In this case, Lucene creates what is known as an **inverted index**.

Expand All @@ -125,6 +132,10 @@ bin/run.sh io.anserini.search.SearchCollection \
-parallelism 4 \
-bm25 -bm25.k1 0.82 -bm25.b 0.68 -hits 1000
```
For Windows:
```bash
bin\run.bat io.anserini.search.SearchCollection -index indexes\msmarco-passage\lucene-index-msmarco -topics collections\msmarco-passage\queries.dev.small.tsv -topicReader TsvInt -output runs\run.msmarco-passage.dev.small.tsv -format msmarco -parallelism 4 -bm25 -bm25.k1 0.82 -bm25.b 0.68 -hits 1000
```

This is the **retrieval** (or **search**) phase.
We're performing retrieval _in batch_, on a set of queries.
Expand Down Expand Up @@ -507,4 +518,5 @@ The BM25 run with default parameters `k1=0.9`, `b=0.4` roughly corresponds to th
+ Results reproduced by [@daisyyedda](https://github.com/daisyyedda) on 2024-08-02 (commit [`3885b5c`](https://github.com/castorini/anserini/commit/3885b5c25178d2a88fc3b953d572b518ef0d1da6))
+ Results reproduced by [@natek-1](https://github.com/natek-1) on 2024-08-05 (commit [`b467d4a`](https://github.com/castorini/anserini/commit/b467d4ade64ba99810b554bfa47655958b9477b2))
+ Results reproduced by [@emily-emily](https://github.com/emily-emily) on 2024-08-15 (commit [`28a98d0`](https://github.com/castorini/anserini/commit/28a98d05d1d379cd9133fce151779e2f312b3806))
+ Results reproduced by [@npjd](https://github.com/npjd) on 2024-08-17 (commit [`46b6834`](https://github.com/castorini/anserini/commit/46b68345b0ee614f511b87c9f66cee399e1308c5))
+ Results reproduced by [@npjd](https://github.com/npjd) on 2024-08-17 (commit [`46b6834`](https://github.com/castorini/anserini/commit/46b68345b0ee614f511b87c9f66cee399e1308c5))
+ Results reproduced by [@setarehbabajani](https://github.com/setarehbabajani) on 2024-08-30 (commit [`859c7bb`](https://github.com/castorini/anserini/commit/859c7bbadd39693e5890a758e89135c04ab811ee))
3 changes: 2 additions & 1 deletion docs/start-here.md
Original file line number Diff line number Diff line change
Expand Up @@ -402,4 +402,5 @@ If you think this guide can be improved in any way (e.g., you caught a typo or t
+ Results reproduced by [@daisyyedda](https://github.com/daisyyedda) on 2024-08-02 (commit [`3885b5c`](https://github.com/castorini/anserini/commit/3885b5c25178d2a88fc3b953d572b518ef0d1da6))
+ Results reproduced by [@natek-1](https://github.com/natek-1) on 2024-08-05 (commit [`b467d4a`](https://github.com/castorini/anserini/commit/b467d4ade64ba99810b554bfa47655958b9477b2))
+ Results reproduced by [@emily-emily](https://github.com/emily-emily) on 2024-08-14 (commit [`28a98d0`](https://github.com/castorini/anserini/commit/28a98d05d1d379cd9133fce151779e2f312b3806))
+ Results reproduced by [@npjd](https://github.com/npjd) on 2024-08-17 (commit [`46b6834`](https://github.com/castorini/anserini/commit/46b68345b0ee614f511b87c9f66cee399e1308c5))
+ Results reproduced by [@npjd](https://github.com/npjd) on 2024-08-17 (commit [`46b6834`](https://github.com/castorini/anserini/commit/46b68345b0ee614f511b87c9f66cee399e1308c5))
+ Results reproduced by [@setarehbabajani](https://github.com/setarehbabajani) on 2024-08-29 (commit [`859c7bba`](https://github.com/castorini/anserini/commit/859c7bbadd39693e5890a758e89135c04ab811ee))

0 comments on commit e0a9578

Please sign in to comment.