Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTR feature rename + docs #1496

Merged
merged 156 commits into from
Apr 4, 2021
Merged
Show file tree
Hide file tree
Changes from 155 commits
Commits
Show all changes
156 commits
Select commit Hold shift + click to select a range
09bb05d
cmd interface draft
nsndimt Sep 26, 2020
a048212
change pom.xml accordingly
Sep 27, 2020
3cdff3d
separate cli file
nsndimt Sep 28, 2020
c6ea60e
fix null pointer problem by remove static
nsndimt Sep 28, 2020
ac4a768
remove more static
nsndimt Sep 28, 2020
5b49dcd
add benchmark
nsndimt Oct 5, 2020
127ce6f
add benchmark
nsndimt Oct 5, 2020
8d0dcc7
fix benchmark
nsndimt Oct 5, 2020
7d6c76a
add idf caching to bm25
nsndimt Oct 5, 2020
eab462b
add idf caching to avgidf,tfidf
nsndimt Oct 5, 2020
1f405e6
Merge branch 'cmdline' of https://github.com/nsndimt/anserini into cm…
stephaniewhoo Oct 6, 2020
8d42ac5
add overall execution time and modified print format
stephaniewhoo Oct 6, 2020
84febdd
modify based on PR comments: indentation + list
stephaniewhoo Oct 7, 2020
c88cbbe
utils indentation + cli print debug msg format
stephaniewhoo Oct 8, 2020
c5e7167
ContentContext draft
nsndimt Oct 14, 2020
9fc26f5
ContentContext draft
nsndimt Oct 14, 2020
88b47bc
ContentContext QueryContext draft
nsndimt Oct 14, 2020
d5ab11a
ContentContext QueryContext draft
nsndimt Oct 15, 2020
a42fb58
ContentContext QueryContext draft
nsndimt Oct 15, 2020
d2c509e
rewrite count bigram
nsndimt Oct 16, 2020
b90aa6e
rewrite count bigram
nsndimt Oct 16, 2020
cb9fcd8
add pooler
nsndimt Oct 17, 2020
fd2e913
rewrite features SCQ, SCS and TFIDF
stephaniewhoo Oct 17, 2020
b472216
Merge branch 'more_feature' of https://github.com/nsndimt/anserini in…
stephaniewhoo Oct 17, 2020
d4f612e
add termFreq to ContentContext
stephaniewhoo Oct 17, 2020
b50a64e
move on without some features that will not be used immediately
nsndimt Oct 18, 2020
ca42a04
discuss tfidf implementation later;
nsndimt Oct 18, 2020
f70c477
refine name
nsndimt Oct 18, 2020
1e6f5d2
fix feature name;
nsndimt Oct 19, 2020
8a9a922
fix LMDir
nsndimt Oct 19, 2020
5d5661b
fix LMDir again
Oct 21, 2020
8b8b9e2
Pooler and Statistic Class
nsndimt Oct 21, 2020
0e921d1
fix avgictf, avgscq, varpooler
nsndimt Oct 25, 2020
beeb1eb
Proximity for bm25 Sum of BM25 query bigrams within an unordered wind…
stephaniewhoo Oct 25, 2020
7011660
Merge branch 'more_feature' of https://github.com/nsndimt/anserini in…
stephaniewhoo Oct 25, 2020
378f2a8
add feature DPH
yuki617 Oct 26, 2020
b1bfe33
fix avgictf;
nsndimt Oct 31, 2020
8883985
add tpscore feature
yuki617 Nov 1, 2020
43e4925
tpDist implementation
stephaniewhoo Nov 3, 2020
86b3d40
Merge branch 'master' into more_feature
nsndimt Nov 6, 2020
dd2ffaa
sdm draft
nsndimt Nov 6, 2020
d4093b5
implement SDM
nsndimt Nov 8, 2020
42c87f5
refactor field implementation
nsndimt Nov 8, 2020
020dad2
add getalldocids
nsndimt Nov 13, 2020
608d3f5
add fieldlevel stats
nsndimt Nov 16, 2020
e3ea020
add context feature
nsndimt Nov 16, 2020
aa2c884
stop + entropy feature
stephaniewhoo Nov 17, 2020
9b97259
merge conflict resolved
stephaniewhoo Nov 17, 2020
34669eb
add stop ratio
stephaniewhoo Nov 17, 2020
33903bb
bert as a feature draft
stephaniewhoo Nov 21, 2020
1c01e52
fix context feature
nsndimt Nov 21, 2020
0d14ad1
add two pooler
nsndimt Nov 21, 2020
9221af5
fix NTFIDF
nsndimt Nov 21, 2020
6b31226
update FeatureExtractorCli
nsndimt Nov 21, 2020
17b27c4
fix SDM,StopCover,StopRatio,Entropy
nsndimt Nov 22, 2020
bf4f7f3
fix NTFIDF
nsndimt Nov 23, 2020
1f81129
BM25 mean mean implementation
stephaniewhoo Nov 28, 2020
252124e
BM25 mean, min, max implementation and typo in bm25
stephaniewhoo Nov 29, 2020
0ba528d
fix typo
stephaniewhoo Nov 29, 2020
3cf6577
BM25mean with Pool
stephaniewhoo Dec 2, 2020
4bb1d1d
add BM25 min/max/hmean/var/conf feature
yuki617 Dec 5, 2020
8fb25fa
quartile
stephaniewhoo Dec 7, 2020
40d91f3
fix nan;
nsndimt Dec 12, 2020
c897664
fix compile error
nsndimt Dec 12, 2020
a9f2e1a
fix compile error
nsndimt Dec 12, 2020
fc1c350
Merge branch 'more_feature' of https://github.com/nsndimt/anserini in…
stephaniewhoo Dec 13, 2020
7a49815
merge BM25 pre-retrieval features
yuki617 Dec 14, 2020
0f4af41
fix a typo in bm25 pre-retrieval merge
yuki617 Dec 15, 2020
333733c
fix a bug on bm25 pre-retrieval feature
yuki617 Dec 16, 2020
a0d81f9
add field to entropy stopcover stopratio
nsndimt Dec 16, 2020
5ea9dc1
Merge branch 'more_feature' of https://github.com/nsndimt/anserini in…
nsndimt Dec 16, 2020
015ab40
change the order of parameter in BM25 pre-retrieval feature
yuki617 Dec 16, 2020
12f7519
Merge branch 'more_feature' of https://github.com/nsndimt/anserini in…
yuki617 Dec 16, 2020
e414256
add IBMModel1
nsndimt Dec 21, 2020
10f8313
add queryunlemma and querybert
yuki617 Dec 22, 2020
c6aef87
fix nulpinter bug
yuki617 Dec 23, 2020
7f55369
add IBMModel new analyzer
yuki617 Dec 23, 2020
cb3744f
add white space
nsndimt Dec 23, 2020
a081c08
fix query length;
nsndimt Dec 24, 2020
fa63825
fix query name
nsndimt Dec 24, 2020
bde038f
fix query name
nsndimt Dec 24, 2020
164d614
fix query json parsing
nsndimt Dec 24, 2020
3f7578a
fix path
Dec 24, 2020
3abb9fc
disable context feature
nsndimt Dec 24, 2020
db2860c
fix field is empty
nsndimt Dec 24, 2020
88790e5
fix zero in IBMModel1
nsndimt Dec 25, 2020
bc8d765
fix static which do not allow more than one IBMModel
nsndimt Dec 25, 2020
f6cdde1
fix static which do not allow more than one IBMModel
nsndimt Dec 25, 2020
85b383e
two level of query context refactoring
stephaniewhoo Dec 27, 2020
6277914
fix pre-retrieval bug
yuki617 Dec 29, 2020
9f48a62
fix pre-retrieval bug
yuki617 Dec 29, 2020
22db081
fix NAN;
nsndimt Dec 29, 2020
0bc9a3a
reverse back to the previous pre-retrieval logic
yuki617 Dec 29, 2020
d369221
merge
yuki617 Dec 29, 2020
264567e
fix small typo
yuki617 Dec 29, 2020
3ada0c5
fix DPH NaN bug
stephaniewhoo Dec 29, 2020
1fa062b
fix bm25Min;
nsndimt Dec 30, 2020
0e9210a
return byte array instead of json
nsndimt Jan 1, 2021
8fbb766
fix median pooler
nsndimt Jan 1, 2021
3b5839c
fix debugOutput
nsndimt Jan 2, 2021
a3d1aac
add entityJson string to document context
stephaniewhoo Jan 3, 2021
28cebe2
Merge branch 'more_feature' of https://github.com/nsndimt/anserini in…
stephaniewhoo Jan 3, 2021
bd67fd1
add entity map in document context
stephaniewhoo Jan 3, 2021
4650bf1
add entityRule implementation + query entities
stephaniewhoo Jan 4, 2021
a4878aa
switch to map old json format
stephaniewhoo Jan 4, 2021
703b6e2
rename filedcontext;
nsndimt Jan 4, 2021
a37cda3
add null check to pass test
stephaniewhoo Jan 4, 2021
ffad4d9
Merge branch 'more_feature' of https://github.com/nsndimt/anserini in…
nsndimt Jan 4, 2021
ce26fdd
rename filedcontext;
nsndimt Jan 4, 2021
b587d17
seperate diff entity rules and add query raw
stephaniewhoo Jan 4, 2021
1ee03ea
Merge branch 'more_feature' of https://github.com/nsndimt/anserini in…
stephaniewhoo Jan 4, 2021
0b70336
split match and count;
nsndimt Jan 4, 2021
cfcb9fd
remove unnecessary pre-retrieval computation;
nsndimt Jan 5, 2021
ec18c62
Merge branch 'more_feature' of https://github.com/nsndimt/anserini in…
stephaniewhoo Jan 7, 2021
a3261ce
fix add ibmmodel1
nsndimt Jan 10, 2021
338cd28
fix docsize feature name
nsndimt Jan 12, 2021
6bbed7a
Merge branch 'more_feature' of https://github.com/nsndimt/anserini in…
stephaniewhoo Jan 15, 2021
3568b3f
Merge branch 'master' of https://github.com/castorini/anserini into m…
stephaniewhoo Jan 15, 2021
8e9024b
fix according to pr comment and add license
stephaniewhoo Jan 15, 2021
73b3495
Merge branch 'master' of https://github.com/castorini/anserini into m…
stephaniewhoo Jan 18, 2021
c0bc666
Merge branch 'master' of https://github.com/castorini/anserini into m…
stephaniewhoo Jan 18, 2021
2b2ce0f
fix stopwords typo
stephaniewhoo Jan 18, 2021
c2b499d
Merge branch 'master' of https://github.com/castorini/anserini into m…
stephaniewhoo Jan 18, 2021
7b6ffc6
stat for features
stephaniewhoo Jan 28, 2021
57682f2
fix nan bug for features
stephaniewhoo Jan 28, 2021
2f4acdb
add pooler name in log and fix entrophy clone
stephaniewhoo Jan 28, 2021
248a3d8
typo fix
stephaniewhoo Jan 28, 2021
5d7031d
merge tfIdfStat scqStat;
nsndimt Jan 30, 2021
f3d299d
fix tfIdfStat name
nsndimt Jan 30, 2021
5b8a7a2
fix AvgPooler bug when list size == 0
nsndimt Jan 31, 2021
1e22ed0
update pre-retrieval subtraction
yuki617 Jan 31, 2021
a0528f2
Merge branch 'more_feature' of https://github.com/nsndimt/anserini in…
stephaniewhoo Feb 19, 2021
4a7f67a
debug config
stephaniewhoo Feb 19, 2021
ddb686e
Merge branch 'master' of https://github.com/castorini/anserini into m…
stephaniewhoo Mar 16, 2021
05eb4b7
add unit test for TFIDF
stephaniewhoo Mar 20, 2021
31329ef
Merge branch 'master' of https://github.com/castorini/anserini into m…
stephaniewhoo Mar 20, 2021
7e39b1b
test cases added
stephaniewhoo Mar 22, 2021
3b5b06f
Merge branch 'master' of https://github.com/castorini/anserini into m…
stephaniewhoo Mar 22, 2021
22205e3
only features in paper
stephaniewhoo Mar 23, 2021
fff216d
unit test cases
stephaniewhoo Mar 23, 2021
6a6491a
test cases
stephaniewhoo Mar 23, 2021
d21ecfe
remove json file
stephaniewhoo Mar 24, 2021
b3767b8
add unit test for tpscore and ibm
yuki617 Mar 26, 2021
a9be1a1
remove bm25 preretrieval
stephaniewhoo Mar 28, 2021
39a4a9d
Merge branch 'master' of https://github.com/castorini/anserini into p…
stephaniewhoo Mar 30, 2021
0e1bfe4
rename test for consistency and add docs for features
stephaniewhoo Mar 31, 2021
014c009
Merge branch 'master' into paper_features
stephaniewhoo Mar 31, 2021
c7a9ae1
rename for java convention
stephaniewhoo Apr 2, 2021
a11eaa8
Merge branch 'master' of https://github.com/castorini/anserini into p…
stephaniewhoo Apr 2, 2021
ffd658e
Merge branch 'paper_features' of https://github.com/nsndimt/anserini …
stephaniewhoo Apr 2, 2021
453f15f
change links based on renaming features
stephaniewhoo Apr 4, 2021
ea91210
links change
stephaniewhoo Apr 4, 2021
bdccabc
Merge branch 'master' into paper_features
stephaniewhoo Apr 4, 2021
849478d
test rename
stephaniewhoo Apr 4, 2021
736e232
Merge branch 'paper_features' of https://github.com/nsndimt/anserini …
stephaniewhoo Apr 4, 2021
7b80bea
Merge branch 'master' into paper_features
lintool Apr 4, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions docs/ltr-features.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# LTR Features
|Feature name |
|-------------------------------------------------|
|[IBM Model1](../src/main/java/io/anserini/ltr/feature/IbmModel1.java)
|[Sum of BM25](../src/main/java/io/anserini/ltr/feature/BM25Stat.java)
|Average of BM25
|Median of BM25
|Max of BM25
|Min of BM25
|MaxMinRatio of BM25
|[Sum of LMDir](../src/main/java/io/anserini/ltr/feature/LmDirStat.java)
|Average of LMDir
|Median of LMDir
|Max of LMDir
|Min of LMDir
|MaxMinRatio of LMDir
| [Sum of DFR\_GL2](../src/main/java/io/anserini/ltr/feature/DfrGl2Stat.java)
| Average of DFR\_GL2
| Median of DFR\_GL2
| Max of DFR\_GL2
| Min of DFR\_GL2
| MaxMinRatio of DFR\_GL2
| [Sum of DFR\_in\_expB2](../src/main/java/io/anserini/ltr/feature/DfrInExpB2Stat.java)
| Average of DFR\_in\_expB2
| Median of DFR\_in\_expB2
| Max of DFR\_in\_expB2
| Min of DFR\_in\_expB2
| MaxMinRatio of DFR\_in\_expB2
| [Sum of DPH](../src/main/java/io/anserini/ltr/feature/DphStat.java)
| Average of DPH
| Median of DPH
| Max of DPH
| Min of DPH
| MaxMinRatio of DPH
| [Sum of TF](../src/main/java/io/anserini/ltr/feature/TfStat.java)
| Average of TF
| Median of TF
| Max of TF
| Min of TF
| MaxMinRatio of TF
| [Sum of TFIDF](../src/main/java/io/anserini/ltr/feature/TfIdfStat.java)
| Average of TFIDF
| Median of TFIDF
| Max of TFIDF
| Min of TFIDF
| MaxMinRatio of TFIDF
| [Sum of Normalized TF](../src/main/java/io/anserini/ltr/feature/NormalizedTfStat.java)
| Average of Normalized TF
| Median of Normalized TF
| Max of Normalized TF
| Min of Normalized TF
| MaxMinRatio of Normalized TF
| [Sum of IDF](../src/main/java/io/anserini/ltr/feature/IdfStat.java)
| Average of IDF
| Median of IDF
| Max of IDF
| Min of IDF
| MaxMinRatio of IDF
| [Sum of ICTF](../src/main/java/io/anserini/ltr/feature/IcTfStat.java)
| Average of ICTF
| Median of ICTF
| Max of ICTF
| Min of ICTF
| MaxMinRatio of ICTFs
| [UnorderedSequentialPairs with gap 3](../src/main/java/io/anserini/ltr/feature/UnorderedSequentialPairs.java)
| UnorderedSequentialPairs with gap 8
| UnorderedSequentialPairs with gap 15
| [OrderedSequentialPairs with gap 3](../src/main/java/io/anserini/ltr/feature/OrderedSequentialPairs.java)
| OrderedSequentialPairs with gap 8
| OrderedSequentialPairs with gap 15
| [UnorderedQueryPairs with gap 3](../src/main/java/io/anserini/ltr/feature/UnorderedQueryPairs.java)
| UnorderedQueryPairs with gap 8
| UnorderedQueryPairs with gap 15
| [OrderedQueryPairs with gap 3](../src/main/java/io/anserini/ltr/feature/OrderedQueryPairs.java)
| OrderedQueryPairs with gap 8
| OrderedQueryPairs with gap 15
| [Normalized TFIDF](../src/main/java/io/anserini/ltr/feature/NormalizedTfIdf.java)
| [ProbabilitySum](../src/main/java/io/anserini/ltr/feature/ProbalitySum.java)
| [Proximity](../src/main/java/io/anserini/ltr/feature/Proximity.java)
| [BM25-TP score](../src/main/java/io/anserini/ltr/feature/TpScore.java)
| [TP distance](../src/main/java/io/anserini/ltr/feature/TpDist.java)
| [Doc size](../src/main/java/io/anserini/ltr/feature/DocSize.java)
| [Query Length](../src/main/java/io/anserini/ltr/feature/QueryLength.java)
| [Query Coverage Ratio](../src/main/java/io/anserini/ltr/feature/QueryCoverageRatio.java)
| [Unique Term Count in Query](../src/main/java/io/anserini/ltr/feature/UniqueTermCount.java)
| [Matching Term Count](../src/main/java/io/anserini/ltr/feature/MatchingTermCount.java)
| [SCS](../src/main/java/io/anserini/ltr/feature/SCS.java)
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package io.anserini.ltr.feature;
package io.anserini.ltr;

import java.util.List;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package io.anserini.ltr.feature;
package io.anserini.ltr;

import java.util.List;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package io.anserini.ltr.feature;
package io.anserini.ltr;

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package io.anserini.ltr.feature;
package io.anserini.ltr;

import io.anserini.index.IndexArgs;
import io.anserini.index.IndexReaderUtils;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package io.anserini.ltr.feature;
package io.anserini.ltr;

import java.io.FileNotFoundException;
import java.io.IOException;
Expand Down
168 changes: 79 additions & 89 deletions src/main/java/io/anserini/ltr/FeatureExtractorCli.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@

import com.fasterxml.jackson.databind.ObjectMapper;
import io.anserini.ltr.feature.*;
import io.anserini.ltr.feature.base.*;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
Expand All @@ -31,7 +30,6 @@
import org.kohsuke.args4j.CmdLineParser;
import org.kohsuke.args4j.Option;

//../indexes/lucene-index-msmarco-passage-doc-expanded-all
public class FeatureExtractorCli {
static class DebugArgs {
@Option(name = "-index", metaVar = "[path]", required = true, usage = "Lucene index directory")
Expand Down Expand Up @@ -91,66 +89,64 @@ public static void addFeature(FeatureExtractorUtils utils, String queryField, St
* utils.add(new Proximity(docField, queryField)); utils.add(new
* TPscore(docField, queryField));
*/
utils.add(new tpDist(docField, queryField));
/*
* utils.add(new DocSize(docField)); if (queryField == "analyzed" && docField ==
* "contents") { utils.add(new QueryLength(queryField)); utils.add(new
* QueryCoverageRatio(docField, queryField)); utils.add(new
* UniqueTermCount(queryField)); }
*
* utils.add(new MatchingTermCount(docField, queryField)); utils.add(new
* SCS(docField, queryField));
*
* utils.add(new tfStat(new AvgPooler(), docField, queryField)); utils.add(new
* tfStat(new MedianPooler(), docField, queryField)); utils.add(new tfStat(new
* SumPooler(), docField, queryField)); utils.add(new tfStat(new MinPooler(),
* docField, queryField)); utils.add(new tfStat(new MaxPooler(), docField,
* queryField)); utils.add(new tfStat(new MaxMinRatioPooler(), docField,
* queryField));
*
* utils.add(new tfIdfStat(true, new AvgPooler(), docField, queryField));
* utils.add(new tfIdfStat(true, new MedianPooler(), docField, queryField));
* utils.add(new tfIdfStat(true, new SumPooler(), docField, queryField));
* utils.add(new tfIdfStat(true, new MinPooler(), docField, queryField));
* utils.add(new tfIdfStat(true, new MaxPooler(), docField, queryField));
* utils.add(new tfIdfStat(true, new MaxMinRatioPooler(), docField,
* queryField));
*
* utils.add(new normalizedTfStat(new AvgPooler(), docField, queryField));
* utils.add(new normalizedTfStat(new MedianPooler(), docField, queryField));
* utils.add(new normalizedTfStat(new SumPooler(), docField, queryField));
* utils.add(new normalizedTfStat(new MinPooler(), docField, queryField));
* utils.add(new normalizedTfStat(new MaxPooler(), docField, queryField));
* utils.add(new normalizedTfStat(new MaxMinRatioPooler(), docField,
* queryField));
*
* utils.add(new idfStat(new AvgPooler(), docField, queryField)); utils.add(new
* idfStat(new MedianPooler(), docField, queryField)); utils.add(new idfStat(new
* SumPooler(), docField, queryField)); utils.add(new idfStat(new MinPooler(),
* docField, queryField)); utils.add(new idfStat(new MaxPooler(), docField,
* queryField)); utils.add(new idfStat(new MaxMinRatioPooler(), docField,
* queryField));
*
* utils.add(new ictfStat(new AvgPooler(), docField, queryField)); utils.add(new
* ictfStat(new MedianPooler(), docField, queryField)); utils.add(new
* ictfStat(new SumPooler(), docField, queryField)); utils.add(new ictfStat(new
* MinPooler(), docField, queryField)); utils.add(new ictfStat(new MaxPooler(),
* docField, queryField)); utils.add(new ictfStat(new MaxMinRatioPooler(),
* docField, queryField));
*
* utils.add(new UnorderedSequentialPairs(3, docField, queryField));
* utils.add(new UnorderedSequentialPairs(8, docField, queryField));
* utils.add(new UnorderedSequentialPairs(15, docField, queryField));
* utils.add(new OrderedSequentialPairs(3, docField, queryField)); utils.add(new
* OrderedSequentialPairs(8, docField, queryField)); utils.add(new
* OrderedSequentialPairs(15, docField, queryField)); utils.add(new
* UnorderedQueryPairs(3, docField, queryField)); utils.add(new
* UnorderedQueryPairs(8, docField, queryField)); utils.add(new
* UnorderedQueryPairs(15, docField, queryField)); utils.add(new
* OrderedQueryPairs(3, docField, queryField)); utils.add(new
* OrderedQueryPairs(8, docField, queryField)); utils.add(new
* OrderedQueryPairs(15, docField, queryField));
*/
utils.add(new TpDist(docField, queryField));

utils.add(new DocSize(docField));
if (queryField == "analyzed" && docField == "contents"){
utils.add(new QueryLength(queryField));
utils.add(new QueryCoverageRatio(docField, queryField));
utils.add(new UniqueTermCount(queryField)); }

utils.add(new MatchingTermCount(docField, queryField));
utils.add(new SCS(docField, queryField));

utils.add(new TfStat(new AvgPooler(), docField, queryField));
utils.add(new TfStat(new MedianPooler(), docField, queryField));
utils.add(new TfStat(new SumPooler(), docField, queryField));
utils.add(new TfStat(new MinPooler(), docField, queryField));
utils.add(new TfStat(new MaxPooler(), docField, queryField));
utils.add(new TfStat(new MaxMinRatioPooler(), docField, queryField));

utils.add(new TfIdfStat(true, new AvgPooler(), docField, queryField));
utils.add(new TfIdfStat(true, new MedianPooler(), docField, queryField));
utils.add(new TfIdfStat(true, new SumPooler(), docField, queryField));
utils.add(new TfIdfStat(true, new MinPooler(), docField, queryField));
utils.add(new TfIdfStat(true, new MaxPooler(), docField, queryField));
utils.add(new TfIdfStat(true, new MaxMinRatioPooler(), docField, queryField));

utils.add(new NormalizedTfStat(new AvgPooler(), docField, queryField));
utils.add(new NormalizedTfStat(new MedianPooler(), docField, queryField));
utils.add(new NormalizedTfStat(new SumPooler(), docField, queryField));
utils.add(new NormalizedTfStat(new MinPooler(), docField, queryField));
utils.add(new NormalizedTfStat(new MaxPooler(), docField, queryField));
utils.add(new NormalizedTfStat(new MaxMinRatioPooler(), docField, queryField));

utils.add(new IdfStat(new AvgPooler(), docField, queryField));
utils.add(new IdfStat(new MedianPooler(), docField, queryField));
utils.add(new IdfStat(new SumPooler(), docField, queryField));
utils.add(new IdfStat(new MinPooler(), docField, queryField));
utils.add(new IdfStat(new MaxPooler(), docField, queryField));
utils.add(new IdfStat(new MaxMinRatioPooler(), docField, queryField));

utils.add(new IcTfStat(new AvgPooler(), docField, queryField));
utils.add(new IcTfStat(new MedianPooler(), docField, queryField));
utils.add(new IcTfStat(new SumPooler(), docField, queryField));
utils.add(new IcTfStat(new MinPooler(), docField, queryField));
utils.add(new IcTfStat(new MaxPooler(), docField, queryField));
utils.add(new IcTfStat(new MaxMinRatioPooler(), docField, queryField));

utils.add(new UnorderedSequentialPairs(3, docField, queryField));
utils.add(new UnorderedSequentialPairs(8, docField, queryField));
utils.add(new UnorderedSequentialPairs(15, docField, queryField));
utils.add(new OrderedSequentialPairs(3, docField, queryField));
utils.add(new OrderedSequentialPairs(8, docField, queryField));
utils.add(new OrderedSequentialPairs(15, docField, queryField));
utils.add(new UnorderedQueryPairs(3, docField, queryField));
utils.add(new UnorderedQueryPairs(8, docField, queryField));
utils.add(new UnorderedQueryPairs(15, docField, queryField));
utils.add(new OrderedQueryPairs(3, docField, queryField));
utils.add(new OrderedQueryPairs(8, docField, queryField));
utils.add(new OrderedQueryPairs(15, docField, queryField));

}

Expand All @@ -168,16 +164,16 @@ public static void main(String[] args) throws IOException, ExecutionException, I
}

FeatureExtractorUtils utils = new FeatureExtractorUtils(cmdArgs.indexDir, cmdArgs.threads);
// addFeature(utils, "analyzed", "contents");
// addFeature(utils, "analyzed", "predict");
// addFeature(utils, "text_unlemm", "text_unlemm");
addFeature(utils, "analyzed", "contents");
addFeature(utils, "analyzed", "predict");
addFeature(utils, "text_unlemm", "text_unlemm");
addFeature(utils, "text_bert_tok", "text_bert_tok");

// addFeature(utils,"text","text");
// addFeature(utils,"text_unlemm","text_unlemm");
// addFeature(utils,"text_bert_tok","text_bert_tok");
// System.out.println("Load IBM Models");
// utils.add(new
addFeature(utils,"text","text");
addFeature(utils,"text_unlemm","text_unlemm");
addFeature(utils,"text_bert_tok","text_bert_tok");
//System.out.println("Load IBM Models");
//utils.add(new
// IBMModel1("../FlexNeuART/collections/msmarco_doc/derived_data/giza/title_unlemm",
// "text_unlemm",
// "title_unlemm", "text_unlemm"));
Expand Down Expand Up @@ -218,7 +214,6 @@ public static void main(String[] args) throws IOException, ExecutionException, I
while (qids.size() > 0) {
lastQid = qids.remove(0);
List<debugOutput> outputArray = utils.getDebugResult(lastQid);
// System.out.println(String.format("Qid:%s\tLine:%d",lastQid,offset));
for (debugOutput res : outputArray) {
for (int i = 0; i < names.size(); i++) {
time[i] += res.time.get(i);
Expand All @@ -238,7 +233,6 @@ public static void main(String[] args) throws IOException, ExecutionException, I
while (qids.size() > 0) {
lastQid = qids.remove(0);
List<debugOutput> outputArray = utils.getDebugResult(lastQid);
// System.out.println(String.format("Qid:%s\tLine:%d",lastQid,offset));
for (debugOutput res : outputArray) {
for (int i = 0; i < names.size(); i++) {
time[i] += res.time.get(i);
Expand All @@ -251,24 +245,20 @@ public static void main(String[] args) throws IOException, ExecutionException, I
throw e;
}
}
// long executionEnd = System.nanoTime();
// long sumtime = 0;
// for(int i = 0; i < names.size(); i++){
// sumtime += time[i];
// }
// for(int i = 0; i < names.size(); i++){
// System.out.println(names.get(i)+" takes
// "+String.format("%.2f",time[i]/1000000000.0) + "s, accounts for "+
// String.format("%.2f", time[i]*100.0/sumtime) + "%");
// }
long executionEnd = System.nanoTime();
long sumtime = 0;
for(int i = 0; i < names.size(); i++){
sumtime += time[i];
}
for(int i = 0; i < names.size(); i++){
System.out.println(names.get(i)+" takes "+String.format("%.2f",time[i]/1000000000.0) + "s, accounts for "+
String.format("%.2f", time[i]*100.0/sumtime) + "%");
}
utils.close();
reader.close();
//
// long end = System.nanoTime();
// long overallTime = end - start;
// long overhead = overallTime-(executionEnd - executionStart);
// System.out.println("The program takes
// "+String.format("%.2f",overallTime/1000000000.0) + "s, where the overhead
// takes " + String.format("%.2f",overhead/1000000000.0) +"s");
long end = System.nanoTime();
long overallTime = end - start;
long overhead = overallTime-(executionEnd - executionStart);
System.out.println("The program takes "+String.format("%.2f",overallTime/1000000000.0) + "s, where the overhead takes " + String.format("%.2f",overhead/1000000000.0) +"s");
}
}
6 changes: 0 additions & 6 deletions src/main/java/io/anserini/ltr/FeatureExtractorUtils.java
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,9 @@
package io.anserini.ltr;

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import io.anserini.index.IndexArgs;
import io.anserini.ltr.feature.*;
import io.anserini.ltr.feature.base.*;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.logging.log4j.core.tools.picocli.CommandLine;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package io.anserini.ltr.feature;
package io.anserini.ltr;

import java.util.List;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package io.anserini.ltr.feature;
package io.anserini.ltr;

import java.util.List;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* limitations under the License.
*/

package io.anserini.ltr.feature;
package io.anserini.ltr;

import java.util.Collections;
import java.util.List;
Expand Down
Loading