Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose index statistics #128

Merged
merged 3 commits into from
May 23, 2020
Merged

Expose index statistics #128

merged 3 commits into from
May 23, 2020

Conversation

PepijnBoers
Copy link
Contributor

Pyserini end of castorini/anserini#1218

@PepijnBoers PepijnBoers requested a review from lintool May 23, 2020 11:18
@@ -336,3 +336,22 @@ def convert_collection_docid_to_internal_docid(self, docid: str) -> int:
The Lucene internal ``docid`` corresponding to the external collection ``docid``.
"""
return self.object.convertDocidToLuceneDocid(self.reader, docid)

def stats(self) -> dict:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add type hints here?
and below, do some explicit type checking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All keys have type str so that would be possible but, the values vary between int and str, I'm not sure how to hint/type check without the code getting messy..

def stats(self) -> Dict[str, int or str]:

Looks kinda strange to me, thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the values be string too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the original printIndexStats in Anserini's IndexUtils also showed the stored fields as string, e.g. for the 'title' field:

(indexOption: DOCS, hasVectors: false)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I see.
Yea, def stats(self) -> dict: is fine I think.

Returns
-------
dict
Index statistics as a dictionary of statistic's name to statistic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to just list the statistics we provide? so user doesn't need to go hunting in the Java code?

@lintool lintool merged commit cdba0e9 into master May 23, 2020
@lintool lintool deleted the expose-index-stats branch May 23, 2020 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants