Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk usage has increased in the nightly Logging benchmark #103002

Closed
ChrisHegarty opened this issue Dec 5, 2023 · 3 comments · Fixed by #103601
Closed

Disk usage has increased in the nightly Logging benchmark #103002

ChrisHegarty opened this issue Dec 5, 2023 · 3 comments · Fixed by #103601
Labels
>refactoring :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Milestone

Comments

@ChrisHegarty
Copy link
Contributor

ChrisHegarty commented Dec 5, 2023

Disk usage has increased in the nightly Logging benchmark [1] (on 2023-12-04), from 182GB to 190GB. A little more than 4%. This lines up with the bump to Lucene 9.9.0 in Elasticsearch.

Screenshot 2023-12-05 at 17 36 34

Lucene 9.9.0 has changed postings back to using FOR (from PFOR)[2] . This is the likely change that has resulted in the additional usage.

[1] https://elasticsearch-benchmarks.elastic.co/#tracks/logging/nightly/default/30d - look at nightly-elastic/logs-io
[2] apache/lucene#12741

Note from the Lucene 9.9.0 changelog: "GITHUB#12696: Change Postings back to using FOR in Lucene99PostingsFormat. Freqs, positions and offset keep using PFOR."

@ChrisHegarty ChrisHegarty added :Search/Search Search-related issues that do not fall into other categories >refactoring Team:Search Meta label for search team labels Dec 5, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@ChrisHegarty
Copy link
Contributor Author

Elasticsearch can add back PFOR encoding for posting, in our own codec.

@ChrisHegarty ChrisHegarty added this to the 8.12 milestone Dec 5, 2023
@martijnvg
Copy link
Member

Maybe a heuristic for using PFOR would be if an index is backing index of a data stream? (mapperService.mappingLookup().isDataStreamTimestampFieldEnabled() would do the trick)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>refactoring :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
3 participants