-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to mmapfs within hybridfs #8508
Conversation
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
server/src/main/java/org/opensearch/common/settings/IndexScopedSettings.java
Outdated
Show resolved
Hide resolved
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #8508 +/- ##
============================================
- Coverage 71.19% 71.12% -0.08%
- Complexity 57455 57476 +21
============================================
Files 4777 4777
Lines 270712 270746 +34
Branches 39566 39572 +6
============================================
- Hits 192729 192555 -174
- Misses 61782 62052 +270
+ Partials 16201 16139 -62
|
@navneet1v, +1!! That seems right to me! I think the @msokolov I'm curious if you have thoughts about mmapping Lucene's |
@nknize - In addition to that, this should always have been the default behavior of hybrid directory. Given it replaced the MMapDirectory until 6.x.
My bad if I suggested so, but "Lucene" does not do the opposite (it uses MMap by default on all supported instance types). HybridDirectory is a concept introduced by Elasticsearch in 7.0, that was plagued with several bugs until 7.4.
We are not doing that with this PR!? The only difference being default is mmap instead of nio for unspecified files |
RIght... that's what I suggest we don't do.. if Lucene introduces a read once and forget file that we're not explicitly specifying, why would we default to mmap it? That doesn't make sense. If Lucene introduces a hot file that stays in the page cache, then we should explicitly MMap that file in the corresponding Lucene upgrade PR. |
@jainankitk That was the purpose of this comment in the old codebase:
I think that was lost in some OpenSearch PRs. |
It won't matter as the page cache will eventually evict that file. And, since it was just read once it cannot cause page cache thrashing |
Hi @nknize yes it makes sense to me to use MMAP for these vector files (.vec and .vex), and shouldn't be needed for .vem although I wonder if it would really make any difference to include those as well? |
Also, just an operational anecdote from managed service. I don't remember seeing any customer issues due to file being mmapped instead of niofs. But, remember several issues the other way due to FST Offheap (.tip) file being offheap in 7.1. @backslasht @Bukhtawar @shwetathareja @itiyamas @rishabhmaurya - Do you remember any managed service customer issue root caused to page cache thrashing and using niofs instead of mmap solved the issue? |
@mikecan - We have discussed this a bit offline as well. What do you think about keeping mmap as default fallback instead of niofs? Lucene uses mmap everywhere right? |
@msokolov we did try to add .vem file to mmap, but we went against that approach as we were not seeing the performance gains atleast on our benchmarks of 1M sift-128 dataset |
This is true. However,
+1 That's a good data point. However I had several customer issues where search time "randomly" took 60ms (violating a 50ms QoS requirement) because of cache misses on some shards (appearing random). It would be a shame if an increase in evictions was a cause? But there's always tradeoffs... Do we have any concrete testing to justify the default change? The point of #3837 was to make the mmapped files configurable so a user could always just add it. I know they can do the same here (just the other way around) so it's not a one way door, nor a hill I'd die on.. |
I wish we had concrete root cause for this. There are lot of other variables as well at play like gc, query type, etc.
The default change is primarily based on performance degradation, and no complains with 6.8 whatsoever. I also recently realized this, I had been blaming offheap all along, while it was working absolutely fine for Lucene. Of course, niofs to blame for this issue
Yeah, I am also in favor of keeping mmap default and revisit if it causes any regressions (don't expect any based on Lucene and 6.8 experience). Also in the short term, the default Opensearch behavior is not changing at all. Hopefully, it will keep someone like me from multi-year wild goose chase, just because I missed BufferedIndexInput in the flame graph (thanks @mikecan!) |
We do. It's an AWS issue and the related open source issue that was opened is #1536
I think your talking about Daniel's commit here: elastic/elasticsearch@f0052b1 But that came in 7.0.0 and also defaults to MMap for hot index files while leaving metadata files to nio. Or are you talking about @mikecan's issue elastic/elasticsearch#16983 which also was primarily focused on the compound file format (hot index files)? I'm not sure this is reason to default everything to MMap? But it's true that small files could just get evicted so I'm not sure there's a strong justification either way. In those circumstances I like to take a more conservative approach and lean into good benchmark testing before changing defaults. That's why I suggest we revert this for 2.10 and lean into benchmark testing for 2.11. Update:
Oh I see the motivation now. That chase leading to default everything MMap and only revisit nio if necessary. That makes sense and thanks for the context! Hmmm. I'm not sure I'm completely convinced on that blanket change as a prevention mechanism? But y'all are running this at some good scale so maybe there's some numbers that can help guide the choice a bit better? @mikecan do you think performance hit due to any cache evictions from mmapping files by default are negligible? We'd only hit this if Lucene introduces new files that aren't explicitly mmapped, which shouldn't be often. |
* Default to mmapfs within hybridfs Signed-off-by: David Zane <davizane@amazon.com> * Add index setting validation func Signed-off-by: David Zane <davizane@amazon.com> * Reviewer comments Signed-off-by: David Zane <davizane@amazon.com> * Clean up, mmap.extensions validation Signed-off-by: David Zane <davizane@amazon.com> * Deprecation flag, build all_ext list Signed-off-by: David Zane <davizane@amazon.com> * Make nioExtensions unmodifiable Signed-off-by: David Zane <davizane@amazon.com> --------- Signed-off-by: David Zane <davizane@amazon.com> Signed-off-by: Kiran Reddy <kkreddy@amazon.com>
We are not doing that, the default behavior for all existing file extensions is same as before. Its just instead of specifying files to mmap, user specifies files to nio, else it is mmapped. The default list of nio files ensures no behavior change between say 2.9 (without this PR) and 2.10 (with the PR)
What are we benchmarking against? Since the behavior has not changed, benchmark won't show anything!? |
Sorry. I should've said "I'm not sure this is reason to default everything <that's not explicitly marked> as mmapped". I know the change does not switch all files to be mmapped. It's selecting
Right!! That's partly my point. Why make the change at all then if we don't show any benefit? Alternatively I think we can possibly simulate something? Maybe a micro benchmark that introduces periodic page cache evictions and measures an impact on search? Maybe that will be too contrived? Not sure but this is a good discussion to have before we decide to release it as a core change. |
The rationale here is that in the hypothetical situation of a new type of Lucene file, it's better to err on the side of mmap than nio. It's hard to quantify that because hypothetical situations can be constructed either way. The one concrete data point we have from @jainankitk is #825 where "err on the side of mmap" would have been the better behavior. I think the ideal situation here is that HybridFsDirectory would have the ability to detect a previously unknown Lucene file, and then cause a failure somewhere in the CI pipeline (unit or integration tests or nightly benchmarks or (worst case) release testing). Then we would be forced to make an explicit choice about how to treat this file and the default behavior would not matter. Is that feasible? I think we'd want that behavior in any case because it is better to make an informed decision versus relying on the default. |
There are 2 sides to the what if. And IMO, other side is much more disastrous. Also, given Lucene itself uses MMapDirectory by default, the other way is more likely possibility as it would be completely untested by Lucene microbenchmarks.
While I agree on this change not being urgent, I am not sure if we will have any micro benchmark in near future. It does seem too contrived! :) Also, I will rephrase as no short term benefit, which is the case with most code refactoring/rewrite
+1. Although, even while we are making the explicit choice (taking time to run benchmarks/tests, etc.), we need default fallback. And IMO, mmap is still better than nio in most cases for that duration |
Except we don't have anything to substantiate this claim. So making the change is based on conjecture. There are Elasticsearch issues to substantiate the reason for selecting certain files to NIOFs as well.. otherwise everything would be MMap in Elasticsearch and this would be a non-issue. So I"m not so quick to take that assumption on blind faith. I like to lean into actual testing to substantiate. Paper cuts add up fast... that doesn't seem to be the sentiment lately for some strange reason.
Why? I think calling it disastrous is a stretch given that MMap is still a configurable option. Remember, we can still set files to default to MMap when we upgrade lucene versions. Whether we decide to keep this blind change or not. I think this project needs to start leaning into doing a better job substantiating performance claims before unilaterally changing on faith. |
@andrross, to your point
It was suggested on pull request #8508 (comment) but not taken forward |
@nknize , @jainankitk , @reta @andrross There is long discussion happening on this feature and I am not sure if I am able to understand all of it. But one thing which atleast I am looking here is, are we reverting this change or not. Given that the release for 2.10 is in few days, and it has impact on the builds and performance. I want to make sure that no surprises are seen at the end during the release stage. Next steps as per my understanding: |
@navneet1v since there are concerns with the a) feature itself b) the approach that led to this particular implementation, in my opinion reverting it before the 2.10.0 release and restarting the discussion on what is the way to implement it taken all the risks / gains into account is the way to go, @nknize @andrross @jainankitk please share you opinion |
I gave this bit more thought and mmap should be kept as fallback default in hybrid directory for following reasons (probably mentioned separately, collating them together):
To me, mmap seems better fallback choice when we don't know much about the file access pattern. Specific users that are resource constrained can always use niofs explicitly.
Fair enough, disastrous is strong word. I take that back. Probably, still hurting from FST offheap! :(
I think this towards both zstd compression and this issue. Although, they are much different things. As per me, falling back to niofs for hybrid directory was a mistake in Elasticsearch 7.0 itself. I do not see any convincing data points in any of the PRs or issues. Recently learnt about it, and trying to address it for long term. |
I am okay with either approach. Although, we should agree on concerns and what data points we are looking for. Also, given this is settings addition / removal PR, committing / reverting / re-committing, sounds tricky to me. But maybe, it is simpler than how it seems in my head. |
In addition to the fallback option, we can log warning whenever new file extension is encountered. It should force us to evaluate every “new” file we encounter. Thank you @andrross for the logging suggestion. |
This is all mechanical work. It may be annoying, but it is simple. The new setting is much more of a one way door though. Once it goes out into the 2.10 release, we're committing to support it through at least the 3.x releases.
Agreed we likely will get limited coverage in unit and integration tests. We'd probably want to add something to nightly benchmarks to fail or cut issues on warnings in log statements.
This is the key point for me. What happens to the resource constrained user if nio was actually the better choice but we mmap'd it? If the answer is that performance is a little bit worse, then mmap-by-default is reasonable. But if there is real risk of the OS OOM-killer killing the node, then maybe nio by default is the better choice. @jainankitk What do you think? |
That's a good point!
There is no risk of OOM-killer afaik. Just the performance is worse as OS spends more time figuring out if there is empty page. The same is confirmed by this comment in ES PR |
* Default to mmapfs within hybridfs Signed-off-by: David Zane <davizane@amazon.com> * Add index setting validation func Signed-off-by: David Zane <davizane@amazon.com> * Reviewer comments Signed-off-by: David Zane <davizane@amazon.com> * Clean up, mmap.extensions validation Signed-off-by: David Zane <davizane@amazon.com> * Deprecation flag, build all_ext list Signed-off-by: David Zane <davizane@amazon.com> * Make nioExtensions unmodifiable Signed-off-by: David Zane <davizane@amazon.com> --------- Signed-off-by: David Zane <davizane@amazon.com> Signed-off-by: Kaushal Kumar <ravi.kaushal97@gmail.com>
* Default to mmapfs within hybridfs Signed-off-by: David Zane <davizane@amazon.com> * Add index setting validation func Signed-off-by: David Zane <davizane@amazon.com> * Reviewer comments Signed-off-by: David Zane <davizane@amazon.com> * Clean up, mmap.extensions validation Signed-off-by: David Zane <davizane@amazon.com> * Deprecation flag, build all_ext list Signed-off-by: David Zane <davizane@amazon.com> * Make nioExtensions unmodifiable Signed-off-by: David Zane <davizane@amazon.com> --------- Signed-off-by: David Zane <davizane@amazon.com> Signed-off-by: Ivan Brusic <ivan.brusic@flocksafety.com>
* Default to mmapfs within hybridfs Signed-off-by: David Zane <davizane@amazon.com> * Add index setting validation func Signed-off-by: David Zane <davizane@amazon.com> * Reviewer comments Signed-off-by: David Zane <davizane@amazon.com> * Clean up, mmap.extensions validation Signed-off-by: David Zane <davizane@amazon.com> * Deprecation flag, build all_ext list Signed-off-by: David Zane <davizane@amazon.com> * Make nioExtensions unmodifiable Signed-off-by: David Zane <davizane@amazon.com> --------- Signed-off-by: David Zane <davizane@amazon.com> Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Description
Currently OpenSearch code contains an explicit list of file extensions which load using mmap from hybridfs. Other file extensions default to nio. This PR flips the logic to keep a list of nio file extensions while all others default to mmap. This will prevent any future regressions in case Lucene adds new segment file type that should be using mmap.
Related Issues
Resolves #8297
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.