-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace the ConcurrentOpenHashMap as much as possible #23215
Comments
#12729 shows the Here is an example: final var map = new ConcurrentHashMap<String, String>();
map.computeIfAbsent("A", __ -> null);
System.out.println(map.size());
final var map2 = new ConcurrentOpenHashMap<String, String>();
map2.computeIfAbsent("A", __ -> null);
System.out.println(map2.size()); Outputs:
The caller of |
I added some benchmarks with different threads (by modifying the TL; DRThere is no reason to use computeIfAbsenthttps://gist.github.com/BewareMyPower/1083937e30cb0f5a63be74c4ee9c5559
2 threads
4 threads
8 threads
16 threads
Conclusion
gethttps://gist.github.com/BewareMyPower/b6254ec64932c3cf86b6ec7433a631da
2 threads
4 threads
8 threads
16 threads
32 threads
ConclusionWhen the number of threads is small, |
I support this initiative. |
Okay, let me split #23216 into multiple PRs later. |
#23217 is the first PR to replace the |
It might be a big changes. Is it possible to remain |
It will produce much more garbage code in Pulsar. The idea to reduce code changes seems good, but the
ConcurrentOpenHashMap.<TopicName,
PersistentOfflineTopicStats>newBuilder()./* ... */build(); You still need to keep the builder with all meaningless parameters like
public List<K> keys() {
List<K> keys = new ArrayList<>((int) size());
forEach((key, value) -> keys.add(key));
return keys;
}
public List<V> values() {
List<V> values = new ArrayList<>((int) size());
forEach((key, value) -> values.add(value));
return values;
}
So just let me split the huge PR into multiple relatively small PRs. |
Ok. It sounds we would better replace the whole |
Now all PRs are merged so that Pulsar 4.0.0 won't have any |
PR list
Search before asking
Motivation
There was a discussion at Sep. 2023 before to replace Customized Map with ConcurrentOpenHashMap. In this issue, I'd focus on the
ConcurrentOpenHashMap
.Here is the only advantage of this map.
Let me count the disadvantages.
1. Bad performance
This map was aded in the initial commit (on 2016). However, this implementation was just based on the Java 7 implementation of the
ConcurrentHashMap
, which uses a segment based lock. Actually, this solution was discarded by the Java team since Java 8.#20647 did a benchmark and found the performance was much worse than the current
ConcurrentHashMap
provided by Java library. We can also search the PROs of the Java 8 design in network, or just ask for ChatGPT.Besides, the frequently used
keys()
andvalues()
methods just copy the keys and values to a new list. While theConcurrentHashMap
just returns a thread-safe internal view that users can choose whether to make a copy.Anyway, to prove the performance is worse than
ConcurrentHashMap
, we need to have more tests and research. So it's the least important reason.2. Lack of the updates
This class was rarely updated. What I can remember is the shrink support two years ago. #14663
From apache/bookkeeper#3061, we can see the motivation is the frequently appeared Full GC caused by this implementation. However, adding a
shrink
method makes it harder to use. There are already many parameters to tune, see it's builder:Many
xxxFactor
s and the concurrency level. It's hard to determine a proper value by default. However, it makes new developers hard to modify it.3. Bad debug experience
When I debugged the topics maintained in a
BrokerService
.As you can see. There are 16 sections. And I have to iterate over all these sections and expand the
table
array to find the target topic.Let's compare it with the official
ConcurrentHashMap
(I replaced it locally)Besides, it's even harder to analyze in the heap dump.
4. Not friendly to new developers
Many places just use it as a concurrent hash map. What's the reason for new developers to not use the official
ConcurrentHashMap
, which is developed and consistently improved by a professional team? Just to reduce the node allocation? With the improving JVM GC?As I've mentioned, this class might be introduced at the Java 7 era. Now the minimum required Java version of broker side is 17. We have ZGC. We have Shenandoah GC. We have many more JVM developers developing better GC. I'm suspecting if the advantage makes sense.
I cannot think of a reason to choose this hard-to-maintain class rather than well-maintained official
ConcurrentHashMap
.For example, when I maintained KoP, I encountered the deadlock of
ConcurrentLongHashMap
(maybe the similar implementation). streamnative/kop#620 And it's hard to know if this case is fixed. So I have to switch to the officialConcurrentHashMap
.Solution
Replace
ConcurrentOpenHashMap
with the official JavaConcurrentHashMap
.Alternatives
N/A
Anything else?
Java Concurrent HashMap Improvements over the years https://medium.com/@vikas.taank_40391/java-concurrent-hashmap-improvements-over-the-years-8d8b7be6ce37
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: