Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the ConcurrentOpenHashMap as much as possible #23215

Closed
6 tasks done
BewareMyPower opened this issue Aug 23, 2024 · 9 comments
Closed
6 tasks done

Replace the ConcurrentOpenHashMap as much as possible #23215

BewareMyPower opened this issue Aug 23, 2024 · 9 comments
Assignees
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Comments

@BewareMyPower
Copy link
Contributor

BewareMyPower commented Aug 23, 2024

PR list

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

There was a discussion at Sep. 2023 before to replace Customized Map with ConcurrentOpenHashMap. In this issue, I'd focus on the ConcurrentOpenHashMap.

Here is the only advantage of this map.

Provides similar methods as a {@code ConcurrentMap<K,V>} but since it's an open hash map with linear probing, no node allocations are required to store the values.

Let me count the disadvantages.

1. Bad performance

This map was aded in the initial commit (on 2016). However, this implementation was just based on the Java 7 implementation of the ConcurrentHashMap, which uses a segment based lock. Actually, this solution was discarded by the Java team since Java 8.

#20647 did a benchmark and found the performance was much worse than the current ConcurrentHashMap provided by Java library. We can also search the PROs of the Java 8 design in network, or just ask for ChatGPT.

Besides, the frequently used keys() and values() methods just copy the keys and values to a new list. While the ConcurrentHashMap just returns a thread-safe internal view that users can choose whether to make a copy.

Anyway, to prove the performance is worse than ConcurrentHashMap, we need to have more tests and research. So it's the least important reason.

2. Lack of the updates

This class was rarely updated. What I can remember is the shrink support two years ago. #14663

From apache/bookkeeper#3061, we can see the motivation is the frequently appeared Full GC caused by this implementation. However, adding a shrink method makes it harder to use. There are already many parameters to tune, see it's builder:

    public static class Builder<K, V> {
        int expectedItems = DefaultExpectedItems;
        int concurrencyLevel = DefaultConcurrencyLevel;
        float mapFillFactor = DefaultMapFillFactor;
        float mapIdleFactor = DefaultMapIdleFactor;
        float expandFactor = DefaultExpandFactor;
        float shrinkFactor = DefaultShrinkFactor;
        boolean autoShrink = DefaultAutoShrink;

Many xxxFactors and the concurrency level. It's hard to determine a proper value by default. However, it makes new developers hard to modify it.

3. Bad debug experience

When I debugged the topics maintained in a BrokerService.

image

As you can see. There are 16 sections. And I have to iterate over all these sections and expand the table array to find the target topic.

Let's compare it with the official ConcurrentHashMap (I replaced it locally)

image

Besides, it's even harder to analyze in the heap dump.

4. Not friendly to new developers

Many places just use it as a concurrent hash map. What's the reason for new developers to not use the official ConcurrentHashMap, which is developed and consistently improved by a professional team? Just to reduce the node allocation? With the improving JVM GC?

As I've mentioned, this class might be introduced at the Java 7 era. Now the minimum required Java version of broker side is 17. We have ZGC. We have Shenandoah GC. We have many more JVM developers developing better GC. I'm suspecting if the advantage makes sense.

I cannot think of a reason to choose this hard-to-maintain class rather than well-maintained official ConcurrentHashMap.

For example, when I maintained KoP, I encountered the deadlock of ConcurrentLongHashMap (maybe the similar implementation). streamnative/kop#620 And it's hard to know if this case is fixed. So I have to switch to the official ConcurrentHashMap.

Solution

Replace ConcurrentOpenHashMap with the official Java ConcurrentHashMap.

Alternatives

N/A

Anything else?

Java Concurrent HashMap Improvements over the years https://medium.com/@vikas.taank_40391/java-concurrent-hashmap-improvements-over-the-years-8d8b7be6ce37

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@BewareMyPower BewareMyPower added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Aug 23, 2024
@BewareMyPower BewareMyPower self-assigned this Aug 23, 2024
@BewareMyPower
Copy link
Contributor Author

BewareMyPower commented Aug 24, 2024

#12729 shows the ConcurrentOpenHashMap allows null values. It adds a removeNullValue public method to handle the null values. This is very error-prone.

Here is an example:

        final var map = new ConcurrentHashMap<String, String>();
        map.computeIfAbsent("A", __ -> null);
        System.out.println(map.size());

        final var map2 = new ConcurrentOpenHashMap<String, String>();
        map2.computeIfAbsent("A", __ -> null);
        System.out.println(map2.size());

Outputs:

0
1

The caller of ConcurrentOpenHashMap#computeIfAbsent needs to manually check the return value and call removeNullValue to handle the case a null value is inserted. What a terrible API design :(

@BewareMyPower
Copy link
Contributor Author

I added some benchmarks with different threads (by modifying the BenchmarkState.numThreads field)

TL; DR

There is no reason to use ConcurrentOpenHashMap for the sake of performance.

computeIfAbsent

https://gist.github.com/BewareMyPower/1083937e30cb0f5a63be74c4ee9c5559

  1. Generate 10000 keys in range [0, 100) in advance
  2. Call computeIfAbsent in N threads on all keys

2 threads

Benchmark                                             Mode  Cnt     Score      Error  Units
ConcurrentHashMapTest.testConcurrentOpenHashMap      thrpt    5  2875.969 ±  189.894  ops/s
ConcurrentHashMapTest.testOfficialConcurrentHashMap  thrpt    5  5252.836 ± 1312.977  ops/s
ConcurrentHashMapTest.testSynchronizedHashMap        thrpt    5   996.854 ±   21.998  ops/s

ConcurrentHashMap is 1.8x faster than the ConcurrentOpenHashMap.

4 threads

Benchmark                                             Mode  Cnt     Score      Error  Units
ConcurrentHashMapTest.testConcurrentOpenHashMap      thrpt    5  1460.855 ±  152.361  ops/s
ConcurrentHashMapTest.testOfficialConcurrentHashMap  thrpt    5  3750.708 ± 1324.286  ops/s
ConcurrentHashMapTest.testSynchronizedHashMap        thrpt    5   270.024 ±   11.736  ops/s

ConcurrentHashMap is 2.5x faster than the ConcurrentOpenHashMap.

8 threads

Benchmark                                             Mode  Cnt     Score     Error  Units
ConcurrentHashMapTest.testConcurrentOpenHashMap      thrpt    5   648.256 ± 278.038  ops/s
ConcurrentHashMapTest.testOfficialConcurrentHashMap  thrpt    5  2383.684 ± 663.103  ops/s
ConcurrentHashMapTest.testSynchronizedHashMap        thrpt    5   130.503 ±   4.253  ops/s

ConcurrentHashMap is 3.6x faster than ConcurrentOpenHashMap.

16 threads

Benchmark                                             Mode  Cnt     Score     Error  Units
ConcurrentHashMapTest.testConcurrentOpenHashMap      thrpt    5   259.008 ± 204.781  ops/s
ConcurrentHashMapTest.testOfficialConcurrentHashMap  thrpt    5  1443.469 ± 368.230  ops/s
ConcurrentHashMapTest.testSynchronizedHashMap        thrpt    5    59.875 ±   2.984  ops/s

ConcurrentHashMap is 5.5x faster than ConcurrentOpenHashMap

Conclusion

ConcurrentHashMap performs definitely much better. With the number of threads increasing, the gap becomes much larger.

get

https://gist.github.com/BewareMyPower/b6254ec64932c3cf86b6ec7433a631da

  1. Generate 5000 random keys (might be duplicated) in range [0, 10000) and insert them
  2. Call get on key from 0 to 10000 (inclusive) in N threads

2 threads

Benchmark                                                Mode  Cnt     Score      Error  Units
ConcurrentHashMapGetTest.testConcurrentOpenHashMap      thrpt    5  3741.910 ±  206.337  ops/s
ConcurrentHashMapGetTest.testOfficialConcurrentHashMap  thrpt    5  5807.285 ± 1896.396  ops/s
ConcurrentHashMapGetTest.testSynchronizedHashMap        thrpt    5   870.238 ±   66.370  ops/s

4 threads

Benchmark                                                Mode  Cnt     Score      Error  Units
ConcurrentHashMapGetTest.testConcurrentOpenHashMap      thrpt    5  3485.099 ±  269.270  ops/s
ConcurrentHashMapGetTest.testOfficialConcurrentHashMap  thrpt    5  3777.292 ± 1120.207  ops/s
ConcurrentHashMapGetTest.testSynchronizedHashMap        thrpt    5   256.690 ±   16.778  ops/s

8 threads

Benchmark                                                Mode  Cnt     Score     Error  Units
ConcurrentHashMapGetTest.testConcurrentOpenHashMap      thrpt    5  2318.738 ± 135.056  ops/s
ConcurrentHashMapGetTest.testOfficialConcurrentHashMap  thrpt    5  2568.236 ± 828.029  ops/s
ConcurrentHashMapGetTest.testSynchronizedHashMap        thrpt    5   119.749 ±   2.051  ops/s

16 threads

Benchmark                                                Mode  Cnt     Score     Error  Units
ConcurrentHashMapGetTest.testConcurrentOpenHashMap      thrpt    5  1711.945 ± 128.223  ops/s
ConcurrentHashMapGetTest.testOfficialConcurrentHashMap  thrpt    5  1459.255 ± 431.960  ops/s
ConcurrentHashMapGetTest.testSynchronizedHashMap        thrpt    5    59.398 ±   2.085  ops/s

32 threads

Benchmark                                                Mode  Cnt     Score     Error  Units
ConcurrentHashMapGetTest.testConcurrentOpenHashMap      thrpt    5   863.625 ±  36.957  ops/s
ConcurrentHashMapGetTest.testOfficialConcurrentHashMap  thrpt    5  1001.633 ± 222.766  ops/s
ConcurrentHashMapGetTest.testSynchronizedHashMap        thrpt    5    28.798 ±   4.943  ops/s

Conclusion

When the number of threads is small, ConcurrentHashMap has better performance though they are similar. It should be noted that when the number of threads is 16, ConcurrentOpenHashMap performs better. It might also because the default concurrency level is 16 for ConcurrentOpenHashMap. With 32 threads, ConcurrentHashMap performs better again.

@eolivelli
Copy link
Contributor

I support this initiative.
It is a little bit scary to change this everywhere, maybe we can have a few smaller PRs ?

@BewareMyPower
Copy link
Contributor Author

Okay, let me split #23216 into multiple PRs later.

@BewareMyPower
Copy link
Contributor Author

#23217 is the first PR to replace the ConcurrentOpenHashMap in load manager. However, it's a bug fix rather than an improvement because the current use of the concurrent hash map is wrong.

@geniusjoe
Copy link

It might be a big changes. Is it possible to remain ConcurrentOpenHashMap interface, and use ConcurrentHashMap as ConcurrentOpenHashMap interface realization? So that we may only need to modify ConcurrentOpenHashMap file.

@BewareMyPower
Copy link
Contributor Author

It might be a big changes. Is it possible to remain ConcurrentOpenHashMap interface, and use ConcurrentHashMap as ConcurrentOpenHashMap interface realization? So that we may only need to modify ConcurrentOpenHashMap file.

It will produce much more garbage code in Pulsar.

The idea to reduce code changes seems good, but the ConcurrentOpenHashMap is not fully compatible with ConcurrentHashMap. Let me count one by one.

  1. Constructor changes.
                ConcurrentOpenHashMap.<TopicName,
                        PersistentOfflineTopicStats>newBuilder()./* ... */build();

You still need to keep the builder with all meaningless parameters like concurrencyLevel, which is confusing.

  1. Non-standard keys() and values() APIs
    public List<K> keys() {
        List<K> keys = new ArrayList<>((int) size());
        forEach((key, value) -> keys.add(key));
        return keys;
    }

    public List<V> values() {
        List<V> values = new ArrayList<>((int) size());
        forEach((key, value) -> values.add(value));
        return values;
    }

ConcurrentHashMap does not provide these methods because of the overhead of copying keys or values into a new collection.

  1. The removeNullValue method, which is really wrong and should not exist.

So just let me split the huge PR into multiple relatively small PRs.

@geniusjoe
Copy link

geniusjoe commented Sep 19, 2024

It will produce much more garbage code in Pulsar.

The idea to reduce code changes seems good, but the ConcurrentOpenHashMap is not fully compatible with ConcurrentHashMap. Let me count one by one.

  1. Constructor changes.
                ConcurrentOpenHashMap.<TopicName,
                        PersistentOfflineTopicStats>newBuilder()./* ... */build();

You still need to keep the builder with all meaningless parameters like concurrencyLevel, which is confusing.

  1. Non-standard keys() and values() APIs
    public List<K> keys() {
        List<K> keys = new ArrayList<>((int) size());
        forEach((key, value) -> keys.add(key));
        return keys;
    }

    public List<V> values() {
        List<V> values = new ArrayList<>((int) size());
        forEach((key, value) -> values.add(value));
        return values;
    }

ConcurrentHashMap does not provide these methods because of the overhead of copying keys or values into a new collection.

  1. The removeNullValue method, which is really wrong and should not exist.

So just let me split the huge PR into multiple relatively small PRs.

Ok. It sounds we would better replace the whole ConcurrentHashMap.

@BewareMyPower
Copy link
Contributor Author

Now all PRs are merged so that Pulsar 4.0.0 won't have any ConcurrentOpenHashMap. Close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

No branches or pull requests

3 participants