Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PIP 79][common,broker,client] Change PartitionedTopicStats to support partial partitioned producer #10534

Closed
wants to merge 6 commits into from

Conversation

equanz
Copy link
Contributor

@equanz equanz commented May 11, 2021

Master Issue: https://github.com/apache/pulsar/wiki/PIP-79%3A-Reduce-redundant-producers-from-partitioned-producer

Motivation

Please see the PIP document.
This is a part of implementations. Before review it, please check #10279. After #10279 is merged, I will rebase to top of master branch.

Modifications

  • Change PartitionedTopicStats to support partial partitioned producer like this(#10279)

For backward compatibility reason, I introduce new producer property producerStatsKey. If this property is same, then associate publisher stats as same producer at partitioned producer stats.

I think about using producerName instead of introducing producerStatsKey, but I think we should not use it. Because currently producerName is configured by the system(different for each partition) or yourself(same between all partitions). Therefore, if we use producerName as association key, we should change system behavior (I think this is breaking changes) or request to user that "you should set producerName yourself if you want to associate publisher stats per partitioned producer".

If any suggestions, please comment to this PR.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Added test for partitioned producer stats

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

Documentation

  • Does this pull request introduce a new feature? (no)

@nkurihar nkurihar added area/broker area/client type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages type/PIP labels May 11, 2021
@Vanlightly
Copy link
Contributor

@equanz Could you elaborate on what the problem is? If each partitioned producer creates producers on a subset of partitions, how do the stats become incorrect?

@equanz
Copy link
Contributor Author

equanz commented Aug 13, 2021

@Vanlightly Of course. I think the main issue is stats collecting logic.

if (this.publishers.size() != stats.publishers.size()) {
for (int i = 0; i < stats.publishers.size(); i++) {
PublisherStatsImpl publisherStats = new PublisherStatsImpl();
this.publishers.add(publisherStats.add(stats.publishers.get(i)));
}
} else {
for (int i = 0; i < stats.publishers.size(); i++) {
this.publishers.get(i).add(stats.publishers.get(i));
}
}

It causes these issues.

  1. If each partition has different number of producers, then calculate publisher stats separetelly
    • e.g. if full partitioned producer and partial partitioned producer(or simply single producer) are connected to the partitioned topic which has 2 partitions, then broker calclulates 3 publisher stats each other
    • expect: each partitioned producer is accumulated as single publisher stats
  2. If each partition has same number of producer, then calculate publisher stats as single partitioned producer
    • e.g. if each partition has single producer(all producers were created separetelly), then broker calculates publisher stats as single publisher
    • expect: all producers are accumulated separetelly

@Vanlightly
Copy link
Contributor

@equanz Let me know if I understand correctly. We publish aggregated stats, where we aggregate by partitioned producer, and optionally we also publish stats per partition with no aggregation. So the issue is that currently the stats collection code expects each partition to have the same number of producers on each partition, and uses that as a way of aggregating the stats for each partitioned producer.

So we need another way of aggregating by partitioned producer. Moreover, it looks to me like the current strategy is already broken seeing as the numbers won't always match (neither numbers nor order) such as when producers come and go.

Using producer name as an aggregation key is not good because if the producer name is not set by the user, then each internal producer will have a different producer name (generated by the broker).

So the fix is to add another producer identity (producerStatsKey) that is shared across all producers of the same partitioned producer. Then aggregate on that.

The producerStatsKey is set by the broker, to ensure it is globally unique. The partitioned producer creates a single producer when the partitioned producer is started. Then when additional internal producers for other partitions are required, they are created with the same producerStatsKey as the first producer. That way they all share the same key.

Is that summary correct?

Additional question:

  • Could we not use the producerName for aggregation and use the same strategy of starting a single producer, then making all others inherit their name from the first (when the user doesn't set the name)?

@equanz
Copy link
Contributor Author

equanz commented Aug 17, 2021

@Vanlightly

Moreover, it looks to me like the current strategy is already broken seeing as the numbers won't always match (neither numbers nor order) such as when producers come and go.

I think so too. Therefore, not only partial producer stats, but also total producer stats will be calculated correctly by this feature.

Is that summary correct?

Yes.

Could we not use the producerName for aggregation and use the same strategy of starting a single producer, then making all others inherit their name from the first (when the user doesn't set the name)?

We can choose the strategy described above, but I think it will break some existing behavior(currently, internal producer's name aren't same. / if we use the same producerName in the partitioned topic, then they will be aggregated.).

If the change described above is approved in the community, I'll change the strategy.

@equanz
Copy link
Contributor Author

equanz commented Sep 13, 2021

I'll try to implement this feature based on above comment (use the producerName for aggregation and use the same strategy of starting a single producer).

@equanz
Copy link
Contributor Author

equanz commented Feb 7, 2022

Close this PR because #12401 has been merged.

@equanz equanz closed this Feb 7, 2022
@equanz equanz deleted the pip_79_partitioned_stats_pr branch February 7, 2022 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker area/client type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages type/PIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants