Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[monitoring][broker][metadata] add metadata store metrics #17041

Merged
merged 15 commits into from
Sep 20, 2022

Conversation

tjiuming
Copy link
Contributor

Motivation

there is no direct metrics for MetadataStore, the PR is supposed to fix it.

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

@codelipenghui codelipenghui added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/metrics area/metadata labels Aug 11, 2022
@codelipenghui codelipenghui added this to the 2.12.0 milestone Aug 11, 2022
Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also consider to add metrics for the AbstractBatchedMetadataStore, it will help us to tune the batch operations.

@codelipenghui
Copy link
Contributor

@asafm Please help review this PR.

@asafm
Copy link
Contributor

asafm commented Aug 11, 2022

@codelipenghui done

@tjiuming
Copy link
Contributor Author

@asafm @codelipenghui there will be more than 1 metadataStore in a broker, when I develop #17072, I think record its metrics separately is better, so I've updated the PR. PTAL

@tjiuming tjiuming requested review from codelipenghui and asafm and removed request for codelipenghui and asafm August 18, 2022 17:28
@asafm
Copy link
Contributor

asafm commented Aug 31, 2022

@asafm Please help review this PR

@tjiuming done

@asafm
Copy link
Contributor

asafm commented Sep 4, 2022

I commented @tjiuming
Aside from style notations, there are 3 main things I would improve:

  1. Metadata store name is not indicative enough - see comment
  2. Add latency for failed ops - see comment
  3. Move tests out of PrometheusMetricsTest - see comment

@tjiuming
Copy link
Contributor Author

tjiuming commented Sep 5, 2022

I commented @tjiuming Aside from style notations, there are 3 main things I would improve:

  1. Metadata store name is not indicative enough - see comment
  2. Add latency for failed ops - see comment
  3. Move tests out of PrometheusMetricsTest - see comment

for 1, if I change protected AbstractMetadataStore() to protected AbstractMetadataStore(String metadataStoreName), there will be about 100+ places need to fix, I don't want to make such a huge change.
for 2, resolved
for 3, as I commented above, I added metadata store test here to ensure all metrics outputs as expected.

@asafm
Copy link
Contributor

asafm commented Sep 6, 2022

for 1, if I change protected AbstractMetadataStore() to protected AbstractMetadataStore(String metadataStoreName),

Maybe I misunderstand. You have 5 classes that extend AbstractMetadataStore, which are not abstract, right? If I understand correctly, you only need to modify them, no?

@tjiuming
Copy link
Contributor Author

tjiuming commented Sep 6, 2022

for 1, if I change protected AbstractMetadataStore() to protected AbstractMetadataStore(String metadataStoreName),

Maybe I misunderstand. You have 5 classes that extend AbstractMetadataStore, which are not abstract, right? If I understand correctly, you only need to modify them, no?

not only them, Factory and related tests also need to fix. Toooooo many tests referenced it

@asafm
Copy link
Contributor

asafm commented Sep 6, 2022

for 1, if I change protected AbstractMetadataStore() to protected AbstractMetadataStore(String metadataStoreName),

Maybe I misunderstand. You have 5 classes that extend AbstractMetadataStore, which are not abstract, right? If I understand correctly, you only need to modify them, no?

not only them, Factory and related tests also need to fix. Toooooo many tests referenced it

Ok, I took a closer look at the code, and you are correct. You fix the factory, and give the proper names (apparently instances of metadata stores are created in numerous places, tests excluded), but you end with at least 67 places in the tests that instantiate metadata using the factory and those needs to be fixed.

But, it got me thinking: What is the purpose of this PR? We want to know the operations count and latency on the metadata stores. How do we want them broken down?

  • Purpose? Pulsar Metdata Store, Pulsar Configuration Store, BK metadata store, etc...
  • Type? ZK, etcD?
  • component: metadata store, global configuration store.

What will help the operator understand and pinpoint?

In the current implementation in this PR, you don't know which metadata store it refers to: Is the metadata store or configuration store? Is it the metadata store used by the BK client, or something else?

What you see is "metadata-store-0" which tells you nothing.
If it reveals nothing, maybe we don't need it at all - the name that is.

So that's why I'm asking the questions above.
I think those questions are important.

@tjiuming
Copy link
Contributor Author

tjiuming commented Sep 7, 2022

@asafm Yes, you are right and I understand your concern, but actually, we can know that which metadata store is configuration store and which stores broker metadata, just takes a little skill:

In PulsarService.java, code as below:

            localMetadataStore = createLocalMetadataStore(localMetadataSynchronizer);
            localMetadataStore.registerSessionListener(this::handleMetadataSessionEvent);

            coordinationService = new CoordinationServiceImpl(localMetadataStore);

            if (config.isConfigurationStoreSeparated()) {
                configMetadataSynchronizer = StringUtils.isNotBlank(config.getConfigurationMetadataSyncEventTopic())
                        ? new PulsarMetadataEventSynchronizer(this, config.getConfigurationMetadataSyncEventTopic())
                        : null;
                configurationMetadataStore = createConfigurationMetadataStore(configMetadataSynchronizer);
                shouldShutdownConfigurationMetadataStore = true;
            } else {
                configurationMetadataStore = localMetadataStore;
                shouldShutdownConfigurationMetadataStore = false;
            }

Broker metadata store initialized before configuration metadata store, so, metadata-store-0 is broker metadata store and metadata-store-1 is configuration metadata store

@asafm
Copy link
Contributor

asafm commented Sep 7, 2022

@asafm Yes, you are right and I understand your concern, but actually, we can know that which metadata store is configuration store and which stores broker metadata, just takes a little skill:

In PulsarService.java, code as below:

            localMetadataStore = createLocalMetadataStore(localMetadataSynchronizer);
            localMetadataStore.registerSessionListener(this::handleMetadataSessionEvent);

            coordinationService = new CoordinationServiceImpl(localMetadataStore);

            if (config.isConfigurationStoreSeparated()) {
                configMetadataSynchronizer = StringUtils.isNotBlank(config.getConfigurationMetadataSyncEventTopic())
                        ? new PulsarMetadataEventSynchronizer(this, config.getConfigurationMetadataSyncEventTopic())
                        : null;
                configurationMetadataStore = createConfigurationMetadataStore(configMetadataSynchronizer);
                shouldShutdownConfigurationMetadataStore = true;
            } else {
                configurationMetadataStore = localMetadataStore;
                shouldShutdownConfigurationMetadataStore = false;
            }

Broker metadata store initialized before configuration metadata store, so, metadata-store-0 is broker metadata store and metadata-store-1 is configuration metadata store

  1. Do you find that a reasonable user experience for a Pulsar user? IMO, they will get quite frustrated and mad when they read that, where all they wanted to do was to know the metrics of metadata store access.
  2. Reiterating my question in the original comment - how of the options I listed above, what is your goal when breaking down the metadata store metrics? (see the question in the previous comment)

@codelipenghui
Copy link
Contributor

@tjiuming

Broker metadata store initialized before configuration metadata store, so, metadata-store-0 is broker metadata store and metadata-store-1 is configuration metadata store

IMO, we should improve this part. Instead of adding docs to tell users which one is configuration store, and which one is metadata store, we should add the type to the metrics like type=configuration/metadata. I also consider adding the metadataStoreUrl, but it can be a long string value.

And we should also add the cluster label. Users can have one Prometheus service for multiple clusters, e.g. https://github.com/apache/pulsar/wiki/PIP-8:-Pulsar-beyond-1M-topics

@tjiuming
Copy link
Contributor Author

tjiuming commented Sep 7, 2022

@tjiuming

Broker metadata store initialized before configuration metadata store, so, metadata-store-0 is broker metadata store and metadata-store-1 is configuration metadata store

IMO, we should improve this part. Instead of adding docs to tell users which one is configuration store, and which one is metadata store, we should add the type to the metrics like type=configuration/metadata. I also consider adding the metadataStoreUrl, but it can be a long string value.

And we should also add the cluster label. Users can have one Prometheus service for multiple clusters, e.g. https://github.com/apache/pulsar/wiki/PIP-8:-Pulsar-beyond-1M-topics

@codelipenghui cluster label will be added automatically in PrometheusMetricsGeneratorUtils.generateSystemMetrics(...)
for metadata store name, if I change protected AbstractMetadataStore() to protected AbstractMetadataStore(String name), there will be toooo many places need to fix, WDYT?

@codelipenghui
Copy link
Contributor

@tjiuming

cluster label will be added automatically in PrometheusMetricsGeneratorUtils.generateSystemMetrics(...)
for metadata store name

Oh, I see. Good to know.

if I change protected AbstractMetadataStore() to protected AbstractMetadataStore(String name), there will be toooo many places need to fix, WDYT?

I think it's the 100% right thing to do. If we don't have this change, we might need to explain to many users what the name exactly means. And I think that should be an important point that we missed before, if you are troubleshooting problems with the heap dump, it is hard to determine which one is the configuration store and which one is the metastore.

@tjiuming
Copy link
Contributor Author

tjiuming commented Sep 13, 2022

I think it's the 100% right thing to do. If we don't have this change, we might need to explain to many users what the name exactly means. And I think that should be an important point that we missed before, if you are troubleshooting problems with the heap dump, it is hard to determine which one is the configuration store and which one is the metastore.

@codelipenghui I've tried to change public AbstractMetadataStore() to public AbstractMetadataStore(String name), there will be faaaar beyond 100 places need to change, I think we should consider it veeeery carefully

@codelipenghui
Copy link
Contributor

@tjiuming I think most of them are from tests? After I checked the source code, I think we can just add a new field name to MetadataStoreConfig, just keep the empty name by default. So that we can only modify the method createConfigurationMetadataStore and PulsarMetadataStateStoreProviderImpl.init?

@tjiuming
Copy link
Contributor Author

@codelipenghui @asafm I've added metadataStoreName to MetadataStore, PTAL

@tjiuming
Copy link
Contributor Author

/pulsarbot run-failure-checks

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, please check.

@codelipenghui
Copy link
Contributor

@asafm Please help review again.

Copy link
Contributor

@asafm asafm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes it much more user-friendly. Thanks for the effort !

@tjiuming tjiuming requested a review from asafm September 18, 2022 14:07
Copy link
Contributor

@asafm asafm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯
Thanks for bearing with me for this PR :) I think it was worth it for Pulsar user experience :)

@codelipenghui codelipenghui merged commit 37f0a2d into apache:master Sep 20, 2022
@tjiuming tjiuming deleted the dev/metadata_store_stats branch September 22, 2022 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metadata area/metrics type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants