Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrumentation PoC (WIP) #4216

Closed
wants to merge 11 commits into from
Closed

Instrumentation PoC (WIP) #4216

wants to merge 11 commits into from

Conversation

chrzaszcz
Copy link
Member

@chrzaszcz chrzaszcz commented Jan 24, 2024

This PR initiates the instrumentation rework, which will be done int the feature/instrument branch.

Main goals of this PR:

  • Provide a generic API module mongoose_instrument for setting up and executing instrumentation in the code. It shouldn't be limited to metrics.
  • Add minimal instrumentation handler modules for Exometer (which we have already connected to Graphite exporter) and Prometheus (now exposed via HTTP).
  • Make instrumentation handlers configurable, and provide default configuration in TOML.
  • Automatically set up (and tear down) instrumentation in gen_mod, just like we do with hooks.
  • Update one module to use mongoose_instrument. I chose mod_mam to ensure that the new approach can handle complex cases.
  • Still support the old mongoose_metrics to have all tests passing.

Note: Individual commits contain much more information.

@mongoose-im
Copy link
Collaborator

mongoose-im commented Jan 24, 2024

dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / 9281bd6
Reports root


ldap_mnesia_25 / ldap_mnesia / 9281bd6
Reports root


pgsql_mnesia_25 / pgsql_mnesia / 9281bd6
Reports root


ldap_mnesia_26 / ldap_mnesia / 9281bd6
Reports root


pgsql_mnesia_26 / pgsql_mnesia / 9281bd6
Reports root


internal_mnesia_26 / internal_mnesia / 9281bd6
Reports root


dynamic_domains_mysql_redis_26 / mysql_redis / 9281bd6
Reports root


mysql_redis_26 / mysql_redis / 9281bd6
Reports root


dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / 9281bd6
Reports root


pgsql_cets_26 / pgsql_cets / 9281bd6
Reports root


mssql_mnesia_26 / odbc_mssql_mnesia / 9281bd6
Reports root


dynamic_domains_mssql_mnesia_26 / odbc_mssql_mnesia / 9281bd6
Reports root


elasticsearch_and_cassandra_26 / elasticsearch_and_cassandra_mnesia / 9281bd6
Reports root


small_tests_25 / small_tests / 9281bd6
Reports root / small


small_tests_26 / small_tests / 9281bd6
Reports root / small


small_tests_26_arm64 / small_tests / 9281bd6
Reports root / small

@chrzaszcz chrzaszcz force-pushed the metrics-poc branch 2 times, most recently from b3af1ca to afcc464 Compare January 24, 2024 14:51
@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

Copy link

codecov bot commented Jan 24, 2024

Codecov Report

Attention: 18 lines in your changes are missing coverage. Please review.

Comparison is base (fbd57db) 84.35% compared to head (5f6ebe4) 84.40%.

Files Patch % Lines
src/mam/mod_mam_muc.erl 53.33% 7 Missing ⚠️
src/mam/mod_mam_muc_rdbms_arch_async.erl 60.00% 2 Missing ⚠️
src/mam/mod_mam_pm.erl 90.00% 2 Missing ⚠️
src/mam/mod_mam_rdbms_arch_async.erl 60.00% 2 Missing ⚠️
src/mam/mod_mam_elasticsearch_arch.erl 0.00% 1 Missing ⚠️
src/mam/mod_mam_muc_elasticsearch_arch.erl 0.00% 1 Missing ⚠️
src/metrics/mongoose_instrument.erl 94.11% 1 Missing ⚠️
src/metrics/mongoose_instrument_exometer.erl 93.33% 1 Missing ⚠️
src/metrics/mongoose_instrument_prometheus.erl 96.15% 1 Missing ⚠️
Additional details and impacted files
@@                  Coverage Diff                   @@
##           feature/instrument    #4216      +/-   ##
======================================================
+ Coverage               84.35%   84.40%   +0.04%     
======================================================
  Files                     552      556       +4     
  Lines                   33507    33534      +27     
======================================================
+ Hits                    28266    28303      +37     
+ Misses                   5241     5231      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@chrzaszcz chrzaszcz force-pushed the metrics-poc branch 2 times, most recently from 7adcbca to 4b53b92 Compare January 24, 2024 15:21
@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@chrzaszcz chrzaszcz force-pushed the metrics-poc branch 3 times, most recently from 1b717f2 to 22950b8 Compare February 14, 2024 14:19
@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

@mongoose-im

This comment was marked as outdated.

Key requirements:
- Each event is identified by its name (atom) and a map of labels.
  Possible labels (keys) are strictly limited to eliminate mistakes.
- Config passed to 'set_up' should contain generic specs of the
  instrumentation, e.g. #{metrics => {time => histogram}} indicates
  the need for a histogram metric based on the 'time' measurement, but
  there is no reliance on particular metric backends. The config
  should be extensible as well, and allow specifying the need for
  instruments other than metrics. Particular handlers return a boolean
  indicating if they are interested in handling this event or not.
- A duplicated call to 'set_up' should fail.
- The call to 'execute' should call all the attached handlers.
- The call to 'span' should execute the measured function F.
  A measure callback MeasureF is then used to determine the resulting
  measurements. Execution time is passed to MeasureF alongside the
  result from F.
- A call to 'execute' or 'span' should fail if there was no 'setup'
  for the provided event name and labels.
The goal is to have a PoC for a single module.
Functionality will be extended as needed.

- Only spiral and histogram metrics are supported.
- All metrics are mapped to 3-element lists:
    [EventName, HostTypeLabel, MetricName]
  This means that only host-type metrics are supported.
- There is no support for 'all_metrics_are_global'
- The dependency on mongoose_metrics will be removed in the future.
- 'spiral' metrics are implemented as counters, because Prometheus
  allows rate calculation for counters.
- 'histogram' metrics are implemented as histograms
  with exponential buckets: 2^0, 2^1, ... 2^30.
  The last boundary is more than 10^9, which is e.g. ~18 minutes expressed
  in microseconds. Buckets could be made configurable later to save
  space and computation resources for small metrics, and to allow
  negative values.
Any mongoose_instrument_* modules are allowed for now,
which technically allows mongoose_instrument_registry,
but it shoudn't be a real issue.
We can rename if needed.

Intrumentation is configured globally (i.e. not per host type).
It returns a list of instrument specs, just like hooks/1 returns hooks.

Instrumentation is set up before hooks, and torn down after them.
- Enable exometer and prometheus for now.
  It would be needed to pass tests.
- In the future we would most likely only enable prometheus by
  default.
- Prometheus HTTP endpoint is exposed by default.
Unify event names: mod_mam_pm_* and mod_mam_muc_*

No functional changes other than metric renaming
All instrumentation is done with mongoose_instrument now.
The checks in mongoose_instrument would cause an exception.
@chrzaszcz chrzaszcz changed the base branch from master to feature/instrument February 15, 2024 09:01
@mongoose-im
Copy link
Collaborator

mongoose-im commented Feb 15, 2024

elasticsearch_and_cassandra_26 / elasticsearch_and_cassandra_mnesia / 5f6ebe4
Reports root/ big
OK: 417 / Failed: 0 / User-skipped: 43 / Auto-skipped: 0


small_tests_25 / small_tests / 5f6ebe4
Reports root / small


small_tests_26 / small_tests / 5f6ebe4
Reports root / small


small_tests_26_arm64 / small_tests / 5f6ebe4
Reports root / small


ldap_mnesia_25 / ldap_mnesia / 5f6ebe4
Reports root/ big
OK: 2275 / Failed: 0 / User-skipped: 856 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_25 / pgsql_mnesia / 5f6ebe4
Reports root/ big
OK: 4421 / Failed: 0 / User-skipped: 111 / Auto-skipped: 0


ldap_mnesia_26 / ldap_mnesia / 5f6ebe4
Reports root/ big
OK: 2275 / Failed: 0 / User-skipped: 856 / Auto-skipped: 0


dynamic_domains_pgsql_mnesia_26 / pgsql_mnesia / 5f6ebe4
Reports root/ big
OK: 4421 / Failed: 0 / User-skipped: 111 / Auto-skipped: 0


dynamic_domains_mysql_redis_26 / mysql_redis / 5f6ebe4
Reports root/ big
OK: 4388 / Failed: 0 / User-skipped: 144 / Auto-skipped: 0


internal_mnesia_26 / internal_mnesia / 5f6ebe4
Reports root/ big
OK: 2415 / Failed: 0 / User-skipped: 716 / Auto-skipped: 0


pgsql_cets_26 / pgsql_cets / 5f6ebe4
Reports root/ big
OK: 4356 / Failed: 0 / User-skipped: 178 / Auto-skipped: 0


dynamic_domains_mssql_mnesia_26 / odbc_mssql_mnesia / 5f6ebe4
Reports root/ big
OK: 4418 / Failed: 0 / User-skipped: 114 / Auto-skipped: 0


mysql_redis_26 / mysql_redis / 5f6ebe4
Reports root/ big
OK: 4789 / Failed: 0 / User-skipped: 139 / Auto-skipped: 0


pgsql_mnesia_26 / pgsql_mnesia / 5f6ebe4
Reports root/ big
OK: 4810 / Failed: 0 / User-skipped: 118 / Auto-skipped: 0


pgsql_mnesia_25 / pgsql_mnesia / 5f6ebe4
Reports root/ big
OK: 4810 / Failed: 0 / User-skipped: 118 / Auto-skipped: 0


mssql_mnesia_26 / odbc_mssql_mnesia / 5f6ebe4
Reports root/ big
OK: 4807 / Failed: 0 / User-skipped: 121 / Auto-skipped: 0

@chrzaszcz chrzaszcz mentioned this pull request Feb 15, 2024
@chrzaszcz
Copy link
Member Author

Split into #4223 and #4224

@chrzaszcz chrzaszcz closed this Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants