Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Metrics plugin #6916

Merged
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
fdbe83c
Update conf.py
amyblais Jan 16, 2024
047ff86
Merge branch 'master' into v9.5-documentation
amyblais Jan 16, 2024
3428299
Merge branch 'master' into v9.5-documentation
amyblais Jan 16, 2024
0167586
Merge branch 'master' into v9.5-documentation
amyblais Jan 16, 2024
e01c0dc
Merge branch 'master' into v9.5-documentation
amyblais Jan 23, 2024
a3bb495
Merge branch 'master' into v9.5-documentation
amyblais Jan 25, 2024
0bd8c9e
Merge branch 'master' into v9.5-documentation
amyblais Jan 26, 2024
8fd2f9b
Merge branch 'master' into v9.5-documentation
amyblais Jan 30, 2024
d949f17
Merge branch 'master' into v9.5-documentation
amyblais Feb 2, 2024
c17c450
Merge branch 'master' into v9.5-documentation
amyblais Feb 5, 2024
047f02c
Merge branch 'master' into v9.5-documentation
amyblais Feb 6, 2024
a64b360
Merge branch 'master' into v9.5-documentation
amyblais Feb 7, 2024
05236bc
Clarified that system roles also control access to related API endpoi…
cwarnermm Feb 7, 2024
ad58680
Merge branch 'master' into v9.5-documentation
amyblais Feb 8, 2024
baac423
Added new error codes page (#6908)
cwarnermm Feb 8, 2024
5b94410
Remove legacy MySQL references (#6896)
cwarnermm Feb 8, 2024
5ed3230
Removed "Sync" to align with product naming (#6898)
cwarnermm Feb 8, 2024
4848226
Clarified Ent/Pro differentiation (#6899)
cwarnermm Feb 8, 2024
5aa0b42
Clarified that bots don't count as active users (#6902)
cwarnermm Feb 8, 2024
fa3dff1
Added interactive demo link (#6905)
cwarnermm Feb 8, 2024
558363a
Added link to Kubernetes YAML docs via mattermost repo (#6906)
cwarnermm Feb 9, 2024
e9114d0
Clarified supported AllowCorsFrom values (#6897)
cwarnermm Feb 9, 2024
b4169a4
Updated legacy E10/E20 label text (#6900)
cwarnermm Feb 12, 2024
54eca69
Added transcription end user docs (#6910)
cwarnermm Feb 12, 2024
2da80c8
Added Metrics plugin
cwarnermm Feb 12, 2024
5973dd6
Update source/scale/performance-monitoring.rst
cwarnermm Feb 12, 2024
2529798
Removed HA modes details
cwarnermm Feb 13, 2024
5d9c048
Merge branch 'master' into performance-metrics-plugin
cwarnermm Feb 14, 2024
ed7eba3
Merge branch 'master' into performance-metrics-plugin
cwarnermm Feb 15, 2024
149f3a8
Merge branch 'master' into performance-metrics-plugin
cwarnermm Feb 27, 2024
d2d47ff
Merge branch 'master' into performance-metrics-plugin
cwarnermm Feb 29, 2024
49affeb
dashboards-how-to (#6935)
isacikgoz Feb 29, 2024
157bc93
Merge branch 'master' into performance-metrics-plugin
cwarnermm Mar 14, 2024
7a677d3
Split perf monitoring into 3 parts: 3rd party integrations, plugin, &…
cwarnermm Mar 18, 2024
f75a180
Merge branch 'master' into performance-metrics-plugin
cwarnermm Mar 18, 2024
6551479
Merge branch 'master' into performance-metrics-plugin
cwarnermm Mar 18, 2024
946d533
Incorporated reviewer feedback
cwarnermm Mar 18, 2024
a27f167
Merge branch 'performance-metrics-plugin' of https://github.com/matte…
cwarnermm Mar 18, 2024
90c7b79
Update source/guides/scale-mattermost.rst
cwarnermm Mar 21, 2024
81a6ffe
Update source/guides/scale-mattermost.rst
cwarnermm Mar 21, 2024
4433fc0
Merge branch 'master' into performance-metrics-plugin
cwarnermm Mar 21, 2024
bf5ec96
Incorporated reviewer feedback, added redirects, & updated links
cwarnermm Mar 21, 2024
fa8c8f8
Merge branch 'master' into performance-metrics-plugin
cwarnermm Mar 26, 2024
4f51694
Merge branch 'master' into performance-metrics-plugin
cwarnermm Mar 26, 2024
3955788
Fixed broken link
cwarnermm Mar 26, 2024
01b9dec
Merge branch 'master' into performance-metrics-plugin
cwarnermm Apr 12, 2024
051da5f
Merge branch 'mattermost-supported-integrations' into performance-met…
cwarnermm Apr 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions source/guides/scale-mattermost.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@ Scale Mattermost
Scale up to 88000 users </scale/scale-to-88000-users>
High availability cluster </scale/high-availability-cluster>
Elasticsearch </scale/elasticsearch>
Performance monitoring </scale/performance-monitoring>
Monitor performance using the Metrics plugin </scale/metrics-plugin>
Monitor performance using dashboards </scale/performance-monitoring>
cwarnermm marked this conversation as resolved.
Show resolved Hide resolved
Performance monitoring metrics </scale/performance-monitoring-metrics>
Mattermost performance alerting guide </scale/performance-alerting>

Scale and monitor your Mattermost deployment.
Expand All @@ -33,5 +35,7 @@ Scale and monitor your Mattermost deployment.
* :doc:`Scale up to 88000 users </scale/scale-to-88000-users>` - Learn how to scale Mattermost to up to 88000 users.
* :doc:`High availability cluster </scale/high-availability-cluster>` - Maintain Mattermost service during outages and hardware failures with redundant infrastructure.
* :doc:`Elasticsearch </scale/elasticsearch>` - Enhance search performance with Elasticsearch.
* :doc:`Performance monitoring </scale/performance-monitoring>` - Use Prometheus and Grafana to monitor the health and performance of your Mattermost cluster.
* :doc:`Monitor performance using the Mattermost Metrics plugin </scale/metrics-plugin>` - Use the Mattermost Metrics Plugin for cases where Prometheus and Grafana aren't available.
cwarnermm marked this conversation as resolved.
Show resolved Hide resolved
* :doc:`Monitor performance using dashboards </scale/performance-monitoring>` - Use Prometheus and Grafana to monitor the health and performance of your Mattermost cluster.
* :doc:`Performance monitoring metrics </scale/performance-monitoring-metrics>` - The custom and standard Go metrics available for monitoring system performance.
* :doc:`Mattermost performance alerting guide </scale/performance-alerting>` - Learn strategies and best practices for monitoring your Mattermost cluster.
44 changes: 44 additions & 0 deletions source/scale/metrics-plugin.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
Monitor performance using the Metrics plugin
============================================

.. include:: ../_static/badges/ent-cloud-selfhosted.rst
:start-after: :nosearch:

.. |plus-icon| image:: ../images/plus_F0415.svg
:alt: Open menus using the plus icon.

The `Mattermost Metrics plugin <https://github.com/mattermost/mattermost-plugin-metrics/>`__ is an alternative tool to collect application metrics from Mattermost that doesn't require you to install and integrate `Prometheus <https://prometheus.io/>`__ and `Grafana <https://grafana.org/>`__ with Mattermost.

The Metrics plugin can be installed on Mattermost versions from v6.3 and collects and stores the :doc:`same performance monitoring metrics </scale/performance-monitoring-metrics>` as Prometheus, without having to deploy these third-party tools. Data is collected every minute and is stored where the plugin is running. The data is synchronized to either a cloud-based or local file store every hour, and retained for 15 days.

Using the Mattermost Metrics plugin, you can download and share the collected data with Mattermost to understand application performance, troubleshoot system stability and performance, as well as inform route cause analysis.

Install the Mattermost Metrics plugin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. Download the latest version from the `Mattermost release page <https://github.com/mattermost/mattermost-plugin-metrics/releases>`__.
2. In the System Console, go to **Plugins > Plugin Management** to upload the plugin file. Alternatively, you can place the plugin file in the Mattermost server's plugin directory manually.
3. Enable the plugin in the System Console.

Furthermore, to use the dump file generated by the plugin, you can simply clone the `Dockprom <https://github.com/stefanprodan/dockpromo>`__ repository. Change the Prometheus data volume to point to the dump that you just downloaded. The downloaded file is compressed, so to be able to use it, you need to decompress it first.

The volume configuration for Prometheus should look like the code below in the ``docker-compose.yml`` file:

.. code:: yaml

volumes:
- ./prometheus:/etc/prometheus
- /Path/To/Dump/Directory:/prometheus/data

Once you set this up, run ``docker-compose`` as described in `Dockprom Repository <https://github.com/stefanprodan/dockprom?tab=readme-ov-file#install>`__.

You can also use our `Mattermost Performance Monitoring v2 <https://grafana.com/grafana/dashboards/15582>`__ dashboard by simply importing it into Grafana.

1. Open Grafana (``<localhost>:3000`` by default) and then log into it.
2. Once you log in, go to the **Plus** |plus-icon| icon on the left sidebar, and then select **Import**.
3. Enter the dashboard ID (``15582``) in the **Grafana.com Dashboard** field, and then select **Load** to fetch the dashboard.

What's collected?
-----------------

Mattermost provides :ref:`custom metrics <scale/performance-monitoring-metrics:custom Mattermost metrics>` and :ref:`standard Go metrics <scale/performance-monitoring-metrics:standard go metrics>` that can be used to monitor your system's performance.
194 changes: 194 additions & 0 deletions source/scale/performance-monitoring-metrics.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
Performance monitoring metrics
==============================

.. include:: ../_static/badges/ent-cloud-selfhosted.rst
:start-after: :nosearch:

Mattermost provides the following performance monitoring statistics to integrate with Prometheus and Grafana.

Custom Mattermost metrics
~~~~~~~~~~~~~~~~~~~~~~~~~

The following is a list of custom Mattermost metrics that can be used to monitor your system's performance:

API metrics
^^^^^^^^^^^

- ``mattermost_api_time``: The total time in seconds to execute a given API handler.

Caching metrics
^^^^^^^^^^^^^^^

- ``mattermost_cache_etag_hit_total``: The total number of ETag cache hits for a specific cache.
- ``mattermost_cache_etag_miss_total``: The total number of ETag cache misses for an API call.
- ``mattermost_cache_mem_hit_total``: The total number of memory cache hits for a specific cache.
- ``mattermost_cache_mem_invalidation_total``: The total number of memory cache invalidations for a specific cache.
- ``mattermost_cache_mem_miss_total``: The total number of cache misses for a specific cache.

The above metrics can be used to calculate ETag and memory cache hit rates over time.

.. image:: ../images/perf_monitoring_caching_metrics.png
:alt: Example caching metrics, including Etag hit rate and mem cache hit rate, in a self-hosted Mattermost deployment.

Cluster metrics
^^^^^^^^^^^^^^^

- ``mattermost_cluster_cluster_request_duration_seconds``: The total duration in seconds of the inter-node cluster requests.
- ``mattermost_cluster_cluster_requests_total``: The total number of inter-node requests.
- ``mattermost_cluster_event_type_totals``: The total number of cluster requests sent for any type.

Database metrics
^^^^^^^^^^^^^^^^

- ``mattermost_db_master_connections_total``: The total number of connections to the master database.
- ``mattermost_db_read_replica_connections_total``: The total number of connections to all the read replica databases.
- ``mattermost_db_search_replica_connections_total``: The total number of connections to all the search replica databases.
- ``mattermost_db_store_time``: The total time in seconds to execute a given database store method.
- ``mattermost_db_replica_lag_abs``: Absolute lag time based on binlog distance/transaction queue length.
- ``mattermost_db_replica_lag_time``: The time taken for the replica to catch up.

Database connection metrics
^^^^^^^^^^^^^^^^^^^^^^^^^^^

- ``max_open_connections``: The maximum number of open connections to the database.
- ``open_connections``: The number of established connections both in use and idle.
- ``in_use_connections``: The number of connections currently in use.
- ``idle_connections``: The number of idle connections.
- ``wait_count_total``: The total number of connections waited for.
- ``wait_duration_seconds_total``: The total time blocked waiting for a new connection.
- ``max_idle_closed_total``: The total number of connections closed due to the maximum idle connections being reached.
- ``max_idle_time_closed_total``: The total number of connections closed due to the connection maximum idle time configured.
- ``max_lifetime_closed_total``: The total number of connections closed due to the connection maximum lifetime configured.

HTTP metrics
^^^^^^^^^^^^

- ``mattermost_http_errors_total``: The total number of http API errors.
- ``mattermost_http_request_duration_seconds``: The total duration in seconds of the http API requests.
- ``mattermost_http_requests_total``: The total number of http API requests.

.. image:: ../images/perf_monitoring_http_metrics.png
:alt: Example HTTP metrics, including number of API errors per minute, number of API requests per minute, and mean request time per minute, in a self-hosted Mattermost deployment.

Login and session metrics
^^^^^^^^^^^^^^^^^^^^^^^^^

- ``mattermost_http_websockets_total`` The total number of WebSocket connections to the server.
- ``mattermost_login_logins_fail_total``: The total number of failed logins.
- ``mattermost_login_logins_total``: The total number of successful logins.

Mattermost channels metrics
^^^^^^^^^^^^^^^^^^^^^^^^^^^

- ``mattermost_post_broadcasts_total``: The total number of WebSocket broadcasts sent because a post was created.
- ``mattermost_post_emails_sent_total``: The total number of emails sent because a post was created.
- ``mattermost_post_file_attachments_total``: The total number of file attachments created because a post was created.
- ``mattermost_post_pushes_sent_total``: The total number of mobile push notifications sent because a post was created.
- ``mattermost_post_total``: The total number of posts created.
- ``mattermost_post_webhooks_totals``: The total number of webhook posts created.

.. image:: ../images/perf_monitoring_messaging_metrics.png
:alt: Example Mattermost channels metrics, including messages per minute, broadcasts per minute, emails sent per minute, mobile push notifications per minute, and number of file attachments per minute, in a self-hosted Mattermost deployment.

Process metrics
^^^^^^^^^^^^^^^

- ``mattermost_process_cpu_seconds_total``: Total user and system CPU time spent in seconds.
- ``mattermost_process_max_fds``: Maximum number of open file descriptors.
- ``mattermost_process_open_fds``: Number of open file descriptors.
- ``mattermost_process_resident_memory_bytes``: Resident memory size in bytes.
- ``mattermost_process_start_time_seconds``: Start time of the process since unix epoch in seconds.
- ``mattermost_process_virtual_memory_bytes``: Virtual memory size in bytes.

Search metrics
^^^^^^^^^^^^^^

- ``mattermost_search_posts_searches_duration_seconds_sum``: The total duration, in seconds, of search query requests.
- ``mattermost_search_posts_searches_duration_seconds_count``: The total number of search query requests.

WebSocket metrics
^^^^^^^^^^^^^^^^^

- ``mattermost_websocket_broadcasts_total``: The total number of WebSocket broadcasts sent by type.
- ``mattermost_websocket_event_total``: The total number of WebSocket events sent by type.

Logging metrics
^^^^^^^^^^^^^^^

- ``logger_queue_used``: Current logging queue level(s).
- ``logger_logged_total``: The total number of logging records emitted.
- ``logger_error_total``: The total number of logging errors.
- ``logger_dropped_total``: The total number of logging records dropped.
- ``logger_blocked_total``: The total number of logging records blocked.

Debugging metrics
^^^^^^^^^^^^^^^^^

- ``mattermost_system_server_start_time``: Server start time. Set to the current time on server start.
- ``mattermost_jobs_active``: Increment when a job starts and decrement when the job ends.

Use ``mattermost_system_server_start_time`` to dynamically add an annotation corresponding to the event.

.. image:: ../images/mattermost_system_server_start_time.png
:alt: Example debugging metrics, including number of messages per second, in a self-hosted Mattermost deployment.

Use ``mattermost_jobs_active`` to display an active jobs chart.

.. image:: ../images/mattermost_active_jobs_chart.png
:alt: Example debugging metrics, including active jobs, in a self-hosted Mattermost deployment.

Or, use ``mattermost_jobs_active`` to dynamically add a range annotation corresponding to jobs being active.

.. image:: ../images/mattermost_dynamic_range_annotation.png
:alt: Example debugging metrics, including number of messages per second, in a self-hosted Mattermost deployment.

Use annotations to streamline analysis when a job is long running, such as an LDAP synchronization job.

.. note::
Jobs where the runtime is less than the Prometheus polling interval are unlikely to be visible because Grafana is performing range queries over the raw Prometheus timeseries data, and rendering an event each time the value changes.

Standard Go metrics
~~~~~~~~~~~~~~~~~~~

.. include:: ../_static/badges/allplans-cloud-selfhosted.rst
:start-after: :nosearch:

The performance monitoring feature provides standard Go metrics for HTTP server runtime profiling data and system monitoring, such as:

- ``go_memstats_alloc_bytes`` for memory usage
- ``go_goroutines`` for number of goroutines
- ``go_gc_duration_seconds`` for garbage collection duration
- ``go_memstats_heap_objects`` for object tracking on the heap

To learn how to set up runtime profiling, see the `pprof package Go documentation <https://pkg.go.dev/net/http/pprof>`__. You can also visit the ``ip:port`` page for a complete list of metrics with descriptions.

.. note::
A Mattermost Enterprise license is required to connect to ``/metrics`` using HTTP.

If enabled, you can run the profiler by

``go tool pprof http://localhost:<port>/debug/pprof/profile?seconds=<duration>``

where you can replace ``localhost`` with the server name. The profiling reports are available at ``<ip>:<port>``, which include:

- ``/debug/pprof/profile?seconds=30`` for CPU profiling
- ``/debug/pprof/cmdline`` for command line profiling
- ``/debug/pprof/symbol`` for symbol profiling
- ``/debug/pprof/trace`` for trace profiling
- ``/debug/pprof/goroutine`` for Go routine profiling
- ``/debug/pprof/heap`` for heap profiling
- ``/debug/pprof/threadcreate`` for threads profiling
- ``/debug/pprof/block`` for block profiling

.. image:: ../images/perf_monitoring_go_metrics.png
:alt: Example Go metrics for HTTP server runtime profiling data and system monitoring, including memory usage, Go routines, and garbage collection duration, in a self-hosted Mattermost deployment.

Frequently asked questions
--------------------------

Why are chart labels difficult to distinguish?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The chart labels used in server filters and legends are based on the hostname of your machines. If the hostnames are similar, then it will be difficult to distinguish the labels.

You can either set more descriptive hostnames for your machines or change the display name with a ``relabel_config`` in `Prometheus configuration <https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config>`__.
Loading
Loading