-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Admin visibility into federation status #7982
Comments
I agree that it would be nice to link these things from somewhere, install guide or elsewhere, during synapse setup. They're on the matrix.org synapse guides page, but a nice bullet point list of "Now you've got Synapse installed, here's the various things you can do from here!" somewhere would be great. There is work underway to set up different guides for different synapse usecases at the moment, which may address this problem somewhat. As for yourself, yes prometheus metrics will give you insight into the overall health of your instance, but it's by no means a quick, simple digest of problems. That doesn't quite exist yet I'm afraid. There is a wiki article on understanding some of these graphs. Other than that though, we often recommend one search through the logs. Yes, it's obviously not the most friendly user interface, but it'll get the job done. I suspect what you want, as well as a graphical installation tool for Synapse, will be part of our current goal of making setup and maintenance of Synapse easier for sysadmins over the coming months.
A dashboard like that would probably be built on top of the admin api. There actually is a third party project already doing that here: https://github.com/Awesome-Technologies/synapse-admin Element Matrix Services uses the admin API extensively for its dashboards as well. I don't think any of these dashboards will present internal errors and causes though. A starting point for a project doing so could come from just parsing Synapse's logs, as they give you request information, timings, errors, etc. |
Unfortunately given the activity level of my server, its often impractical or extremely tedious if not outright impossible to find relevant errors in the logs unless I already know what I'm looking for. Some kind of automated parsing would certainly help. |
It's not kept up to date to my knowledge but maybe something like https://github.com/turt2live/matrix-monitor-bot could be extended for these purposes? Would love to see a project like this revived/modernized. |
@chr-1x A quick tip for that is back-paginating through a search of "ERROR ". On another topic, Synapse does have (very) limited support for structured logging, which outputs logs lines as JSON objects rather than text, which may help build a parsing tool: https://github.com/matrix-org/synapse/blob/master/docs/structured_logging.md |
This could be part of the requested Admin UI. |
Related to #10562 and #10553 which also lists some relevant already existing prometheus metrics to look at. |
I think about to start with some API for this. Have anybody a hint where to start or find any informations? synapse/synapse/storage/databases/main/transactions.py Lines 429 to 444 in 2b82ec4
synapse/synapse/storage/databases/main/transactions.py Lines 156 to 168 in 2b82ec4
|
As admin of a large synapse server (3000 users) I frequently end up in situations where users are reporting issues sending or receiving messages from other homeservers. (Often, this is matrix.org, but sometimes it's other large homeservers such as kde.org or pine64.org). I currently have little visibility into what could be causing these issues. Is it an issue with my homeserver, or the remote? If it's on our end, where should I be looking for problems?
In particular, there are a few key questions I don't currently have a way to answer:
I would love if this information was exposed in some kind of dashboard, but failing that an addition to the admin API would be acceptable. (Note though that I don't really have any insight into what's currently included in the admin API or how I would access it). Looking around in the docs folder of the repo, I've found an unfinished looking document on room statistics and some information on prometheus metrics, which is unhelpful to me as I don't use prometheus (maybe I should, but it's not mentioned anywhere in the README or setup instructions).
The text was updated successfully, but these errors were encountered: