Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I backup Sentry 10+? #364

Closed
nogweii opened this issue Jan 27, 2020 · 22 comments · Fixed by getsentry/develop#258
Closed

How do I backup Sentry 10+? #364

nogweii opened this issue Jan 27, 2020 · 22 comments · Fixed by getsentry/develop#258
Assignees

Comments

@nogweii
Copy link

nogweii commented Jan 27, 2020

I've just finished setting up Sentry 10.1.0.dev0 (1713221b5d6f182853c0d71f51100464ceada7de) today, along with SAML authentication with my Keycloak server. This is all very nice and awesome, so I'd like to keep it around, even in the case of server failure.

Which leads me to the question in the title: With the new architecture with Kafka, Snuba, etc, what do I need to include in my backups? A postgres dump is already included, is there anything more?

@BYK BYK self-assigned this Jan 27, 2020
@BYK
Copy link
Member

BYK commented Jan 27, 2020

If you back up all the named volumes defined in the install script here:

https://github.com/getsentry/onpremise/blob/bc6d3b47e257057587e29153947c1ba223160416/install.sh#L72-L79

you should be good. The critical ones there are sentry-postgres and sentry-clickhouse that said Redis holds the stats and some in-flight data for task queues, same for Kafka and Zookeeper. sentry-data holds all the data you have uploaded to Sentry backend such as avatars, source maps or symbol files and finally sentry-symbolicator holds the cache for Symbolicator which is not critical but is good for performance.

@BYK
Copy link
Member

BYK commented Jan 27, 2020

cc @mattrobenolt in case I'm missing anything.

@mattrobenolt
Copy link
Contributor

Seems reasonable.

@mingfang
Copy link

@nogweii
Can you please share the steps you took to step SAML with Keycloak?

I tried everything for a week but it still fails with this error

Authentication error: SAML SSO failed, https://sentry.<myhost>/saml/metadata/sentry/ is not a valid audience for this Response

Thanks in advance for your help.

@BYK
Copy link
Member

BYK commented Apr 20, 2020

@mingfang this issue doesn't seem like the right place for that question. I strongly recommend using the forum for this.

@mingfang
Copy link

mingfang commented Apr 22, 2020

Here are the steps for anyone trying to integrate Sentry SAML with Keycloak.

Keycloak

1-create client, Clients -> Create
Client ID = https://<sentry url>/saml/metadata/sentry/
Client Protocol = saml
Client SAML Endpoint = https://<sentry url>/saml/acs/sentry/
*must include trailing slash

2-edit the client created in #1 and set
IDP Initiated SSO URL Name = sentry

3-Remove Client Scopes
Assigned Default Client Scopes
select role_list
Remove selected

4-add username Mapper
Name = username
Mapper Type = User Property
Property = Username
SAML Attribute Name = username

5-add email Mapper
Name = email
Mapper Type = User Property
Property = Email
SAML Attribute Name = email

Sentry

1-Register Identity Provider -> IdP Data

2- Entity ID
Keycloak -> Realm Settings -> General -> Endpoints -> SAML 2.0 Identitiy Provider Metadata
entityID=https://<keycloak url>/auth/realms/<realm>

3- Single Sign On URL = https://<keycloak url>/auth/realms/<realm>/protocol/saml/clients/sentry

4- x509 public certificate
Keycloak -> Realm Settings -> Keys -> Certficate -> copy and paste long cert string

5- Attribute Mappings
IdP User ID = username
User Email = email

@MarcusRiemer
Copy link

The list provided by @BYK is helpful:

https://github.com/getsentry/onpremise/blob/bc6d3b47e257057587e29153947c1ba223160416/install.sh#L72-L79

But as far as I understand docker, this is not the full story. Backing up docker volumes usually seems to happen by mounting that volume in a container that writes the contents of some mounted folder to the host filesystem.

In order to properly backup the data inside those volumes, one has to look through docker-compose.yml to see which parts of the filesystem require a backup. From what I can see this boils down to ...

sentry-data: /data
sentry-postgres: /var/lib/postgresql/data
sentry-redis: /data
sentry-zookeeper: /var/lib/zookeeper/data
sentry-kafka: /var/lib/kafka/data
sentry-clickhouse: /var/lib/clickhouse
sentry-symbolicator: /data

@Bessonov
Copy link

Bessonov commented May 4, 2020

@MarcusRiemer

Backing up docker volumes usually seems to happen by mounting that volume in a container that writes the contents of some mounted folder to the host filesystem.

Backing up to hostfile system should be done temporary only and then files should be moved to network or cloud storage. I've created a helper to push the files directly from container to s3. But backup files isn't the right way for databases. Maybe it works for Redis with AOF, but I'm sure that this breaks for postgres. Except the database is offline. From my point of view, usage of specialized tools is always better.

So, to backup sentry online is a very complex task. I'm not even sure if it's possible to create a consistent backup in this case.

@MarcusRiemer
Copy link

Oh yes, that is of course correct. I implicitly assumed that the backup would be done when Sentry is down.

Otherwise one has to somehow coordinate the state of the different data storages and should probably use the dedicated tools like pg_dump.

@BYK
Copy link
Member

BYK commented May 6, 2020

Docker folks recommend the "extract from container" method here: https://docs.docker.com/storage/volumes/#backup-restore-or-migrate-data-volumes

We can obviously improve this but don't have the resources to invest into it currently. If anyone is willing to give a helping hand, we'd definitely review and guide the patch.

@srstsavage
Copy link

Are these assumptions true?

  • Backups of sentry-postgres and sentry-data will contain all non-event sentry config, e.g. users, avatars, etc. Restoring only these two would basically restore sentry config completely, but without any actual event data.
  • A backup of sentry-clickhouse contains all event data, other than events currently in the incoming queue.
  • A backup of sentry-redis holds stats; these aren't essential for a restore, but they can be preserved by restoring the redis volume.
  • sentry-zookeeper and sentry-kafka only hold data about incoming events; if it's acceptable to lose any in-flight events at the time of backup, these don't need to be backed up.
  • The sentry-symbolicator cache will be rebuilt if not restored.

Trying to figure out what makes sense for a periodic backup with some tolerance for in-flight event loss. Assuing the above are true, seems like sentry-postgres, sentry-data, sentry-clickhouse, and sentry-redis would be sufficient to preserve most data, and only sentry-postgres and sentry-data are needed if restoring to the same config with a clean event slate is desired. True?

@BYK
Copy link
Member

BYK commented May 29, 2020

Only the following two are not correct/accurate:

A backup of sentry-redis holds stats; these aren't essential for a restore, but they can be preserved by restoring the redis volume.

Now the stats are also held in Clickhouse. Redis holds in-flight or pending job data and some other stuff like sessions etc. You may still lose things if you don't restore this but it is likely that they won't be disastrous.

sentry-zookeeper and sentry-kafka only hold data about incoming events; if it's acceptable to lose any in-flight events at the time of backup, these don't need to be backed up.

Kafka is now the main communication pipeline between services so it holds any in-flight data between these. These can be events to be processed, events to be post-processed, event outcomes, and soon session information for release health. None of this data should be terrible to lose but again, you may lose some real data if these are purged.

@srstsavage
Copy link

Perfect, thanks for the quick feedback!

@Apollon77
Copy link

Hey, Does anyone did a "persistent postgresql backup by calling pg_dump or such before backing up" (pot same for redis with a BGSAVE)? and could share the "commands" to be executed when sentry is used/started via docker?

@alexislefebvre
Copy link

alexislefebvre commented Nov 5, 2020

I mounted volumes in directory for easing backups. But it looks like using volumes for sentry-zookeeper and sentry-kafka caused issues. So I got back to default configuration for these containers.

The issue was that the post processors showed errors when following these steps: #478 (comment)

@BYK
Copy link
Member

BYK commented Nov 9, 2020

I mounted volumes in directory for easing backups. But it looks like using volumes for sentry-zookeeper and sentry-kafka caused issues. So I got back to default configuration for these containers.

This is most probably due to some permission or user id conflicts. We'll be having a back-up and restore guide in the coming months.

@luca-rath
Copy link

#364 (comment) Do the containers need to be stopped before creating the backups of the volumes? Or do you think it's fine to keep them running? I'm just asking because I'm a little bit afraid of data corruption if there are new events during the backup process

@mattrobenolt
Copy link
Contributor

To be safe, yeah, they should be stopped.

@luca-rath
Copy link

Is there any good solution to backup clickhouse while the contaimers are running? postgres is no problem

@Apollon77
Copy link

Can anyone tell methe command to do a postgrest backup to another directory... for sentry?

@luca-rath
Copy link

We use docker exec -t $POSTGRES_CONTAINER_NAME pg_dump -c -U postgres postgres | gzip > $BACKUP_PATH

@maxlapshin
Copy link

it is not clear, how to backup all configuration and login information, without exact data.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.