Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Purge old cleartext on my homeserver #2964

Closed
lofidevops opened this issue Mar 9, 2018 · 17 comments
Closed

Purge old cleartext on my homeserver #2964

lofidevops opened this issue Mar 9, 2018 · 17 comments

Comments

@lofidevops
Copy link

User story: I am a system administrator setting up a homeserver for strangers. I want to delete any cleartext messages older than 30 days, so I have limited access to user content.

Is this possible with Synapse? Dendrite?

Related: #2963

@neilisfragile
Copy link
Contributor

You can redact messages via API, though the db will still contain them.

What you probably want is the purge api, which is in master, but not in 0.26. I don't have a release date for 0.27, other than that I confirm the purge api will be part of that release.
https://github.com/matrix-org/synapse/blob/develop/docs/admin_api/purge_history_api.rst

Note, if the server federates, then the room data will live equally across all servers that join the room.

@neilisfragile
Copy link
Contributor

FWIW 0.27.0 release candidate should be out later today

@ukcb
Copy link

ukcb commented Mar 28, 2018

Here is my little private maintains script with postgresql on localhost as an example:

#!/bin/bash

logger "$0 started."

HOMEBASE="http://localhost"
ADMIN="@admin:matrix.example.org"

DBNAME="synapse"

TOKEN=$(sudo -u postgres psql -t -A --dbname="$DBNAME"  --command="SELECT token FROM access_tokens WHERE user_id='$ADMIN' ORDER BY id DESC LIMIT 1;" 2>/dev/null)

TIME='30 days ago'
# # unix timestamp in milliseconds
UNIX_TIMESTAMP=$(date +%s%3N --date='TZ="UTC" '"$TIME")
ROOMS=$(sudo -u postgres psql -t -A --dbname="$DBNAME" --command="SELECT room_id FROM rooms;" 2>/dev/null)

echo "### MATRIX MAINTAINS"
echo "### purge history at $TIME:"

date --date='TZ="UTC" '"$TIME"

for ROOM_NAME in $ROOMS; do
    echo "ROOM_ID: $ROOM_NAME"
    curl --silent --header "Content-Type: application/json" --request POST \
    --data '{"purge_up_to_ts":'$UNIX_TIMESTAMP',"delete_local_events": true}' \
    $HOMEBASE':8008/_matrix/client/r0/admin/purge_history/'$ROOM_NAME'?access_token='$TOKEN
done

echo "### purge media cache:"
curl --silent --request POST $HOMEBASE':8008/_matrix/client/r0/admin/purge_media_cache?before_ts='$UNIX_TIMESTAMP'&access_token='$TOKEN

echo "### list rooms:"
sudo -u postgres psql -t -A --dbname="$DBNAME" --command="SELECT room_id, name FROM room_names;" 2>/dev/null

echo "### done."

logger "$0 stopped."

exit 0

# eof

@makedir
Copy link

makedir commented May 4, 2018

@ukcb And why postgresql? Default config of matrix uses sqlite3. So how would that work?

@ukcb
Copy link

ukcb commented May 5, 2018

Postgresql gives me more options. https://github.com/matrix-org/synapse#using-postgresql

Sorry, I don't use sqlite3 here.

@makedir
Copy link

makedir commented May 5, 2018

@ukcb but does this even work? #2540 says this api does not remove everything, for whatever reason. I dont get it, why there is no way to clear all channel data after x days.

@ukcb
Copy link

ukcb commented May 5, 2018

The script is only as good as the API. I notice for myself that the API does not delete everything (see also #3148 and #3189).

@makedir
Copy link

makedir commented May 5, 2018

@ukcb But how do admins then properly purge old data... it cant be, that they are stored for ever and the server runs low on disk space?

@ukcb
Copy link

ukcb commented May 5, 2018

I hope that it will eventually work completely. Nothing is forever. :-)

@makedir
Copy link

makedir commented May 5, 2018

@ukcb but that doesnt make any sense. Why arent you making a proper script then, which deletes all database entries after timestamp x, doesnt that work? Didnt someone do that already, I looked for a script but couldnt find any, but yours, which just uses the nonsense API.

@ukcb
Copy link

ukcb commented May 6, 2018

@makedir The script is not that important, it should only control the API and not do any direct access to the database. I hope that at some point there will be settings in Matrix that make such scripts superfluous. At the moment, the script does a good job for me, even if it does not delete everything.
Of course, I could delete everything in the database without an API, but that's not the purpose of the script.

@neilisfragile
Copy link
Contributor

We plan to add more administrative functionality to synapse later this year - the idea being that admins can have greater control over data storage etc.

For now purge api is your best bet.

If you are using using your hs for anything other than very light loads, strongly suggest migrating to postgres.

@makedir
Copy link

makedir commented May 9, 2018

@neilisfragile there should be some easy way an admin can access things like that, for example via riot client, if youre an admin, just go into channel settings and click "purge older than 30 days data and media" or auto purge these after 90 days in this channel or something like that.

@ukcb
Copy link

ukcb commented May 9, 2018

@neilisfragile As server admin I would like to have a central purging option for all rooms. I am concerned with avoiding data in the sense of the GDPR

@neilisfragile
Copy link
Contributor

@makedir - we'd probably want to make it an admin for the server itself rather the tie into any given client.

@ukcb - Nods, this is a popular feature request, though we talked in #1941 on why I don't believe it a prerequisite for GDPR. As I say, we'll definitely be working towards improved server admin tooling. If you can't wait that long, PRs always welcome!

@richvdh
Copy link
Member

richvdh commented May 11, 2018

so is this not solved by scripting the purge api?

@rubo77
Copy link
Contributor

rubo77 commented Nov 2, 2018

In the Script in my pull request #1034 the data is really deleted (unless it it is re federated from another Homeserver)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants