Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make AlmaLinux.org more privacy friendly #409

Open
codyro opened this issue Dec 4, 2023 · 4 comments
Open

Make AlmaLinux.org more privacy friendly #409

codyro opened this issue Dec 4, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@codyro
Copy link
Member

codyro commented Dec 4, 2023

Change the following Matomo settings to increase the privacy of visitors by restricting & anonymizing the data we collect

Proposed changes:

  • Anonymize Visitors' IP addresses
  • Force tracking without cookies

    Enabling this option will automatically update matomo.js, so it contains some additional code to ensure all trackers won't use cookies. Additionally Matomo will ignore all tracking cookies on server side.

  • Remove query parameters from referrer (ex, visiting from hawkhost.com/some-secret-portal?sid=12983719kawdkjaweq would strip the latter portion)
  • Regularly delete old raw data from the database (proposed: 90 or 180)
  • Delete old aggregated report data (proposed: 90 or 180)
  • Anonymize previously tracked raw data
    • IP Addresses
    • Locations (Re-evaluates the location based on the anonymized IP (at least 2 bytes of the IP will be anonymized))

Cookie consent

While they're annoying, we should display a cookie consent banner if we are setting any cookies (EX, Matomo [1]). We should consider if we want this to be a global change or only be displayed for individuals in countries with GDPR, etc).

We can utilize some of Matomos' built-in functionality for this, as well as instructions using other consent managers: https://developer.matomo.org/guides/tracking-consent.

It is worth noting that if we implement the anonymization changes proposed above, French visitors don't need to be offered a tracking consent popup: https://matomo.org/faq/how-to/how-do-i-configure-matomo-without-tracking-consent-for-french-visitors-cnil-exemption/

[1] We need to verify what cookies are being set. We need to re-evaluate this if we opt to set Force tracking without cookies

Create a privacy policy & tell users exactly what data is collected and how it is utilized

The scope of this is beyond just AlmaLinux.org, but this is a good starting point. Here is a list of data Matomo collects by default (which would be far less if we implement the changes above).

https://developer.matomo.org/guides/tracking-consent

@codyro codyro self-assigned this Dec 4, 2023
@codyro
Copy link
Member Author

codyro commented Dec 4, 2023

cc @bennyvasquez @jonathanspw

@jonathanspw
Copy link
Member

Anonymize Visitors' IP addresses

This would prevent us from turning the data into a geographic heatmap would it not? I don't think we should do this. This data is relevant for things like determining what areas we may need more mirrors, where we should focus translation efforts to serve as many users as possible, and even so far as things like what local laws should we be aware of and try to respect (with GDPR/cookie consent being the whole start of this GH issue).

Force tracking without cookies

I'm all for this. Cookie popups are stupidly annoying so if we can avoid them by avoiding cookies in the first place all the better. If there's a perceived improvement in privacy from it then it's a further win.

Remove query parameters from referrer (ex, visiting from hawkhost.com/some-secret-portal?sid=12983719kawdkjaweq would strip the latter portion)

Seems fine to me.

Regularly delete old raw data from the database (proposed: 90 or 180)

What is the raw data you're referring to? Does Matomo log the raw requests?

Delete old aggregated report data (proposed: 90 or 180)

This seems counterproductive and I don't think it should be deleted.

Anonymize previously tracked raw data

See the first response about this above.

Create a privacy policy & tell users exactly what data is collected and how it is utilized

100% agree, of course.

@codyro
Copy link
Member Author

codyro commented Dec 4, 2023

Anonymize Visitors' IP addresses

This would prevent us from turning the data into a geographic heatmap would it not? I don't think we should do this. This data is relevant for things like determining what areas we may need more mirrors, where we should focus translation efforts to serve as many users as possible, and even so far as things like what local laws should we be aware of and try to respect (with GDPR/cookie consent being the whole start of this GH issue).

I would need to double-check, but I don't believe so. It would anonymize a few bytes of the IP and use the location on the anonymized IP instead of the raw IP, which should be sufficient for those needs.

Suppose you're referring to analytics/logs from things like our mirrors. In that case, there will be a separate issue in the appropriate repository once we kickstart the privacy policy to hash that out.

Locations (Re-evaluates the location based on the anonymized IP (at least 2 bytes of the IP will be anonymized))

Regularly delete old raw data from the database (proposed: 90 or 180)

What is the raw data you're referring to? Does Matomo log the raw requests?

https://matomo.org/guide/apis/raw-data/
https://matomo.org/faq/new-to-piwik/what-is-the-difference-between-raw-data-and-report-data/

Delete old aggregated report data (proposed: 90 or 180)

This seems counterproductive and I don't think it should be deleted.

Counterproductive to what? This issue aims to increase privacy & transparency on the data we collect and how long we collect it for.

Is this data something we need to hang onto forever? Is there a point where it becomes less useful for marketing purposes (EX, a year, two years, etc)? If the aggregated reports are based of the anonymized data (after we make some changes), it's less important. It's worth discussing further.

@bennyvasquez
Copy link
Sponsor Member

So, to clarify one thing, we do actually have a privacy policy, but it's in strong need of updating.

Anonymize Visitors' IP addresses

This would prevent us from turning the data into a geographic heatmap would it not? I don't think we should do this. This data is relevant for things like determining what areas we may need more mirrors, where we should focus translation efforts to serve as many users as possible, and even so far as things like what local laws should we be aware of and try to respect (with GDPR/cookie consent being the whole start of this GH issue).

I would need to double-check, but I don't believe so. It would anonymize a few bytes of the IP and use the location on the anonymized IP instead of the raw IP, which should be sufficient for those needs.

As long as we can still get the information we need to serve our community out of it, I'm fine with anonymizing. ie: does is shift from "somewhere in Germany to somewhere in Europe"? or does it shift from "Somewhere in Berlin to somewhere in Germany"? The former is too much loss IMO. The later would be fine.

Regularly delete old raw data from the database (proposed: 90 or 180)

What is the raw data you're referring to? Does Matomo log the raw requests?

https://matomo.org/guide/apis/raw-data/ https://matomo.org/faq/new-to-piwik/what-is-the-difference-between-raw-data-and-report-data/

I'm fine with deleting raw data. I'd say 180 to be safe for now, and then we discuss reducing it in a year, but that's because data once deleted can't be recovered.

Delete old aggregated report data (proposed: 90 or 180)

This seems counterproductive and I don't think it should be deleted.

Counterproductive to what? This issue aims to increase privacy & transparency on the data we collect and how long we collect it for.

Is this data something we need to hang onto forever? Is there a point where it becomes less useful for marketing purposes (EX, a year, two years, etc)? If the aggregated reports are based of the anonymized data (after we make some changes), it's less important. It's worth discussing further.

If it's already been anonymized I'm not sure how it would be increasing privacy to delete aggregated reports. Even if we opt to not anonymize the collected data, aggregated reports are intentionally abstracted already.

I'd say we revisit this and the cookie consent discussion once we've decided and tested what adjusting the original settings would ultimately do.

So, to restate, I think this is where we should go from here:

Adjust the Matomo settings thusly:

Proposed changes:

Anonymize Visitors' IP addresses

  • Force tracking without cookies
  • Remove query parameters from referrer (ex, visiting from hawkhost.com/some-secret-portal?sid=12983719kawdkjaweq would strip the latter portion)
  • retain raw data for 180 days

IP Addresses

  • Locations (Re-evaluates the location based on the anonymized IP (at least 2 bytes of the IP will be anonymized)) (after we confirm that we're okay with the changes)

Privacy policy

  • updates will include things outside this change, but should include caveats for the concerns in these updates.

@bennyvasquez bennyvasquez added the enhancement New feature or request label Apr 2, 2024
@bennyvasquez bennyvasquez self-assigned this Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants