Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with DLP #182

Open
raserva opened this issue Mar 2, 2022 · 4 comments
Open

Integrate with DLP #182

raserva opened this issue Mar 2, 2022 · 4 comments
Labels
enhancement New feature or request iceboxed proposal Something might be useful and should be discussed

Comments

@raserva
Copy link
Contributor

raserva commented Mar 2, 2022

Add integration with Cloud Data-Loss Prevention (DLP). This will allow us to redact sensitive information from requests & responses in our audit logs. POC here: #176

@raserva raserva added enhancement New feature or request proposal Something might be useful and should be discussed labels Mar 2, 2022
@sethvargo
Copy link
Contributor

I looked through the POC. A few thoughts:

  • Having DLP in-the-loop for each incoming audit entry feels expensive. Are we sure this is the right architecture?
  • How much of this will be user-configurable?
  • I'm curious about the decision to scrub instead of reject. If I can convince DLP that my email address is PII (hint: it's pretty easy), I could get away with bad things now.

@raserva
Copy link
Contributor Author

raserva commented Mar 3, 2022

Are we sure this is the right architecture?

Nope. That was a POC, not intended to be a a final solution. I wanted to see
A) how easy it was to integrate
B) that it could be done in real time

Solutions we've discussed up to this point have generally done in real-time. It is cheaper to do an asynchronous and non real time solution, however we thought the additional complexity unnecessary given our low TPS

How much of this will be user-configurable?

Undecided. Part of this would likely be to create a design, and whether we want to have a standard configuration for redaction, or allow users to create their own redaction configs. Other solutions in the area (Dryad) use a standard configuration for all.

I'm curious about the decision to scrub instead of reject

It only scrubs the request and response values, NOT the principal or other fields in the audit log. So your email would still be associated with whatever data you access, its just the accessed data that would be redacted. You can see this is the example provided in the POC:

{
    ...
      },
      "request": {
        "message": "[EMAIL_ADDRESS] [DATE]",
        "target": "3c04c892-c532-4f70-a27d-52d6b5cc3ec5",
      },
      "method_name": "abcxyz.test.Talker/Hello",
      "authentication_info": {
        "principal_email": "rsrv@tycho.joonix.net"
      },
    },
    ...
},

Originally:

{
    ...
      },
      "request": {
        "message": "me@example.com 3/4/2020",
        "target": "3c04c892-c532-4f70-a27d-52d6b5cc3ec5",
      },
      "method_name": "abcxyz.test.Talker/Hello",
      "authentication_info": {
        "principal_email": "rsrv@tycho.joonix.net"
      },
    },
    ...
},

@sethvargo
Copy link
Contributor

Thanks for the reply. That all makes sense. I think the "DLP in the path of the request" is still a concern we should discuss more. It impacts our maximum availability and we'd need to look into quota and QPS bits.

@yolocs
Copy link
Contributor

yolocs commented Mar 5, 2022

The meta point: I meant DLP integration to be a "quick" feature that adds potential value to lumberjack. There isn't any clear requirement for it. So if it's going to complicate the lumberjack architecture by a lot (which is likely the case with async DLP processing), we should table the idea.

Having DLP in-the-loop for each incoming audit entry feels expensive. Are we sure this is the right architecture?

Considering the amount of audit logs won't be on the same scale as debug logs, I think the higher cost should be acceptable. Plus, this is meant to be an optional feature. E.g. an org could require product teams to not log req/resp (we have knob for that) if sensitive data is expected there.

How much of this will be user-configurable?

We will minimally have:

  • toggle on/off DLP integration
  • choose a default DLP config to use
  • best-effort DLP (toggle), meaning if DLP check fails, ignore the error and continue the audit logging

I'm curious about the decision to scrub instead of reject. If I can convince DLP that my email address is PII (hint: it's pretty easy), I could get away with bad things now.

scrub vs. reject - could be a global config (in addition to the ones above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request iceboxed proposal Something might be useful and should be discussed
Development

No branches or pull requests

3 participants