Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adding support for data uri sanitisation #31721

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sahil-seth
Copy link

@sahil-seth sahil-seth commented Oct 1, 2024

Changes

This PR aims to address this feature request #31719 where we can now sanitise possible data uris coming in the logs.

Context

Documentation (please check one with an [x])

  • No documentation update is required

How I've tested my work (please select one)

I have verified these changes via:

  • Newly added/modified unit tests, or

lib/logger/utils.ts Outdated Show resolved Hide resolved
@sahil-seth sahil-seth changed the title adding support for data uri sanitisation feat: adding support for data uri sanitisation Oct 1, 2024
@sahil-seth
Copy link
Author

@rarkins should we go ahead with this change?
The workflows seem to be waiting an approval from a maintainer

Copy link
Collaborator

@rarkins rarkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Regex seem too aggressive and will match some unintentional/irrelevant data messages too

@sahil-seth
Copy link
Author

@rarkins Good point, but do you have any examples that you can think of, which might match to the new regex but should not ideally?
Asking cause it might help me optimise the regex quicker.

@rarkins
Copy link
Collaborator

rarkins commented Oct 1, 2024

No, because there are infinite examples possible, so you should provide some examples of where data uri's need redacting and we make the Regex tighter. For example should it occur only at the start of a field?

@sahil-seth
Copy link
Author

Here are some data URI examples which I aimed to redact with this change:

data:application/pkcs8;kid=example-key-123;base64,MIIBIjANBgkqhkiG9w0BAp1s1F3lQfZ9c/GbE=
data:application/jwt;kid=jwt-key-789;base64,eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9
data:application/x-pem-file;kid=pem-cert-202;base64,LS0tLS1CZWdpbiBDRVJUSUZJQ0FURS0tLS0t
data:audio/wav;kid=audio-key-707;base64,UklGRjIAAABXQVZFZm10IBAAAAABAAEAESsAACJWAAACABAAZGF0YYAAAA

Also, upon a quick check with an AI assistant, the regex doesn't seem to match unnecessary strings apart from URLs or data URI like fields.

@sahil-seth
Copy link
Author

For example should it occur only at the start of a field?

Yes, it can. It might be safer to redact the whole string after data cause it takes away the knowledge of what kind of key it is, whether it is jwt or pkcs8 or pem-file along with the key itself.

WDYT?

@rarkins
Copy link
Collaborator

rarkins commented Oct 1, 2024

Please provide a full log example from today (you can redact parts of it yourself). Eg where does this appear today?

@sahil-seth
Copy link
Author

Screenshot 2024-10-01 at 12 28 30 PM

Here is an example

@rarkins
Copy link
Collaborator

rarkins commented Oct 1, 2024

Great, so can we start with the case where we match against ^data: and it's the whole value of the log message? (as in your examples)

@sahil-seth
Copy link
Author

@rarkins I'm not sure If I got you this time, could you please elaborate on the suggestion?

@sahil-seth
Copy link
Author

@rarkins just following up on above message,
do you mean we update the regex to include the strings starting with data: ?

@rarkins
Copy link
Collaborator

rarkins commented Oct 3, 2024

Yes, the regex should be strict to match only where the log content starts and ends with a data URI

@viceice
Copy link
Member

viceice commented Oct 3, 2024

we can even more tighten the regex to also require the content-type after data:

^data:[0-9a-z-]+/[0-9a-z-]+;.+

@sahil-seth
Copy link
Author

sahil-seth commented Oct 4, 2024

@rarkins @viceice
Just to clarify and make sure we are on the same page,

Are we talking about dataUriCredRe OR urlRe when we talk about tightening the regex?

lib/logger/utils.ts Outdated Show resolved Hide resolved
@rarkins
Copy link
Collaborator

rarkins commented Oct 4, 2024

It's better if you revert your changes to urlRe

@sahil-seth
Copy link
Author

sahil-seth commented Oct 4, 2024

@rarkins I've reverted the change to urlRe and added a "if" case for data URIs using the regex suggested above, LMK if this looks fine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants