Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require all audit events to be able to be trimmed #45931

Merged
merged 3 commits into from
Sep 10, 2024

Conversation

capnspacehook
Copy link
Contributor

@capnspacehook capnspacehook commented Aug 27, 2024

We have encountered several audit events that sometimes are too large for a specific backend to process. Historically we've just addressed the offending events on a per-event basis. This PR requires all audit events to implement TrimToMaxSize to trim themselves to prevent this from happening again, and/or making it much easier to deal with large events in the future.

The TrimToMaxSize method has been implemented for all events, however for many events you will notice that the event is returned unchanged. When implementing these methods I only trimmed fields that are generally controlled by user input, or that can easily grow quite large (error messages for example). Many events have no such fields and as such don't attempt to trim themselves. Trimming general event metadata (UserMetadata, ConnectionMetadata, etc) was not in scope but this can be easily added in the future.

The existing code that calls TrimToMaxSize methods have been modified to take into account the fact that all audit events implement TrimToMaxSize and that trimming to a specific size may not be possible.

I recommend you review commit by commit. Most of the additions are TrimToMaxSize methods which are very repetitive and verbose.

Fixes https://github.com/gravitational/teleport-private/issues/1299.
Fixes #36442.

changelog: allow all audit events to be trimmed if necessary

@capnspacehook capnspacehook marked this pull request as ready for review August 27, 2024 20:40
@github-actions github-actions bot added audit-log Issues related to Teleports Audit Log size/xl labels Aug 27, 2024
@public-teleport-github-review-bot

@capnspacehook - this PR will require admin approval to merge due to its size. Consider breaking it up into a series smaller changes.

api/types/events/api.go Outdated Show resolved Hide resolved
lib/events/events_test.go Outdated Show resolved Hide resolved
api/types/events/api.go Outdated Show resolved Hide resolved
@capnspacehook capnspacehook force-pushed the capnspacehook/audit-event-trim-methods branch from a6a2172 to c6dc83b Compare August 29, 2024 17:21
@greedy52
Copy link
Contributor

This implementation does fix #36442 which will prevent the event handler from crashing.

But I don't think this fully addresses https://github.com/gravitational/teleport-private/issues/1299. Big events are now partially logged but the trimmed parts are lost.

FYI, database events are duplicated to session recordings so it's kinda ok that db events are partially logged. But other types will likely lose the trimmed parts forever.

@capnspacehook
Copy link
Contributor Author

@greedy52 trimming isn't ideal, but it should be a relatively rare occurrence. Since only certain fields are trimmed, most fields will still have the original data fully intact. Currently larger events are dropped, so trimming certain fields is better than dropping the whole thing.

When I was discussing this with @r0mant he did mention that it would probably be possible to spit large events into several smaller events, but for now we are just going to trim events because events that are too large are pretty rare and trimming is much simpler to implement.

@greedy52
Copy link
Contributor

greedy52 commented Sep 6, 2024

When I was discussing this with @r0mant he did mention that it would probably be possible to spit large events into several smaller events, but for now we are just going to trim events because events that are too large are pretty rare and trimming is much simpler to implement.

Not opposing to this solution. Just saying this does not fully fix https://github.com/gravitational/teleport-private/issues/1299. We should either leave the issue open, or create a separate issue to track that trimmed parts are missing from audit logs.

api/types/events/api.go Outdated Show resolved Hide resolved
event = v.TrimToMaxSize(MaxProtoMessageSizeBytes)
default:
event = event.TrimToMaxSize(MaxProtoMessageSizeBytes)
if event.Size() > MaxProtoMessageSizeBytes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this still be possible after trimming, or will it be rare? Any way to prevent the event handler from crashing when running into this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will still be possible but should be very rare, as some metadata is not trimmed but is unlikely to be very large. I don't think returning an error here will cause the event handler to crash, an error could be returned here before and I don't believe this caused the event handler to crash.

@capnspacehook capnspacehook force-pushed the capnspacehook/audit-event-trim-methods branch from c6dc83b to 27f6641 Compare September 9, 2024 22:05
@capnspacehook capnspacehook force-pushed the capnspacehook/audit-event-trim-methods branch from 27f6641 to 18bf453 Compare September 10, 2024 19:59
@capnspacehook capnspacehook added this pull request to the merge queue Sep 10, 2024
Merged via the queue into master with commit df91f1a Sep 10, 2024
43 of 44 checks passed
@capnspacehook capnspacehook deleted the capnspacehook/audit-event-trim-methods branch September 10, 2024 20:35
@public-teleport-github-review-bot

@capnspacehook See the table below for backport results.

Branch Result
branch/v14 Failed
branch/v15 Failed
branch/v16 Failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Event handler loop fails on large database events using Athena audit backend
4 participants