You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As highlighted by #17, some services may integrate events into their log streams at unknown and unpredictable intervals. This is not an issue if all log entries are delayed by a consistent amount of time, however, this is a challenge when only a subset of these log entries are integrated into the log stream after this unpredictable delay.
If a single pointer is used to track the status of collected logs, delayed log events may result in dropped or missed data.
In order to mitigate this issue for vendors where this is known to be an issue - such as Google Workspaces, and GitHub - a lag parameter (#20) was previously added into Grove. This parameter delays the collection of all logs by the configured number of minutes, allowing the vendor's log stream the opportunity to become consistent. However, the use of lag results in logs which are available immediately being delayed to account for the slowest log entries. In the case of Google Workspaces, this may be in the realm of hours for login events.
Proposal
In order to ensure that logs are collected when they are available, this feature request is to implement a retrospection feature in Grove.
This feature will allow periodic retrospection of collected logs, and deduplication of collected log events which have already been collected. This allows for collection to be performed aggressively, resulting in logs which are made available immediately being collected as soon as possible, while allowing for "slow" log entries to be collected rather than missed.
Retrospection will be implemented as a generic feature which can be turned on or off for a given connector as required. This is to ensure that future vendors with these constraints can be handled consistently, and without the need for special "once-off" treatment.
Considerations
Deduplication will need to be performed on a per log entry basis. This will increase the amount of data stored in cache, and the volume of read / writes to the cache.
The text was updated successfully, but these errors were encountered:
Overview
As highlighted by #17, some services may integrate events into their log streams at unknown and unpredictable intervals. This is not an issue if all log entries are delayed by a consistent amount of time, however, this is a challenge when only a subset of these log entries are integrated into the log stream after this unpredictable delay.
If a single pointer is used to track the status of collected logs, delayed log events may result in dropped or missed data.
In order to mitigate this issue for vendors where this is known to be an issue - such as Google Workspaces, and GitHub - a
lag
parameter (#20) was previously added into Grove. This parameter delays the collection of all logs by the configured number of minutes, allowing the vendor's log stream the opportunity to become consistent. However, the use oflag
results in logs which are available immediately being delayed to account for the slowest log entries. In the case of Google Workspaces, this may be in the realm of hours for login events.Proposal
In order to ensure that logs are collected when they are available, this feature request is to implement a retrospection feature in Grove.
This feature will allow periodic retrospection of collected logs, and deduplication of collected log events which have already been collected. This allows for collection to be performed aggressively, resulting in logs which are made available immediately being collected as soon as possible, while allowing for "slow" log entries to be collected rather than missed.
Retrospection will be implemented as a generic feature which can be turned on or off for a given connector as required. This is to ensure that future vendors with these constraints can be handled consistently, and without the need for special "once-off" treatment.
Considerations
Deduplication will need to be performed on a per log entry basis. This will increase the amount of data stored in cache, and the volume of read / writes to the cache.
The text was updated successfully, but these errors were encountered: