[BEAM-12164]: Add SDF for reading change stream records #16514
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds ReadChangeStreamPartitionDoFn, which is an SDF to read partitions from change streams and process them accordingly. This component receives a change stream name, a partition, a start time and an end time to query. It then initiates a change stream query with the received parameters.
Within a change stream, 3 types of records can be received:
Upon receiving (1), the function updates the watermark with the record's commit timestamp and emits the record into the output PCollection.
Upon receiving (2), the function updates the watermark with the record's timestamp, but it does not emit any record into the PCollection.
Upon receiving (3), the function updates the watermark with the record's timestamp and writes the new child partitions into the metadata table. These partitions will be later scheduled by the DetectNewPartitions component.
Once the change stream query for the element partition finishes, it marks the partition as finished in the metadata table and terminates.