Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12164]: Add SDF for reading change stream records #16514

Merged
merged 1 commit into from
Jan 21, 2022
Merged

[BEAM-12164]: Add SDF for reading change stream records #16514

merged 1 commit into from
Jan 21, 2022

Conversation

thiagotnunes
Copy link
Contributor

@thiagotnunes thiagotnunes commented Jan 14, 2022

Adds ReadChangeStreamPartitionDoFn, which is an SDF to read partitions from change streams and process them accordingly. This component receives a change stream name, a partition, a start time and an end time to query. It then initiates a change stream query with the received parameters.

Within a change stream, 3 types of records can be received:

  1. A Data record
  2. A Heartbeat record
  3. A Child partitions record

Upon receiving (1), the function updates the watermark with the record's commit timestamp and emits the record into the output PCollection.
Upon receiving (2), the function updates the watermark with the record's timestamp, but it does not emit any record into the PCollection.
Upon receiving (3), the function updates the watermark with the record's timestamp and writes the new child partitions into the metadata table. These partitions will be later scheduled by the DetectNewPartitions component.

Once the change stream query for the element partition finishes, it marks the partition as finished in the metadata table and terminates.

Adds ReadChangeStreamPartitionDoFn, which is an SDF to read partitions
from change streams and process them accordingly. This component
receives a change stream name, a partition, a start time and an end time
to query. It then initiates a change stream query with the received
parameters.

Within a change stream, 3 types of records can be received:

1. A Data record
2. A Heartbeat record
3. A Child partitions record

Upon receiving #1, the function updates the watermark with the record's
commit timestamp and emits the record into the output PCollection.
Upon receiving #2, the function updates the watermark with the record's
timestamp, but it does not emit any record into the PCollection.
Upon receiving #3, the function updates the watermark with the record's
timestamp and writes the new child partitions into the metadata table.
These partitions will be later scheduled by the DetectNewPartitions
component.

Once the change stream query for the element partition finishes, it
marks the partition as finished in the metadata table and terminates.
@thiagotnunes
Copy link
Contributor Author

retest this please

@thiagotnunes
Copy link
Contributor Author

thiagotnunes commented Jan 14, 2022

R: @pabloem

@pabloem
Copy link
Member

pabloem commented Jan 21, 2022

ah this PR is surprisingly easy to follow. I think it makes sense to me. LGTM

@pabloem pabloem merged commit f43789a into apache:master Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants