You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This requires several improvements, first, non-mmerging support, and then merging support.
First, is simply supporting and allowing Custom WindowFns at all. Specifically, "non-merging" windows. This largely means having something comparable for grouping by keys in Prism.
Goal:
This should allow the Python Validates runner test "test_custom_window_type" to pass.
There are likely some Java side tests that require this as well.
There shouldn't be many changes needed for this part, since it's mostly to allow for an arbitrary byte equality windows, as identical windows should encode the same.
Likely we need to handle that here among other places.
Which is largely a timestamp followed by an arbitrary encoded bytes. We'll need to length prefix that coder sometimes as directed by the runner if the coder isn't standard.
Second allowing for Custom Merging of windows.
Windowing strategies that need this (like sessions, generally, but custom fns in specific) set this field.
Basically, once we've determined the GBK is firing, we first group by all the windows we currently have (in particular, windows that may not be ready to fire yet as well), and then send those to a custom stage that has DataSource, and the MergeWindows transform, and DataSink.
From there we aggregate the data for new merged windows from their constituent unmerged windows for a given key.
Basically we need to produce a bundle for this custom stage whenever we might need one for a given key, and only aggregate for that key when the watermark threshold has passed the given key.
The ability to have a SideCar stages associated with a given for meta processing is very useful, and will pay dividends for the Drain implementation, and for side input mapping.
This requires several improvements, first, non-mmerging support, and then merging support.
Goal:
This should allow the Python Validates runner test "test_custom_window_type" to pass.
beam/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py
Line 1120 in f3e6c66
There are likely some Java side tests that require this as well.
There shouldn't be many changes needed for this part, since it's mostly to allow for an arbitrary byte equality windows, as identical windows should encode the same.
Likely we need to handle that here among other places.
beam/sdks/go/pkg/beam/runners/prism/internal/engine/data.go
Line 72 in 2d6d55b
And implement a reasonable comparable type for
typex.Window
for within prism use.Custom WindowFns have the following Coder:
https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L1025
Which is largely a timestamp followed by an arbitrary encoded bytes. We'll need to length prefix that coder sometimes as directed by the runner if the coder isn't standard.
Windowing strategies that need this (like sessions, generally, but custom fns in specific) set this field.
https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L1130
Testing here will be from the Python Validates runner side, and likely various Java benchmarks and tests.
beam/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py
Line 1105 in f3e6c66
The trick here is that we need to create and send a new SideCar Stage specific for handling the merge information before processing a GBK.
The stage will container the merge windows transform defined here.
https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L313
Whose urn is here in prism:
beam/sdks/go/pkg/beam/runners/prism/internal/urns/urns.go
Line 75 in 2d6d55b
Basically, once we've determined the GBK is firing, we first group by all the windows we currently have (in particular, windows that may not be ready to fire yet as well), and then send those to a custom stage that has DataSource, and the MergeWindows transform, and DataSink.
From there we aggregate the data for new merged windows from their constituent unmerged windows for a given key.
Basically we need to produce a bundle for this custom stage whenever we might need one for a given key, and only aggregate for that key when the watermark threshold has passed the given key.
The ability to have a SideCar stages associated with a given for meta processing is very useful, and will pay dividends for the Drain implementation, and for side input mapping.
https://beam.apache.org/documentation/programming-guide/#session-windows shows that merging windows is per key.
Given the complexity of this, we may split this into a 2nd issue though.
The text was updated successfully, but these errors were encountered: