-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Windowed Streaming OnTimer State Wiped #32599
Comments
A related note for Prism Runner for testing (don't see a tag for prism runner would have added otherwise) donothing_test.gopackage poc
import (
"testing"
"github.com/apache/beam/sdks/v2/go/pkg/beam"
"github.com/apache/beam/sdks/v2/go/pkg/beam/testing/passert"
"github.com/apache/beam/sdks/v2/go/pkg/beam/testing/ptest"
)
func TestDoNothingTransform(t *testing.T) {
beam.Init()
pipeline, scope, col := ptest.CreateList([]string{"foo"})
col = beam.AddFixedKey(scope, col)
col = DoNothingTransform(scope, col)
// If I comment out this passert all retries don't occur.
passert.EqualsList(scope, col, []string{"foo"})
ptest.RunAndValidate(t, pipeline)
}
func TestMain(m *testing.M) {
ptest.MainWithDefault(m, "prism")
} I am seeing similar behavior where logs show all retries don't execute if I don't include the passert clause. With passert
Without passert
|
If I understand correctly the issue is:
That does sound like an issue with Dataflow. As for Prism, this does seem like an edge case even with the Real Time execution or not. But it's not related to the windowing / or not windowing technically. I'd recommend filling it as a separate issue, even though it uses the same-ish, pipeline. Some notes:
Technically this seems to be around looping timers, along with ProcessingTime weirdness. |
What happened?
OnTimer
for processing time on streaming dataflow runner has its state wiped after indeterminate number of retries when windowing applied on pcollection.I believe compatibility matrix for streaming dataflow says this should work.
I am adding an unbounded source to pipeline separate from this pardo to get dataflow to launch pipeline as streaming.
using
github.com/apache/beam/sdks/v2 v2.59.0
donothing.go
main.go
Dataflow Logs
ps: newest log at top
With windowing
Without windowing
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: