Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flink Go XVR tests fail on TestXLang_...: Insufficient number of network buffers #21094

Open
damccorm opened this issue Jun 4, 2022 · 5 comments

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 4, 2022

When running the cross-language test suites () Flink fails on TestXLang_Multi with the following error:


19:29:14 2021/08/27 02:29:14  (): java.io.IOException: Insufficient number of network buffers: required
17, but only 16 available. The total number of network buffers is currently set to 2048 of 32768 bytes
each. You can increase this number by setting the configuration keys 'taskmanager.memory.network.fraction',
'taskmanager.memory.network.min', and 'taskmanager.memory.network.max'.
19:29:14 2021/08/27 02:29:14
Job state: FAILED
19:29:14 --- FAIL: TestXLang_Multi (6.26s)

This doesn't seem to be a parallelism problem (go test is run with "-p 1" as expected) and is only happening on this specific test.

Imported from Jira BEAM-12815. Original Jira may contain additional context.
Reported by: danoliveira.

@damccorm
Copy link
Contributor Author

damccorm commented Oct 5, 2022

This hasn't happened for a long time

@damccorm damccorm closed this as completed Oct 5, 2022
@kileys kileys added the done & done Issue has been reviewed after it was closed for verification, followups, etc. label Oct 25, 2022
@Abacn
Copy link
Contributor

Abacn commented Nov 16, 2022

It happens again after switched to use Flink 1.15 for testing and being permared:

Task :runners:flink:1.15:job-server:validatesCrossLanguageRunnerGoUsingJava FAILED

05:30:04 Caused by: java.io.IOException: Insufficient number of network buffers: required 17, but only 0 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.memory.network.fraction', 'taskmanager.memory.network.min', and 'taskmanager.memory.network.max'.
05:30:04 	at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.internalCreateBufferPool(NetworkBufferPool.java:483)
05:30:04 	at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:461)
05:30:04 	at org.apache.flink.runtime.io.network.partition.ResultPartitionFactory.lambda$createBufferPoolFactory$0(ResultPartitionFactory.java:279)
05:30:04 	at org.apache.flink.runtime.io.network.partition.ResultPartition.setup(ResultPartition.java:160)
05:30:04 	at org.apache.flink.runtime.io.network.partition.BufferWritingResultPartition.setup(BufferWritingResultPartition.java:97)
05:30:04 	at org.apache.flink.runtime.taskmanager.Task.setupPartitionsAndGates(Task.java:959)
05:30:04 	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:652)
05:30:04 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
05:30:04 	at java.lang.Thread.run(Thread.java:750)
05:30:04 2022/11/16 10:30:03  (): java.io.IOException: Insufficient number of network buffers: required 17, but only 0 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.memory.network.fraction', 'taskmanager.memory.network.min', and 'taskmanager.memory.network.max'.
05:30:04 2022/11/16 10:30:03 Job state: FAILED
05:30:04     ptest.go:108: Failed to execute job: job go0testxlang0partition0717-jenkins-1116102953-b57d468_c2e367a3-5ff1-4ff6-a475-1faea47d2c99 failed
05:30:04 --- FAIL: TestXLang_Partition (13.36s)
05:30:04 FAIL
05:30:04 FAIL	github.com/apache/beam/sdks/v2/go/test/integration/xlang	82.614s

@Abacn Abacn reopened this Nov 16, 2022
@Abacn Abacn added P2 and removed P3 done & done Issue has been reviewed after it was closed for verification, followups, etc. labels Nov 16, 2022
@Abacn
Copy link
Contributor

Abacn commented Nov 16, 2022

Note that TestXLang_Multi is skipped that is why it was not seen

05:29:50 === RUN   TestXLang_Multi
05:29:50     integration.go:309: Test TestXLang_Multi is currently filtered for runner flink
05:29:50 --- SKIP: TestXLang_Multi (0.00s)

@lukecwik
Copy link
Member

It fails on other flink XLang tests:

02:32:22 Caused by: java.io.IOException: Insufficient number of network buffers: required 16, but only 10 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.memory.network.fraction', 'taskmanager.memory.network.min', and 'taskmanager.memory.network.max'.
02:32:22 	at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.internalCreateBufferPool(NetworkBufferPool.java:483)
02:32:22 	at org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.createBufferPool(NetworkBufferPool.java:461)
02:32:22 	at org.apache.flink.runtime.io.network.partition.ResultPartitionFactory.lambda$createBufferPoolFactory$0(ResultPartitionFactory.java:279)
02:32:22 	at org.apache.flink.runtime.io.network.partition.ResultPartition.setup(ResultPartition.java:160)
02:32:22 	at org.apache.flink.runtime.io.network.partition.SortMergeResultPartition.setup(SortMergeResultPartition.java:190)
02:32:22 	at org.apache.flink.runtime.taskmanager.Task.setupPartitionsAndGates(Task.java:959)
02:32:22 	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:652)
02:32:22 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
02:32:22 	at java.lang.Thread.run(Thread.java:750)
02:32:22 2022/11/15 10:32:22  (): java.io.IOException: Insufficient number of network buffers: required 16, but only 10 available. The total number of network buffers is currently set to 2048 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.memory.network.fraction', 'taskmanager.memory.network.min', and 'taskmanager.memory.network.max'.
02:32:22 2022/11/15 10:32:22 Job state: FAILED
02:32:22     ptest.go:108: Failed to execute job: job go0testxlang0partition0705-jenkins-1115103209-155225b7_dbefca6f-b4ba-44ad-8304-8266dad6660c failed
02:32:22 --- FAIL: TestXLang_Partition (16.33s)
02:32:22 FAIL
02:32:22 FAIL	github.com/apache/beam/sdks/v2/go/test/integration/xlang	92.103s

@lukecwik lukecwik changed the title Flink Go XVR tests fail on TestXLang_Multi: Insufficient number of network buffers Flink Go XVR tests fail on TestXLang_...: Insufficient number of network buffers Nov 16, 2022
@damccorm
Copy link
Contributor Author

This should be generally fixed by #24228 - there's a few cleanup items left though so I'll leave this open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants