-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeleteRecords can crash on empty partial response #3476
Labels
Milestone
Comments
Hi there, I've met similar crashes as in the coredump. I saw the pr is about ready. So is there a timeline when to merge the fix? Thank you. |
Great analysis, Chris! |
edenhill
added a commit
that referenced
this issue
Aug 18, 2021
edenhill
added a commit
that referenced
this issue
Aug 24, 2021
edenhill
added a commit
that referenced
this issue
Aug 25, 2021
jacobpath
pushed a commit
to pathccm/librdkafka
that referenced
this issue
Jun 6, 2023
parent d2bc749 author garrett528 <andrew.garrett@compass.com> 1625669334 -0400 committer Jacob Lee <jacob.lee@pathccm.com> 1686019633 -0700 gpgsig -----BEGIN SSH SIGNATURE----- U1NIU0lHAAAAAQAAAH8AAAAic2stZWNkc2Etc2hhMi1uaXN0cDI1NkBvcGVuc3NoLmNvbQ AAAAhuaXN0cDI1NgAAAEEE/KxKuQeycJHYJkNEqsJPsQqQxVl1ftFETXL0PMawe+tBCMrH AiNd2GpQHEKTqzopO72+yiqWDjpM10WrTyzXBAAAAARzc2g6AAAAA2dpdAAAAAAAAAAGc2 hhNTEyAAAAeAAAACJzay1lY2RzYS1zaGEyLW5pc3RwMjU2QG9wZW5zc2guY29tAAAASQAA ACBhH8xrzkQR+w6xy86JjJ6tC6udVA0Xn4VgdX3YEEZ25QAAACEA//YouC+q94g0jxjA8D tL+R+SGXR8782VjNc2vO5hS6YBAAAfoQ== -----END SSH SIGNATURE----- sasl: Enable AWS_MSK_IAM SASL mechanism (confluentinc#3402) AWS_MSK_IAM is a new SASL mechanism for authenticating clients to AWS MSK Kafka clusters and use IAM-based controls to set Kafka ACLs and permissions. This change provides support to allow clients to pass AWS credentials at runtime which is used to build the SASL payload and authenticate clients to IAM enabled MSK clusters. It adds a new SASL mechanism, AWS_MSK_IAM, as well as configuration options to set the following: * AWS access key id * AWS secret access key * AWS region * AWS security token The SASL handshake requires a specific payload that is described here: https://github.com/aws/aws-msk-iam-auth Add curl to doozer build Address comments (UrbanCompass#5) Reduce Travis-CI runtime * Reduce number of jobs when not building a tag * Run unit tests if no tag, and local quick suite (old default) when tagged. * Combine some jobs. Travis ARM64: build static lib Travis: Disable C99 for all builds but the integration test build .. since it hampers the use of assembler (asm()) on arm64. Keep session alive when receiving heartbeat responses during rebalancing add changelog message Update Changelog Add cleanup-s3.py script Move Admin request arguments to result op to make them available on merge (confluentinc#3476) Fix test 0055 now when flush() does not wait for linger.ms Adds support for buildling on illumos mklove: Use curl for module downloads .. instead of wget, since we rely on curl elsewhere. Verify checksum of source dependencies and bump to OpenSSL 1.1.1l, zstd 1.5.0 Travis: login with docker account to avoid rate-limiting Docker dotnet images have changed names, updated. rxidle and txidle were stats emitted as unsigned 64, now signed (confluentinc#3519) Fix a small error due to the unreleased lock before program exit Fix a small error due to the unreleased lock skm->lock before program exit. mklove: make zlib test program compilable The test program that is used at compile-time to detect whether zlib is available fails to compile due to `NULL` being undefined: ``` _mkltmpyos55w.c:5:20: error: use of undeclared identifier 'NULL' z_stream *p = NULL; ^ 1 error generated. ``` This means that zlib availability is only automatically detected when using pkg-config. Import `stddef.h` (which defines `NULL`) in the test program, allowing zlib to be automatically detected via a compilation check. sasl: Enable STS credential refresh (UrbanCompass#7) Define IOV_MAX as 1024 if not defined Removed check int and added debug Fixes error handling for error responses from STS (UrbanCompass#10) mklove: make zlib test program compilable The test program that is used at compile-time to detect whether zlib is available fails to compile due to `NULL` being undefined: ``` _mkltmpyos55w.c:5:20: error: use of undeclared identifier 'NULL' z_stream *p = NULL; ^ 1 error generated. ``` This means that zlib availability is only automatically detected when using pkg-config. Import `stddef.h` (which defines `NULL`) in the test program, allowing zlib to be automatically detected via a compilation check. Travis: New secure env vars AppVeyor: rotate access keys Travis: show sha256sums of artifacts prior to deploy Add MSVC 140 runtimes (for packaging) Add 'ssl.ca.pem' property (confluentinc#2380) Improve nuget release script - Verify artifact file contents and architectures. - Verify that artifact attributes match. - Get README, CONFIG,.. etc, from artifacts instead of local source tree (which may not match the released version). Bump to version 1.8.2 (Skipping 1.8.1 due to dotnet release with that number) mklove: fix static bundle .a generation on osx mklove: portable checksum checking for downloads mklove: allow --source-deps-only OpenSSL builds on OSX Don't build ancient OSX Sierra artifacts Travis: reduce build minutes (tagged jobs) Travis: use --source-deps-only for dependencies instead of using homebrew Homebrew is fantastically slow to update to Travis-CI, and it is burning build credits like crazy. mklove: added mklove_patch mklove: show more of failed build logs mklove openssl installer: workaround build issue in 1.1.1l on osx. Apply OpenSSL PR 16409 patch to fix 1.1.1l build issues on OSX Travis: Remove -Werror from OSX worker since OpenSSL builds have quite a few warnings mklove: try both wget and curl for archive downloads Don't overwrite ssl.ca.location on OSX (confluentinc#3566) Travis: bump Linux base builder from trusty to xenial to circumvent ISRG cert expiry .. which causes older versions of OpenSSL+curl to fail to download OpenSSL.. AddOffsetsToTxn Refresh errors did not trigger coord refresh (confluentinc#3571) Ensure timers are started even if timeout is 0 Transactional producer: Fix possible message loss on OUT_OF_ORDER_SEQ error (confluentinc#3575) Mock push_request_errors() appended the errors in reverse order Update list of supported KIPs Add rd_buf_new() Import cJSON v1.7.14 URL: https://github.com/DaveGamble/cJSON Tag: v1.7.14 SHA: d2735278ed1c2e4556f53a7a782063b31331dbf7 Added HTTP(S) client using cURL Add HTTP(S) client using cURL Fix uninitialized warning on msvc Remove commented-out printfs Remove stray license include in librdkafka vcxproj librdkafka.vcxproj: remove stale OpenSSL paths and enable Vcpkg manifests mklove: but all built deps in the same destdir and set up compiler flags accordingly This fixes some issues when dependency B depends on dependency A, in this case for libcurl that depends on OpenSSL, to make it find the OpenSSL libraries, pkg-config files, etc. mklove: don't include STATIC_LIB_..s in BUILT_WITH mklove: Some autoconf versions seem to need a full path to $INSTALL curl: disable everything but HTTP(S) Added string splitter and kv splitter OAuth/OIDC: Add fields to client configuration (confluentinc#3510) Implement native Win32 IO/Queue scheduler (WSAWaitForMultipleEvents) This removes the internal loopback connections (one per known broker) that were previously used to trigger io-based queue wakeups. Add vcpkg_installed to gitignore Left-trim spaces from string configuration values This makes it easier to use Bash on Windows where a prefixing / is translated into the MinGW32 file system root. Mark rd_kafka_conf_kv_split as unused .. until it's used. rd_kafka_queue_get_background() now creates the background thread Added custom SASL callback queue Fix test flags for 0122 and 0126 Test 0119: remove unused code Direct questions to the github discussions forum to keep issue load down Add clang-format style checking and fixing Add Python style checking and fixing Run style-checker with Github Actions Automatic style fixes using 'make style-fix' Manual style fixes of Python code Avoid use of FILE* BIOs to circumvent OpenSSL_Applink requirement on Windows (confluentinc#3554) Added README for fork (UrbanCompass#15) merge upstream 2022 04 08 (UrbanCompass#17) * Fix memory leak in admin requests Fix a memory leak introduces in ca1b30e in which the arguments to an admin request were not being freed. Discovered by the test suite for rust-rdkafka [0]. [0]: https://github.com/fede1024/rust-rdkafka/pull/397/checks?check_run_id=3914902373 * Fix MinGW Travis build issues by breaking test execution into a separate script * ACL Admin Apis: CreateAcls, DescribeAcls, DeleteAcls * Minor ACL API adjustments and some small code tweaks * Add ACL support to CHANGELOG * Retrieve jwt token from token provider (@jliunyu, confluentinc#3560) * Fixed typo * MsgSets with just aborted msgs raised a MSG_SIZE error, and fix backoff (confluentinc#2993) This also removes fetch backoffs on underflows (truncated responses). * test 0129: style fix * test 0105: Fix race condition * Idempotent producer: save state for removed partitions .. in case they come back. To avoid silent message loss. * Remove incorrect comment on mock API * Fix rkbuf_rkb assert on malformed JoinGroupResponse.metadata * clusterid() would fail if there were no topics in metadata (confluentinc#3620) * sasl.oauthbearer.extensions should be optional Fixes confluentinc/confluent-kafka-python#1269. * Added AK 3.1.0 to test versions * Changelog updates * Bump version to v1.9.0 * sasl.oauthbearer.scope should be optional According to the section 4.4.2 of RFC 6749, the scope is optional in the access token request in client credentials flow. And indeed, for OIDC providers that I find in the wild such as Amazon Cognito, the scope _is_ optional. If the scope is omitted from the request, then the returned access token will contain any and all scope(s) that are configured for the client. See https://datatracker.ietf.org/doc/html/rfc6749#section-4.4.2 * Fix hang in list_groups() when cluster is unavailable (confluentinc#3705) This was caused by holding on to an old broker state version that got outdated and caused an infinite loop, rather than a timeout. * Style fixes * Integration test for OIDC (confluentinc#3646) * Test for trivup * integration test * Update code style for existing code at rdkafka_sasl_oauthbearer_oidc.c * Handle review comment * tiny fix * Handle review comments * misc.c style fix * Test fixes: OIDC requires AK 3.1, not 3.0 * Test 0113: reset security.protocol when using mock cluster * Travis: use Py 3.8 (not 3.5) on Xenial builder * Travis: bump integration test from AK 2.7.0 to 2.8.1 * Fix README release wording * Improve subscribe() error documentation * Fix linger.ms/message.timeout.ms config checking (confluentinc#3709) * Replace deprecated zookeeper flag with bootstrap (@ladislavmacoun, confluentinc#3700) * Replace deprecated zookeeper flag with bootstrap Fixes: confluentinc#3699 Signed-off-by: Ladislav Macoun <ladislavmacoun@gmail.com> * Add backwards compatibility Signed-off-by: Ladislav Macoun <ladislavmacoun@gmail.com> * Add assertion for cmd fitting inside buffer Signed-off-by: Ladislav Macoun <ladislavmacoun@gmail.com> * Increase command buffer Signed-off-by: Ladislav Macoun <ladislavmacoun@gmail.com> * Save one superfluous message timeout toppar scan * Update to fedora:35 to fix the CentOS 8 build mock epel-8-x86_64 is now broken in fedora:33: https://bugzilla.redhat.com/show_bug.cgi?id=2049024 Update to fedora:35 with mock configs: centos+epel-7-x86_64 centos-stream+epel-8-x86_64 * Add link to tutorial on Confluent Developer Also fix indenting of bullet list * Grooming (compilation warnings, potential issues) Signed-off-by: Sergio Arroutbi <sarroutb@redhat.com> * fix: acl binding enum checks (@emasab, confluentinc#3741) * checking enums values when creating or reading AclBinding and AclBindingFilter * AclBinding destroy array function * acl binding unit tests * warnings and fix for unknown enums, test fixes * int sizes matching the read size * pointer to the correct broker * cmake: Use CMAKE_INSTALL_LIBDIR this ensures that it is portable across platforms e.g. ppc64/linux uses lib64 not lib Signed-off-by: Khem Raj <raj.khem@gmail.com> * Trigger op callbacks regardless for unhandled types in consume_batch_queue() et.al. (confluentinc#3263) * AppVeyor: Use Visual Studio 2019 image to build since 2015 has TLS problems The 2015 image fails to donwload openssl due to TLS 1.2 not being available, or something along those lines. * mklove: add LD_LIBRARY_PATH to libcurl builder so that runtime checks pass * Travis: build alpine & manylinux builds with --source-deps-only This avoids relying on distro installed packages, which isn't very robust. * Nuget Debian build: use --source-deps-only to avoid external dependencies * RPM test: Use ubi8 image instead of centos:8 .. since centos is no more * Curl 7.82.0 * mklove: curl now requires CoreFoundation and SystemConfiguration frameworks on osx * Test 0128: skip if there's no oauthbearer support * Test 0128: make thread-safe * Test 0077: reduce flakyness by expediting compaction * Update to zlib 1.2.12 and OpenSSL 1.1.1n * vcpkg: revoke to zlib 1.2.11 since 1.2.12 is not yet available (as vcpkg) * Travis: Disable mingw dynamic build for now (gcc breakage) GCC 11 adds a new symbol that is not available in the mingw/msys2 libstdc++, which makes it impossible to run applications that were built. Until that's fixed we disable this worker since it will fail anyway. * mklove: fix formatting of skipped pkg-config checks * Fix lock order for rk_init_lock to avoid deadlock (non-released regression) * vcpkg version bumps * Update release instructions * Make dynamic MinGW build copy DLLs instead of trying to manipulate PATH (@neptoess, confluentinc#3787) * Make dynamic MinGW build copy DLLs instead of trying to manipulate PATH * Remove tag requirement on MinGW dynamic build Co-authored-by: Bill Rose <neptoess@gmail.com> * Fix regression from last PR: curl_ldflags * Reset stored offset on assign() and prevent offsets_store() for unassigned partitions * Include broker_id in offset reset logs and corresponding consumer errors (confluentinc#3785) * Txn: properly handle PRODUCER_FENCED in InitPid reply * Provide reason to broker thread wakeups in debug logs This will make troubleshooting easier * rdkafka_performance: include broker in DR printouts * Make SUBTESTS=.. match all of the subtest format string * Added file io abstraction * rdkafka_performance: cut down on the number of poll calls in full-rate mode * Added test.mock.broker.rtt * Log mock broker bootstrap.servers addresses when test.mock.num.brokers is set * Mock brokers now allow compressed ProduceRequests No decompression or validation is performed. * Made rd_buf_read|peek_iXX() type safe * SUB_TEST_SKIP() format verification * Statistics: let broker.wakeups metric cover all broker wakeups, both IO and cnds * Improved producer queue wakeups * Broker thread: don't block on IO if there are ops available * vcpkg: Update to zlib 1.2.12 * Fix some win32 compilation warnings * Proper use of rd_socket_close() on Win32 Regression during v1.9.0 development * Test 0101: missing return after Test::Skip() * seek() doc clarification (confluentinc#3004) * Documentation updates * style-check* now fails on style warnings * Automatic style fixes * Some OIDC documentation fixes * Fix for AWS_MSK_IAM * Update for new method signature Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com> Co-authored-by: Bill Rose <neptoess@gmail.com> Co-authored-by: Emanuele Sabellico <emasab@gmail.com> Co-authored-by: Magnus Edenhill <magnus@edenhill.se> Co-authored-by: Jing Liu <jl5311@nyu.edu> Co-authored-by: Matt Clarke <matt.clarke@ess.eu> Co-authored-by: Leo Singer <leo.singer@ligo.org> Co-authored-by: Ladislav <ladislavmacoun@gmail.com> Co-authored-by: Ladislav Snizek <ladislav.snizek@cdn77.com> Co-authored-by: Lance Shelton <lance.shelton@hammerspace.com> Co-authored-by: Robin Moffatt <robin@rmoff.net> Co-authored-by: Sergio Arroutbi <sarroutb@redhat.com> Co-authored-by: Khem Raj <raj.khem@gmail.com> Co-authored-by: Bill Rose <wwriv1991@gmail.com> merge upstream 2022 08 01 (UrbanCompass#19) Co-authored-by: Bill Rose <neptoess@gmail.com> Co-authored-by: Magnus Edenhill <magnus@edenhill.se> Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com> Co-authored-by: Emanuele Sabellico <emasab@gmail.com> Co-authored-by: Jing Liu <jl5311@nyu.edu> Co-authored-by: Matt Clarke <matt.clarke@ess.eu> Co-authored-by: Leo Singer <leo.singer@ligo.org> Co-authored-by: Ladislav <ladislavmacoun@gmail.com> Co-authored-by: Ladislav Snizek <ladislav.snizek@cdn77.com> Co-authored-by: Lance Shelton <lance.shelton@hammerspace.com> Co-authored-by: Robin Moffatt <robin@rmoff.net> Co-authored-by: Sergio Arroutbi <sarroutb@redhat.com> Co-authored-by: Khem Raj <raj.khem@gmail.com> Co-authored-by: Bill Rose <wwriv1991@gmail.com> Co-authored-by: Dmytro Milinevskyi <dmytro.milinevskyi@datadoghq.com> Co-authored-by: Mikhail Avdienko <whitearchey@gmail.com> Co-authored-by: wding <yangwding@gmail.com> Co-authored-by: Shawn <wangxiaofan0529@gmail.com> Co-authored-by: ihsinme <ihsinme@gmail.com> Co-authored-by: Emanuele Sabellico <esabellico@confluent.io> Co-authored-by: Roman Schmitz <rschmitz@confluent.io> Co-authored-by: Miklos Espak <miklos@smartcow.ai> Co-authored-by: Alice Rum <wyvie@wyvie.org> Co-authored-by: Eli Smaga <eli@confluent.io>
jacobpath
pushed a commit
to pathccm/librdkafka
that referenced
this issue
Jun 6, 2023
parent d2bc749 author garrett528 <andrew.garrett@compass.com> 1625669334 -0400 committer Jacob Lee <jacob.lee@pathccm.com> 1686019633 -0700 gpgsig -----BEGIN SSH SIGNATURE----- U1NIU0lHAAAAAQAAAH8AAAAic2stZWNkc2Etc2hhMi1uaXN0cDI1NkBvcGVuc3NoLmNvbQ AAAAhuaXN0cDI1NgAAAEEE/KxKuQeycJHYJkNEqsJPsQqQxVl1ftFETXL0PMawe+tBCMrH AiNd2GpQHEKTqzopO72+yiqWDjpM10WrTyzXBAAAAARzc2g6AAAAA2dpdAAAAAAAAAAGc2 hhNTEyAAAAeAAAACJzay1lY2RzYS1zaGEyLW5pc3RwMjU2QG9wZW5zc2guY29tAAAASQAA ACBhH8xrzkQR+w6xy86JjJ6tC6udVA0Xn4VgdX3YEEZ25QAAACEA//YouC+q94g0jxjA8D tL+R+SGXR8782VjNc2vO5hS6YBAAAfoQ== -----END SSH SIGNATURE----- sasl: Enable AWS_MSK_IAM SASL mechanism (confluentinc#3402) AWS_MSK_IAM is a new SASL mechanism for authenticating clients to AWS MSK Kafka clusters and use IAM-based controls to set Kafka ACLs and permissions. This change provides support to allow clients to pass AWS credentials at runtime which is used to build the SASL payload and authenticate clients to IAM enabled MSK clusters. It adds a new SASL mechanism, AWS_MSK_IAM, as well as configuration options to set the following: * AWS access key id * AWS secret access key * AWS region * AWS security token The SASL handshake requires a specific payload that is described here: https://github.com/aws/aws-msk-iam-auth Add curl to doozer build Address comments (UrbanCompass#5) Reduce Travis-CI runtime * Reduce number of jobs when not building a tag * Run unit tests if no tag, and local quick suite (old default) when tagged. * Combine some jobs. Travis ARM64: build static lib Travis: Disable C99 for all builds but the integration test build .. since it hampers the use of assembler (asm()) on arm64. Keep session alive when receiving heartbeat responses during rebalancing add changelog message Update Changelog Add cleanup-s3.py script Move Admin request arguments to result op to make them available on merge (confluentinc#3476) Fix test 0055 now when flush() does not wait for linger.ms Adds support for buildling on illumos mklove: Use curl for module downloads .. instead of wget, since we rely on curl elsewhere. Verify checksum of source dependencies and bump to OpenSSL 1.1.1l, zstd 1.5.0 Travis: login with docker account to avoid rate-limiting Docker dotnet images have changed names, updated. rxidle and txidle were stats emitted as unsigned 64, now signed (confluentinc#3519) Fix a small error due to the unreleased lock before program exit Fix a small error due to the unreleased lock skm->lock before program exit. mklove: make zlib test program compilable The test program that is used at compile-time to detect whether zlib is available fails to compile due to `NULL` being undefined: ``` _mkltmpyos55w.c:5:20: error: use of undeclared identifier 'NULL' z_stream *p = NULL; ^ 1 error generated. ``` This means that zlib availability is only automatically detected when using pkg-config. Import `stddef.h` (which defines `NULL`) in the test program, allowing zlib to be automatically detected via a compilation check. sasl: Enable STS credential refresh (UrbanCompass#7) Define IOV_MAX as 1024 if not defined Removed check int and added debug Fixes error handling for error responses from STS (UrbanCompass#10) mklove: make zlib test program compilable The test program that is used at compile-time to detect whether zlib is available fails to compile due to `NULL` being undefined: ``` _mkltmpyos55w.c:5:20: error: use of undeclared identifier 'NULL' z_stream *p = NULL; ^ 1 error generated. ``` This means that zlib availability is only automatically detected when using pkg-config. Import `stddef.h` (which defines `NULL`) in the test program, allowing zlib to be automatically detected via a compilation check. Travis: New secure env vars AppVeyor: rotate access keys Travis: show sha256sums of artifacts prior to deploy Add MSVC 140 runtimes (for packaging) Add 'ssl.ca.pem' property (confluentinc#2380) Improve nuget release script - Verify artifact file contents and architectures. - Verify that artifact attributes match. - Get README, CONFIG,.. etc, from artifacts instead of local source tree (which may not match the released version). Bump to version 1.8.2 (Skipping 1.8.1 due to dotnet release with that number) mklove: fix static bundle .a generation on osx mklove: portable checksum checking for downloads mklove: allow --source-deps-only OpenSSL builds on OSX Don't build ancient OSX Sierra artifacts Travis: reduce build minutes (tagged jobs) Travis: use --source-deps-only for dependencies instead of using homebrew Homebrew is fantastically slow to update to Travis-CI, and it is burning build credits like crazy. mklove: added mklove_patch mklove: show more of failed build logs mklove openssl installer: workaround build issue in 1.1.1l on osx. Apply OpenSSL PR 16409 patch to fix 1.1.1l build issues on OSX Travis: Remove -Werror from OSX worker since OpenSSL builds have quite a few warnings mklove: try both wget and curl for archive downloads Don't overwrite ssl.ca.location on OSX (confluentinc#3566) Travis: bump Linux base builder from trusty to xenial to circumvent ISRG cert expiry .. which causes older versions of OpenSSL+curl to fail to download OpenSSL.. AddOffsetsToTxn Refresh errors did not trigger coord refresh (confluentinc#3571) Ensure timers are started even if timeout is 0 Transactional producer: Fix possible message loss on OUT_OF_ORDER_SEQ error (confluentinc#3575) Mock push_request_errors() appended the errors in reverse order Update list of supported KIPs Add rd_buf_new() Import cJSON v1.7.14 URL: https://github.com/DaveGamble/cJSON Tag: v1.7.14 SHA: d2735278ed1c2e4556f53a7a782063b31331dbf7 Added HTTP(S) client using cURL Add HTTP(S) client using cURL Fix uninitialized warning on msvc Remove commented-out printfs Remove stray license include in librdkafka vcxproj librdkafka.vcxproj: remove stale OpenSSL paths and enable Vcpkg manifests mklove: but all built deps in the same destdir and set up compiler flags accordingly This fixes some issues when dependency B depends on dependency A, in this case for libcurl that depends on OpenSSL, to make it find the OpenSSL libraries, pkg-config files, etc. mklove: don't include STATIC_LIB_..s in BUILT_WITH mklove: Some autoconf versions seem to need a full path to $INSTALL curl: disable everything but HTTP(S) Added string splitter and kv splitter OAuth/OIDC: Add fields to client configuration (confluentinc#3510) Implement native Win32 IO/Queue scheduler (WSAWaitForMultipleEvents) This removes the internal loopback connections (one per known broker) that were previously used to trigger io-based queue wakeups. Add vcpkg_installed to gitignore Left-trim spaces from string configuration values This makes it easier to use Bash on Windows where a prefixing / is translated into the MinGW32 file system root. Mark rd_kafka_conf_kv_split as unused .. until it's used. rd_kafka_queue_get_background() now creates the background thread Added custom SASL callback queue Fix test flags for 0122 and 0126 Test 0119: remove unused code Direct questions to the github discussions forum to keep issue load down Add clang-format style checking and fixing Add Python style checking and fixing Run style-checker with Github Actions Automatic style fixes using 'make style-fix' Manual style fixes of Python code Avoid use of FILE* BIOs to circumvent OpenSSL_Applink requirement on Windows (confluentinc#3554) Added README for fork (UrbanCompass#15) merge upstream 2022 04 08 (UrbanCompass#17) * Fix memory leak in admin requests Fix a memory leak introduces in ca1b30e in which the arguments to an admin request were not being freed. Discovered by the test suite for rust-rdkafka [0]. [0]: https://github.com/fede1024/rust-rdkafka/pull/397/checks?check_run_id=3914902373 * Fix MinGW Travis build issues by breaking test execution into a separate script * ACL Admin Apis: CreateAcls, DescribeAcls, DeleteAcls * Minor ACL API adjustments and some small code tweaks * Add ACL support to CHANGELOG * Retrieve jwt token from token provider (@jliunyu, confluentinc#3560) * Fixed typo * MsgSets with just aborted msgs raised a MSG_SIZE error, and fix backoff (confluentinc#2993) This also removes fetch backoffs on underflows (truncated responses). * test 0129: style fix * test 0105: Fix race condition * Idempotent producer: save state for removed partitions .. in case they come back. To avoid silent message loss. * Remove incorrect comment on mock API * Fix rkbuf_rkb assert on malformed JoinGroupResponse.metadata * clusterid() would fail if there were no topics in metadata (confluentinc#3620) * sasl.oauthbearer.extensions should be optional Fixes confluentinc/confluent-kafka-python#1269. * Added AK 3.1.0 to test versions * Changelog updates * Bump version to v1.9.0 * sasl.oauthbearer.scope should be optional According to the section 4.4.2 of RFC 6749, the scope is optional in the access token request in client credentials flow. And indeed, for OIDC providers that I find in the wild such as Amazon Cognito, the scope _is_ optional. If the scope is omitted from the request, then the returned access token will contain any and all scope(s) that are configured for the client. See https://datatracker.ietf.org/doc/html/rfc6749#section-4.4.2 * Fix hang in list_groups() when cluster is unavailable (confluentinc#3705) This was caused by holding on to an old broker state version that got outdated and caused an infinite loop, rather than a timeout. * Style fixes * Integration test for OIDC (confluentinc#3646) * Test for trivup * integration test * Update code style for existing code at rdkafka_sasl_oauthbearer_oidc.c * Handle review comment * tiny fix * Handle review comments * misc.c style fix * Test fixes: OIDC requires AK 3.1, not 3.0 * Test 0113: reset security.protocol when using mock cluster * Travis: use Py 3.8 (not 3.5) on Xenial builder * Travis: bump integration test from AK 2.7.0 to 2.8.1 * Fix README release wording * Improve subscribe() error documentation * Fix linger.ms/message.timeout.ms config checking (confluentinc#3709) * Replace deprecated zookeeper flag with bootstrap (@ladislavmacoun, confluentinc#3700) * Replace deprecated zookeeper flag with bootstrap Fixes: confluentinc#3699 Signed-off-by: Ladislav Macoun <ladislavmacoun@gmail.com> * Add backwards compatibility Signed-off-by: Ladislav Macoun <ladislavmacoun@gmail.com> * Add assertion for cmd fitting inside buffer Signed-off-by: Ladislav Macoun <ladislavmacoun@gmail.com> * Increase command buffer Signed-off-by: Ladislav Macoun <ladislavmacoun@gmail.com> * Save one superfluous message timeout toppar scan * Update to fedora:35 to fix the CentOS 8 build mock epel-8-x86_64 is now broken in fedora:33: https://bugzilla.redhat.com/show_bug.cgi?id=2049024 Update to fedora:35 with mock configs: centos+epel-7-x86_64 centos-stream+epel-8-x86_64 * Add link to tutorial on Confluent Developer Also fix indenting of bullet list * Grooming (compilation warnings, potential issues) Signed-off-by: Sergio Arroutbi <sarroutb@redhat.com> * fix: acl binding enum checks (@emasab, confluentinc#3741) * checking enums values when creating or reading AclBinding and AclBindingFilter * AclBinding destroy array function * acl binding unit tests * warnings and fix for unknown enums, test fixes * int sizes matching the read size * pointer to the correct broker * cmake: Use CMAKE_INSTALL_LIBDIR this ensures that it is portable across platforms e.g. ppc64/linux uses lib64 not lib Signed-off-by: Khem Raj <raj.khem@gmail.com> * Trigger op callbacks regardless for unhandled types in consume_batch_queue() et.al. (confluentinc#3263) * AppVeyor: Use Visual Studio 2019 image to build since 2015 has TLS problems The 2015 image fails to donwload openssl due to TLS 1.2 not being available, or something along those lines. * mklove: add LD_LIBRARY_PATH to libcurl builder so that runtime checks pass * Travis: build alpine & manylinux builds with --source-deps-only This avoids relying on distro installed packages, which isn't very robust. * Nuget Debian build: use --source-deps-only to avoid external dependencies * RPM test: Use ubi8 image instead of centos:8 .. since centos is no more * Curl 7.82.0 * mklove: curl now requires CoreFoundation and SystemConfiguration frameworks on osx * Test 0128: skip if there's no oauthbearer support * Test 0128: make thread-safe * Test 0077: reduce flakyness by expediting compaction * Update to zlib 1.2.12 and OpenSSL 1.1.1n * vcpkg: revoke to zlib 1.2.11 since 1.2.12 is not yet available (as vcpkg) * Travis: Disable mingw dynamic build for now (gcc breakage) GCC 11 adds a new symbol that is not available in the mingw/msys2 libstdc++, which makes it impossible to run applications that were built. Until that's fixed we disable this worker since it will fail anyway. * mklove: fix formatting of skipped pkg-config checks * Fix lock order for rk_init_lock to avoid deadlock (non-released regression) * vcpkg version bumps * Update release instructions * Make dynamic MinGW build copy DLLs instead of trying to manipulate PATH (@neptoess, confluentinc#3787) * Make dynamic MinGW build copy DLLs instead of trying to manipulate PATH * Remove tag requirement on MinGW dynamic build Co-authored-by: Bill Rose <neptoess@gmail.com> * Fix regression from last PR: curl_ldflags * Reset stored offset on assign() and prevent offsets_store() for unassigned partitions * Include broker_id in offset reset logs and corresponding consumer errors (confluentinc#3785) * Txn: properly handle PRODUCER_FENCED in InitPid reply * Provide reason to broker thread wakeups in debug logs This will make troubleshooting easier * rdkafka_performance: include broker in DR printouts * Make SUBTESTS=.. match all of the subtest format string * Added file io abstraction * rdkafka_performance: cut down on the number of poll calls in full-rate mode * Added test.mock.broker.rtt * Log mock broker bootstrap.servers addresses when test.mock.num.brokers is set * Mock brokers now allow compressed ProduceRequests No decompression or validation is performed. * Made rd_buf_read|peek_iXX() type safe * SUB_TEST_SKIP() format verification * Statistics: let broker.wakeups metric cover all broker wakeups, both IO and cnds * Improved producer queue wakeups * Broker thread: don't block on IO if there are ops available * vcpkg: Update to zlib 1.2.12 * Fix some win32 compilation warnings * Proper use of rd_socket_close() on Win32 Regression during v1.9.0 development * Test 0101: missing return after Test::Skip() * seek() doc clarification (confluentinc#3004) * Documentation updates * style-check* now fails on style warnings * Automatic style fixes * Some OIDC documentation fixes * Fix for AWS_MSK_IAM * Update for new method signature Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com> Co-authored-by: Bill Rose <neptoess@gmail.com> Co-authored-by: Emanuele Sabellico <emasab@gmail.com> Co-authored-by: Magnus Edenhill <magnus@edenhill.se> Co-authored-by: Jing Liu <jl5311@nyu.edu> Co-authored-by: Matt Clarke <matt.clarke@ess.eu> Co-authored-by: Leo Singer <leo.singer@ligo.org> Co-authored-by: Ladislav <ladislavmacoun@gmail.com> Co-authored-by: Ladislav Snizek <ladislav.snizek@cdn77.com> Co-authored-by: Lance Shelton <lance.shelton@hammerspace.com> Co-authored-by: Robin Moffatt <robin@rmoff.net> Co-authored-by: Sergio Arroutbi <sarroutb@redhat.com> Co-authored-by: Khem Raj <raj.khem@gmail.com> Co-authored-by: Bill Rose <wwriv1991@gmail.com> merge upstream 2022 08 01 (UrbanCompass#19) Co-authored-by: Bill Rose <neptoess@gmail.com> Co-authored-by: Magnus Edenhill <magnus@edenhill.se> Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com> Co-authored-by: Emanuele Sabellico <emasab@gmail.com> Co-authored-by: Jing Liu <jl5311@nyu.edu> Co-authored-by: Matt Clarke <matt.clarke@ess.eu> Co-authored-by: Leo Singer <leo.singer@ligo.org> Co-authored-by: Ladislav <ladislavmacoun@gmail.com> Co-authored-by: Ladislav Snizek <ladislav.snizek@cdn77.com> Co-authored-by: Lance Shelton <lance.shelton@hammerspace.com> Co-authored-by: Robin Moffatt <robin@rmoff.net> Co-authored-by: Sergio Arroutbi <sarroutb@redhat.com> Co-authored-by: Khem Raj <raj.khem@gmail.com> Co-authored-by: Bill Rose <wwriv1991@gmail.com> Co-authored-by: Dmytro Milinevskyi <dmytro.milinevskyi@datadoghq.com> Co-authored-by: Mikhail Avdienko <whitearchey@gmail.com> Co-authored-by: wding <yangwding@gmail.com> Co-authored-by: Shawn <wangxiaofan0529@gmail.com> Co-authored-by: ihsinme <ihsinme@gmail.com> Co-authored-by: Emanuele Sabellico <esabellico@confluent.io> Co-authored-by: Roman Schmitz <rschmitz@confluent.io> Co-authored-by: Miklos Espak <miklos@smartcow.ai> Co-authored-by: Alice Rum <wyvie@wyvie.org> Co-authored-by: Eli Smaga <eli@confluent.io>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
It appears that DeleteRecords can crash when a broker shuts down mid-request.
We only observe this happening as of
v1.7.0
with an rdkafka client that tunnels connections to the Kafka cluster through an HTTP proxy. The rdkafka client tunnels through an HTTP proxy using the CONNECT method via the connect callback. When brokers turn without the proxy, we see no issue. We'd previously been usingv1.6.0
andv1.6.1
with no issue.I fully admit, this setup is strange and may very well be to blame, or at least may be causing things to fail in a strange way that librdkafka would otherwise handle gracefully (without the HTTP proxy in the picture).
Here's the walkback from our coredump:
rdkafka_admin.c:3267
points here. This segfaults becausepartitions
is null.After noticing the WARN/ERROR logs, I added a few additional logs (I don't have full debug logs right now). I've reproduced this using sasl_plaintext as noted below, but nothing I've seen suggests sasl is related or to blame, it's just the config I tested with.
This ~60s timeout corresponds to the default for
socket.timeout.ms
, which serves as the default timeout for admin network requests according toCONFIGURATION.md
.It looks like when the connection times out (because broker shutdown has severed the proxy -> broker connection),
rd_kafka_DeleteRecords_response_merge
doesn't gracefully handle the error. This is made clearer when the logic is compared tord_kafka_DeleteGroups_response_merge
, which checks if the givenrko_partial
has an error [source].At minimum, a patch like the following seems reasonable to at least prevent a segfault: chrisbeard@ca94570
I'm happy to send this patch over if what I've outlined above is reasonable.
How to reproduce
I'll outline the high-level of what I've done to reproduce, but don't have a code sample that's sharable. I don't expect anyone to actually attempt to reproduce this, but the steps might be helpful regardless.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
v1.7.0
2.5.1
RHEL 7.6
debug=..
as necessary) from librdkafkaThe text was updated successfully, but these errors were encountered: