Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17104: [C++] Workaround protobuf ABI issues #13634

Closed
wants to merge 13 commits into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Jul 18, 2022

The protocol buffers ABI unfortunately depends on whether NDEBUG is enabled or not.
When installing gRPC using a package manager (Homebrew, conda...), most often it is built with NDEBUG enabled.
Some hackery is then needed to avoid missing symbols on Arrow debug builds...

Should also fix ARROW-16520.

@pitrou
Copy link
Member Author

pitrou commented Jul 18, 2022

@github-actions crossbow submit -g cpp

@pitrou pitrou force-pushed the ARROW-17104-grpc-abi branch from fdafabe to 1d78dd6 Compare July 18, 2022 11:41
@pitrou
Copy link
Member Author

pitrou commented Jul 18, 2022

@lidavidm @raulcd This at least works for me with the conda-forge libprotobuf package.

@pitrou pitrou force-pushed the ARROW-17104-grpc-abi branch from 1d78dd6 to a6a445c Compare July 18, 2022 11:49
@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@github-actions
Copy link

Revision: a6a445c

Submitted crossbow builds: ursacomputing/crossbow @ actions-f4bb15b092

Task Status
test-build-cpp-fuzz Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-10-cpp-amd64 Github Actions
test-debian-10-cpp-i386 Github Actions
test-debian-11-cpp-amd64 Github Actions
test-debian-11-cpp-i386 Github Actions
test-fedora-35-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-bundled Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions
test-ubuntu-22.04-cpp Github Actions

@raulcd
Copy link
Member

raulcd commented Jul 18, 2022

Do we require a higher protobuf version for substrait with this change? I think the failures on Ubuntu 18.04 are due to version 3.0.0 being used.
Found Protobuf: /usr/lib/x86_64-linux-gnu/libprotobuf.so;-pthread (found suitable version "3.0.0", minimum required is "3.0.0")
From what I can see metadata_lite.h was added on >=3.3.x
Not present on 3.2 branch:
https://github.com/protocolbuffers/protobuf/tree/3.2.x/src/google/protobuf
present on 3.3.:
https://github.com/protocolbuffers/protobuf/tree/3.3.x/src/google/protobuf

@pitrou
Copy link
Member Author

pitrou commented Jul 18, 2022

@raulcd No, this should hopefully be fixed using a compile-time check.

Comment on lines +25 to +27
// FIXME there are too many inclusions here; besides, arrow/compute/exec/util.h
// includes even more, including arrow/util/logging.h which is forbidden
// in public headers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pitrou
Copy link
Member Author

pitrou commented Jul 18, 2022

Unfortunately the MacOS Python issue still occurs:
https://github.com/apache/arrow/runs/7391963825?check_suite_focus=true#step:5:98

@lidavidm @kou does one of you have a MacOS machine to debug this?

@pitrou pitrou force-pushed the ARROW-17104-grpc-abi branch from 5301ee1 to 9582bd8 Compare July 18, 2022 16:32
@lidavidm
Copy link
Member

I usually SSH into the Ursa Mac machines to debug this sort of issue. I could take a look but it might be a few days.

@pitrou
Copy link
Member Author

pitrou commented Jul 18, 2022

@paleolimbot Can you help me debug this R CI failure? I cannot even reproduce locally (build succeeds here with Docker):
https://github.com/apache/arrow/runs/7393839585?check_suite_focus=true#step:7:4753

@paleolimbot
Copy link
Member

Seems like #include <arrow/engine/substrait/api.h> is not sufficient for our usage in the R package? Maybe #include <arrow/engine/substrait/serde.h> is needed?

@pitrou
Copy link
Member Author

pitrou commented Jul 18, 2022

Well, arrow/engine/substrait/api.h includes arrow/engine/substrait/serde.h.

@paleolimbot
Copy link
Member

I also cannot reproduce locally (MacOS, M1). I will try locally on Linux as well...I don't quite understand the failure mode...even if ARROW_HOME was picking up a different version of Arrow than that of the commit, are there any versions of engine/substrait/api.h that have ever excluded the basic functionality of Substrait?

@pitrou
Copy link
Member Author

pitrou commented Jul 18, 2022

are there any versions of engine/substrait/api.h that have ever excluded the basic functionality of Substrait?

Apparently not. It was like that from the start.

@paleolimbot
Copy link
Member

I can't reproduce locally on Linux either. One other thing to try is change all ValueOrStop() and StopIfNotOk() calls to arrow::ValueOrStop()/arrow::StopIfNotOk(). I don't know why it's not complaining about any of the other ValueOrStop()/StopIfNotOk() calls in the file, but that does seem to be part of the problem?

@pitrou
Copy link
Member Author

pitrou commented Jul 18, 2022

I'm not sure. TBH I'm a bit surprised these calls work at all in other instances. Perhaps because of argument-dependent lookup?

In any case, I think the arrow::engine::DeserializePlans and arrow::engine::internal errors should be investigated first.

@pitrou
Copy link
Member Author

pitrou commented Jul 18, 2022

I'm a trying a cache-less build...

@pitrou pitrou force-pushed the ARROW-17104-grpc-abi branch from 3b2a345 to 9582bd8 Compare July 18, 2022 18:09
@pitrou
Copy link
Member Author

pitrou commented Jul 18, 2022

Ok, I'm relieved that the issue isn't due to caching at least.

@@ -22,6 +22,10 @@
#include <string>
#include <vector>

// FIXME there are too many inclusions here; besides, arrow/compute/exec/util.h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// FIXME there are too many inclusions here; besides, arrow/compute/exec/util.h
// TODO(ARROW-17109): there are too many inclusions here; besides, arrow/compute/exec/util.h

@pitrou pitrou force-pushed the ARROW-17104-grpc-abi branch from 4874f99 to 9f281de Compare July 19, 2022 14:01
@pitrou
Copy link
Member Author

pitrou commented Jul 19, 2022

Ok, my latest attempt failed again.

@pitrou pitrou changed the title ARROW-17104: [C++] Workaround gRPC ABI issues ARROW-17104: [C++] Workaround protobuf ABI issues Jul 19, 2022
@pitrou
Copy link
Member Author

pitrou commented Jul 19, 2022

Well, finally it seems the issue may be fixed in the next upstream release:
protocolbuffers/protobuf#10271

@kou
Copy link
Member

kou commented Jul 20, 2022

I've fixed GLib and Python failures.

@kou
Copy link
Member

kou commented Jul 20, 2022

@github-actions crossbow submit -g cpp -g python

@github-actions
Copy link

Revision: 8b0b941

Submitted crossbow builds: ursacomputing/crossbow @ actions-f8f2014cd2

Task Status
test-build-cpp-fuzz Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-conda-python-3.10 Github Actions
test-conda-python-3.7 Github Actions
test-conda-python-3.7-hdfs-2.9.2 Github Actions
test-conda-python-3.7-hdfs-3.2.1 Github Actions
test-conda-python-3.7-kartothek-latest Github Actions
test-conda-python-3.7-kartothek-master Github Actions
test-conda-python-3.7-pandas-0.24 Github Actions
test-conda-python-3.7-pandas-latest Github Actions
test-conda-python-3.7-spark-v3.1.2 Github Actions
test-conda-python-3.8 Github Actions
test-conda-python-3.8-hypothesis Github Actions
test-conda-python-3.8-pandas-latest Github Actions
test-conda-python-3.8-pandas-nightly Github Actions
test-conda-python-3.8-spark-v3.2.0 Github Actions
test-conda-python-3.9 Github Actions
test-conda-python-3.9-dask-latest Github Actions
test-conda-python-3.9-dask-master Github Actions
test-conda-python-3.9-pandas-master Github Actions
test-conda-python-3.9-spark-master Github Actions
test-debian-10-cpp-amd64 Github Actions
test-debian-10-cpp-i386 Github Actions
test-debian-11-cpp-amd64 Github Actions
test-debian-11-cpp-i386 Github Actions
test-debian-11-python-3 Azure
test-fedora-35-cpp Github Actions
test-fedora-35-python-3 Azure
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-bundled Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions
test-ubuntu-20.04-python-3 Azure
test-ubuntu-22.04-cpp Github Actions

@kou
Copy link
Member

kou commented Jul 20, 2022

https://app.travis-ci.com/github/apache/arrow/jobs/577173431#L2887

In file included from /arrow/cpp/src/arrow/flight/protocol_internal.h:22,
                 from /arrow/cpp/src/arrow/flight/serialization_internal.h:22,
                 from /arrow/cpp/src/arrow/flight/client.cc:39,
                 from src/arrow/flight/CMakeFiles/arrow_flight_objlib.dir/Unity/unity_0_cxx.cxx:4:
/arrow/cpp/src/arrow/util/protobuf_abi_fixup_internal.h:39:2: error: #error "Protobuf already included, fixup can't be applied!"
   39 | #error "Protobuf already included, fixup can't be applied!"
      |  ^~~~~

It seems that we need a wrapper for Flight.pb.cc like cpp/src/arrow/flight/transport/grpc/protocol_grpc_internal.cc.

@pitrou
Copy link
Member Author

pitrou commented Jul 20, 2022

Now that this is fixed upstream, my inclination would be to wait for conda-forge and homebrew packages to be updated.
We can also add an error message when compiling in debug mode with the offending protobuf versions.

@kou
Copy link
Member

kou commented Jul 21, 2022

I think that we need this because other packaging system may ship Protobuf that has this problem.
If you don't want to complete this, can I take over this?

@pitrou
Copy link
Member Author

pitrou commented Jul 21, 2022

I think that we need this because other packaging system may ship Protobuf that has this problem.

Which packaging systems ship the offending protobuf version?
This is mainly for us and our CI. If you compile Arrow in release mode, there is no problem. Third-party users are unlikely to compile Arrow in debug mode...

If you don't want to complete this, can I take over this?

@kou You can. But the fix is extremely brittle, so I'm in favour of not doing it since it's fixed upstream.

@kou
Copy link
Member

kou commented Jul 21, 2022

This is mainly for us and our CI. If you compile Arrow in release mode, there is no problem.

Ah, I see. OK. We can close this.

@raulcd
Copy link
Member

raulcd commented Jul 21, 2022

Should we still merge 9e10f6c independently? I think is required to fix the:

Program glib-mkenums mkenums found: NO
c_glib/arrow-glib/meson.build:216:0: ERROR: Program 'glib-mkenums mkenums' not found or not executable

Which happens on master?

@pitrou
Copy link
Member Author

pitrou commented Jul 21, 2022

Right, and the added cpp/src/arrow/public_api_test.cc test (together with the necessary source code fixes) would be nice to get in as well. So let's keep this PR open until they're addressed by another PR?

@pitrou
Copy link
Member Author

pitrou commented Aug 24, 2022

I opened #13965 for the public API test improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants