Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace conda with pyenv to fix incorrect libstdc++ use in jammy CI. #6966

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ssheorey
Copy link
Member

@ssheorey ssheorey commented Sep 13, 2024

Type

  • Bug fix (non-breaking change which fixes an issue): Fixes #
  • New feature (non-breaking change which adds functionality). Resolves #
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) Resolves #

Motivation and Context

CUDA jammy CI (python tests) fail since pytest causes the conda libstdc++ library to be loaded instead of the system libstdc++ library. This is older and prevents Open3D cuda pybind lib from loading.

Checklist:

  • I have run python util/check_style.py --apply to apply Open3D code style
    to my code.
  • This PR changes Open3D behavior or adds new functionality.
    • Both C++ (Doxygen) and Python (Sphinx / Google style) documentation is
      updated accordingly.
    • I have added or updated C++ and / or Python unit tests OR included test
      results
      (e.g. screenshots or numbers) here.
  • I will follow up and update the code if CI fails.
  • For fork PRs, I have selected Allow edits from maintainers.

Description

  • Replace python install from conda with pyenv python install. This uses the system libraries.
  • Use gcc-12 instead of gcc-11 when building in Ubuntu 22.04. gcc-11 leads to a seg fault in tensormap in Release mode.

Copy link

update-docs bot commented Sep 13, 2024

Thanks for submitting this pull request! The maintainers of this repository would appreciate if you could update the CHANGELOG.md based on your changes.

@ssheorey
Copy link
Member Author

ssheorey commented Sep 17, 2024

Seg fault in python test_tensormap.py here when built with gcc 11 (Ubuntu 20.04 jammy) in release config. No seg fault in debug or relwithdebinfo configs.

...
def test_tensormap(device):
...
    # __delitem__ operator. SEGFAULT
    with pytest.raises(RuntimeError) as excinfo:
        del tm.positions
        assert 'cannot be deleted' in str(excinfo.value)
...
   # Set primary key. SEGFAULT
    with pytest.raises(KeyError) as e:
        tm.primary_key = o3c.Tensor.ones((2, 3), dtype, device)
...
    # Get unknown attributes. SEGFAULT
    with pytest.raises(KeyError) as e:
        normals = tm.normals

Fix: switch to gcc 13.

Update nanoflannimpl to fix this warning as error:

    inlined from ‘open3d::core::nns::impl::{anonymous}::_KnnSearchCPU<float, int, open3d::core::nns::NeighborSearchAllocator<float, int>, 1>(open3d::core::nns::NanoFlannIndexHolderBase*, int64_t*, size_t, const float*, size_t, const float*, size_t, int, bool, bool, open3d::core::nns::NeighborSearchAllocator<float, int>&)::<lambda(const float*, const float*, size_t)>’ at /home/ssheorey/Documents/Open3D/Code/Open3D/cpp/open3d/core/nns/NanoFlannImpl.h:126:5,
    inlined from ‘open3d::core::nns::impl::{anonymous}::_RadiusSearchCPU<float, int, open3d::core::nns::NeighborSearchAllocator<float, int>, 1>(open3d::core::nns::NanoFlannIndexHolderBase*, int64_t*, size_t, const float*, size_t, const float*, size_t, const float*, bool, bool, bool, bool, open3d::core::nns::NeighborSearchAllocator<float, int>&)::<lambda(const tbb::detail::d1::blocked_range<long unsigned int>&)>’ at /home/ssheorey/Documents/Open3D/Code/Open3D/cpp/open3d/core/nns/NanoFlannImpl.h:258:41:
/usr/include/c++/13/bits/new_allocator.h:168:33: error: ‘void operator delete(void*, std::size_t)’ called on pointer ‘<unknown>’ with nonzero offset [4, 9223372036854775804] [-Werror=free-nonheap-object]

gcc-11 leads to a seg fault in tensormap in Release mode.
Copy link
Contributor

@benjaminum benjaminum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -147,8 +140,9 @@ void _KnnSearchCPU(NanoFlannIndexHolderBase *holder,
for (size_t valid_i = 0; valid_i < num_valid; ++valid_i) {
TIndex idx = result_indices[valid_i];
if (ignore_query_point &&
points_equal(&queries[i * dimension],
&points[idx * dimension], dimension)) {
std::equal(&queries[i * dimension],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! This should significantly reduce the number of heap allocations

bash Miniconda3-latest-Linux-x86_64.sh -b; \
rm Miniconda3-latest-Linux-x86_64.sh; \
curl https://pyenv.run | bash \
&& pyenv update \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was working on switching to Miniforge in #6717 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

miniforge likely won't fix this issue, since it's a copy of conda and likely uses it's on libstdc++ as well.

@ssheorey
Copy link
Member Author

Seg fault with Python, CUDA 11.8, 12.1 and Ubuntu 22.04. No issue with C++, the open3d-cpu package, or if CUDA_VISIBLE_DEVICES is empty. No issue on Ubuntu 20.04 or Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants