Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - show how to split the build #148

Closed
wants to merge 14 commits into from

Conversation

coryan
Copy link
Contributor

@coryan coryan commented Sep 10, 2023

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Part of the work for #141

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe:

  • It looks like the 'google-cloud-cpp-full' output doesn't have any tests.

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks amazing @coryan!

I just have one question out of curiosity, but basically it looks like we can open a PR for a new google-cloud-cpp-core-feedstock feedstock.

There's just one thing that stood out to me from the logs, which is that we'll have to decide where the protos go, so that something like the below doesn't happen.

ClobberWarning: This transaction has incompatible packages due to a shared path.
  packages: local/linux-64::libgoogle-cloud-2.15.1-h8d7e28b_0, local/linux-64::libgoogle-cloud-full-2.15.1-h8d7e28b_0
  path: 'lib/libgoogle_cloud_cpp_iam_v2_protos.so.2.15.1'

Also, I checked the timings of the builds, and even though we've reduced to only bigtable,iam,pubsub,storage,spanner, those builds take very close to 6h already. While this would unblock us for now, I doubt that these components will not also see their own growth over time, so just to check your thoughts on the long-term viability of that split, resp. that set of core libraries...

cmake -G "Ninja" ^
-S . -B build_full ^
-DGOOGLE_CLOUD_CPP_ENABLE=kms ^
-DGOOGLE_CLOUD_CPP_USE_INSTALLED_COMMON=ON ^
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What expectations are hiding behind this switch? I presume something very close to bigtable,iam,pubsub,storage,spanner?

Copy link
Contributor Author

@coryan coryan Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's just one thing that stood out to me from the logs, which is that we'll have to decide where the protos go, so that something like the below doesn't happen.

That seems like a bug, or maybe a feature that depends on iam and needs to be in the same package as iam.... let me experiment locally and get back to you.

Also, I checked the timings of the builds, and even though we've reduced to only bigtable,iam,pubsub,storage,spanner, those builds take very close to 6h already.

This PR is (or was) still building nearly everything. It just does in stages that we can break into new feedstocks. I changed the build script to build a smaller subset and to time each step. Let's see how that looks.

I expect the build times will shrink once these are split into separate feedstocks, as the feedstock for "everything else" will not need to compile bigtable, pubsub, etc.

What expectations are hiding behind this switch? I presume something very close to bigtable,iam,pubsub,storage,spanner?

There are about a dozen libraries1 that are used by all features 2. This flag says that at least these common libraries are already installed and do not need to be built. In almost all cases the features do not depend on each other, but there are a handful of exceptions.

I think we should discuss the structure and contents of these new feedstocks. This PR sketches how to build one feedstock with the most popular features, and a second feedstock with "everything else". Maybe we should have three ("common", "most popular", "everything else"). I don't think we want to go to the extreme of creating nearly 120 feedstocks (one per feature).

We also need to think about the subpackages within each feedstock. We may want to create separate subpackages for some large features (aiplatform, bigtable, pubsub, spanner, and storage come to mind), but we should leave most features in one large subpackage.

Footnotes

  1. libgoogle_cloud_cpp_common, google_cloud_cpp_grpc_utils.so, libgoogle_cloud_cpp_rest_internal.so, libgoogle_cloud_cpp_rest_protobuf_internal.so, libgoogle_cloud_cpp_iam_credentials_v1_iamcredentials_protos.so, libgoogle_cloud_cpp_iam_v1_policy_protos.so, libgoogle_cloud_cpp_longrunning_operations_protos.so, libgoogle_cloud_cpp_rpc_status_protos.so.

  2. bigtable and storage are examples of "features", there are about 120 of those. Each has at least two libraries: libgoogle_cloud_cpp_${feature}.so and libgoogle_cloud_cpp_${feature}_protos.so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build times just for bigtable,...,storage are closer to 1h. That suggest the build times for "everything else" are going to be closer to 5h, which maybe is not enough headroom. We may need to split that into two separate feedsocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should discuss the structure and contents of these new feedstocks.

I've thought a bit more about the "add -core" vs. "add -full" side of things, and while I can live with both, I think I'm now slightly more in favour of the -full one. That's because I think it'll be less common to require all of the libraries, than to just require the "common" bits. So I think we should make the rarer use-case use the extra output, in this case google-cloud-cpp-full (or -all) or something.

The plan would then be

  • switch this feedstock to define google-cloud-cpp (from the POV of conda-forge) as the minimum necessary to enable later builds to use -DGOOGLE_CLOUD_CPP_USE_INSTALLED_COMMON=ON.
  • add another feedstock for google-cloud-cpp-all as the maximal package
    • assuming we can still build everything except -common there, do so
    • if not, split off big features on an as-needed/as-justified bases into separate feedstocks, which then get pulled into the final google-cloud-cpp-all in that feedstock.

Regarding naming, we could use this as an opportunity to align the naming with conda-forge/conda-forge.github.io#1073 et al. (it's a bit of a sprawling issue, sorry...; compare also the soon-to-be-complete boost unification to +/- libboost{,-headers,-devel})

I could imagine the following:

lib depends on content
libgoogle-cloud - common libraries
libgoogle-cloud-devel libgoogle-cloud + headers and CMake/pkgconfig metadata
libgoogle-cloud-<feat> libgoogle-cloud a library for a given feature we had to break out
of -all because it doesn't compile together
with "everything else" anymore
libgoogle-cloud-<feat>-devel libgoogle-cloud-<feat> + headers and CMake/pkgconfig metadata
libgoogle-cloud-all libgoogle-cloud
libgoogle-cloud-<feat>
all the libs
libgoogle-cloud-all-devel libgoogle-cloud-all + headers and CMake/pkgconfig metadata

I don't care so much about -all vs. -full, but we do have a couple of the former already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subpackage names (libgoogle-cloud-<feat> and libgoogle-cloud-<feat>-devel) make sense to me, and I think you understand the tradeoffs in the Conda ecosystem better than I do. I think you have a better sense of what they should be named. I assume it is no a problem to break existing packages (e.g. arrow) that depend on libgoogle-cloud?

Do we need to make libgoogle-cloud-<feat>-devel depend on libgoogle-cloud-devel. I assume that is needed to get the common headers and CMake/pkgconfig metadata?

I need to make some changes to make libgoogle-cloud and libgoogle-cloud-devel as small as possible. I assume we can apply the patch if we want to start these changes before the next google-cloud-cpp release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make libgoogle-cloud-<feat>-devel depend on libgoogle-cloud-devel. I assume that is needed to get the common headers and CMake/pkgconfig metadata?

You're right about that of course, that was an oversight.

I assume it is no a problem to break existing packages (e.g. arrow) that depend on libgoogle-cloud?

The runtime dependence libgoogle-cloud will still be there; the only issue is to change the name of the host dependency in the recipe, but we can add a nudge to to that via a linter (or directly change it with the migration).

@coryan
Copy link
Contributor Author

coryan commented Oct 14, 2023

Closing for now and moving the discussion back to #141

@coryan coryan closed this Oct 14, 2023
@coryan coryan deleted the demo-split-build branch October 14, 2023 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants