-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP - show how to split the build #148
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe:
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks amazing @coryan!
I just have one question out of curiosity, but basically it looks like we can open a PR for a new google-cloud-cpp-core-feedstock
feedstock.
There's just one thing that stood out to me from the logs, which is that we'll have to decide where the protos go, so that something like the below doesn't happen.
ClobberWarning: This transaction has incompatible packages due to a shared path.
packages: local/linux-64::libgoogle-cloud-2.15.1-h8d7e28b_0, local/linux-64::libgoogle-cloud-full-2.15.1-h8d7e28b_0
path: 'lib/libgoogle_cloud_cpp_iam_v2_protos.so.2.15.1'
Also, I checked the timings of the builds, and even though we've reduced to only bigtable,iam,pubsub,storage,spanner
, those builds take very close to 6h already. While this would unblock us for now, I doubt that these components will not also see their own growth over time, so just to check your thoughts on the long-term viability of that split, resp. that set of core libraries...
cmake -G "Ninja" ^ | ||
-S . -B build_full ^ | ||
-DGOOGLE_CLOUD_CPP_ENABLE=kms ^ | ||
-DGOOGLE_CLOUD_CPP_USE_INSTALLED_COMMON=ON ^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What expectations are hiding behind this switch? I presume something very close to bigtable,iam,pubsub,storage,spanner
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's just one thing that stood out to me from the logs, which is that we'll have to decide where the protos go, so that something like the below doesn't happen.
That seems like a bug, or maybe a feature that depends on iam
and needs to be in the same package as iam
.... let me experiment locally and get back to you.
Also, I checked the timings of the builds, and even though we've reduced to only
bigtable,iam,pubsub,storage,spanner
, those builds take very close to 6h already.
This PR is (or was) still building nearly everything. It just does in stages that we can break into new feedstocks. I changed the build script to build a smaller subset and to time each step. Let's see how that looks.
I expect the build times will shrink once these are split into separate feedstocks, as the feedstock for "everything else" will not need to compile bigtable
, pubsub
, etc.
What expectations are hiding behind this switch? I presume something very close to
bigtable,iam,pubsub,storage,spanner
?
There are about a dozen libraries1 that are used by all features 2. This flag says that at least these common libraries are already installed and do not need to be built. In almost all cases the features do not depend on each other, but there are a handful of exceptions.
I think we should discuss the structure and contents of these new feedstocks. This PR sketches how to build one feedstock with the most popular features, and a second feedstock with "everything else". Maybe we should have three ("common", "most popular", "everything else"). I don't think we want to go to the extreme of creating nearly 120 feedstocks (one per feature).
We also need to think about the subpackages within each feedstock. We may want to create separate subpackages for some large features (aiplatform
, bigtable
, pubsub
, spanner
, and storage
come to mind), but we should leave most features in one large subpackage.
Footnotes
-
libgoogle_cloud_cpp_common
,google_cloud_cpp_grpc_utils.so
,libgoogle_cloud_cpp_rest_internal.so
,libgoogle_cloud_cpp_rest_protobuf_internal.so
,libgoogle_cloud_cpp_iam_credentials_v1_iamcredentials_protos.so
,libgoogle_cloud_cpp_iam_v1_policy_protos.so
,libgoogle_cloud_cpp_longrunning_operations_protos.so
,libgoogle_cloud_cpp_rpc_status_protos.so
. ↩ -
bigtable
andstorage
are examples of "features", there are about 120 of those. Each has at least two libraries:libgoogle_cloud_cpp_${feature}.so
andlibgoogle_cloud_cpp_${feature}_protos.so
↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build times just for bigtable,...,storage
are closer to 1h. That suggest the build times for "everything else" are going to be closer to 5h, which maybe is not enough headroom. We may need to split that into two separate feedsocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should discuss the structure and contents of these new feedstocks.
I've thought a bit more about the "add -core
" vs. "add -full
" side of things, and while I can live with both, I think I'm now slightly more in favour of the -full
one. That's because I think it'll be less common to require all of the libraries, than to just require the "common" bits. So I think we should make the rarer use-case use the extra output, in this case google-cloud-cpp-full
(or -all
) or something.
The plan would then be
- switch this feedstock to define
google-cloud-cpp
(from the POV of conda-forge) as the minimum necessary to enable later builds to use-DGOOGLE_CLOUD_CPP_USE_INSTALLED_COMMON=ON
. - add another feedstock for
google-cloud-cpp-all
as the maximal package- assuming we can still build everything except
-common
there, do so - if not, split off big features on an as-needed/as-justified bases into separate feedstocks, which then get pulled into the final
google-cloud-cpp-all
in that feedstock.
- assuming we can still build everything except
Regarding naming, we could use this as an opportunity to align the naming with conda-forge/conda-forge.github.io#1073 et al. (it's a bit of a sprawling issue, sorry...; compare also the soon-to-be-complete boost unification to +/- libboost{,-headers,-devel}
)
I could imagine the following:
lib | depends on | content |
---|---|---|
libgoogle-cloud |
- | common libraries |
libgoogle-cloud-devel |
libgoogle-cloud |
+ headers and CMake/pkgconfig metadata |
libgoogle-cloud-<feat> |
libgoogle-cloud |
a library for a given feature we had to break out of -all because it doesn't compile togetherwith "everything else" anymore |
libgoogle-cloud-<feat>-devel |
libgoogle-cloud-<feat> |
+ headers and CMake/pkgconfig metadata |
libgoogle-cloud-all |
libgoogle-cloud libgoogle-cloud-<feat> |
all the libs |
libgoogle-cloud-all-devel |
libgoogle-cloud-all |
+ headers and CMake/pkgconfig metadata |
I don't care so much about -all
vs. -full
, but we do have a couple of the former already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subpackage names (libgoogle-cloud-<feat>
and libgoogle-cloud-<feat>-devel
) make sense to me, and I think you understand the tradeoffs in the Conda ecosystem better than I do. I think you have a better sense of what they should be named. I assume it is no a problem to break existing packages (e.g. arrow) that depend on libgoogle-cloud
?
Do we need to make libgoogle-cloud-<feat>-devel
depend on libgoogle-cloud-devel
. I assume that is needed to get the common headers and CMake/pkgconfig metadata?
I need to make some changes to make libgoogle-cloud
and libgoogle-cloud-devel
as small as possible. I assume we can apply the patch if we want to start these changes before the next google-cloud-cpp
release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to make
libgoogle-cloud-<feat>-devel
depend onlibgoogle-cloud-devel
. I assume that is needed to get the common headers and CMake/pkgconfig metadata?
You're right about that of course, that was an oversight.
I assume it is no a problem to break existing packages (e.g. arrow) that depend on
libgoogle-cloud
?
The runtime dependence libgoogle-cloud
will still be there; the only issue is to change the name of the host dependency in the recipe, but we can add a nudge to to that via a linter (or directly change it with the migration).
Closing for now and moving the discussion back to #141 |
Checklist
0
(if the version changed)conda-smithy
(Use the phrase@conda-forge-admin, please rerender
in a comment in this PR for automated rerendering)Part of the work for #141