Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent naming for library-packages #1073

Open
h-vetinari opened this issue May 28, 2020 · 17 comments
Open

Consistent naming for library-packages #1073

h-vetinari opened this issue May 28, 2020 · 17 comments

Comments

@h-vetinari
Copy link
Member

In the context of moving pyarrow into the arrow-cpp-feedstock, there was some discussion of naming the outputs, and I brought up the following:

Regarding the naming conventions, I think it might be worth renaming arrow-cpp to libarrow. This is in line with a lot of other feedstocks, but obviously a matter of taste. In this case, we'd have to have a compatibility output of arrow-cpp that depends on libarrow, which could be dropped after a few versions.

@isuruf made me aware of this in the context of faiss. Other examples I can think of off the top of my head are blas & lapack, openblas, opencv, postgresql, gdal, plus a whole bunch more (non-exhaustive).

Of course, this recipe has some counter examples to this: aws-sdk-cpp, boost-cpp, grpc-cpp, thrift-cpp, but I think they are far in the minority and the same argument could be made for renaming those (maybe worth noting that at least the last two seem to have been started by the arrow team).

Following @isuruf's and @xhochy's input, I'm opening this issue here. Also note @isuruf's comment:

@isuruf: -cpp was a trend that I started with boost-cpp. That was a mistake. I'm in favour of changing it, but this is not the correct place to raise the issue.

@wolfv
Copy link
Member

wolfv commented May 28, 2020

I agree that we should have a general discussion of naming: not only in teh context of libraries, but also with bindings.

E.g. should it be py-openimageio or python-openimageio or openimageio-py...

@h-vetinari
Copy link
Member Author

That's a valid point @wolfv, but IMO worth splitting into a separate discussion/issue (not least because it's partly intertwined with the naming choices that the upstream naming packages make themselves).

@jjhelmus
Copy link
Contributor

I'm in favor of using libexample for the name of the example library package.
cpp is a particular confusing name as cpp is both a common suffix for C++ source files (foo.cpp) and the name of the C Preprocessor.

@h-vetinari
Copy link
Member Author

So what's the plan for this? Slowly migrate packages (resp. feedstocks) from example-cpp to libexample? Are there even other possible options (this seems out of scope for normal migrations)?

@CJ-Wright
Copy link
Member

The need for shifting GH repos themselves makes these migrations a beyond its ken, although we could have a mini-migrator that updates the requirements to use the new names. The mini-migrator would provide the least disruption and be most effective if we could get this ready to go before the next python release.

@h-vetinari
Copy link
Member Author

[...] if we could get this ready to go before the next python release.

That's at the beginning of October. Depending on the number of feedstocks, this seems a bit ambitious (but not impossible).

Should we just start modifying the output names of the various feedstocks (while maintaining example-cpp as a compat-output that depends only on the new libexample)?

Do we have a list of affected packages?

@h-vetinari
Copy link
Member Author

Opened an RFC for arrow: conda-forge/arrow-cpp-feedstock#158

If this approach can be applied more generally, I'd be happy to chip in a few PRs for affected packages.

@pearu
Copy link

pearu commented Jun 27, 2020

Should the rename happen in-place or via introducing a new feedstock and the old will be archived?

conda-forge/arrow-cpp-feedstock#158 (comment)

@h-vetinari
Copy link
Member Author

As outlined in my answer there, I think that all the feedstock-based operations are IMO much higher in terms of effort / complexity / possible disruption - not least the amount of mutex-packages that would be necessary to prevent parallel installation of the packages from the old & new feedstock.

This can be much more easily achieved by keeping the old output names as compatibility, but depending exactly on the subpackage of the new output name.

@xhochy
Copy link
Member

xhochy commented Jun 29, 2020

There is not need for mutex packages, setting the correct. run_constraint should be enough. This is how we prevent defaults' libboost being installed at the same time with conda-forge's boost-cpp: https://github.com/conda-forge/boost-cpp-feedstock/blob/992bc86a87a05f9935ae8049b5c21bc9a80cedc7/recipe/meta.yaml#L33

I would though setup new repositories with fixed names for the libraries though. I don't think that our infrastructure supports renaming repositories (I tried that one and failed heavily).

@h-vetinari
Copy link
Member Author

I've started to pick this up again (i.e. doing xyz-cpp -> libxyz), first with abseil (also presented at two core dev meetings), and once that is digested, hopefully sometime soon with grpc.

This is going to be a slow process (due to the involvement with the pinning), but I don't mind that.

However, stumbling over conda-forge/staged-recipes#19764, there are cases that explicitly distinguish between the libraries built for C resp. C++. There is some prior art on this (both in c-f as well as upstream) where e.g. LLVM has both libclang & libclang-cpp.

I'm wondering if we should "graduate" this to a general principle, of which I could imagine two flavours:

  1. [more disruptive] always name C-only libraries libxyz and libraries containg (also) C++ libxyz-cpp
  2. [less disruptive] libraries (either C/C++) get named libxyz unless they have separate builds for C & C++, in which case we use libxyz for the C interface and libxyz-cpp for the C++ one

Thoughts?

@carterbox
Copy link
Member

Why shouldn't package names should follow upstream naming conventions? If the developers of a package have named their package foo-cpp or foo, then why would we publish it as libfoo? Or is this discussion solely for the purpose of naming package outputs for example when libraries, binaries, and other artifacts of a package are to be separated.

@h-vetinari
Copy link
Member Author

h-vetinari commented Aug 25, 2022

Why shouldn't package names should follow upstream naming conventions?

Because occasionally there are more relevant considerations, and having a consistent lib prefix qualifies IMO. Already many libs diverge from their upstream naming, either through an added "-cpp" prefix (arrow, boost1, grpc, etc.), some that add a "lib" prefix (many), some with neither despite clearly qualifyinq (e.g. jpeq, json, zstd), some publishing both original & prefixed (e.g. zlib), and even some that publish libs that don't exist get published upstream (tensorflow, faiss).

Our artefacts only "live" in the conda ecosystem anyway (where many python packages don't follow the PyPI names for various reasons either), and for very similar reasons, we can choose to make our lives easier on the library side, by enforcing (within reason) a consistent approach in our ecosystem.

PS. Funnily enough, even for packages that call themselves foo or foo-cpp, the output produced by their build scripts is often libfoo.so, which is another reason to prefer the lib-prefix.

Footnotes

  1. See Isuru's comment quoted in the OP:
    "-cpp was a trend that I started with boost-cpp. That was a mistake. I'm in favour of changing it, [...]"

@h-vetinari
Copy link
Member Author

we can choose to make our lives easier on the library side, by enforcing (within reason) a consistent approach in our ecosystem.

I should perhaps word this more strongly than "make our lives easier". By having a consistent setup for libs (e.g. run-exports, cmake & pkgconfig stuff, static-vs-shared setup if applicable, etc.) that's also clearly recognizable by a common naming scheme, we'd make navigating / understanding / contributing to the respective feedstocks much easier for both new and experienced members.

We could then go from "I have to figure out how this bespoke recipe works" to "ah, that's a feedstock following the library-pattern" (ideally accompanied by some documentation in the knowledge base, obviously).

@carterbox
Copy link
Member

even some that publish libs that don't exist get published upstream

I think this is the best answer to my question of why we don't follow upstream naming. i.e. in most cases upstream developers don't have naming conventions for separating their artifacts into multiple outputs. The packages which you mention (see tensorflow, zlib, boost, arrow), all have strange output names because conda recipe maintainers are trying to indicate the contents of the output from the name or avoid collisions with other packages on the channel.

I'm in favor of consistent naming scheme. However, I think libfoo output should only contain libs, headers, and build system config files. For example, the jpeg, zstd, dav1d, packages are not only libs, they also contain command line utilities, so I don't think renaming them to libfoo is appropriate. However, if the libs and headers were split into a separate output then that new output would be libfoo.

@h-vetinari
Copy link
Member Author

[...] packages are not only libs, they also contain command line utilities, so I don't think renaming them to libfoo is appropriate. However, if the libs and headers were split into a separate output then that new output would be libfoo.

I broadly agree with all of that (though there are already exceptions the other way around too, like libprotobuf containing protoc).

@xhochy
Copy link
Member

xhochy commented Aug 26, 2022

I would prefer that in the cases where we have a binary, we should split it up into a separate output (if we are touching the package) and really keep packages with a lib prefix library only packages. libprotobuf in build always confuses me even though I maintain that package for a long time. For example in the case of thrift-cpp, I have split up the package https://github.com/conda-forge/thrift-cpp-feedstock/blob/f5a6473f172fe475bafdec30555ffcb1dcfe9652/recipe/meta.yaml#L81 as the compiler was 90% of the package size. Here the naming transition was simpler as the initial package was called thrift-cpp, so the new outputs didn't clash with the existing packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants