Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-vendor prebuilt binaries to ease packaging for Linux distros #469

Open
pombredanne opened this issue Jan 29, 2017 · 21 comments
Open

De-vendor prebuilt binaries to ease packaging for Linux distros #469

pombredanne opened this issue Jan 29, 2017 · 21 comments

Comments

@pombredanne
Copy link
Contributor

pombredanne commented Jan 29, 2017

@pabs3 and @matthieucan suggested to me on the OFTC #debian-qa IRC channel that ScanCode could be packaged as a Debian Linux package.

@dvc94ch in #288 looks for help on packaging ScanCode for the Guix Linux distro.

Packaging in a distro should be a non-event but it would be a tad involved today.
ScanCode is already buildable as a wheel which is a first good step towards this.

As a second step, any pre-built binaries that are vendored as a user convenience in the src/*/bin directories should be de-vendored from main source tree and provided instead either:

  • as another Python package (which can be vendored again such that the current user experience to be able to just download or clone and run without any other install required still works fine)
  • as distro-provided dependent packages such that a distro packaging of ScanCode can use proper deps on other distro packages in this case

This way a distro packaging would be vastly simplified

@pabs3
Copy link

pabs3 commented Jan 29, 2017 via email

@pombredanne
Copy link
Contributor Author

@pabs3 wrote

Generally, I suggest a good way to replace embedded code copies is to create binary packages for Windows/macOS/etc and only embed deps in the binary packages that can't be installed via binary package deps.

I agree this is the way to go.

Python has a system for binary packages alright and I should be able to use that to include plain binaries for the few pure native exe and libs (such as libarchive, libmagic, p7zip and a few more) as extra packages.

And I can make these such that we have variants for Windows, Mac and Linux32 and 64 instead of the vendored directories at https://github.com/nexB/scancode-toolkit/tree/develop/src/typecode/bin or https://github.com/nexB/scancode-toolkit/tree/develop/src/extractcode/bin ...

I still want to keep the experience to download and run without having anything other dep needed beside Python proper and that on all three OSes and this should work alright: I wish that dealing with installing dependent packages from sources or prebuilt binaries on Windows and Mac would be a well solved problem, but this is not the case.

So the way could be to break these binaries apart such that:

  1. they would still be available but provided in some optional Python package such that the current experience of fetch or clone and run without build-essential like deps can be kept the same

  2. when packaged for a distro or when building from sources, you could use instead the corresponding distro packages (such as https://packages.debian.org/jessie/libarchive13 or https://packages.debian.org/jessie/p7zip or https://packages.debian.org/jessie/file and a few more)

One difficulty is that the available Debian versions of these deps may not match the exact version that ScanCode expects... but that should not be a major issue to deal with (though it will need testing for sure)

@pombredanne
Copy link
Contributor Author

@roscopecoltran for your issue #636 related to the pre-built binaries it is best to do this here

@pombredanne pombredanne added this to the v2.1 milestone Jun 23, 2017
@pombredanne
Copy link
Contributor Author

pombredanne commented Aug 13, 2017

This is also needed for the work of @maxyz
The approach would likely to extend scancode's new plugin mechanisms by @yashdsaraf to provide these binaries as optional plugins. (e.g. optional Python packages) or get them from the PATH as a last resort.

@pombredanne pombredanne modified the milestones: v2.1, v3.0 Oct 4, 2017
@bhavishyagopesh
Copy link

@pombredanne So is this done now, (I mean a PyPI package containing all binaries)...I would like to work on this packaging problem,( and eventually #487 ).

@pombredanne
Copy link
Contributor Author

@bhavishyagopesh Thanks! No this is not done.
The gist of it: check how the commoncode.command code works and how there are bundled pre-built binaries in bin/ dirs in extractcode and typecode.

Ideally, we want a plugin system using the same approach as the scancode plugins with pluggy such that a plugin can provide a certain binary for an os/arch and that this is usable by the code instead of the bundled binaries. Then if no plugin provides a binary, a command or DLL should be used from the PATH as a last resort. This way a certain binary can either be provided with ScanCode as a plugin or not be provided at all and obtained from the PATH as provided by a distro for intance.

@pombredanne
Copy link
Contributor Author

pombredanne commented Nov 3, 2017

@bhavishyagopesh any progress on your side? I just got a new report on a related issue in #834 by @spbkelt so since you want to work on this issue, I'd need to know what's up and what could be an ETA?

@bhavishyagopesh
Copy link

@pombredanne I went through ur suggestion, I need to confirm a few things :

  1. Where do we intend to keep the binaries eventually.
  2. So almost all platform(os/arch) detection is being done by commoncode.system and we just need to fetch the matching binaries using plugin similar to commoncode.command.
  3. Where will the plugin code go, in a separate repo?...

Thanks for your patience.

@pombredanne
Copy link
Contributor Author

Where do we intend to keep the binaries eventually.

In plugins, one for each os/arch and binary with fallback to PATH env var if not provided.

So almost all platform(os/arch) detection is being done by commoncode.system and we just need to fetch the matching binaries using plugin similar to commoncode.command.
Where will the plugin code go, in a separate repo?...

Plugins can go a in module for now, but they should be using the same approach with pluggy designed by @yashdsaraf : they should provide a path to a binary for an os/arch. Eventually they will be in separate repos in the future.

@bhavishyagopesh
Copy link

@pombredanne I'm not quite sure of this but would something like this work:
https://gist.github.com/bhavishyagopesh/7d139dc0cbfc22a0778dde7672f04b5b

and now for eg. libmagic.py can call this plugin instead of load_lib() and the binaries would remain in a folder containing this module...

I think everything like finding the os/arch and load_lib() is being done by commoncode.command ...so just call them and if they fail search using ctypes.CDLL.

Thanks.

@pombredanne
Copy link
Contributor Author

@bhavishyagopesh yes something along these line would likely work.... a PR is welcomed and is best to discuss and review!

nishakm pushed a commit to nishakm/scancode-toolkit that referenced this issue Dec 14, 2017
Since Scancode binaries are built on Debian based machines, and the
prebuilt dependencies are named differently on RedHat based machines
py.test will not be able to find the libbz2.so library on those
machines.
See https://github.com/nexB/scancode-toolkit/issues/443

Added a note about how to work around this issue by symbolically
linking the existing libbz2.so on the filesystem to the expected
name.

This should be removed once
aboutcode-org#469 is solved

Signed-off-by: Nisha K <nishak@vmware.com>
JonoYang pushed a commit that referenced this issue Feb 8, 2018
Since Scancode binaries are built on Debian based machines, and the
prebuilt dependencies are named differently on RedHat based machines
py.test will not be able to find the libbz2.so library on those
machines.
See https://github.com/nexB/scancode-toolkit/issues/443

Added a note about how to work around this issue by symbolically
linking the existing libbz2.so on the filesystem to the expected
name.

This should be removed once
#469 is solved

Signed-off-by: Nisha K <nishak@vmware.com>
yash-nisar pushed a commit to yash-nisar/scancode-toolkit that referenced this issue Feb 8, 2018
Since Scancode binaries are built on Debian based machines, and the
prebuilt dependencies are named differently on RedHat based machines
py.test will not be able to find the libbz2.so library on those
machines.
See https://github.com/nexB/scancode-toolkit/issues/443

Added a note about how to work around this issue by symbolically
linking the existing libbz2.so on the filesystem to the expected
name.

This should be removed once
aboutcode-org#469 is solved

Signed-off-by: Nisha K <nishak@vmware.com>
@pombredanne
Copy link
Contributor Author

Just for reference, the gist is that we are using pre-built libmagic, libarchive and 7z and a few other as a convenience, with some commoncode.command code to pick the right path on each OS. This makes it hard to use system libs instead when scancode is packaged as a Linux distro package.

So we need to find a way to still have this option (e.g. by using one or more wheel with these natives that is optional) and otherwise use a system provided lib (e.g. a distro provided native)

This is part about de-bundling and packaging the natives as wheel and part about having some pluggable way to use that or fallback to a system lib if available and the wheels with native are not
and updating the code that handles loading these natives (either for spawns or ctypes) to use one or the other

@pombredanne
Copy link
Contributor Author

@bhavishyagopesh You never submitted a PR for this, are you still interested?

pombredanne added a commit that referenced this issue Aug 9, 2018
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Aug 10, 2018
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Aug 10, 2018
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Aug 10, 2018
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Aug 10, 2018
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Aug 10, 2018
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Aug 10, 2018
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Aug 10, 2018
Implement this in extractcode and typecode to provide 7zip, libarchive
and libmagic through plugins instead of using vendored binaries in the
core scancode

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Aug 10, 2018
- Some binaries initially committed were not correct for a given
architecture. Also several provide location were not correct and have
been updated.

- All location keys are also inlined in plugins code to avoid recursive
imports

- Extra consistency checks are done on provided locations (they must
exist)

The version reported in p7zip plugins and ABOUT files was 9.20.1 but
it is 9.38.1: this has been fixed.

In sevenzip CLI calls, the empty password arg is now last on the CLI
args to avoid any ambiguity with the file being extracted.

The overall command execute2 logging has been aslo updated.

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Aug 10, 2018
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
pombredanne added a commit that referenced this issue Aug 10, 2018
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Contributor Author

Using plugins for third-parties is now merged in develop and has been released since v2.9.3
I will still need to document how this can be used when creating a distro package for ScanCode... best would be to actually create the packaging here ... help would much welcomed!

@pombredanne
Copy link
Contributor Author

@dankegel you wrote :

I've got a bad feeling about this.
Do you know why this was originally set up in a platform-specific way?

Originally posted by @dankegel in https://github.com/nexB/scancode-toolkit/pull/1437/files/e3c4bd64d7587b2d55f1432c3828a5bf066f212b

This a tad of a wart... but this is the easy I found to be able to create wheels on one OS for all OSes where these wheels do not need any native compilation since they embed only prebuilt and each of them has a shared object and binaries for a specific OS.

The whole story is in this ticket though ping me there if this not clear. The purpose of these /plugins and somewhat complex machinery is to support different usage modes for ScanCode:

  1. used as an app where everything is self-contained, built and bundled in one archive. These are the release archives and a Github zip download

  2. install as a lib or app with pip install where all the deps are available on Pypi including these wheels for each platform. For instance https://pypi.org/project/extractcode-7z/#files and https://pypi.org/project/extractcode-libarchive/#files

  3. packaged as a lib or app in BSD and Linux distros. Here the native prebuilt bundled in 1. or available in 2. would instead be regular distro packages and a plugin would provide a path to these rather than to provide the binaries proper.

The main driver has been to help distros. And effectively changing build flags is not something that should be done lightly as these have been selected very carefully after a lot of trial and errors and tests

@ideepika
Copy link

ideepika commented Apr 7, 2019

Is this project open for GSoC 2019, I am late but I am familiar with linux and package managers used in ubuntu, fedora, git, python, C, and C++. Can I work on some task and create a proposal for this project @pombredanne can you orient me?

@pombredanne
Copy link
Contributor Author

@dexter816 welcome! the devendoring is mostly done and you can see the resulting plugins in https://github.com/nexB/scancode-toolkit/tree/develop/plugins
Each of these contain a prebuilt binary and a small Python shim plugin to provide a path to its bundled binary e.g. a LocationProviderPlugin subclass

The thing that is missing to complete this story would be one or more small plugins using the same kind of code... but bundling no binary and instead providing access to system-installed packages binaries (for instance access to the libarchive .so installed though a standard system package on Debian, or Ubuntu or ... etc instead of the one vendored in https://github.com/nexB/scancode-toolkit/blob/develop/plugins/extractcode-libarchive-manylinux1_x86_64/src/extractcode_libarchive/__init__.py

Such a plugin could for instance detect the distro with https://pypi.org/project/distro/
and based on that provide proper paths to the system libarchive. And fail loudly if the required binaries are missing suggesting a proper installation command.

@pombredanne
Copy link
Contributor Author

@aj4ayushjain at this stage I think this is all completed and working, correct?

@aj4ayushjain
Copy link
Collaborator

aj4ayushjain commented Oct 13, 2019

@pombredanne Naming of the plugins is still an issue which need to be resolved in case of use with the debian package.
http://dpaste.com/0A828GQ

scancode
Traceback (most recent call last):
  File "/usr/bin/scancode", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/home/ayush/.local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3250, in <module>
    @_call_aside
  File "/home/ayush/.local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3234, in _call_aside
    f(*args, **kwargs)
  File "/home/ayush/.local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 3263, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/home/ayush/.local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 583, in _build_master
    ws.require(__requires__)
  File "/home/ayush/.local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/ayush/.local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 786, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'extractcode-libarchive' distribution was not found and is required by scancode-toolkit

@pombredanne
Copy link
Contributor Author

@aj4ayushjain I copied the trace here... it is better to have it self contained.
The 'extractcode-libarchive' distribution was not found.. --> which branch does this happen? what do you think the root cause of the problem would be? you alluded to naming... can you elaborate?

@pombredanne pombredanne modified the milestones: v3.1 , v3.2 Oct 15, 2019
@pombredanne
Copy link
Contributor Author

At this stage all pre-built binaries have been "devendored" and are buildable (eventually from sources) from here https://github.com/nexB/scancode-plugins and we have variants that can be used to rely on system-provided binaries instead.
I am keeping this open as we need to craft porting instructions for distros package maintainers

@pombredanne pombredanne removed this from the v3.3 milestone Sep 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants