Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile (alpine based) + some quick ideas/suggestions #636

Open
roscopecoltran opened this issue Jun 14, 2017 · 30 comments
Open

Dockerfile (alpine based) + some quick ideas/suggestions #636

roscopecoltran opened this issue Jun 14, 2017 · 30 comments

Comments

@roscopecoltran
Copy link

Hi guys,

Hope you are all well !

I found your great repo while I was working on a personal project aiming to do, also, some code analysis with a distributed bot, working as a virtual agent to build a semi-structured database of meta informations about some open source projects (mainly SQL and graph based).

Context:
The goal is to enrich automatically, with a virtual assistant/bot, some of my starred repos by 'left joining', or apply some graphql queries, on some data like, known frameworks detection, github stats or github trends, in order to extract more meta data from some of my starred repositories.

In a nutshell, it is all about to have a more convenient local search engines on my starred repos, create a domain specific topic modeling api. By detecting some patterns, I want to manage a dynamic tree of events that could be dedicated to some sub-tasks like to generate Docker/Docker-Compose files from a database a snippets if some dependencies are detected or matched.

examples of pattern detection to build dynamic dockerfiles:

  • requirements.txt -> append pyp-pip to apk packages to install in a Alpine based dockerfile, but could do the mapping for an Ubuntu based dockerfile for apt package label, python-pip.
    • Docker alpine: apk --no-cache --no-progress --update pyp-pip
    • Docker ubuntu: apt-get install python-pip
  • CMakeLists.txt -> add cmake, add some additional helpers for building C/C++ based projects.

1. Deploy local instances with Docker/Docker Compose

For some tasks, it sounds clear that some content lexer, grep, parser would provide faster responses and results; dependencies scanning, licences; so that s how I found your projects, and that really cool :-)

So first of all, as a dev ops ^^, I was wondering if it would not be easier to bundle scancode into some docker/docker-compose files, of course alpine based in order to keep the size of containers reasonable.

2. Some suggestions/features

So far, I started to give a try to your projects, and it sounds that I will create a fork with some personal ideas/features, close to want the context mentioned above, and wanted to share it with you, so any feedback or experience sharing is tremendously welcomed :-)

- Administration panel

- Topic modelling

- Code parsing/search

-Starred Github managers:

3. Questions related to your roadmap:

Last question, do you plan to migrate to Python3 your stack of scripts ? Is it something that you have on the roadmap and would like to have a community efforts ?

Ps. sorry for the long post, but I was really inspired by your work ^^, so thanks for reading it all :-)

Please have a great day !

Cheers,
Richard

@pombredanne
Copy link
Contributor

@roscopecoltran this sounds like an awesome plan!

Having a Dockerfile would be great indeed.

Some other notes OTH:

  • about Python3: yes the plan is to port to Python... and this should happen pretty soon after the release of v2.0.0. Help is definitely welcomed. The plan was to eventually support only Python 3.6 and up and the question is whether or not to continue support Python 2.

  • something that is likely to help you a lot is the upcoming plugin architecture by @yashdsaraf

@roscopecoltran
Copy link
Author

roscopecoltran commented Jun 16, 2017

Hey,

Thanks for the reply !

It would be awesome to have a flow based processing for starting concurrent tasks on the code. defined in some human readable yaml files ^^.

Also, for any meta based additional info, about a project, I would recommend to use searx as it can flexible on the scope of metasearch to gather for public/web related info, if u want to build a context around the code audit.

Have a good week-end :-)

Cheers,
Richard

@pombredanne
Copy link
Contributor

btw the ticket for Python 3 is #295 and also #442

@roscopecoltran
Copy link
Author

Bonjour Philippe, :-)

Hope you are all well !

I am bundling scancode into a docker alpine container but some errors occured.

I think that some tweaks are required while running the configure script. Mainly, it would be great to use fallbacks to find the library outside the scancode project, like operating a search in standard locations, and keeping the musl order, eg. paths = ['/lib', '/usr/local/lib', '/usr/lib'], or maybe to trigger a build event if the pre-compiled lib failed to be loaded.

In our case, I add such error message while trying to link the libmagic2.so shared library, the configure scripts fails with "__snprintf_chk: symbol not found"; probably due to musl-dev or glibc).

From my understanding, these functions are used to find and map the pre-compiled libmagic shared lib.

The easier solution would be to use an ubuntu based container, but the size of Ubuntu container is such a none-sense for me:

❯ docker images | awk '{print $1"\t"$2"\t"$7" "$8}'
REPOSITORY  TAG SIZE
sample_alpine   latest  167.6 MB
sample_ubuntu   latest  447.8 MB
<none>  <none>  187.9 MB
ubuntu  latest  187.9 MB
redis   latest  109.3 MB
mongo   latest  261.6 MB
golang  latest  709.5 MB
alpine  latest  5.249 MB

Refs used:

Waiting for your input/point of view on that question. :-)

Cheers,
Richard

@pombredanne
Copy link
Contributor

@roscopecoltran this is effectively a todo: the prebuilt binaries need to be unbundled ... this is tracked in #469

@znerd
Copy link

znerd commented Jul 17, 2018

Here's a simple Dockerfile based on Alpine Linux 3.8 and Python 2.7.15:

FROM python:2.7.15-alpine
MAINTAINER Ernst de Haan "ernst.dehaan@mindcurv.com"

ARG SCANCODE_VERSION

RUN apk add build-base libxml2-dev libxslt-dev linux-headers
RUN pip install scancode-toolkit==${SCANCODE_VERSION}

CMD [ "/usr/local/bin/scancode" ]

Source can also be found here:

And the container here:

Here's how to build it with a version tag:

docker build -t scancode-toolkit:latest -t scancode-toolkit:2.2.1 --build-arg SCANCODE_VERSION=2.2.1 .

@znerd
Copy link

znerd commented Jul 17, 2018

Hmm, while a simple scancode --help works OK:

docker run -it mindcurv/scancode-toolkit:2.2.1 scancode --help

…a real scan doesn't work yet. I'm running into (apparently) the same issue as @roscopecoltran :

$ docker run -it mindcurv/scancode-toolkit:2.2.1 scancode --format json . scancode_result.json
Scanning files for: licenses, copyrights, packages with 1 process(es)...
Building license detection index...Traceback (most recent call last):
  File "/usr/local/bin/scancode", line 11, in <module>
    sys.exit(scancode())
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/scancode/utils.py", line 74, in main
    standalone_mode=standalone_mode, **extra)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/scancode/cli.py", line 490, in scancode
    pre_scan_plugins=pre_scan_plugins)
  File "/usr/local/lib/python2.7/site-packages/scancode/cli.py", line 572, in scan
    get_index(False)
  File "/usr/local/lib/python2.7/site-packages/licensedcode/cache.py", line 188, in get_index
    _LICENSES_INDEX = get_or_build_index_through_cache()
  File "/usr/local/lib/python2.7/site-packages/licensedcode/cache.py", line 108, in get_or_build_index_through_cache
    from licensedcode.index import LicenseIndex
  File "/usr/local/lib/python2.7/site-packages/licensedcode/index.py", line 47, in <module>
    from licensedcode import match
  File "/usr/local/lib/python2.7/site-packages/licensedcode/match.py", line 36, in <module>
    from licensedcode import query
  File "/usr/local/lib/python2.7/site-packages/licensedcode/query.py", line 32, in <module>
    import typecode
  File "/usr/local/lib/python2.7/site-packages/typecode/__init__.py", line 27, in <module>
    from typecode.contenttype import get_type
  File "/usr/local/lib/python2.7/site-packages/typecode/contenttype.py", line 47, in <module>
    from typecode import magic2
  File "/usr/local/lib/python2.7/site-packages/typecode/magic2.py", line 221, in <module>
    libmagic = load_lib()
  File "/usr/local/lib/python2.7/site-packages/typecode/magic2.py", line 214, in load_lib
    lib = ctypes.CDLL(magic_so)
  File "/usr/local/lib/python2.7/ctypes/__init__.py", line 366, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: Error relocating /usr/local/lib/python2.7/site-packages/typecode/bin/linux-64/lib/libmagic.so: __snprintf_chk: symbol not found

I will do some more research.

@znerd
Copy link

znerd commented Jul 17, 2018

I've created a 2nd Dockerfile, this time based on Debian Stretch. That seems to work.

Stuff is over here:

@pombredanne
Copy link
Contributor

pombredanne commented Jul 17, 2018

@znerd Thanks you for figuring all this out and sorry for some of the troubles: keep me posted as you progress!

At the moment, an Alpine container would require a bit of manual work as we there are some bundled, pre-built binaries that would need to be rebuilt by hand for this to work with an Alpine-style static build context.

Things should work fine with Debian/Ubuntu/CentOS/Fedora/Suse and similar

Here are two other examples: https://github.com/clearlydefined/tool-images/blob/0392771a408dbfb2ab5fcd88f702c43f207aa4ce/scancode/Dockerfile

https://github.com/clearlydefined/crawler/blob/1567d78abb7c1ee00c4ef89129ae9f1c56c92df4/Dockerfile

I am not sure which version of Scancode you used but I would suggest using the latest 2.9.x pre v3 releases.

@sschuberth
Copy link
Collaborator

I'm currently experimenting with a minimal Dockerfile that already has glibc installed:

FROM frolvlad/alpine-python2

RUN pip install scancode-toolkit

But building the image gives

Collecting extractcode-libarchive (from scancode-toolkit)
ERROR: Could not find a version that satisfies the requirement extractcode-libarchive (from scancode-toolkit) (from versions: none)
ERROR: No matching distribution found for extractcode-libarchive (from scancode-toolkit)

@pombredanne, any idea what the issue is?

@pombredanne
Copy link
Contributor

@sschuberth sorry for the late reply...
ScanCode has a dep on extractcode-libarchive and this contains a prebuilt native that may not be happy with an Alpine static setup.
What is likely is that the glibc provided there may not match?

Could you try to use a release archive with this Alpine setup instead to eliminate some moving parts?

Just our if curiosity is using Alpine really proving such big benefits for the pain it brings?

@sschuberth
Copy link
Collaborator

What is likely is that the glibc provided there may not match?

What are the criteria by which glibc is matched?

Could you try to use a release archive with this Alpine setup instead to eliminate some moving parts?

Will try.

Just our if curiosity is using Alpine really proving such big benefits for the pain it brings?

Well, actually ScanCode is the first project I run into which resists to easily run on Alpine. But yes, image-size wise I believe it's worth the effort.

@pombredanne
Copy link
Contributor

@sschuberth

What are the criteria by which glibc is matched?

I have no idea :D

that said, it (making things work on Alpine) could be something that could may be dealt with in the GSoC project of @aj4ayushjain ?

@sschuberth
Copy link
Collaborator

@pombredanne, so when installing from source while building the Alpine-Docker-image I get

Collecting extractcode-libarchive (from scancode-toolkit===3.0.2.post620.415d0c892->-r /scancode-toolkit/etc/conf/base.txt (line 10))
  Could not find a version that satisfies the requirement extractcode-libarchive (from scancode-toolkit===3.0.2.post620.415d0c892->-r /scancode-toolkit/etc/conf/base.txt (line 10)) (from versions: )
No matching distribution found for extractcode-libarchive (from scancode-toolkit===3.0.2.post620.415d0c892->-r /scancode-toolkit/etc/conf/base.txt (line 10))
* Installing components ...

Failed to execute command:
pip install --upgrade --no-index --no-cache-dir --find-links="/scancode-toolkit/thirdparty" -r "/scancode-toolkit/etc/conf/base.txt". Aborting...

Not really much more telling about what exactly is the issue why extractcode-libarchive could not be installed.

@aj4ayushjain
Copy link
Collaborator

aj4ayushjain commented May 10, 2019

@sschuberth
It does not have the binaries of extractcode-libarchive plugin for alpine that's what it mean so
@pombredanne can't we build a prebuilt native binary for alpine and fix this.And if there is enough time available i will do it in alpine because it's whole new world so i need to study on this.

@sschuberth
Copy link
Collaborator

Ok, this Dockerfile gets me a bit further:

FROM frolvlad/alpine-python2

RUN apk add --no-cache py-icu

# Override PIP's glibc detection, see https://github.com/pypa/pip/issues/3969.
RUN echo "manylinux1_compatible = True" > /usr/lib/python2.7/_manylinux.py

RUN pip install --prefer-binary scancode-toolkit

But now it still wants to build intbitset using gcc:

  Running setup.py install for intbitset: started
    Running setup.py install for intbitset: finished with status 'error'
    ERROR: Complete output from command /usr/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-4aegy3/intbitset/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-rymTcl/install-record.txt --single-version-externally-managed --compile:
    ERROR: running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    copying intbitset/intbitset_helper.py -> build/lib.linux-x86_64-2.7
    copying intbitset/version.py -> build/lib.linux-x86_64-2.7
    running egg_info
    writing requirements to intbitset/intbitset.egg-info/requires.txt
    writing intbitset/intbitset.egg-info/PKG-INFO
    writing top-level names to intbitset/intbitset.egg-info/top_level.txt
    writing dependency_links to intbitset/intbitset.egg-info/dependency_links.txt
    reading manifest file 'intbitset/intbitset.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no files found matching '*.css' under directory 'docs/_themes'
    warning: no files found matching '*.css_t' under directory 'docs/_themes'
    warning: no files found matching '*.conf' under directory 'docs/_themes'
    warning: no files found matching '*.html' under directory 'docs/_themes'
    warning: no files found matching 'COPYING' under directory 'docs/_themes'
    warning: no files found matching 'README' under directory 'docs/_themes'
    warning: no files found matching '*.html' under directory 'docs/_templates'
    writing manifest file 'intbitset/intbitset.egg-info/SOURCES.txt'
    running build_ext
    building 'intbitset' extension
    creating build/temp.linux-x86_64-2.7
    creating build/temp.linux-x86_64-2.7/intbitset
    gcc -fno-strict-aliasing -Os -fomit-frame-pointer -g -DNDEBUG -Os -fomit-frame-pointer -g -DTHREAD_STACK_SIZE=0x100000 -fPIC -I/usr/include/python2.7 -c intbitset/intbitset.c -o build/temp.linux-x86_64-2.7/intbitset/intbitset.o -O3 -march=core2 -mtune=native
    unable to execute 'gcc': No such file or directory
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command "/usr/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-4aegy3/intbitset/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-rymTcl/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-4aegy3/intbitset/
WARNING: You are using pip version 19.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

@sschuberth
Copy link
Collaborator

can't we build a prebuilt native binary for alpine

@aj4ayushjain, that basically what I had been asking for in #1262, but it turns out to be quite laborious, and as mentioned here I by now believe the better approach is to use an Alpine Docker image that has https://github.com/sgerrand/alpine-pkg-glibc installed, which is what I'm trying right now.

@earlyster
Copy link

can't we build a prebuilt native binary for alpine

@aj4ayushjain, that basically what I had been asking for in #1262, but it turns out to be quite laborious, and as mentioned here I by now believe the better approach is to use an Alpine Docker image that has https://github.com/sgerrand/alpine-pkg-glibc installed, which is what I'm trying right now.

I tried installing alpine-pkg-glibc and it didn't work for me were you able to get this to work? .. still getting this error

Traceback (most recent call last):
File "/usr/bin/scancode", line 7, in
from scancode.cli import scancode
File "/usr/lib/python2.7/site-packages/scancode/cli.py", line 206, in
plugin_classes, plugin_options = PluginManager.load_plugins()
File "/usr/lib/python2.7/site-packages/plugincode/init.py", line 190, in load_plugins
mgr_setup = manager.setup()
File "/usr/lib/python2.7/site-packages/plugincode/init.py", line 217, in setup
self.manager.load_setuptools_entrypoints(entrypoint)
File "/usr/lib/python2.7/site-packages/pluggy/manager.py", line 292, in load_setuptools_entrypoints
plugin = ep.load()
File "/usr/lib/python2.7/site-packages/importlib_metadata/init.py", line 90, in load
module = import_module(match.group('module'))
File "/usr/lib/python2.7/importlib/init.py", line 37, in import_module
import(name)
File "/usr/lib/python2.7/site-packages/summarycode/generated.py", line 38, in
import typecode.contenttype
File "/usr/lib/python2.7/site-packages/typecode/init.py", line 27, in
from typecode.contenttype import get_type
File "/usr/lib/python2.7/site-packages/typecode/contenttype.py", line 54, in
from typecode import magic2
File "/usr/lib/python2.7/site-packages/typecode/magic2.py", line 205, in
libmagic = load_lib()
File "/usr/lib/python2.7/site-packages/typecode/magic2.py", line 99, in load_lib
return command.load_shared_library(dll, libdir)
File "/usr/lib/python2.7/site-packages/commoncode/command.py", line 222, in load_shared_library
lib = ctypes.CDLL(dll_path)
File "/usr/lib/python2.7/ctypes/init.py", line 366, in init
self._handle = _dlopen(self._name, mode)
OSError: Error relocating /usr/lib/python2.7/site-packages/typecode_libmagic/lib/libmagic.so: __snprintf_chk: symbol not found

@pombredanne
Copy link
Contributor

@earlyster sorry for that! @sschuberth if you have an Alpine Dockerfile taht works, we can make this part of the repo here alright.

@earlyster also the work that @aj4ayushjain is doing on packaging in general (and debian in particular) will likely help here too as we would possibly have a clearer way to get things from Alpine packages.

@earlyster
Copy link

Thanks @pombredanne for the update!

@sschuberth
Copy link
Collaborator

sschuberth commented Jul 12, 2019

IIRC the last version of my Alpine-based Dockerfile has the same __snprintf_chk: symbol not found error despite glibc being installed. I currently have no time to look further into this.

@earlyster
Copy link

@sschuberth ok so we are in the same boat .. not able to use scantool with alpine based docker image.

@cpunekar
Copy link

cpunekar commented Aug 2, 2019

Trying to run on alpine but it's not working. Any updates on this?

@sschuberth
Copy link
Collaborator

I finally managed to create an Alpine-based Docker image with Python 3.6 that is able to run ScanCode. Feel free to give it a try: https://github.com/sschuberth/docker-files/blob/4ebd681dfef5a8142b92c1157edd4f5495f0706b/scancode/Dockerfile

@pombredanne
Copy link
Contributor

@sschuberth you rock!
Do you know if the base image is something that is reliable In https://github.com/sschuberth/docker-files/blob/4ebd681dfef5a8142b92c1157edd4f5495f0706b/scancode/Dockerfile#L2
As a first gut reaction in:

# See https://github.com/sgerrand/alpine-pkg-glibc/issues/111#issuecomment-466301535.
FROM frolvlad/alpine-miniconda3:python3.6

... frolvlad makes me cringe a bit to use a trusted base image.

@sschuberth
Copy link
Collaborator

That depends on your definition of "trusted" 😉 I was also a bit reluctant at first as it's "just some random single user" (and not e.g. a company / foundation) maintaining these images. But @frolvlad seems to be very active in the Docker / Alpine community and quite a few people seem to be using his images. So I decided for myself to trust these base images.

@pombredanne
Copy link
Contributor

It feels a tad engaged to me if I dig a little:

  1. we start from a ultimate alpine:3.11 base image which I would assume is the base standard one.
    https://github.com/Docker-Hub-frolvlad/docker-alpine-glibc/blob/master/Dockerfile
  2. that then fetches a pre-built binary from https://github.com/sgerrand/alpine-pkg-glibc/releases
  3. and finally these are installed on top https://github.com/Docker-Hub-frolvlad/docker-alpine-miniconda3/blob/master/Dockerfile

Just curious, how big a startup speed gain and size gain do you get with this?

@sschuberth
Copy link
Collaborator

I haven't bothered to measure this so far TBH...

@pombredanne
Copy link
Contributor

FWIW, there is a new support for Alpine musl in the Python wheel manylinux including support on PyPI. This could pave the way to support Alpine images.

@frol
Copy link

frol commented Feb 2, 2022

Oh, wow, I am quite amused to find these types of topics and articles throughout the internet using my Alpine-baked images here and there (frolvlad is my handle on Docker Hub). Well, I would not trust a random guy on the internet to provide a base image for some mission-critical software, but you are free to just copy-paste the contents of the Dockerfile to your image based on plain alpine image if you wish 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants