-
-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker: implement multi-stage-builds for ubuntu #3170
Conversation
P.S.: Backporting may make sense (e.g. after 8.3.1 release)... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thank you! I just added some tiny remarks.
```bash | ||
# Test basic functionality | ||
$ docker run -ti \ | ||
-v /opt/src/grass:/grassdb \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding /grassdb
is the (mounted) destination in the container where the grass data is stored, not the source code. I would distinguish between data and software and use a different target directory. In the end it is not really that important but it might prevent confusion.
Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com>
Is the (download?) size a big issue with Docker images these days? I think the original idea was that it's okay to take extra space. It would be good to avoid the majority of people ending up building their own images because they need datum grids or C addons. |
docker/ubuntu/Dockerfile
Outdated
COPY . /src/grass_build/ | ||
|
||
WORKDIR /src/grass_build | ||
|
||
# Cleanup potentially leftover GISRC file with wrong path to "demolocation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Line after (168, I can't comment directly on it), there is a RUN rm
line. This will create a new layer, to remove files in the build stage image, but the files should be cleaned in the same step as they are created to not waste space. This is correct if the files to remove were added from the COPY step. In that case, maybe consider using a .dockerignore file (https://docs.docker.com/engine/reference/builder/#dockerignore-file) to not have them transferred in the first place.
Since this rm is not in the final stage, it would not impact the image produced at the end.
docker/ubuntu/Dockerfile
Outdated
# Unset environmental variables to avoid later compilation issues | ||
ENV INTEL "" | ||
ENV MYCFLAGS "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the removed lines, there is no other compilation after, so these ENV are now useless, since these ENV don't affect the env when a new "FROM" is used.
docker/ubuntu/Dockerfile
Outdated
# Copy GRASS GIS from build image | ||
COPY --from=build /usr/local/bin/grass* /usr/local/bin/ | ||
COPY --from=build /usr/local/grass84 /usr/local/grass84/ | ||
COPY --from=build /src/site-packages /usr/lib/python3.10/ | ||
COPY --from=build /usr/lib/gdalplugins /usr/lib/gdalplugins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using COPY --link --from=build
in order to not invalidate the cache below if the files changed. https://docs.docker.com/engine/reference/builder/#copy---link When using COPY --link
, the files are copied as an independent scratch layer, and it's content can be changed without affecting the cache of others. It is useful in multi-stage builds when combining files built in independent stages at the end, since the files copied are really independent and no other RUN commands should be unconditionally run below.
Here, there is only the RUN ln
that we might need to check that the behaviour is still correct.
Note: this needs to have
# syntax=docker/dockerfile:1
Or
# syntax=docker/dockerfile:1.4
as the first line, and use the buildx builder. Though usually, multi-stage builds are used with buildx, since it can optimismes what stages to build and built in parallel. The ancien builder would build all stages, needed or not, and wouldn't be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Included in 4d4da07
The symbolic link seems to work.
Personally, I would be OK with adding more dev-packages and datum grids. Basic build tools for addons are kept already and central dev-packages (gdal-dev, geos-dev) are available in the resulting image too. The question is where do we want to draw the line? Do we btw. have an overview over the dependencies for all official addons, so we could make an informed choice of which packages to include in the image(s)? Maybe we want to provide (in another PR) different sets of docker images like other projects do: "slim", "full", "with_addons", "python3.9", ... For now, changes to the Ubuntu Dockerfile in this PR are mostly oriented on the Dockerfile for Alpine. So it is more similar to the Alpine image... |
Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com>
Not an overview, but we have some additional dependencies installed in the CI. Some info could be extracted from the Makefiles and imports in Python.
It makes sense for Alpine to create a slim image since that's the idea behind Alpine. That does not apply to Ubuntu. Ubuntu may suggest the opposite than Alpine. The invented meaning of the word ubuntu the comes to my mind. Urban Dictionary:
|
Here is a list of imports of non-standard python packages in the AddOns repo:
So the list is quite long and some packages are relatively large. Python packages on the other hand can be installed as normal user. So maybe those are actually not that important to include? C-libraries (that require root/admin access) are unfortunately not as straight forward to identify from the code. And even if people with C-knowledge can guess the name of the library from the include, the package of the distro can have different names / versions again.... Stuff I assume I identified are:
But I may have overlooked quite abit. And then there is R and R packages that are used by some addons... Should R be added as well? Packages can be installed by normal users... So, the questions remains what is a good level of "completeness" with regards to addon dependencies in Docker images (dependency list should be maintained at some point)... I have no objections against adding stuff, but help from people with overview esp. of the C- (C++?)-addons would be very much appreciated. |
C:
One feature of the addons repo is that the tools there can have an arbitrary dependency (R, CGAL, ... or Java, Hadoop, ...). With the current build and install system, it is hard to support dependencies in addons in a cross-platform way (this might be better with CMake). So leaving this for a custom image would be quite justifiable here. On the other hand, an Ubuntu image or Linux in general is the one place where a hardcoded path in Makefile may just work. Anyway, when I was suggesting not to make the image too slim, I was more thinking about being able to install addons (in C, C++) which require the same |
Addressed in: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Downloading proj vdatum could be a good candidate for a separate stage, see full comment :)
docker/ubuntu/Dockerfile
Outdated
RUN mkdir vdatum && \ | ||
cd vdatum && \ | ||
wget -q http://download.osgeo.org/proj/vdatum/usa_geoid2012.zip && unzip -j -u usa_geoid2012.zip -d /usr/share/proj; \ | ||
wget -q http://download.osgeo.org/proj/vdatum/usa_geoid2009.zip && unzip -j -u usa_geoid2009.zip -d /usr/share/proj; \ | ||
wget -q http://download.osgeo.org/proj/vdatum/usa_geoid2003.zip && unzip -j -u usa_geoid2003.zip -d /usr/share/proj; \ | ||
wget -q http://download.osgeo.org/proj/vdatum/usa_geoid1999.zip && unzip -j -u usa_geoid1999.zip -d /usr/share/proj; \ | ||
wget -q http://download.osgeo.org/proj/vdatum/vertcon/vertconc.gtx && mv vertconc.gtx /usr/share/proj; \ | ||
wget -q http://download.osgeo.org/proj/vdatum/vertcon/vertcone.gtx && mv vertcone.gtx /usr/share/proj; \ | ||
wget -q http://download.osgeo.org/proj/vdatum/vertcon/vertconw.gtx && mv vertconw.gtx /usr/share/proj; \ | ||
wget -q http://download.osgeo.org/proj/vdatum/egm96_15/egm96_15.gtx && mv egm96_15.gtx /usr/share/proj; \ | ||
wget -q http://download.osgeo.org/proj/vdatum/egm08_25/egm08_25.gtx && mv egm08_25.gtx /usr/share/proj; \ | ||
cd .. && \ | ||
rm -rf vdatum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these vdatum downloads aren't needed for the compilation of grass, these can be a really good candidate to be in a separate stage, and copied over with a COPY --link
. Using a fast base image, that is, either a small image or an image already downloaded, will allow to download in parallel to the build. Copying them over in a COPY --link
layer will make that layer not change when the rest of the build changes, and could be cached locally (and could even be the same if used in other Docker images that would result in the same layer hash).
I couldn't add the comment to these lines to the file, so I added them to the specific commit that reverted a change.
Thanks @echoix I am learning a lot here (still relatively new to docker). Did you mean something like: 1822d00 ? Additional follow-up on datum grids: datum grids fetched before this PR are meant for proj < 7, right? So I changed the source here as the docker image now uses proj > 9.0. Pleasea anyone, correct me if I am wrong. The mirroring of the entire https://cdn.proj.org/ should be also less prone to accidental ommissions... It is more data though, (703 MB in datum grids)... |
What about this alternative:
If set to ON, enable the capability to use remote grids stored on CDN (Content Delivery Network) storage, when grids are not available locally.
hence, using |
Please have a look at the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m good for the Docker part! When the deduplication was mentioned by neteller the other day, that was the advanced multi-stage « conditional » way I was thinking, but didn’t think it was appropriate for me to suggest it this early, but it was really well done. For the other files, I can’t really comment more than looks good to me.
I get a little experience and practice on advanced Docker and CI when collaborating/maintaining Megalinter, as the bulk of the repo is building lots of Docker images, and pushing the CI workflows.
We may adjust the way docker images are built and pushed in a follow up PR, and remove the ubuntu_wxgui directory in such a PR... |
Taking the liberty to merge. Since this is mainly packaging / shipping related, I would like to suggest to backport it after 8.3.1 is released, so "current" and "main" images become as comparable as possible!?! Please feel free to remove the backport label if anyone disagrees... |
This was merged with broken Additional checks check which is actually catching an issue relevant to this PR namely enforcing intentional code duplicity. |
- Copies over the Ubuntu Dockerfile edited in #3170 to the main Dockerfile in the root of the repo. - Adds a note at the top for future edits to the file (in both files). Additionally, this includes a temporary fix for pytest-pylint where new version finds more issues than the old one, so we need to use the old one before we fix the issues.
* test grass.script.setup * test script for docker images * black * add test instructions * multi-stage-build similar to alpine * Apply suggestions from code review Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com> * add test for GDAL-plugin * copy GDAL-plugin and set path * Update docker/ubuntu/Dockerfile Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com> * ignore dist.* in docker builds * address review comments * separate datum grid (proj >= 7) stage * use network grids * GUI build argument * GUI build argument --------- Co-authored-by: ninsbl <stbl@nve.no> Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com> Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com>
- Copies over the Ubuntu Dockerfile edited in OSGeo#3170 to the main Dockerfile in the root of the repo. - Adds a note at the top for future edits to the file (in both files). Additionally, this includes a temporary fix for pytest-pylint where new version finds more issues than the old one, so we need to use the old one before we fix the issues.
* test grass.script.setup * test script for docker images * black * add test instructions * multi-stage-build similar to alpine * Apply suggestions from code review Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com> * add test for GDAL-plugin * copy GDAL-plugin and set path * Update docker/ubuntu/Dockerfile Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com> * ignore dist.* in docker builds * address review comments * separate datum grid (proj >= 7) stage * use network grids * GUI build argument * GUI build argument --------- Co-authored-by: ninsbl <stbl@nve.no> Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com> Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com>
- Copies over the Ubuntu Dockerfile edited in OSGeo#3170 to the main Dockerfile in the root of the repo. - Adds a note at the top for future edits to the file (in both files). Additionally, this includes a temporary fix for pytest-pylint where new version finds more issues than the old one, so we need to use the old one before we fix the issues.
* test grass.script.setup * test script for docker images * black * add test instructions * multi-stage-build similar to alpine * Apply suggestions from code review Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com> * add test for GDAL-plugin * copy GDAL-plugin and set path * Update docker/ubuntu/Dockerfile Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com> * ignore dist.* in docker builds * address review comments * separate datum grid (proj >= 7) stage * use network grids * GUI build argument * GUI build argument --------- Co-authored-by: ninsbl <stbl@nve.no> Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com> Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com>
- Copies over the Ubuntu Dockerfile edited in OSGeo#3170 to the main Dockerfile in the root of the repo. - Adds a note at the top for future edits to the file (in both files). Additionally, this includes a temporary fix for pytest-pylint where new version finds more issues than the old one, so we need to use the old one before we fix the issues.
Do you plan to backport this PR, @ninsbl ? |
This PR implements multi-stage-builds for ubuntu, comparable to how they are set up for alpine. It also adds a test script for docker files, that should be usable across different distributions...
Image size is reduced to ~1.6 GB (from > 2GB), while some new features are added and datum grids are removed.
A major change is also that GIS packages now come from
ubuntugis-unstable
, so the image ships more recent versions of GDAL, GEOS. In addition, PDAL does not have to be compiled...Some dev-packages (e.g. gdal-dev, geos-dev) are kept for addon installation, but they still take quite some space so, I am happy to remove some more, depending on what you consider a good size/functionality trade-off.
Adresses: #776