Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker: implement multi-stage-builds for ubuntu #3170

Merged
merged 15 commits into from
Oct 13, 2023
Merged

Conversation

ninsbl
Copy link
Member

@ninsbl ninsbl commented Sep 21, 2023

This PR implements multi-stage-builds for ubuntu, comparable to how they are set up for alpine. It also adds a test script for docker files, that should be usable across different distributions...

Image size is reduced to ~1.6 GB (from > 2GB), while some new features are added and datum grids are removed.

A major change is also that GIS packages now come from ubuntugis-unstable, so the image ships more recent versions of GDAL, GEOS. In addition, PDAL does not have to be compiled...

Some dev-packages (e.g. gdal-dev, geos-dev) are kept for addon installation, but they still take quite some space so, I am happy to remove some more, depending on what you consider a good size/functionality trade-off.

Adresses: #776

@ninsbl ninsbl added the docker Docker related label Sep 21, 2023
@ninsbl ninsbl added this to the 8.4.0 milestone Sep 21, 2023
@ninsbl
Copy link
Member Author

ninsbl commented Sep 21, 2023

P.S.: Backporting may make sense (e.g. after 8.3.1 release)...

Copy link
Contributor

@mmacata mmacata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thank you! I just added some tiny remarks.

docker/ubuntu/Dockerfile Outdated Show resolved Hide resolved
docker/ubuntu/Dockerfile Outdated Show resolved Hide resolved
docker/ubuntu/Dockerfile Outdated Show resolved Hide resolved
docker/ubuntu/Dockerfile Outdated Show resolved Hide resolved
```bash
# Test basic functionality
$ docker run -ti \
-v /opt/src/grass:/grassdb \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding /grassdb is the (mounted) destination in the container where the grass data is stored, not the source code. I would distinguish between data and software and use a different target directory. In the end it is not really that important but it might prevent confusion.

Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com>
@wenzeslaus
Copy link
Member

...Image size is reduced to ~1.6 GB (from > 2GB), while some new features are added and datum grids are removed.
...
Some dev-packages (e.g. gdal-dev, geos-dev) are kept for addon installation, but they still take quite some space...

Is the (download?) size a big issue with Docker images these days? I think the original idea was that it's okay to take extra space. It would be good to avoid the majority of people ending up building their own images because they need datum grids or C addons.

COPY . /src/grass_build/

WORKDIR /src/grass_build

# Cleanup potentially leftover GISRC file with wrong path to "demolocation"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Line after (168, I can't comment directly on it), there is a RUN rm line. This will create a new layer, to remove files in the build stage image, but the files should be cleaned in the same step as they are created to not waste space. This is correct if the files to remove were added from the COPY step. In that case, maybe consider using a .dockerignore file (https://docs.docker.com/engine/reference/builder/#dockerignore-file) to not have them transferred in the first place.

Since this rm is not in the final stage, it would not impact the image produced at the end.

Comment on lines 227 to 229
# Unset environmental variables to avoid later compilation issues
ENV INTEL ""
ENV MYCFLAGS ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the removed lines, there is no other compilation after, so these ENV are now useless, since these ENV don't affect the env when a new "FROM" is used.

Comment on lines 255 to 259
# Copy GRASS GIS from build image
COPY --from=build /usr/local/bin/grass* /usr/local/bin/
COPY --from=build /usr/local/grass84 /usr/local/grass84/
COPY --from=build /src/site-packages /usr/lib/python3.10/
COPY --from=build /usr/lib/gdalplugins /usr/lib/gdalplugins
Copy link
Member

@echoix echoix Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using COPY --link --from=build in order to not invalidate the cache below if the files changed. https://docs.docker.com/engine/reference/builder/#copy---link When using COPY --link, the files are copied as an independent scratch layer, and it's content can be changed without affecting the cache of others. It is useful in multi-stage builds when combining files built in independent stages at the end, since the files copied are really independent and no other RUN commands should be unconditionally run below.

Here, there is only the RUN ln that we might need to check that the behaviour is still correct.

Note: this needs to have

# syntax=docker/dockerfile:1

Or

# syntax=docker/dockerfile:1.4

as the first line, and use the buildx builder. Though usually, multi-stage builds are used with buildx, since it can optimismes what stages to build and built in parallel. The ancien builder would build all stages, needed or not, and wouldn't be faster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Included in 4d4da07

The symbolic link seems to work.

docker/ubuntu/Dockerfile Outdated Show resolved Hide resolved
@ninsbl
Copy link
Member Author

ninsbl commented Sep 22, 2023

Is the (download?) size a big issue with Docker images these days? I think the original idea was that it's okay to take extra space. It would be good to avoid the majority of people ending up building their own images because they need datum grids or C addons.

Personally, I would be OK with adding more dev-packages and datum grids. Basic build tools for addons are kept already and central dev-packages (gdal-dev, geos-dev) are available in the resulting image too. The question is where do we want to draw the line?

Do we btw. have an overview over the dependencies for all official addons, so we could make an informed choice of which packages to include in the image(s)?

Maybe we want to provide (in another PR) different sets of docker images like other projects do: "slim", "full", "with_addons", "python3.9", ...

For now, changes to the Ubuntu Dockerfile in this PR are mostly oriented on the Dockerfile for Alpine. So it is more similar to the Alpine image...

ninsbl and others added 2 commits September 22, 2023 22:11
Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com>
@wenzeslaus
Copy link
Member

Do we btw. have an overview over the dependencies for all official addons, so we could make an informed choice of which packages to include in the image(s)?

Not an overview, but we have some additional dependencies installed in the CI. Some info could be extracted from the Makefiles and imports in Python.

For now, changes to the Ubuntu Dockerfile in this PR are mostly oriented on the Dockerfile for Alpine. So it is more similar to the Alpine image...

It makes sense for Alpine to create a slim image since that's the idea behind Alpine. That does not apply to Ubuntu. Ubuntu may suggest the opposite than Alpine. The invented meaning of the word ubuntu the comes to my mind. Urban Dictionary:

Ubuntu is an ancient african word, meaning "I can't configure Debian"

@ninsbl
Copy link
Member Author

ninsbl commented Sep 25, 2023

Here is a list of imports of non-standard python packages in the AddOns repo:

cf_units
cvxopt
h5py
hdfs
ipdb
jinja2
keras
landsatxplore
lxml
matplotlib
mlpy
netCDF4
nnbathy
osgeo (GDAL)
owslib
pandas
PIL
ply
pycsw
pyearth
pyexcel_ods3
pygbif
pygments
pymodis
pyproj
pyspatialite
pysptools
requests
richdem
rpy2
scipy
seaborn
sentinelsat
skimage
sklearn
sqlalchemy
thredds_crawler
tqdm
urllib

So the list is quite long and some packages are relatively large. Python packages on the other hand can be installed as normal user. So maybe those are actually not that important to include?

C-libraries (that require root/admin access) are unfortunately not as straight forward to identify from the code. And even if people with C-knowledge can guess the name of the library from the include, the package of the distro can have different names / versions again.... Stuff I assume I identified are:

flex
fftw
OpenMP
CGAL

But I may have overlooked quite abit.

And then there is R and R packages that are used by some addons... Should R be added as well? Packages can be installed by normal users...

So, the questions remains what is a good level of "completeness" with regards to addon dependencies in Docker images (dependency list should be maintained at some point)...

I have no objections against adding stuff, but help from people with overview esp. of the C- (C++?)-addons would be very much appreciated.

@wenzeslaus
Copy link
Member

C:

  • OpenMP is an optional core dependency. You likely want this in the image.
  • flex, fftw: Technically optional, but crucial. Do others agree?
  • CGAL: Addons-only

One feature of the addons repo is that the tools there can have an arbitrary dependency (R, CGAL, ... or Java, Hadoop, ...). With the current build and install system, it is hard to support dependencies in addons in a cross-platform way (this might be better with CMake). So leaving this for a custom image would be quite justifiable here. On the other hand, an Ubuntu image or Linux in general is the one place where a hardcoded path in Makefile may just work.

Anyway, when I was suggesting not to make the image too slim, I was more thinking about being able to install addons (in C, C++) which require the same *-dev packages the core requires, not necessary about being able to install and run an arbitrary addon tool.

@ninsbl
Copy link
Member Author

ninsbl commented Sep 26, 2023

Anyway, when I was suggesting not to make the image too slim, I was more thinking about being able to install addons (in C, C++) which require the same *-dev packages the core requires, not necessary about being able to install and run an arbitrary addon tool.

Addressed in:
4d4da07
Now central dev-packages (OpenMP, fftw, and others) are included in the final image, together with datum grids.

Copy link
Member

@echoix echoix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downloading proj vdatum could be a good candidate for a separate stage, see full comment :)

Comment on lines 98 to 110
RUN mkdir vdatum && \
cd vdatum && \
wget -q http://download.osgeo.org/proj/vdatum/usa_geoid2012.zip && unzip -j -u usa_geoid2012.zip -d /usr/share/proj; \
wget -q http://download.osgeo.org/proj/vdatum/usa_geoid2009.zip && unzip -j -u usa_geoid2009.zip -d /usr/share/proj; \
wget -q http://download.osgeo.org/proj/vdatum/usa_geoid2003.zip && unzip -j -u usa_geoid2003.zip -d /usr/share/proj; \
wget -q http://download.osgeo.org/proj/vdatum/usa_geoid1999.zip && unzip -j -u usa_geoid1999.zip -d /usr/share/proj; \
wget -q http://download.osgeo.org/proj/vdatum/vertcon/vertconc.gtx && mv vertconc.gtx /usr/share/proj; \
wget -q http://download.osgeo.org/proj/vdatum/vertcon/vertcone.gtx && mv vertcone.gtx /usr/share/proj; \
wget -q http://download.osgeo.org/proj/vdatum/vertcon/vertconw.gtx && mv vertconw.gtx /usr/share/proj; \
wget -q http://download.osgeo.org/proj/vdatum/egm96_15/egm96_15.gtx && mv egm96_15.gtx /usr/share/proj; \
wget -q http://download.osgeo.org/proj/vdatum/egm08_25/egm08_25.gtx && mv egm08_25.gtx /usr/share/proj; \
cd .. && \
rm -rf vdatum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these vdatum downloads aren't needed for the compilation of grass, these can be a really good candidate to be in a separate stage, and copied over with a COPY --link. Using a fast base image, that is, either a small image or an image already downloaded, will allow to download in parallel to the build. Copying them over in a COPY --link layer will make that layer not change when the rest of the build changes, and could be cached locally (and could even be the same if used in other Docker images that would result in the same layer hash).

I couldn't add the comment to these lines to the file, so I added them to the specific commit that reverted a change.

@ninsbl
Copy link
Member Author

ninsbl commented Sep 27, 2023

Thanks @echoix I am learning a lot here (still relatively new to docker).

Did you mean something like: 1822d00 ?

Additional follow-up on datum grids: datum grids fetched before this PR are meant for proj < 7, right? So I changed the source here as the docker image now uses proj > 9.0. Pleasea anyone, correct me if I am wrong. The mirroring of the entire https://cdn.proj.org/ should be also less prone to accidental ommissions... It is more data though, (703 MB in datum grids)...

@neteler
Copy link
Member

neteler commented Sep 27, 2023

The mirroring of the entire https://cdn.proj.org/ should be also less prone to accidental ommissions... It is more data though, (703 MB in datum grids)...

What about this alternative:

PROJ_NETWORK (New in version 7.0.0):

If set to ON, enable the capability to use remote grids stored on CDN (Content Delivery Network) storage, when grids are not available locally.

hence, using
export PROJ_NETWORK=ON
?

@ninsbl ninsbl requested a review from echoix September 29, 2023 06:31
@ninsbl
Copy link
Member Author

ninsbl commented Sep 29, 2023

A simplification and de-duplication is most welcome.

Please have a look at the build_arg GUI.

Copy link
Member

@echoix echoix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m good for the Docker part! When the deduplication was mentioned by neteller the other day, that was the advanced multi-stage « conditional » way I was thinking, but didn’t think it was appropriate for me to suggest it this early, but it was really well done. For the other files, I can’t really comment more than looks good to me.

I get a little experience and practice on advanced Docker and CI when collaborating/maintaining Megalinter, as the bulk of the repo is building lots of Docker images, and pushing the CI workflows.

@ninsbl
Copy link
Member Author

ninsbl commented Sep 29, 2023

We may adjust the way docker images are built and pushed in a follow up PR, and remove the ubuntu_wxgui directory in such a PR...

@ninsbl ninsbl merged commit 0bc5039 into OSGeo:main Oct 13, 2023
17 of 18 checks passed
@ninsbl
Copy link
Member Author

ninsbl commented Oct 13, 2023

Taking the liberty to merge. Since this is mainly packaging / shipping related, I would like to suggest to backport it after 8.3.1 is released, so "current" and "main" images become as comparable as possible!?!

Please feel free to remove the backport label if anyone disagrees...

@ninsbl ninsbl modified the milestones: 8.4.0, 8.3.2 Oct 13, 2023
@wenzeslaus
Copy link
Member

This was merged with broken Additional checks check which is actually catching an issue relevant to this PR namely enforcing intentional code duplicity.

wenzeslaus pushed a commit that referenced this pull request Oct 19, 2023
- Copies over the Ubuntu Dockerfile edited in #3170 to the main Dockerfile in the root of the repo.
- Adds a note at the top for future edits to the file (in both files).

Additionally, this includes a temporary fix for pytest-pylint where new version finds more issues than the old one, so we need to use the old one before we fix the issues.
landam pushed a commit to landam/grass that referenced this pull request Oct 25, 2023
* test grass.script.setup

* test script for docker images

* black

* add test instructions

* multi-stage-build similar to alpine

* Apply suggestions from code review

Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com>

* add test for GDAL-plugin

* copy GDAL-plugin and set path

* Update docker/ubuntu/Dockerfile

Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com>

* ignore dist.* in docker builds

* address review comments

* separate datum grid (proj >= 7) stage

* use network grids

* GUI build argument

* GUI build argument

---------

Co-authored-by: ninsbl <stbl@nve.no>
Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com>
Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com>
landam pushed a commit to landam/grass that referenced this pull request Oct 25, 2023
- Copies over the Ubuntu Dockerfile edited in OSGeo#3170 to the main Dockerfile in the root of the repo.
- Adds a note at the top for future edits to the file (in both files).

Additionally, this includes a temporary fix for pytest-pylint where new version finds more issues than the old one, so we need to use the old one before we fix the issues.
neteler pushed a commit to nilason/grass that referenced this pull request Nov 7, 2023
* test grass.script.setup

* test script for docker images

* black

* add test instructions

* multi-stage-build similar to alpine

* Apply suggestions from code review

Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com>

* add test for GDAL-plugin

* copy GDAL-plugin and set path

* Update docker/ubuntu/Dockerfile

Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com>

* ignore dist.* in docker builds

* address review comments

* separate datum grid (proj >= 7) stage

* use network grids

* GUI build argument

* GUI build argument

---------

Co-authored-by: ninsbl <stbl@nve.no>
Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com>
Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com>
neteler pushed a commit to nilason/grass that referenced this pull request Nov 7, 2023
- Copies over the Ubuntu Dockerfile edited in OSGeo#3170 to the main Dockerfile in the root of the repo.
- Adds a note at the top for future edits to the file (in both files).

Additionally, this includes a temporary fix for pytest-pylint where new version finds more issues than the old one, so we need to use the old one before we fix the issues.
HuidaeCho pushed a commit to HuidaeCho/grass that referenced this pull request Jan 9, 2024
* test grass.script.setup

* test script for docker images

* black

* add test instructions

* multi-stage-build similar to alpine

* Apply suggestions from code review

Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com>

* add test for GDAL-plugin

* copy GDAL-plugin and set path

* Update docker/ubuntu/Dockerfile

Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com>

* ignore dist.* in docker builds

* address review comments

* separate datum grid (proj >= 7) stage

* use network grids

* GUI build argument

* GUI build argument

---------

Co-authored-by: ninsbl <stbl@nve.no>
Co-authored-by: Carmen Tawalika <mmacata@users.noreply.github.com>
Co-authored-by: Edouard Choinière <27212526+echoix@users.noreply.github.com>
HuidaeCho pushed a commit to HuidaeCho/grass that referenced this pull request Jan 9, 2024
- Copies over the Ubuntu Dockerfile edited in OSGeo#3170 to the main Dockerfile in the root of the repo.
- Adds a note at the top for future edits to the file (in both files).

Additionally, this includes a temporary fix for pytest-pylint where new version finds more issues than the old one, so we need to use the old one before we fix the issues.
@neteler
Copy link
Member

neteler commented Jan 10, 2024

P.S.: Backporting may make sense (e.g. after 8.3.1 release)...

Do you plan to backport this PR, @ninsbl ?

@landam
Copy link
Member

landam commented Feb 9, 2024

P.S.: Backporting may make sense (e.g. after 8.3.1 release)...

Do you plan to backport this PR, @ninsbl ?

There are unfortunately conflicts. @ninsbl Is the backport to 8.3 still relevant (8.4.0 is quite close)?

@landam landam modified the milestones: 8.3.2, 8.4.0 Feb 9, 2024
@landam landam assigned landam and ninsbl and unassigned landam Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker Docker related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants