Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Dockerfile Rework #56

Closed
wants to merge 10 commits into from
Closed

Conversation

seoester
Copy link
Contributor

@seoester seoester commented Feb 3, 2018

I re-wrote the Dockerfile to be more in line with common Dockerfiles.

The only significant difference is the user running promgen: I removed the promgen user, that means that the web app and the worker are now run as root. This simplifies the permissions required for the configuration files. Running applications in docker as root is a common strategy.

@codecov-io
Copy link

codecov-io commented Feb 3, 2018

Codecov Report

Merging #56 into master will increase coverage by 0.66%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #56      +/-   ##
==========================================
+ Coverage   47.14%   47.81%   +0.66%     
==========================================
  Files          76       72       -4     
  Lines        2507     2405     -102     
==========================================
- Hits         1182     1150      -32     
+ Misses       1325     1255      -70
Impacted Files Coverage Δ
promgen/settings.py 0% <0%> (ø)
migrations/0015_delete_stat.py
migrations/0030_exporter_enabled.py
plugins.py
migrations/0038_audit_user.py
migrations/0001_squashed_0044_common-rules.py
migrations/0006_auto_20161019_1214.py
migrations/0037_shard_proxy.py
migrations/0034_auto_20170622_0518.py
migrations/0003_setting.py
... and 139 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fef9b9c...1312d6f. Read the comment docs.

Dockerfile Outdated

ENV CONFIG_DIR=/etc/promgen

USER promgen
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit I'm quite ignorant of docker best practices, but I thought it was generally discourated to run as root. I know Prometheus 2.0 was changed to not run as root ( prometheus/prometheus#1637 ) though I think that also caused issues for some other things. Hmm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it is not as simple as I wrote in the description.

It is theoretically not possible for the docker root user to break out (and gain host root privileges). In production, containers are usually run as a non-root user to add an additional layer of security. And to prevent attackers from compromising the (ephemeral) container fs, of course.

Using nobody as the user to execute commands would be good for security and does not pose an issue for the Promgen Web App. The worker, however, needs write access to an external / shared volume.
Adding a new user to a docker image would require very specific configuration on the host system: Creating a new isolated user with the same uid and granting access only to the docker volume mount point.
By using root the "self-contained" aspect of the docker image is preserved. In more advanced or production deployments the user will (and should) most likely be changed to a custom user by the operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reading up on the issue, as I'm not the most up-to-date either.

While most agree that it's best not to run as root, there is no consensus on what to do instead.
To me, it seems best to run under a random but static uid by default.

For Promgen the situation is slightly more complicated, as setting up write access to the external mount point is not optional but required for running the worker.

@kfdm
Copy link
Collaborator

kfdm commented Feb 6, 2018

I also need to test a bit more, but can probably move the 'apk del' to the end to make the container a bit smaller

diff --git a/Dockerfile b/Dockerfile
index 733a9a2..75e5547 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -22,11 +22,12 @@ RUN set -ex; \
 	apk add --no-cache curl tar; \
 	curl -L -s $PROMETHEUS_DOWNLOAD_URL \
 		| tar -xz -C /usr/local/bin --strip-components=1 prometheus-${PROMETHEUS_VERSION}.linux-amd64/promtool; \
-	apk del curl tar; \
-	rm -rf /var/cache/apk; \
 	mkdir -p /etc/prometheus; \
 	pip install -r /tmp/requirements.txt; \
 	pip install -e .; \
+	apk del curl tar build-base; \
+	rm -rf /var/cache/apk; \
 	SECRET_KEY=1 promgen collectstatic --noinput;
 
 EXPOSE 8000
REPOSITORY                                TAG                 IMAGE ID            CREATED              SIZE
promgen                                   test                3425ff233702        About a minute ago   442MB
<none>                                    <none>              54c546587f72        21 minutes ago       588MB

@seoester
Copy link
Contributor Author

seoester commented Feb 6, 2018

Nice savings :)
I'll add WIP to the title for now as to indicate that we're still actively amending this.

@seoester seoester changed the title Dockerfile Rework [WIP] Dockerfile Rework Feb 6, 2018
Dockerfile Outdated Show resolved Hide resolved
Docker images for python 3.5.3 are no longer built
 * Env vars only relevant during build now use ARG
 * Combined two ENV instructions
 * Removed PIP_NO_CACHE_DIR declaration (unclear why )
@seoester
Copy link
Contributor Author

I rebuilt the container from the current master branch, and the size has been greatly reduced 👍

promgen-test                                                        latest              0c63338ab2a0        45 hours ago        195MB

@seoester
Copy link
Contributor Author

The primary open question is: What user should promgen run as? (comments)

The most secure option is to create a new user in the container with a random but static uid and gid. However, this poses a problem because the worker has to write files to an (external) docker volume. By running as such a user, operators have to configure appropriate permissions for the prometheus configuration files.

Some options:

  • Run as user root. Operators should change the user in production environments.
  • Run as user promgen (as above).
    • Describe how to set permissions in the "getting started with docker-compose" documentation.
    • In docker-compose.yml, an additional service could be used to properly set up file permissions. This service could also bootstrap the configuration completely, requiring no manual setup before docker-compose up.
    • Change the user of the worker service in docker-compose.yml to root.

@kfdm
Copy link
Collaborator

kfdm commented Aug 1, 2018

Yeah, I do not have a good answer to that since I am quite unaware of the best practices. I think I will yield that decision to what you think is best :)

@seoester
Copy link
Contributor Author

seoester commented Aug 2, 2018

I had some further ideas for a more holistic approach to the docker image, I introduced the following changes:

  • Removed bash, docker-entrypoint.sh now uses /bin/sh
  • Perform migrations whenever starting a web or web-dev container.
    • migrate is idempotent on consecutive executions 👍
    • By setting PROMGEN_DISABLE_MIGRATE to '1' migrations are disabled (perhaps useful when running with multiple containers)
  • Removed the docker-compose-bootstrap command
  • By default: Use user promgen with id 16395.
  • Added a init-config command which runs bootstrap and automatically fixes permissions when executed as root user.
    • Added --no-input flag to bootstrap so it can be executed without asking the user for additional information
  • collectstatic and provide a mountable volume to serve from a separate container.
    • In Dockerfile, collecstatic to STATIC_ROOT=/usr/src/app/static
    • The files are copied over to the volume at /srv/promgen/www (/static) in the docker-entrypoint.sh file.
  • Enable overwriting STATIC_URL via env var

The aim is to have a simpler setup experience as well as to accomodate production setups.
Auto migrations are used to allow a docker image upgrade without requiring any manual intervention.

To do:

  • Prometheus: Write configuration file (docker-compose context)
  • Alertmanager: Write configuration file (docker-compose context)

Copy link
Collaborator

@kfdm kfdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some good changes though I think I need a few extra comments to help me wrap my head around the specific implementation details :)

Dockerfile Outdated
EXPOSE 8000
RUN set -ex; \
addgroup -g 16395 promgen; \
adduser -D -u 16395 -G promgen promgen; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something special about the value 16395 here ? Shouldn't be a problem, I'm just curious why you chose this value :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing special about this value, it is an arbitrary uid >> 1000. This is the random but static uid I wrote about before.

Some projects choose to simply run adduser as the first command and hope for the same uid to be allocated every time. For promgen, the uid is of the upmost importance, because if permissions are misconfigured, the worker will be unable to write any configuration files. That's why I think explicitly setting the uid is a good idea.

Because the uid is much greater than 1000, it is unlikely to map to any user on the local system.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Should be fine, I just wish there was more authoritative sources about values like this 😓
I agree blindly trusting adduser is scary.
For my own deployment I use 9090:9090 for Prometheus (to match the default port) but I suppose in Promgen's case we can't easily do something similar.

#!/bin/bash
#!/bin/sh

function check_migrate() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to a function to reuse is a good idea 👍

}

function copy_static() {
if [ ! -f /srv/promgen/www/static/.present ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reading the code, I'm not entirely sure the purpose for having both /srv/promgen/www and /usr/src/app/static. Is this so that one can mount the directory and then serve the files from nginx/apache or for some other reason?
I think a comment either in the file or the docker instructions will be necessary for clarification :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the reason is the docker volume management: Every time the container is created, a new volume is allocated and mounted at the path /srv/promgen/www. This means all content in the directory (e.g. created in the Dockerfile) will be invisible.
So instead, on the start of the web app, the files are copied across, if they don't exist yet. The .present file represents a minor security consideration: When not forbidden by the web server, the file will be accessible by anyone with HTTP access. I decided to only run collectstatic in the Dockerfile, just in case it involves some computationally complex work.

Other containers can then mount the volume (e.g. with the --volumes-from flag, by mounting a custom volume in both containers or by using the bind volumes).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One consideration instead of adding a new file (.present) perhaps can just test for a file that we expect to be there, /srv/promgen/www/static/js/promgen.js otherwise, seems like a reasonable explanation :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good 👍

shift
set -- gunicorn "promgen.wsgi:application" "$@"

exec promgen bootstrap --noinput "$@"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be mistaken, but I think using exec here means that this shell will be replaced by the running program so the later steps will not run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to check this, you're absolutely right, exec just wraps the corresponding syscall.

@seoester
Copy link
Contributor Author

seoester commented Aug 6, 2018

I made the changes we discussed and also removed /etc/promgen as a volume. Operators can still choose to mount a volume there, but I think it doesn't make sense to have it declared in the Dockerfile.

Still have to properly test everything though.

@github-actions
Copy link

github-actions bot commented Nov 8, 2019

This pr is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions github-actions bot added the Stale label Nov 8, 2019
@seoester
Copy link
Contributor Author

Some of the ideas implemented in the PR are still valid but must be re-applied against the current version. I'll re-apply next week.

@kfdm kfdm removed the Stale label Nov 11, 2019
@codecov-commenter
Copy link

Codecov Report

Merging #56 into master will not change coverage.
The diff coverage is 0.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #56   +/-   ##
=======================================
  Coverage   47.81%   47.81%           
=======================================
  Files          72       72           
  Lines        2405     2405           
=======================================
  Hits         1150     1150           
  Misses       1255     1255           
Impacted Files Coverage Δ
promgen/settings.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58d9445...1312d6f. Read the comment docs.

@kfdm kfdm mentioned this pull request Aug 10, 2022
@kfdm
Copy link
Collaborator

kfdm commented Aug 10, 2022

Probably quite out of date at this point 😅
Closing in favor of #402

@kfdm kfdm closed this Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants