Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intro Dockerfile to ease help adoption. #3003

Closed
wants to merge 1 commit into from

Conversation

sanscore
Copy link
Contributor

@ericholscher, I have an idea.

First, setting up a RTD instance is hard. When setting up a long-running service, a little bit of pain is fine. However, for potential contributors and evaluators, that initial barrier can shun them away. By creating and distributing a Dockerfile, then we give these users a very simple way to run RTD locally. This way, they can run and play with RTD which should hopefully help generate more interest.

Commit Message

Users of this Dockerfile can be either contributors or administrators.

For contributors, they will be able to make changes and be able to see
their changes by rebuilding the image.

For administrators, they will be able to quickly evaluate RTD without
having to contribute a lot of time.

Build:
$ docker build --tag rtfd-server .

Run:
$ docker run -p 8000:8000 rtfd-server

I'm not intending for this PR to be accepted as-is, but to help generate ideas. I can speak of the design decisions so far.

Notes about the design:

  • This Dockerfile tries to follow the Installation Instructions.
  • It uses python:2.7.13
    • It can use python:3.6.1, but currently, untested.
    • I haven't attempted it, but using readthedocs/build:latest might be more interesting because it would allow the flexibility of choosing which version of Python.
    • I attempted to use python:2.7.13-onbuild, but it will only work if I refactor./requirements.txt.
  • Installs redis-server
    • I don't think that apt-get starts Redis, if so a RUN command will have to be add.
  • Installs elasticsearch
    • Same concern as redis-server.

I could rewrite the Dockerfile as a docker-compose.yml file. That would allow for splitting out the RTD, the database, redis, and elasticsearch to their own containers, and add an nginx instance as well.

At present, running the container has one obvious issue:

$ docker run -p 8000:8000 rtfd-server
System check identified some issues:

WARNINGS:
?: (guardian.W001) Guardian authentication backend is not hooked. You can add this in settings as eg: `AUTHENTICATION_BACKENDS = ('django.contrib.auth.backends.ModelBackend', 'guardian.backends.ObjectPermissionBackend')`.

I haven't investigated the problem, but the warning does give a good idea on how to resolve it.

Thoughts?

Users of this Dockerfile can be either contributors or administrators.

For contributors, they will be able to make changes and be able to see
their changes by rebuilding the image.

For administrators, they will be able to quickly evaluate RTD without
having to contribute a lot of time.

Build:
$ docker build --tag rtfd-server .

Run:
$ docker run -p 8000:8000 rtfd-server
@sanscore
Copy link
Contributor Author

Ignore the -onbuild comment. We wouldn't want to use it because I am using apt-get. With the current way, if there are any changes to requirements or the project, then RUN apt-get ... will hit the cache instead of re-running. That's a really, really good quality to have in a Dockerfile.

With the docker-compose.yml idea, then the apt-get lines would become unnecessary. 🤔

@ericholscher
Copy link
Member

This is definitely an interesting idea. We maintain a set of docker images, which we use for building the docs: https://github.com/rtfd/readthedocs-docker-images -- I think the reason we've shied away from this in the past is that it required running docker in docker, which seems weird. We do support local building within the instance though, so we could have the docker image just build things that way -- it isn't exactly like production, but it at least gets things moving.

I think one big question I have is how to manage development with this setup. I think we need some way to map the underlying RTD checkout on disk to the one in the Docker image, so that people can actually do dev against their changes.

@sanscore
Copy link
Contributor Author

sanscore commented Jul 17, 2017

I think one big question I have is how to manage development with this setup. I think we need some way to map the underlying RTD checkout on disk to the one in the Docker image, so that people can actually do dev against their changes.

The current Dockerfile uses COPY . /usr/src/app to copy whatever changes are present at build time. However, that would require a docker build ... every time a developer wants to test a change. So, that might not be very obvious or desirable.

The alternative to COPY would be to use a volume; docker run ... -v .:/usr/src/app .... However, that would create problems artifacts (like dev.db, etc. in .dockerignore).

I'll get this Dockerfile into better shape, and explore the docker-compose idea. I'm fairly certain that a docker container can run docker images themselves, but I'll have to explore that (I think it's just a matter of installing the docker binary).

Docker-compose will allow for externalizing Redis and Elasticsearch, plus the user won't have to pass -p 8000:8000 -v .:/usr/src/app.

@sanscore
Copy link
Contributor Author

This is definitely an interesting idea. We maintain a set of docker images, which we use for building the docs: https://github.com/rtfd/readthedocs-docker-images

Assuming it's trivial to run docker inside docker, then there's a very good reason to use FROM readthedocs/build:latest over python:2.7.13/python:3.6.1. That is, the user will already have readthedocs/build:latest in their local cache, and avoid having to pull two containers.

I'll make this change as well.

@ericholscher
Copy link
Member

We can base this one on normal images -- especially because our builder images are huge because of the dependency on building PDF's. That adds gigs of stuff to the container, so it would likely be best to leave those out for dev.

Another note, a long time ago someone worked on a docker compose setup as well: #1133 -- it is likely outdated, but might be a good starting point.

@sanscore
Copy link
Contributor Author

Oh, wow! I did not expect that readthedocs/build to be 8.5GB in size. At work, I had to spin my own docker image for Python 2.7.x (CentOS-based) which ended up being 2+GB, but that didn't have any PDF requirements (obviously).

readthedocs/build:latest

$ docker images --all readthedocs/build
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
readthedocs/build   latest              efe4a3eea182        4 months ago        8.452 GB

python:2.7.13

$ docker images --all python
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
python              2.7.13              320a06f42b5f        10 days ago         673.2 MB

Thanks for the input and for referring me to the previous effort.

@agjohnson
Copy link
Contributor

I have minor changes that the Dockerfile should have, but first, conceptual problems:

Relying on COPY for development doesn't seem like an optimal solution here. Ideally, the package path is mounted instead, so live development is possible. COPY requires rebuilding the image each change to code, no?

Docker doesn't need to run in docker, but the host's Docker socket file/port needs to be forwarded to the guest container.

I don't really want to support the non-docker build option much longer. Requiring an 8GB image download is awful, but we haven't been developing against the local builder for probably years now.

@agjohnson agjohnson added Needed: design decision A core team decision is required Needed: more information A reply from issue author is required PR: work in progress Pull request is not ready for full review labels Jul 27, 2017
@sanscore
Copy link
Contributor Author

I haven't been able to put too much time on this in the past week, but I am still contemplating the issues that have been raised so far.

Relying on COPY for development doesn't seem like an optimal solution here. Ideally, the package path is mounted instead, so live development is possible. COPY requires rebuilding the image each change to code, no?

First, to answer the last part: Yes, that's correct.

As for Volumes, I've purposely avoided using them because I didn't want to persist files created. The SQLite database, specifically, causes issues when run RTD natively and then inside a container. (I haven't tried it vise versa.)

But, I hadn't considered the idea of being able to make local changes and have them take effect immediately in the container. I'm not sure how that would work. 🤔

Is it just a matter of mounting the current directory as a Volume, so replaceCOPY with VOLUME /usr/src/app. And, to run, add docker run ... -v .:/usr/src/app ...

Docker doesn't need to run in docker, but the host's Docker socket file/port needs to be forwarded to the guest container.

This is something that I still don't know much about, so I'll have to read up on it. But, TMK, to run "Docker inside a container", you just need to install the Docker client and it just connects magically(?) uses the Host's Docker daemon. ... I could be entirely wrong about this, so feel free to correct me.

I don't really want to support the non-docker build option much longer.

That seems like a good idea to me.

Requiring an 8GB image download is awful, but we haven't been developing against the local builder for probably years now.

Any idea what is making up the bulk of image size? I haven't worked with it too much, but peeking at the Dockerfile, I suspect that texlive is to blame. Whatever the case is, we could replace the pseudo/meta packages (i.e. texlive-full) with pared down list of packages that are used. Which could be messy.

Either, somehow collect data on which texlive packages are actually used and keep those, OR just remove it all and see who complains the loudest, then fix the issues on a case-by-case basis.

Quick Research, I combined the apt-get install texlive lines ... and apt-get gave me an estimate of 4GB for texlive:

$ apt-get install texlive-fonts-extra texlive-latex-extra-doc texlive-publishers-doc texlive-pictures-doc texlive-lang-english texlive-lang-japanese texlive-full texlive-fonts-recommended latex-cjk-chinese-arphic-bkai00mp latex-cjk-chinese-arphic-gbsn00lp latex-cjk-chinese-arphic-gkai00mp
...
The following packages will be upgraded:
  libsystemd0 systemd
2 upgraded, 488 newly installed, 0 to remove and 21 not upgraded.
Need to get 2015 MB of archives.
After this operation, 4161 MB of additional disk space will be used.
Do you want to continue? [Y/n] n
Abort.

@sanscore
Copy link
Contributor Author

sanscore commented Jul 31, 2017

Relying on COPY for development doesn't seem like an optimal solution here. Ideally, the package path is mounted instead, so live development is possible. COPY requires rebuilding the image each change to code, no?

First, to answer the last part: Yes, that's correct.

Footnote: This Dockerfile is written in such a way that it will take advantage of Docker's build cache as much as possible. The first build will take a while, but apt-get should never rerun unless the dev is specifically manipulating either the FROM line or adding a package to the agt-get line.

Likewise, the pip install line should not rerun unless the requirements are modified.

That said, any modifications to the code base will force the COPY to rerun and thus everything after it.

Moving to a docker-compose.yml solution would externalize elasticsearch and redis, so there's no need to install anything.

As far as switching from COPY to VOLUME, I tend to prefer COPY so I don't know off hand how using VOLUME effects the (re-)build process.

@maxirus
Copy link
Contributor

maxirus commented Jan 6, 2018

Wanted to cross-reference my comment on issue #3245 (comment) which addresses this issue.

@agjohnson
Copy link
Contributor

We're doing some house cleaning, and so I'm going to close this PR up. I think there are several pieces that we would have discussed in a design phase:

  • If the docker container doesn't mount the local filesystem for our application code, it's not very helpful for local development
  • RTD is not interested in maintaining a docker image for hosting RTD, we wouldn't use docker like this in production either
  • I'm not thoroughly convinced docker needs to be a solution here. See contrib/supervisor.conf as an example -- it already brings up the services we need in a more reliable pattern. Perhaps docker is helpful for running dependcies like postgres/redis/elasticsearch. I do this on linux, but OSX has different mechanisms for installing/running all of these, so this still isn't super helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needed: design decision A core team decision is required Needed: more information A reply from issue author is required PR: work in progress Pull request is not ready for full review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants