-
Notifications
You must be signed in to change notification settings - Fork 10
Docker dev environments
The /docker directory contains a docker-compose configuration for setting up instances of the METASPACE platform for personal and development use.
This configuration has not been secured for production use and should not be deployed to any server with the intent of providing public access without first considering the security implications. In particular, administrative ports for back-end services have not been closed in this configuration, and easily guessable passwords are used and stored in plain text.
By default Docker opens all container ports to the public, which is a huge security issue because Postgres, etc. have hard-coded passwords in this docker config. It's recommended to set up a firewall before creating your dev environment to avoid exposing these ports.
On Ubuntu:
sudo apt-get install docker-compose git
sudo snap install docker
After installation, log out and log back in to start the Docker daemon. If this step isn't done, docker will hang when trying to start containers.
If running macOS, Docker's configuration makes a massive difference:
- If everything is extremely slow, try disabling "Use gRPC FUSE for file sharing"
- Resources -> Advanced -> CPUs - set to maximum. Docker can share CPUs with macOS so it's not a problem
- Resources -> Advanced -> Memory - set to at least 4GB for web dev, 16GB if you plan to do a lot of dataset annotation. Docker CANNOT share this memory with macOS, so don't slide it to maximum or it will slow down the rest of your computer
The full metaspace
repository is mounted into most containers
and projects are run from your checked-out code. This makes it much
easier to make live code changes.
Running setup-dev-env.sh
will copy the pre-made docker config files into the
projects in this repository and start the docker containers. After running this,
edit docker/.env
and customize it. It's a good idea to move DATA_ROOT
somewhere
outside of the git repository directory.
To avoid disruption due to changes in the docker-compose files while updating from git,
changing branches, etc. it's recommended to make a copy of the docker-compose.yml
file that is excluded from git:
- Copy
docker-compose.yml
todocker-compose.custom.yml
(this file is already .gitignored) - Update
.env
withCOMPOSE_FILE=docker-compose.custom.yml
Webapp and graphql are set to auto-reload if code changes, but they'll need to be restarted if dependencies change. Api, update-daemon and annotate-daemon will need to be manually restarted for code changes to take effect.
Add these to your ~/.bashrc
or ~/.bash_profile
(assuming you're using bash):
export DOCKER_DEV_ROOT=~/dev/metaspace/docker # Change this to your checkout directory
dc() {
local status=1
pushd "$DOCKER_DEV_ROOT" > /dev/null
(docker-compose "$@"; status=$?)
popd > /dev/null
return $status
}
dcr() {
dc kill "$@" ; dc up -d --no-deps --no-recreate "$@"
}
alias dclogs="dc logs -f --tail 0 api webapp graphql annotate-daemon update-daemon lithops-daemon off-sample"
alias dcstart="dc up -d ; dclogs"
alias dcstop="dc kill"
Then force your shell to reload this file with:
exec bash
- This adds
dc
as a shortcut to run Docker Compose in the METASPACE docker directory from anywhere - To start the environment:
dcstart
(starts containers and then shows their logs. Note that some initialization might be missed due to the delay between starting and logging) - To stop the environment:
dcstop
ordckill
(note: don't usedc down
as it deletes the containers, making them take much longer to start later) - To restart one or more containers:
dcr graphql webapp
(dcr's kill/re-up logic is much faster than dc restart) - To follow the logs of all METASPACE containers:
dclogs
Running setup-dev-env.sh
should set up individual projects' config files to a working state,
though some adjustments may be required if container names or credentials have changed.
The most common causes of runtime errors in new development environments are mismatched
credentials and incorrect service names.
This mapping is necessary to allow your browser to directly access the MinIO storage container at http://storage:9000
. Additionally it allows you to run some code both inside and outside Docker without needing to change config files.
Add the following to your /etc/hosts
:
0.0.0.0 nginx
0.0.0.0 redis
0.0.0.0 elasticsearch
0.0.0.0 postgres
0.0.0.0 rabbitmq
0.0.0.0 api
0.0.0.0 graphql
0.0.0.0 off-sample
0.0.0.0 storage
Note: The IP address 0.0.0.0
seems to have the best success rate for addressing Docker, but in some situations 127.0.0.1
may work better.
(Skip this section if setup-dev-env.sh
ran successfully)
- Install Molecular Databases:
docker-compose run --rm api /sm-engine/install-dbs.sh
- (Optional) Install molecular images for display on the annotations page
./fetch-mol-images.sh
- It's usually best to skip this step - it's very slow and creates 100,000s of files inside the project directory, which can slow down VSCode/PyCharm.
- These images are only used in the Molecules section of the Annotations page. If they're not installed, it will just show harmless "No image" placeholders
- Install the ML Scoring Model:
docker-compose run --rm api /sm-engine/install-scoring-model.sh
- http://localhost:8999/ - Main site
Development tools:
-
localhost:9200
- Elasticsearch REST endpoint. Can be accessed with GUIs such as Elasticvue and dejavu -
localhost:5432
- Postgres server. Can be used with e.g. DataGrip. Username:postgres
, Password:postgres
, Database:sm
- http://localhost:15672/ - RabbitMQ management interface
-
http://localhost:9000/ - MinIO storage server
Access Key:
minioadmin
Secret Key:minioadmin
Files uploaded to MinIO (datasets, iso images, etc.) are stored on your filesystem in$DATA_ROOT/s3
(based on the path in.env
) and it's safe to directly access/delete them
Watching application logs:
docker-compose logs --tail 5 -f api update-daemon annotate-daemon lithops-daemon graphql webapp
Rebuilding the Elasticsearch index:
docker-compose run --rm api /sm-engine/rebuild-es-index.sh
- Register through the METASPACE web UI
- Use the email verification link to verify your account (can be found in the graphql logs, or your inbox if the AWS credentials are set up)
- Update your user type to
admin
in thegraphql.user
table in the database. If you don't have a DB UI set up yet, you can do this instead:docker-compose exec postgres psql sm postgres -c "UPDATE graphql.user SET role = 'admin' WHERE email = '<your email address>';"
When Docker runs containers in a VM, only directory-based volumes can be mounted. This is why the nginx/elasticsearch services have custom dockerfiles that create symlinks to files in a mounted config directory, rather than mounting the files directly.
There's no cross-platform way to do the /etc/timezone mounts, but they're optional and can just be commented out on non-Linux systems.
The graphql container is based on "alpine" images, which use musl instead of glibc to minimize the image size. This causes incompatibility with binary dependencies (mainly bcrypt). There's currently no easy way to have bcrypt work both inside and outside the docker container with the same installation, so fixing one will break the other. In normal development it's easy to avoid calling bcrypt at all (thus preventing crashes) by using Google-based authentication in METASPACE.
To fix bcrypt for your host environment (and break it for the docker environment), run npm rebuild bcrypt --update-binary
in the /metaspace/graphql
directory
To fix bcrypt for the docker environment (and break it for your host environment), run dc exec graphql npm rebuild bcrypt --update-binary
(needs the above dc
shell alias installed).