Docker dev environments

Description

The /docker directory contains a docker-compose configuration for setting up instances of the METASPACE platform for personal and development use.

This configuration has not been secured for production use and should not be deployed to any server with the intent of providing public access without first considering the security implications. In particular, administrative ports for back-end services have not been closed in this configuration, and easily guessable passwords are used and stored in plain text.

Usage

Security

By default Docker opens all container ports to the public, which is a huge security issue because Postgres, etc. have hard-coded passwords in this docker config. It's recommended to set up a firewall before creating your dev environment to avoid exposing these ports.

Install Docker

On Ubuntu:

sudo apt-get install docker-compose git
sudo snap install docker

After installation, log out and log back in to start the Docker daemon. If this step isn't done, docker will hang when trying to start containers.

If running macOS, Docker's configuration makes a massive difference:

If everything is extremely slow, try disabling "Use gRPC FUSE for file sharing"
Resources -> Advanced -> CPUs - set to maximum. Docker can share CPUs with macOS so it's not a problem
Resources -> Advanced -> Memory - set to at least 4GB for web dev, 16GB if you plan to do a lot of dataset annotation. Docker CANNOT share this memory with macOS, so don't slide it to maximum or it will slow down the rest of your computer

Development installation

The full metaspace repository is mounted into most containers and projects are run from your checked-out code. This makes it much easier to make live code changes.

Running setup-dev-env.sh will copy the pre-made docker config files into the projects in this repository and start the docker containers. After running this, edit docker/.env and customize it. It's a good idea to move DATA_ROOT somewhere outside of the git repository directory.

To avoid disruption due to changes in the docker-compose files while updating from git, changing branches, etc. it's recommended to make a copy of the docker-compose.yml file that is excluded from git:

Copy docker-compose.yml to docker-compose.custom.yml (this file is already .gitignored)
Update .env with COMPOSE_FILE=docker-compose.custom.yml

Webapp and graphql are set to auto-reload if code changes, but they'll need to be restarted if dependencies change. Api, update-daemon and annotate-daemon will need to be manually restarted for code changes to take effect.

Recommended bash aliases

Add these to your ~/.bashrc or ~/.bash_profile (assuming you're using bash):

export DOCKER_DEV_ROOT=~/dev/metaspace/docker # Change this to your checkout directory
dc() {
    local status=1
    pushd "$DOCKER_DEV_ROOT" > /dev/null
    (docker-compose "$@"; status=$?)
    popd > /dev/null
    return $status
}
dcr() {
    dc kill "$@" ; dc up -d --no-deps --no-recreate "$@"
}
alias dclogs="dc logs -f --tail 0 api webapp graphql annotate-daemon update-daemon lithops-daemon off-sample"
alias dcstart="dc up -d ; dclogs"
alias dcstop="dc kill"

Then force your shell to reload this file with:

exec bash

This adds dc as a shortcut to run Docker Compose in the METASPACE docker directory from anywhere
To start the environment: dcstart (starts containers and then shows their logs. Note that some initialization might be missed due to the delay between starting and logging)
To stop the environment: dcstop or dckill (note: don't use dc down as it deletes the containers, making them take much longer to start later)
To restart one or more containers: dcr graphql webapp (dcr's kill/re-up logic is much faster than dc restart)
To follow the logs of all METASPACE containers: dclogs

Configuration

Running setup-dev-env.sh should set up individual projects' config files to a working state, though some adjustments may be required if container names or credentials have changed. The most common causes of runtime errors in new development environments are mismatched credentials and incorrect service names.

Add docker container names to your hosts file

This mapping is necessary to allow your browser to directly access the MinIO storage container at http://storage:9000. Additionally it allows you to run some code both inside and outside Docker without needing to change config files.

Add the following to your /etc/hosts:

0.0.0.0         nginx
0.0.0.0         redis
0.0.0.0         elasticsearch
0.0.0.0         postgres
0.0.0.0         rabbitmq
0.0.0.0         api
0.0.0.0         graphql
0.0.0.0         off-sample
0.0.0.0         storage

Note: The IP address 0.0.0.0 seems to have the best success rate for addressing Docker, but in some situations 127.0.0.1 may work better.

Import data

(Skip this section if setup-dev-env.sh ran successfully)

Install Molecular Databases: docker-compose run --rm api /sm-engine/install-dbs.sh
(Optional) Install molecular images for display on the annotations page ./fetch-mol-images.sh
- It's usually best to skip this step - it's very slow and creates 100,000s of files inside the project directory, which can slow down VSCode/PyCharm.
- These images are only used in the Molecules section of the Annotations page. If they're not installed, it will just show harmless "No image" placeholders
Install the ML Scoring Model: docker-compose run --rm api /sm-engine/install-scoring-model.sh

Accessing METASPACE

http://localhost:8999/ - Main site

Development tools:

localhost:9200 - Elasticsearch REST endpoint. Can be accessed with GUIs such as Elasticvue and dejavu
localhost:5432 - Postgres server. Can be used with e.g. DataGrip. Username: postgres, Password: postgres, Database: sm
http://localhost:15672/ - RabbitMQ management interface
http://localhost:9000/ - MinIO storage server Access Key: minioadmin Secret Key: minioadmin Files uploaded to MinIO (datasets, iso images, etc.) are stored on your filesystem in $DATA_ROOT/s3 (based on the path in .env) and it's safe to directly access/delete them

Watching application logs:

docker-compose logs --tail 5 -f api update-daemon annotate-daemon lithops-daemon graphql webapp

Rebuilding the Elasticsearch index:

docker-compose run --rm api /sm-engine/rebuild-es-index.sh

Creating an admin user

Register through the METASPACE web UI
Use the email verification link to verify your account (can be found in the graphql logs, or your inbox if the AWS credentials are set up)
Update your user type to admin in the graphql.user table in the database. If you don't have a DB UI set up yet, you can do this instead: docker-compose exec postgres psql sm postgres -c "UPDATE graphql.user SET role = 'admin' WHERE email = '<your email address>';"

Non-Linux host support

When Docker runs containers in a VM, only directory-based volumes can be mounted. This is why the nginx/elasticsearch services have custom dockerfiles that create symlinks to files in a mounted config directory, rather than mounting the files directly.

There's no cross-platform way to do the /etc/timezone mounts, but they're optional and can just be commented out on non-Linux systems.

Bcrypt binary incompatibilities / fixing random segfaults in graphql

The graphql container is based on "alpine" images, which use musl instead of glibc to minimize the image size. This causes incompatibility with binary dependencies (mainly bcrypt). There's currently no easy way to have bcrypt work both inside and outside the docker container with the same installation, so fixing one will break the other. In normal development it's easy to avoid calling bcrypt at all (thus preventing crashes) by using Google-based authentication in METASPACE.

To fix bcrypt for your host environment (and break it for the docker environment), run npm rebuild bcrypt --update-binary in the /metaspace/graphql directory

To fix bcrypt for the docker environment (and break it for your host environment), run dc exec graphql npm rebuild bcrypt --update-binary (needs the above dc shell alias installed).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly