The entire system can be run up using this repository The following components are part of this system:
- Mauro Data Mapper [maurodatamapper] - Mauro Data Mapper
- Postgres 12 [postgres] - Postgres Database
- Mauro Data Mapper Docker
If using Windows
or OSX
you will need to install Docker.
Currently the minimum level of docker is
- Engine: 19.03.0+
- Compose: 1.25.0+
⚠️ If you're running on Ubuntu: the default version ofdocker-compose
installed with apt-get is 1.17.1, and you might get the error message:Building docker compose ERROR: Need service name for --build-arg option
In this case, you should uninstall
docker-compose
and re-install directly from Docker, following the instructions here: [https://docs.docker.com/compose/install/]
We advise a minimum of 2 CPUs and 4GBs RAM just to run this system this does not allow for the requirements to have an operating system running as well. Therefore we recommend a 4 CPU and 8GB RAM server.
The default install of Docker inside Linux configures the docker engine with unlimited access to the server's resources, however if running in Windows or Mac OS X the Docker Toolbox will need to be configured
This should be possible using the normal git checkout
command however it possible you're on an SSH firewalled server, in which case you can use the
following [SSH over HTTPS document](https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/using-ssh-over-the-https
-port).
Once cloned then running the standard docker-compose build command will build the images necessary to run the services.
# Build the entire system
$ ./docker-compose build
Updating an already running system can be performed in 1 of 2 ways. The preferred method would be to pull the latest version tag from the
repository and then rebuild the mauro-data-mapper service. However this may be hard if multiple changes have been made to the docker-compose.yml
and
you're not familiar enough with git to handle stashing and merging.
# Update an already built system
# Fetch the latest commits
$ git fetch
# Stash any local changes
$ git stash
# Checkout/pull the version you want to update to
# e.g. git checkout B4.4.1_F6.0.0
$ git checkout <TAG>
# Unstash local changes, you may need to resolve any merge conflicts
$ git stash pop
# Build the new image
$ docker-compose build mauro-data-mapper
# Start the update
$ docker-compose up -d mauro-data-mapper
The alternative method is to use the update command script and pass in the new versions you want to update to. The downside with this method is if we have made any changes to the Dockerfiles or base versions you will not have them.
# Update an already built system
# e.g ./update -b 4.4.1 -f 6.0.0
$ ./update -b <BACKEND_VERSION> -f <FRONTEND VERSION>
Additional plugins can be found at the Mauro Data Mapper Plugins organisation page. A complete list with versions can also be found in the installation documentation please note that while we will do our best to keep this page up-to-date there may be circumstances where it is behind, therefore we recommend using our official GitHub Plugins organisation to find the latest releases and all available plugins.
Each of these can be added as runtimeOnly
dependencies by adding them to the ADDITIONAL_PLUGINS
build argument for the mauro-data-mapper
service build.
These dependencies should be provided in a semi-colon separated list in the gradle style, they will be split and each will be added as a runtimeOnly
dependency.
Example
mauro-data-mapper:
build:
context: mauro-data-mapper
args:
ADDITIONAL_PLUGINS: "uk.ac.ox.softeng.maurodatamapper.plugins:mdm-plugin-excel:3.0.0"
Will add the Excel plugin to the dependencies.gradle
file:
runtimeOnly uk.ac.ox.softeng.maurodatamapper.plugins:mdm-plugin-excel:3.0.0
You can use dynamic versioning to add dependencies, however this comes with a risk that it pulls a version which does not comply with your expected version of mdm-application-build/mdm-core which may cause conflicts with other plugins, therefore we do not advise this approach.
Example
mauro-data-mapper:
build:
context: mauro-data-mapper
args:
ADDITIONAL_PLUGINS: "uk.ac.ox.softeng.maurodatamapper.plugins:mdm-plugin-excel:3.+"
This will add the latest minor version of the Excel plugin.
If running multiple docker-compose instances then they will all make use of the same initial images, therefore you only need to run the ./make
script once per server.
Some servers have the 22 SSH port firewalled for external connections. If this is the case you can change the base_images/sdk_base/ssh/config
file,
- comment out the
Hostname
field thats currently active * uncomment both commented outHostname
andPort
fields, this will allow git to work using the 443 port which will not be blocked.
By adding variables to the <service>.environment
section of the docker-compose.yml file you can pass them into the container as environment variables. These will override
any existing configuration variables which are used by default. Any defaults and normally used environment variables can be found in the relevant service's Dockerfile at
the ENV
command.
POSTGRES_PASSWORD
- This sets the postgres user password for the service, as per the documentation at Postgres Docker Hub, it must be set for a docker postgres container. We have set a default but you can override if desired. If you do override it, you will also need to change thePGPASSWORD
env variable in the mauro-data-mapper section.DATABASE_USERNAME
- This is the username which will be created inside the Postgres instance to own the database which the MDM service will use. The username is also used by the MDM service to connect to the postgres instance, therefore if you change this you MUST also supply it in the environment args for the MDM serviceDATABASE_PASSWORD
- This is the password set for theDATABASE_USERNAME
. It is the password used by the MDM service to connect to this postgres container.
Any grails configuration property found in any of the plugin.yml or application.yml files can be overridden through environment variables. They simply need to be provided in the "dot notation" form rather than the "YML new line" format.
e.g. application.yml
database:
host: localhost
would be overridden by docker-compose.yml
services:
mauro-data-mapper:
environment:
database.host: another-host
However to make life simpler and to avoid too many variables in the docker-compose.yml file we have supplied 2 additional methods of overriding the defaults. This replaces all of the previous releases environment variables setting in docker-compose.yml.
The preference order for loaded sources of properties is
- Environment Variables
- runtime.yml
- build.yml
- application.yml
- plugin.yml - there are multiple versions of these as each plugin we build may supply their own
The build.yml file is built into the MDM service when the image is built and is a standard grails configuration file. Therefore any properties which can be safely set at build time for the image should be set into this file. This includes any properties which may be shared between multiple instances of MDM which all start from the same image.
Our recommendation is that if only running 1 instance of MDM from 1 cloned repository then you should load all your properties into the build.yml file. For this reason we have supplied the build.yml file with all the properties which we either require to be overridden or expect may want to be overridden.
The runtime.yml file will be loaded into the container via the docker-compose.yml file. This is intended as the replacement for environment variable overrides, where each running container might have specifically set properties which differ from a common shared image.
NOTE: Do not change the environment variable runtime.config.path
as this denotes the path inside the container where the config file will be found
The following variables need to be overriden/set when starting up a new mauro-data-mapper image. Usually this is done in the docker-compose.yml file. It should not be done in the Dockerfile as each instance which starts up may use different values.
grails.cors.allowedOrigins
- Should be set to a single FQDN URL which is the host where MDM will be accessed from. If using a proxy to break SSL then the origin would be the hostname where the proxy sits, not the host of the server running the docker containers. The origin must include the protocol, i.e. https or httpmaurodatamapper.authority.name
- The full URL to the location of the catalogue. This is considered a unique identifier to distinguish any instance from another and therefore no 2 instances should use the same URL.maurodatamapper.authority.url
- A unique name used to distinguish a running MDM instance.simplejavamail.smtp.username
- To allow the catalogue to send emails this needs to be a valid username for thesimplejavamail.smtp.host
simplejavamail.smtp.password
- To allow the catalogue to send emails this needs to be a valid password for thesimplejavamail.smtp.host
andsimplejavamail.smtp.username
simplejavamail.smtp.host
- This is the FQDN of the mail server to use when sending emails
PGPASSWORD
- This is the postgres user's password for the postgres server. This is an environment variable set to allow the MDM service to wait till the postgres service has completely finished starting up. It is only used to confirm the Postgres server is running and databases exist. After this it is not used again. If you changePOSTGRES_PASSWORD
you must change this to match This can ONLY be overridden in the docker-compose.yml fileCATALINA_OPTS
- Java Opts to be passed to Tomcat This can ONLY be overridden in the docker-compose.yml filedatabase.host
- The host of the database. If using docker-compose this should be left aspostgres
or changed to the name of the database servicedatabase.port
- The port of the databasedatabase.name
- The name of the database which the catalogue data will be stored indataSource.username
- Username to use to connect to the database. See the Postgres service environment variables for more information.dataSource.password
- Password to use to connect to the database. See the Postgres service environment variables for more information.simplejavamail.smtp.port
- The port to use when sending emailssimplejavamail.smtp.transportstrategy
- The transport strategy to use when sending emailshibernate.search.default.indexBase
- The directory to store the lucene index files in
Database The system is designed to use the postgres service provided in the docker-compose file, therefore there should be no need to alter any of these settings. Only make alterations if running postgres as a separate service outside of docker-compose.
Email The standard email properties will allow emails to be sent to a specific SMTP server.
Please see the mc-to-mdm-migration repository for details.
You will need to have started up this docker service once to ensure the database and volume exists for the Mauro Data Mapper.
The default docker-machine
in a Windows or Mac OS X environment is 1 CPU and 1GB RAM, this is not enough to run the Mauro Data Mapper system.
On Linux the docker machine is the host machine so there is no need to build or remove anything.
If using the Native Docker then edit the preferences of the Docker application and increase the RAM to at least 4GB, you will probably need to restart Docker after doing this.
If using the Docker Toolbox then as such you will need to perform the following in a 'docker' terminal.
# Stop the default docker machine
$ docker-machine stop default
# Remove the default machine
$ docker-machine rm default
# Replace with a more powerful machine (4096 is the minimum recommended RAM, if you can give it more then do so)
$ docker-machine create --driver virtualbox --virtualbox-cpu-count "-1" --virtualbox-memory "4096" default
When controlling using Docker Machine via your terminal shell it is useful to set the default
docker machine.
Type the following at the command line, or add it to the appropriate bash profile file:
eval "$(docker-machine env default)"
If not you may see the following error: Cannot connect to the Docker daemon. Is the docker daemon running on this host?
Continually building docker images will leave a lot of loose snapshot images floating around, occasionally make use of:
- Clean up stopped containers -
docker rm $(docker ps -a -q)
- Clean up untagged images -
docker rmi $(docker images | grep "^<none>" | awk "{print $3}")
- Clean up dangling volumes -
docker volume rm $(docker volume ls -qf dangling=true)
You can make life easier by adding the following to the appropriate bash profile file:
alias docker-rmi='docker rmi $(docker images -q --filter "dangling=true")'
alias docker-rm='docker rm $(docker ps -a -q)'
alias docker-rmv='docker volume rm $(docker volume ls -qf dangling=true)'
Remove all stopped containers first then remove all tagged images.
A useful tool is Dockviz,
ever since docker did away with docker images --tree
you can't see all the layers of images and therefore how much floating mess you have.
Add the following to to the appropriate bash profile file:
alias dockviz="docker run --privileged -it --rm -v /var/run/docker.sock:/var/run/docker.sock nate/dockviz"
Then in a new terminal you can run dockviz images -t
to see the tree,
the program also does dot notation files for a graphical view as well.
When you supply multiple files, docker-compose combines them into a single configuration. Compose builds the configuration in the order you supply the files. Subsequent files override and add to their successors.
# Apply the .dev yml file, create and start the containers in the background
$ docker-compose -f docker-compose.yml -f docker-compose.dev.yml -d <COMMAND>
# Apply the .prod yml file, create and start the containers in the background
$ docker-compose -f docker-compose.yml -f docker-compose.prod.yml -d <COMMAND>
We recommend adding the following lines to the appropriate bash profile file:
alias docker-compose-dev="docker-compose -f docker-compose.yml -f docker-compose.dev.yml"
alias docker-compose-prod="docker-compose -f docker-compose.yml -f docker-compose.prod.yml"
This will allow you to start compose in dev mode without all the extra file definitions
Before running please read the parameters section first.
With docker
and docker-compose
installed, run the following:
# Build all the images
$ docker-compose-dev build
# Start all the components up
$ docker-compose up -d
# To only start 1 service
# This will also start up any of the services the named service depends on (defined by `links` or `depends_on`)
$ docker-compose up [SERVICE]
# To push all the output to the background add `-d`
$ docker-compose up -d [SERVICE]
# Stop background running and remove the containers
$ docker-compose down
# To update an already running service
$ docker-compose-dev build [SERVICE]
$ docker-compose up -d --no-deps [SERVICE]
# To run in production mode
$ docker-compose-prod up -d [SERVICE]
If you run everything in the background use Kitematic
to see the individual container logs.
You can do this if running in the foreground and its easier as it splits each of the containers up.
If only starting a service when you stop the service docker will not stop the dependencies that were started to allow the named service to start.
The default compose file will pull the correct version images from Bintray, or a locally defined docker repository.
There is an extra override docker-compose file for development, this currently opens up the ports in
- postgres
The .dev
compose file builds all of the images,
the standard compose file and .prod
versions do not build new images.
Try to keep images as small as possible
While -links
and depends_on
make sure the services a service requires are brought up first Docker only waits till they are running NOT till they
are actually ready.
The wait scripts allow testing to make sure the service is actually available.
Note: If requiring postgres and using any of the Alpine Linux base images then the Dockerfile will need to add the following:
RUN apk add postgresql-client
- If not requiring any dependencies then just set
CMD ["arg1", ...]
and the args will be passed to theENTRYPOINT
- If requiring dependencies then set the
ENTRYPOINT
to the wait script and theCMD
toCMD ["process", "arg1", ...]
Note: We should be able to override the ENTRYPOINT
in the docker-compose but for some reason its not then passing the CMD args through.
Docker recommends using COPY instead of ADD unless the source is a URL or a tar file which ADD can retrieve and/or unzip.,=
Careful thought about what is required and what ports need to be passed through.
If the port only needs to be available to other docker services then use expose
.
If the port needs to be open outside (e.g. the LabKey port 8080) then use ports
.
If the ports
option is used this opens the port from the service to the outside world,
it does not affect exposed
ports between services, so if a service (e.g. postgres with port 5432) exposes a port
then any service which used link
to postgres
will be able to find the database at postgresql://postgres:5432
All work should be done on the develop
branch.