Skip to content

How do I configure a Qanary pipeline using Docker?

heinpa edited this page May 3, 2022 · 7 revisions

tl;dr

In the simplest case you can start all relevant services (pipeline, components and triplestore) in the same Docker network (most likely host) and connect the services using default parameters only.

However, in most production usecases the parameters used for networking need to be changed - most importantly host, port and triplestore settings.

This Guide will go over different configurations that might be necessary for the pipeline.

Recommended Properties to be configured

There are several properties which might be configured, depending on the context:

  • server.host: the host on which the pipeline will be listening
  • server.port: the port on which the pipeline will be listening

If a Stardog triplestore is used, then the following properties need to be defined:

  • stardog.url
  • stardog.username
  • stardog.password

Note: You may implement your own triplestore connector that requires different properties.

Additionally, the property qanary.triplestore provides a fallback option if:

  • the Stardog triplestore is running on a different host to the pipeline
  • you cannot define your own triplestore and no other options are available

SSL settings can be configured using the following properties:

  • server.ssl.enabled: (boolean) enable SSL
  • server.ssl.key-store: the path to the key store containing the certificate, e.g. classpath:keystore.p12
  • server.ssl.key-store-password: the password used to access the key store
  • server.ssl.key-store-type: the type of key store (JKS or PKCS12)

Note: You can find more information about enabling SSL in this guide: How-do-I-improve-the-security-of-my-implementation

To override the default values set in application.properties the use of environment variables is encouraged. This is further described in this guide: How do I configure Qanary services using Docker containers

Configurations for Networking

Port configuration

Usually docker containers are self-contained. To access an application running on port 8080 inside a container, you need to map this port to one on your local machine with -p <machine>:<container> so that it can be reached from outside (i.e. the Internet).

Alternatively, you may run the service insde the Docker network host which automatically publishes the service on a host port, matching the internal port. In that case you need to override the default value for server.port to change where your Qanary pipeline will be available inside this network!

Example:

  • the pipeline is listening on port 8080 (default, as specified by property server.port) inside the docker container,
  • it needs to be available on port 8000 on the Internet
  • you have two options:
    1. map the internal port 8080 to your server port 8000 with docker run -p 8000:8080 qanary-pipeline:latest (recommended for production) OR
    2. change the server.port to 8000 and run the pipeline in the host network with docker run -e SERVER_PORT=8000 --net host qanary-pipeline:latest

Note: Using the host network is only encouraged if all services (i.e. pipeline, components, triplestore) can be started in this mode as well (see section below)!

Host configuration

In a production environment it might not be possible to start all components and the pipeline in one host network. In such a case http://localhost is not an option for networking.

Here, the property server.host needs to reflect the actual host where the Qanary pipeline service is running, so that the correct address of the pipeline can be communicated to external services if required (for example when loading local resources with SPARQL queries).

docker run -e SERVER_HOST=http://example.pipeline example-pipeline:latest

For more information about networking between Qanary pipeline and components please see guide: How-do-I-configure-a-Qanary-component-using-Docker?

Docker-compose

The configurations shown above using the standard docker run command can easily be applied in a docker-compose.yml file. To start an instance of the latest Qanary pipeline that is listening on http://example.pipeline:8000 and is connecting to a triplestore endpoint at http://example.triplestore:5820 the configuration could look like this:

version: "3.5"
  services:

    pipeline:
      image: qanary/qanary-pipeline:latest
      environment:
        - "SERVER_HOST=http://example.pipeline"
        - "STARDOG_URL=http://example.triplestore:5820/"
        - "STARDOG_USERNAME=admin"
        - "STARDOG_PASSWORD=admin"
      ports:
        - "8000:8080"

Note: If the pipeline, components and the triplestore are all available on the same host network, you might define a pipeline similar to this example:

version: "3.5"
  services:

	pipeline:
      image: qanary/qanary-pipeline:latest
      environment:
        - "SERVER_PORT=8000"
        - "STARDOG_URL=http://localhost:5820/"
        - "STARDOG_USERNAME=admin"
        - "STARDOG_PASSWORD=admin"
      network_mode: host
Clone this wiki locally