Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Support multiple database #175

Closed
H6LS1S opened this issue Feb 20, 2020 · 8 comments
Closed

Proposal: Support multiple database #175

H6LS1S opened this issue Feb 20, 2020 · 8 comments

Comments

@H6LS1S
Copy link

H6LS1S commented Feb 20, 2020

Description:

Add support for creating multiple databases when starting a container. This would be convenient for docker-compose. Implementation in other container.

Example:

/docker-compose.yaml
version: '3.5'
services:
  database:
    image: 'mdillon/postgis'
    container_name: Database
    ports:
      - '5432:5432'
    environment:
      POSTGRES_DB: db1, db2, db3
      POSTGRES_USER: dev
      POSTGRES_PASSWORD: dev

Solutions:

Replace the variable POSTGRES_DB with the expression $(echo $POSTGRES_DB | tr ',' ' '). This should give us the desired result:

  • If the string contains a comma-separated listing, break it into loop arguments
  • If the line contains only the phrase, then do nothing
/initdb-postgis.sh
#!/bin/sh

set -e

# Perform all actions as $POSTGRES_USER
export PGUSER="$POSTGRES_USER"

# Create the 'template_postgis' template db
"${psql[@]}" <<- 'EOSQL'
CREATE DATABASE template_postgis IS_TEMPLATE true;
EOSQL

# Load PostGIS into both template_database and $POSTGRES_DB
for DB in template_postgis $(echo $POSTGRES_DB | tr ',' ' '); do
	echo "Loading PostGIS extensions into $DB"
	"${psql[@]}" --dbname="$DB" <<-'EOSQL'
		CREATE EXTENSION IF NOT EXISTS postgis;
		CREATE EXTENSION IF NOT EXISTS postgis_topology;
		CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
		CREATE EXTENSION IF NOT EXISTS postgis_tiger_geocoder;
EOSQL
done
@ImreSamu
Copy link
Member

imho: we need some risk assessment, because the upstream image ( postgres) is not supporting this method ( or this is not the recommended way )
see: docker-library/postgres#240 (comment) ;

@H6LS1S
Copy link
Author

H6LS1S commented Feb 20, 2020

Ok, how long can it take and are there any alternatives? I re-read several related issues on this topic, and most of them are already closed. It all comes down to the fact that the container user must independently override the initialization script. So why not extend this method here, since it image is not a direct Postgres provider?

@phillipross
Copy link
Contributor

I agree with @ImreSamu, risk assessment is necessary to make sure compatibility can be maintained with the postgres images.

While it's not been explicitly stated/documented anywhere that I know of, the postgis images can ideally be used as drop-in replacements for the postgres images. The postgres images can be considered "upstream" and the changes that go into the postgis images need to be analyzed to make sure this compatibility isn't broken. And as much as possible, the extension facilities and recommended usage patterns prescribed by the upstream image should be used and continue to work with the postgis images.

In this specific case, I don't see how the implementation @HELSIS666 lays out would break compatibility, but it doesn't exactly follow the upstream recommendations. The reference that @ImreSamu linked to does begin to spell out ways of accomplishing the same thing.

Given all of this, I actually have a slightly different proposal:

Maybe creating multiple databases within a single container should be done by a separate script which reads different environment variables and provides a solution that is decoupled from what is inherited from the postgres images? Maybe a script that reads different environment variables such as POSTGRES_DBS instead of POSTGRES_DB, for example. The environment variables would be different from what has been established upstream, but have slightly clearer semantics in that multiple dbs can be created. And it would work in addition to the current init script which is modeled after the postgres image. The benefit of this decoupling is that the current init script can evolve along side the postgres image init script much easier. It's easier to maintain that way, and has less chance of breaking as the postgres image evolves. Something I envisioning happening immediately after adding support for initialization of multiple dbs is then the notion of creating multiple dbs with nonuniform credentials and then it just goes on from there with all the new suggestions. Establishing a separate script which is run after the current init script provides a separate extension hook right from the start.

@phillipross
Copy link
Contributor

In light of apparent agreement of my most recent comment...

@HELSIS666 would you like to contribute a PR that would implement a separate script which we can include in the image and provide functionality to initialize multiple databases based on separate optional environment variables?

Alternatively, we can continue discussion here about the specifics of an approach or implementation, and only start working toward a PR once we feel we have a better understanding and more consensus on how to move forward.

If it's the case that nobody has free time or desire to work on the issue and just use the prescribed image customization patterns documented in the upstream image, we can close out this issue.

@md5
Copy link
Contributor

md5 commented Feb 22, 2020

@phillipross I would recommend a slightly different approach

  1. Split initdb-postgis.sh into two scripts, one of which is just the part in the loop
  2. Call that script from initdb-postgis.sh, passing all arguments explicitly (as opposed to using PGUSER on line 6)
  3. Document the other script

With that approach, anyone can add additional init scripts of their own in a derived image that call "the other script" based on whatever environment variables they choose. This can either be done in an image derived FROM postgis/postgis or by bind mounting in the init script(s).

Edit: I guess the missing part is that the base image is actually creating the database for the core DB. I feel like that part is easy enough to include in "the other script" above. Either that or ask the postgres image to split out any special database creation logic into a separate script in the same way I recommend above.

cc/ @tianon @yosifkit

@yosifkit
Copy link

We did a lot of refactoring and "functionalizing" in docker-library/postgres#496; combine that with docker-library/postgres#452 and it should be possible to call any function you want from the orginial entrypoint. The functions not beginning with an underscore are considered the "public" interface of the script when sourcing it (and from sourced initdb.d scripts) and so will be kept as a stable as possible (functionality, names, and arguments will be very unlikely to change). On the flip side, functions beginning with an underscore are "private" and can change at any time.

So something like this should work (in a non-executable sh file):

#!/bin/bash

docker_process_sql --dbname postgres --set db="$myVariable" <<-'EOSQL'
	CREATE DATABASE :"db" ;
EOSQL

I would have said to just use docker_setup_db, but this function was designed to do more than just create a database. Although that is all it does right now in the postgres image, it could have other functionality added later since the MySQL version also does loading of timezone data, root password, and non-root user creation. The description when creating the function was "create initial DB and non-root user with access" (I need to put descriptions of these functions in an easier place); we didn't envision needing a function just for creating a database when it is basically just one line.

TL;DR: I'd recommend just creating any extra databases via sql.


Side note: docker_process_sql is the drop-in replacement for "${psql[@]}", but "${psql[@]}" is still supported for backwards compatibility.

@H6LS1S
Copy link
Author

H6LS1S commented Mar 6, 2020

In light of apparent agreement of my most recent comment...

@HELSIS666 would you like to contribute a PR that would implement a separate script which we can include in the image and provide functionality to initialize multiple databases based on separate optional environment variables?

Alternatively, we can continue discussion here about the specifics of an approach or implementation, and only start working toward a PR once we feel we have a better understanding and more consensus on how to move forward.

If it's the case that nobody has free time or desire to work on the issue and just use the prescribed image customization patterns documented in the upstream image, we can close out this issue.

Yes, I think it’s worth more to analyze all the possible solutions, since I forgot that the databases have their own settings, roles, etc. Would be nice to have an optional interface that will wait for a list of databases settings. Mounting a custom initialization script and settings file should solve my problems, but I would like to hear all the comments on this

@nyurik
Copy link
Contributor

nyurik commented Mar 11, 2020

I think this task should really be moved to the upstream postgres repo and discussed there. While possibly a good idea, I think it should be the PG parent that makes this possible, whereas postgis image should be as simple as possible, and just add various versions of postgis (and related) extensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants