Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgis vector provider 189 #240

Merged
merged 17 commits into from
Sep 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ linters-settings:
- '60'
- '360'
- '255'
- '2.0'
revive:
rules:
- name: call-to-gc
Expand Down
4 changes: 4 additions & 0 deletions docs/development/modules/ROOT/pages/glossary.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,7 @@ The name of the layer as specified in incoming requests. The layer name is used,
== Provider

A Provider is any source of geospatial data that defines what is seen in a layer. Tilegroxy includes a number of distinct Provider implementations, those often allow making web calls to an external service but in many cases can simply map to another Provider implementation with a mutation applied.

== Operator

The Operator is the administrator who creates the tilegroxy configuration and utilizes it in their environment. This is in contrast to the User who consumes the tiles in some web application. Operator input (primarily configuration) is considered trusted and so has much lesser degree of scrutiny, for instance in a provider that utilizes a database it can be possible to inject arbitrary SQL with Operator input. Any input coming from a User in contrast is treated with a presumption of maliciousness.
16 changes: 10 additions & 6 deletions docs/operation/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,16 @@
** xref:configuration/layer.adoc[]
** xref:configuration/provider/index.adoc[]
*** xref:configuration/provider/proxy.adoc[]
*** xref:configuration/provider/url_template.adoc[]
*** xref:configuration/provider/effect.adoc[]
*** xref:configuration/provider/blend.adoc[]
*** xref:configuration/provider/cgi.adoc[]
*** xref:configuration/provider/custom.adoc[]
*** xref:configuration/provider/effect.adoc[]
*** xref:configuration/provider/fallback.adoc[]
*** xref:configuration/provider/static.adoc[]
*** xref:configuration/provider/ref.adoc[]
*** xref:configuration/provider/custom.adoc[]
*** xref:configuration/provider/postgisvector.adoc[]
*** xref:configuration/provider/static.adoc[]
*** xref:configuration/provider/transform.adoc[]
*** xref:configuration/provider/cgi.adoc[]
*** xref:configuration/provider/url_template.adoc[]
** xref:configuration/cache/index.adoc[]
*** xref:configuration/cache/none.adoc[]
*** xref:configuration/cache/multi.adoc[]
Expand All @@ -25,6 +26,8 @@
*** xref:configuration/authentication/static_key.adoc[]
*** xref:configuration/authentication/jwt.adoc[]
*** xref:configuration/authentication/custom.adoc[]
** xref:configuration/datastores/index.adoc[]
*** xref:configuration/datastores/postgresql.adoc[]
** xref:configuration/secret/index.adoc[]
*** xref:configuration/secret/aws_secrets_manager.adoc[]
** xref:configuration/server.adoc[]
Expand All @@ -41,6 +44,7 @@
** xref:commands/seed.adoc[]
** xref:commands/test.adoc[]
* xref:extensibility.adoc[]
* xref:telemetry.adoc[]
* xref:migrate-tilestache.adoc[]
* xref:productionizing.adoc[]
* xref:security.adoc[]
* xref:telemetry.adoc[]
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
= Datastores

The Datastores configuration defines connections to shared resources, primarily databases, that are used by certain providers. Each datastore definition creates a connection pool when the application is running. Like other entities, the datastores configuration uses a parameter called "name" to dictate the type of datastore which controls the specific list of configuration parameters available. Every datastore configuration must also have an "ID" defined which can be any string and is used by a corresponding provider configuration, usually as a `datastore` parameter. See the following sections for the list of supported datastores.

Datastore configurations are only used by providers, caches must have their connection information defined inline.
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
= PostgreSQL

Defines a connection pool to a link:https://www.postgresql.org/[PostgreSQL] database. Utilizes link:https://github.com/jackc/pgx?tab=readme-ov-file#supported-go-and-postgresql-versions[pgx] under the hood which supports PostgreSQL 12 and higher.

Name should be "postgresql"

== Configuration options:

[cols="1,3,1,1,1"]
|===
| Parameter | Description | Type | Required | Default

| ID
| The unique identifier of the datastore used to reference this datastore in a provider
| string
| Yes
| None

| Host
| The hostname to use to connect to the database
| string
| No
| localhost

| Port
| The port to use to connect to the database
| uint16
| No
| 5432

| User
| The user to use to authenticate with postgresql
| string
| No
| postgres

| Password
| The password to use to authenticate with postgresql
| string
| No
| None

| Database
| The name of the database to connect to
| string
| No
| postgres

| MinConnections
| The minimum number of connections to keep in reserve in the connection pool
| int
| No
| 10

| MaxConnections
| The maximum number of connections to allow in the connection pool. Ensure your postgresql instance's `max_connections` is configured high enough to accommodate this setting
| int
| No
| 30

| IdleTimeout
| The amount of time (in seconds) to allow a connection to sit idle before it is removed from the pool
| int
| No
| 10 minutes

| Lifetime
| The maximum amount of time (in seconds) to allow a connection to live in the pool. A jitter of 10% is automatically applied.
| int
| No
| 1 day
|===


== Example:

----
datastores:
- name: postgresql
id: pg-database-0
host: localhost
user: postgres
password: password
database: postgres
----
40 changes: 27 additions & 13 deletions docs/operation/modules/ROOT/pages/configuration/index.adoc
Original file line number Diff line number Diff line change
@@ -1,24 +1,38 @@
= Configuration

Tilegroxy is heavily configuration driven. This document describes the various configuration options available. link:../examples/configurations/[Complete examples are available here.]
Tilegroxy is a configuration driven application. This documentation describes the various configuration options available. Configuration can be supplied as either YAML or JSON format. Documentation is primarily in YAML format however advanced YAML features are avoided to make it easy to convert to JSON.

Some configuration sections (<<authentication,authentication>>, <<provider,provider>>, <<cache,cache>>, and <<secret,secret>>) support selecting different methods of operation that change the full list of parameters available. For example, a "proxy" provider requires a `url` parameter to get a map tile from another server while a "static" provider takes in a `image` to return for every request. You select these operating modes using a parameter called `name`. Since these entities are too dynamic to have fixed environment variables and frequently may require a secret to operate, any string parameters can be made to use an environment variable by specifying a value in the format of `env.ENV_VAR_NAME`. You can also use an external secret store <<secret,if configured>> by specifying a value in the format `secret.SECRET_NAME`
link:https://github.com/Michad/tilegroxy/tree/main/examples/configurations[Complete examples are available here.]

Configuration key names are case-insensitive unless indicated otherwise. Names are always lower case.
== Conventions

Parameter names (configuration keys) are case-insensitive unless indicated otherwise.

Names (see below) are always lower case.

Some parameters can be specified by environment variables which must be upper case. Environment variables override config parameters which override default values.

== Entities

Some configuration sections (xref:configuration/authentication/index.adoc[authentication], xref:configuration/provider/index.adoc[provider], xref:configuration/cache/index.adoc[cache], xref:configuration/datastores/index.adoc[datastores] and xref:configuration/secret/index.adoc[secret]) support selecting different methods of operation that change the full list of parameters available. For example, a "proxy" provider requires a `url` parameter to get a map tile from another server while a "static" provider takes in a `image` to return for every request. You select these operating modes using a parameter called `name`.

Since these entities are too dynamic to have fixed environment variables and frequently may require a secret to operate, any string parameters can be made to use an environment variable by specifying a value in the format of `env.ENV_VAR_NAME`. You can also use an external secret store xref:configuration/secret/index.adoc[if configured] by specifying a value in the format `secret.SECRET_NAME`

== Structure

The following is the top-level configuration structure. All top-level keys are optional besides layers:

____
<<server,server>>: ... +
<<client,client>>: ... +
<<log,logging>>: ... +
<<telemetry,telemetry>>: ... +
<<error,error>>: ... +
<<secret,secret>>: ... +
<<authentication,authentication>>: ... +
<<cache,cache>>: ... +
<<layer,layers>>: +
- ...
xref:configuration/server.adoc[server]: ... +
xref:configuration/client.adoc[client]: ... +
xref:configuration/log.adoc[logging]: ... +
xref:configuration/telemetry.adoc[telemetry]: ... +
xref:configuration/error.adoc[error]: ... +
xref:configuration/secret/index.adoc[secret]: ... +
xref:configuration/authentication/index.adoc[authentication]: ... +
xref:configuration/cache/index.adoc[cache]: ... +
xref:configuration/datastores/index.adoc[datastores]: +
- ... +
xref:configuration/layer.adoc[layers]: +
- ... +
____
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
= Postgis Vector (MVT)

This provider pulls from a table/view in a link:https://www.postgresql.org/[PostgreSQL] database with a link:https://postgis.net/[Postgis] Geometry column and outputs in link:https://github.com/mapbox/vector-tile-spec[MVT] format. This requires Postgis 3.X with a corresponding version of PostgreSQL. This provider does not support raster or geography data.

The intent of this provider is to avoid needing to install and operate a separate server for light use-cases with standard table structures. The data is pulled from postgis using built-in functions and a fixed format. If you need a highly customized query to pull the data from PG then it's recommended you use a dedicated server for this such as link:https://mapserver.org/[Mapserver] (see the xref:configuration/provider/cgi.adoc[CGI provider]) or link:https://martin.maplibre.org/[Martin] (see the xref:configuration/provider/proxy.adoc[Proxy provider]).

This provider is one of a few that directly talks to a database which brings with it special security concerns. Please see xref:security.adoc[Security] documentation for a discussion on Tilegroxy's trust model.

Name should be "postgismvt"

Configuration options:

[cols="1,3,1,1,1"]
|===
| Parameter | Description | Type | Required | Default


| Datastore
| The ID of the datastore to use for retrieving data. The datastore must have a type of "postgresql". Also see the xref:configuration/datastores/index.adoc[Datastores] documentation.
| string
| Yes
| None

| Table
| The relation (table/view/materialized view) to pull data from (including schema if outside the default search path)
| string
| Yes
| None

| Extent
| The resolution of the vector tile. Decrease this to make tiles smaller but more "blocky"
| uint
| No
| 4096

| Buffer
| How much extra data off the edges of vector tiles to include. This helps avoid phantom grid-lines and enables consistency in icons/label placement between tiles. The buffer is relative to the size of a tile; 0 means no buffer and 1 means a buffer equal to size of a tile
| float64
| No
| 0.125

| GID
| The name of the feature ID column. This value is case-sensitive; this normally means it should be left in all-lowercase.
| string
| No
| gid

| Geometry
| The name of the geometry column. This value is case-sensitive; this normally means it should be left in all-lowercase.
| string
| No
| geom

| Attributes
| Any other columns from the table to include as attributes in the vector tile. This value is case-sensitive; this normally means it should be left in all-lowercase.
| []string
| No
| None

| Filter
| A SQL snippet to include inside the WHERE clause of the query used to retrieve data. This snippet can include the standard placeholders (see xref:configuration/provider/proxy.adoc[Proxy provider] for a list of these). The placeholder values are included as parameters in Prepared Statement to prevent SQL Injection, however outside of that the Filter is inserted into the SQL as-is.
| string
| No
| None

| SourceSRID
| The link:https://postgis.net/docs/using_postgis_dbmanagement.html#spatial_ref_sys[SRID] of the geometries in the table. Mixed-projection tables are not supported.
| uint
| No
| 4326

| Limit
| A sanity limit of the number of geometries to include in the vector tiles. Using this provider against very large tables can give poor performance when zoomed-out, this parameter is a protection against intensive queries hanging until the request timeout limit is hit. There is no guarantee to which geometries will be skipped when the limit is hit which can lead to a bad user experience, therefore this is only recommended to be used as a secondary protection.
| uint
| No
| Unlimited

|===
36 changes: 36 additions & 0 deletions docs/operation/modules/ROOT/pages/security.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
= Security

Tilegroxy operates as a configuration driven framework for flexibly serving up data that comes from a wide variety of possible sources. The flexible configuration means that an operator can misconfigure tilegroxy in an insecure manner that opens it up to a wide variety of vulnerabilities. Tilegroxy cannot and does not attempt to prevent all possible misconfigurations however it aims to equip operators with the tools necessary to have a secure deployment when properly configured. This document describes the considerations one should have when deploying tilegroxy inside a security sensitive environment.

== User Inputs

Tilegroxy draws a distinction between inputs provided by the Operator (probably you) and the User (the end user who makes tile requests). Operator input is inherently trusted and beyond preventing foot-guns, no attempt is made to prevent potentially malicious intent. User input meanwhile is treated as potentially hostile.

=== Layer Name Parameters

Tilegroxy supports parameterized layer names where each parameter is arbitrary user input. By default these parameters are not used for anything and merely provide flexibility in how one refers to a map layer. However, the Operator can configure these parameters to be used in a variety of ways, for instance placed inside of proxied URLs, inside of SQL as parameters in Prepared Statements, or as inputs in arbitrary code.

A `parameterValidator` configuration option is available that allows defining a Regular Expression to validate the parameters specified. It is recommended the Operator defines restrictions as tight as possible, often allowing only alphanumeric values.

The configuration for tilegroxy should be treated the same as one would treat code, it should be carefully reviewed, kept in source control, and one should never trust complex configuration from third parties without vetting it for bugs and vulnerabilities.

=== Headers and Query Params

By default tilegroxy ignores incoming HTTP headers and the query string. It is possible for the Operator to utilize these in many of the same situations described in Layer Name Parameters. However, there is no facility for validating these inputs so operators are advised to use extreme caution in trusting these inputs.

== Downstream Responses

One of the primary use-cases for tilegroxy is to proxy tile requests to another HTTP(s) server. These inputs are cached and returned to the user, either as-is or with transformation. In order to prevent a third-party server returning malicious inputs a few protections are available:

* Content Type validation - if the server returns an invalid content-type then the request errors. By default only PNG and JPG images are allowed.
** Content Type mirroring - when returning a payload retrieved from a third party HTTP service tilegroxy will return the same content-type as the server, this helps prevent using tilegroxy to deliver malicious javascript payloads disguised under a false content type
* Content Size validation - if the server fails to return a content size header or if the content size is greater than a limit (by default 10 MiB) then the request errors. Tilegroxy will not allocate a buffer greater than the allowed size even if the server responds with a larger response than advertised
* TLS Validation - if the URL configured includes an https protocol then standard TLS certificate validation occurs. Tilegroxy uses default Go settings for validation including a minimum version of TLS 1.2 and rejection of insecure ciphers

Despite these protections it's important that operators only utilize trusted services to originate its maps. Tilegroxy cannot protect against a third party returning a valid result containing offensive imagery. And never allow user input to dictate the hostname used.

== Authentication

Tilegroxy provides a flexible capability for xref:configuration/authentication/index.adoc[auth] but defaults to operating unauthenticated. The flagship mode is via JWTs provided via HTTP header, which provides a capability for both authentication and authorization by layer and geographic region.

Tilegroxy is primarily intended to be used as a microservice within a broader application ecosystem so implementing the specifics of how authentication should work end-to-end is left up to the operator in conjunction with their needs.
52 changes: 52 additions & 0 deletions examples/configurations/postgis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
server:
headers:
Access-Control-Allow-Origin: "*"
Logging:
Main:
Level: trace
cache:
name: memory
datastores:
- name: postgresql
id: vector-database-0
host: localhost
user: postgres
password: password
database: postgres

layers:
# This pulls from a US census TIGER data table containing all current counties. This data can be loaded with https://postgis.net/docs/Loader_Generate_Nation_Script.html
- id: tiger_county
provider:
name: postgismvt
datastore: vector-database-0
table: tiger_data.county_all
gid: gid
geometry: the_geom
attributes:
- "name"
sourcesrid: 4269
# This pulls from a table detailing historic US counties. The layer includes a parameter that allows you to select an arbitrary year. This data can be found https://digital.newberry.org/ahcb/pages/United_States.html
- id: counties_by_year
pattern: counties_{year}
paramValidator:
"year": "^[0-9]{4}$"
provider:
name: postgismvt
# Must match the ID of a datastore in the datastores section above that has a name of postgresql
datastore: vector-database-0
# Loaded like shp2pgsql -g geom -I US_HistCounties.shp histcounties | psql -h localhost
table: public.histcounties
gid: id_num
geometry: geom
# Controls the resolution of the resulting vector tiles. Defaults to 4096
extent: 256
# Includes extra data off the edges of vector tiles to avoid grid-lines and to enable consistency in icons/labels. Defaults to 1/8th
buffer: 0.01
# Indicate any other columns to include in the tiles. Case sensitive (all lowercase if you didn't quote the columns when loading). By default no extra columns are included.
attributes:
- "full_name"
# A snippet that goes inside a WHERE clause. Note that the {layer.year} is loaded as a parameter in a prepared statement to prevent SQL injection from user input. However the rest of this is inserted into SQL as-is so be cautious that it comes from trusted parties.
filter: "to_date({layer.year}, 'yyyy') BETWEEN start_date AND end_date"
# Needs to match the geometries in the table
sourcesrid: 4326
Loading