From 7296f804d74087e17493d6c9d5dc797df8dcbbfd Mon Sep 17 00:00:00 2001 From: Lana Brindley Date: Tue, 16 Nov 2021 11:15:02 +1000 Subject: [PATCH] "will" removal (#593) * API docs * Cloud * MST * Getting Started * How Tos * Top level pages * Code quick starts * Tutorials * Tutorials Part Deux * Contributing --- CONTRIBUTING.md | 4 +- api/add_compression_policy.md | 14 +- api/add_data_node.md | 22 +- api/add_dimension.md | 4 +- api/add_job.md | 2 +- api/add_reorder_policy.md | 10 +- api/add_retention_policy.md | 20 +- api/alter_job.md | 28 +- api/alter_table_compression.md | 10 +- api/approximate_row_count.md | 8 +- api/attach_data_node.md | 12 +- api/attach_tablespace.md | 12 +- api/chunks.md | 26 +- api/compress_chunk.md | 4 +- api/create_distributed_hypertable.md | 24 +- api/create_hypertable.md | 54 ++-- api/create_index.md | 14 +- api/create_materialized_view.md | 2 +- api/decompress_chunk.md | 12 +- api/delete_data_node.md | 6 +- api/delete_job.md | 6 +- api/detach_data_node.md | 6 +- api/detach_tablespace.md | 14 +- api/detach_tablespaces.md | 10 +- api/dimensions.md | 48 ++-- api/distributed-hypertables.md | 8 +- api/distributed_exec.md | 10 +- api/drop_chunks.md | 14 +- api/error.md | 2 +- api/first.md | 4 +- api/index.md | 4 +- api/jobs.md | 26 +- api/last.md | 4 +- api/move_chunk.md | 2 +- api/refresh_continuous_aggregate.md | 6 +- api/remove_compression_policy.md | 10 +- api/reorder_chunk.md | 8 +- api/rollup-stats.md | 12 +- api/set_number_partitions.md | 2 +- api/set_replication_factor.md | 10 +- api/show_chunks.md | 14 +- api/time-weighted-averages.md | 2 +- api/time_weight.md | 4 +- api/uddsketch.md | 2 +- api/x_intercept.md | 6 +- cloud/create-a-service.md | 4 +- cloud/customize-configuration.md | 4 +- cloud/scaling-a-service.md | 8 +- cloud/vpc-peering-aws/create.md | 25 +- cloud/vpc-peering-aws/index.md | 8 +- cloud/vpc-peering-aws/migrate.md | 32 +-- mst/ingest-data.md | 2 +- mst/mst-multi-node.md | 18 +- mst/security.md | 20 +- mst/viewing-service-logs.md | 2 +- mst/vpc-peering.md | 6 +- timescaledb/contribute-to-docs.md | 8 +- timescaledb/contribute-to-timescaledb.md | 6 +- .../getting-started/access-timescaledb.md | 12 +- timescaledb/getting-started/compress-data.md | 8 +- timescaledb/getting-started/create-cagg.md | 10 +- timescaledb/getting-started/data-retention.md | 6 +- timescaledb/getting-started/index.md | 2 +- .../getting-started/launch-timescaledb.md | 6 +- timescaledb/getting-started/migrate-data.md | 2 +- timescaledb/how-to-guides/alerting.md | 2 +- .../compression/about-compression.md | 6 +- .../compression/backfill-historical-data.md | 18 +- .../compression/decompress-chunks.md | 2 +- .../configuration/configuration.md | 16 +- .../configuration/postgres-config.md | 6 +- .../how-to-guides/configuration/telemetry.md | 12 +- .../configuration/timescaledb-config.md | 8 +- .../configuration/timescaledb-tune.md | 2 +- timescaledb/how-to-guides/connecting/index.md | 8 +- timescaledb/how-to-guides/connecting/psql.md | 20 +- .../real-time-aggregates.md | 2 +- .../continuous-aggregates/refresh-policies.md | 8 +- ...ta-retention-with-continuous-aggregates.md | 14 +- .../data-retention/manually-drop-chunks.md | 6 +- .../how-to-guides/data-tiering/move-data.md | 10 +- .../hyperfunctions/advanced-agg.md | 4 +- .../hyperfunctions/function-pipelines.md | 34 +-- .../hyperfunctions/time-weighted-averages.md | 2 +- .../hypertables/best-practices.md | 2 +- .../hypertables/distributed-hypertables.md | 2 +- timescaledb/how-to-guides/ingest-data.md | 2 +- .../installation-apt-debian.md | 12 +- .../installation-apt-ubuntu.md | 12 +- .../installation-docker.md | 13 +- .../installation-grafana.md | 8 +- .../installation-homebrew.md | 8 +- .../installation-source-windows.md | 4 +- .../installation-source.md | 2 +- .../installation-ubuntu-ami.md | 4 +- .../installation-windows.md | 6 +- .../install-timescaledb/installation-yum.md | 10 +- .../managed-service-for-timescaledb.md | 41 ++- .../migrate-data/different-db.md | 2 +- .../migrate-data/migrate-influxdb.md | 16 +- timescaledb/how-to-guides/psql-basics.md | 2 +- .../query-data/advanced-analytic-queries.md | 2 +- .../replication-and-ha/replication.md | 50 ++-- .../how-to-guides/schema-management/alter.md | 4 +- .../schema-management/constraints.md | 8 +- .../schema-management/indexing.md | 6 +- .../how-to-guides/schema-management/json.md | 2 +- .../schema-management/tablespaces.md | 8 +- .../schema-management/triggers.md | 8 +- timescaledb/how-to-guides/tooling.md | 2 +- .../how-to-guides/update-timescaledb/index.md | 2 +- .../update-timescaledb-1.md | 6 +- .../update-timescaledb-2.md | 8 +- .../update-timescaledb/update-timescaledb.md | 4 +- .../update-timescaledb/updating-docker.md | 4 +- .../update-timescaledb/upgrade-postgresql.md | 13 +- .../create-and-register.md | 2 +- .../how-to-guides/write-data/delete.md | 3 +- timescaledb/how-to-guides/write-data/index.md | 4 +- .../how-to-guides/write-data/insert.md | 4 +- .../how-to-guides/write-data/update.md | 2 +- .../how-to-guides/write-data/upsert.md | 10 +- timescaledb/index.md | 2 +- timescaledb/integrations/ingesting-data.md | 2 +- timescaledb/overview/core-concepts/chunks.md | 12 +- .../overview/core-concepts/compression.md | 12 +- .../core-concepts/continuous-aggregates.md | 6 +- .../overview/core-concepts/data-retention.md | 4 +- .../core-concepts/distributed-hypertables.md | 20 +- .../core-concepts/hypertables-and-chunks.md | 12 +- timescaledb/overview/core-concepts/scaling.md | 2 +- .../narrow-data-model.md | 2 +- timescaledb/overview/faq/faq-postgres.md | 2 +- timescaledb/overview/faq/faq-products.md | 4 +- .../timescaledb-vs-postgres.md | 12 +- .../release-notes/changes-in-timescaledb-2.md | 30 +- timescaledb/overview/release-notes/index.md | 33 ++- timescaledb/overview/why-timescaledb.md | 2 +- timescaledb/quick-start/dotnet.md | 43 ++- timescaledb/quick-start/golang.md | 22 +- timescaledb/quick-start/java.md | 40 +-- timescaledb/quick-start/node.md | 26 +- timescaledb/quick-start/python.md | 18 +- timescaledb/quick-start/ruby.md | 38 +-- .../tutorials/analyze-cryptocurrency-data.md | 4 +- .../fetch-and-ingest.md | 40 +-- .../analyze-intraday-stocks/index.md | 18 +- .../analyzing-nft-transactions.md | 266 +++++++++--------- .../analyze-nft-data/nft-schema-ingestion.md | 114 ++++---- .../aws-lambda/3rd-party-api-ingest.md | 4 +- .../aws-lambda/continuous-deployment.md | 4 +- .../grafana/create-dashboard-and-panel.md | 32 +-- .../grafana/geospatial-dashboards.md | 23 +- .../tutorials/grafana/grafana-variables.md | 24 +- timescaledb/tutorials/grafana/setup-alerts.md | 36 +-- .../grafana/visualize-missing-data.md | 6 +- timescaledb/tutorials/index.md | 2 +- .../monitor-django-with-prometheus.md | 6 +- .../tutorials/monitor-mst-with-prometheus.md | 14 +- .../nfl-analytics/advanced-analysis.md | 2 +- timescaledb/tutorials/nfl-analytics/index.md | 4 +- .../nfl-analytics/join-with-relational.md | 55 ++-- .../nfl-analytics/play-visualization.md | 2 +- timescaledb/tutorials/nfl-fantasy-league.md | 164 +++++------ timescaledb/tutorials/nyc-taxi-cab.md | 54 ++-- timescaledb/tutorials/promscale/index.md | 2 +- .../promscale/promscale-how-it-works.md | 14 +- .../tutorials/promscale/promscale-install.md | 16 +- .../promscale/promscale-run-queries.md | 10 +- timescaledb/tutorials/sample-datasets.md | 4 +- .../setting-up-mst-for-prometheus.md | 12 +- .../tutorials/simulate-iot-sensor-data.md | 6 +- .../tutorials/telegraf-output-plugin.md | 6 +- timescaledb/tutorials/time-series-forecast.md | 24 +- .../tutorials/visualize-with-tableau.md | 16 +- 175 files changed, 1259 insertions(+), 1258 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9f0885f66351..327fd471b9d5 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -110,7 +110,7 @@ this, you *will* end up with merge conflicts. ```bash git checkout latest ``` - You will get a message like this: + You get a message like this: ```bash Switched to branch 'latest' Your branch is up to date with 'origin/latest'. @@ -127,7 +127,7 @@ this, you *will* end up with merge conflicts. ``` 1. If you are continuing work you began earlier, check out the branch that contains your work. For new work, create a new branch. Doing this regularly as - you are working will mean you keep your local copies up to date and avoid + you are working means you keep your local copies up to date and avoid conflicts. You should do it at least every day before you begin work, and again whenever you switch branches. diff --git a/api/add_compression_policy.md b/api/add_compression_policy.md index 8cd45668bb68..85f42554bb52 100644 --- a/api/add_compression_policy.md +++ b/api/add_compression_policy.md @@ -1,6 +1,6 @@ -# add_compression_policy() -Allows you to set a policy by which the system will compress a chunk -automatically in the background after it reaches a given age. +# add_compression_policy() +Allows you to set a policy by which the system compresses a chunk +automatically in the background after it reaches a given age. Note that compression policies can only be created on hypertables that already have compression enabled, e.g., via the [`ALTER TABLE`][compression_alter-table] command @@ -11,7 +11,7 @@ to set `timescaledb.compress` and other configuration parameters. |Name|Type|Description| |---|---|---| | `hypertable` |REGCLASS| Name of the hypertable| -| `compress_after` | INTERVAL or INTEGER | The age after which the policy job will compress chunks| +| `compress_after` | INTERVAL or INTEGER | The age after which the policy job compresses chunks| The `compress_after` parameter should be specified differently depending on the type of the time column of the hypertable: - For hypertables with TIMESTAMP, TIMESTAMPTZ, and DATE time columns: the time interval should be an INTERVAL type. @@ -22,9 +22,9 @@ the [integer_now_func][set_integer_now_func] to be set). |Name|Type|Description| |---|---|---| -| `if_not_exists` | BOOLEAN | Setting to true will cause the command to fail with a warning instead of an error if a compression policy already exists on the hypertable. Defaults to false.| +| `if_not_exists` | BOOLEAN | Setting to true causes the command to fail with a warning instead of an error if a compression policy already exists on the hypertable. Defaults to false.| -### Sample Usage +### Sample Usage Add a policy to compress chunks older than 60 days on the 'cpu' hypertable. ``` sql @@ -39,4 +39,4 @@ SELECT add_compression_policy('table_with_bigint_time', BIGINT '600000'); [compression_alter-table]: /api/:currentVersion:/compression/alter_table_compression/ -[set_integer_now_func]: /hypertable/set_integer_now_func \ No newline at end of file +[set_integer_now_func]: /hypertable/set_integer_now_func diff --git a/api/add_data_node.md b/api/add_data_node.md index 2b1cd93c86a5..d268882d519b 100644 --- a/api/add_data_node.md +++ b/api/add_data_node.md @@ -1,17 +1,17 @@ ## add_data_node() Community Add a new data node on the access node to be used by distributed -hypertables. The data node will automatically be used by distributed +hypertables. The data node is automatically used by distributed hypertables that are created after the data node has been added, while existing distributed hypertables require an additional [`attach_data_node`](/distributed-hypertables/attach_data_node). -If the data node already exists, the command will abort with either an +If the data node already exists, the command aborts with either an error or a notice depending on the value of `if_not_exists`. For security purposes, only superusers or users with necessary privileges can add data nodes (see below for details). When adding a -data node, the access node will also try to connect to the data node +data node, the access node also tries to connect to the data node and therefore needs a way to authenticate with it. TimescaleDB currently supports several different such authentication methods for flexibility (including trust, user mappings, password, and certificate @@ -19,9 +19,9 @@ methods). Please refer to [Setting up Multi-Node TimescaleDB][multinode] for more information about node-to-node authentication. -Unless `bootstrap` is false, the function will attempt to bootstrap +Unless `bootstrap` is false, the function attempts to bootstrap the data node by: -1. Creating the database given in `database` that will serve as the +1. Creating the database given in `database` that serve as the new data node. 2. Loading the TimescaleDB extension in the new database. 3. Setting metadata to make the data node part of the distributed @@ -43,13 +43,13 @@ after it is added. | Name | Description | |----------------------|-------------------------------------------------------| -| `database` | Database name where remote hypertables will be created. The default is the current database name. | +| `database` | Database name where remote hypertables are created. The default is the current database name. | | `port` | Port to use on the remote data node. The default is the PostgreSQL port used by the access node on which the function is executed. | | `if_not_exists` | Do not fail if the data node already exists. The default is `FALSE`. | | `bootstrap` | Bootstrap the remote data node. The default is `TRUE`. | -| `password` | Password for authenticating with the remote data node during bootstrapping or validation. A password only needs to be provided if the data node requires password authentication and a password for the user does not exist in a local password file on the access node. If password authentication is not used, the specified password will be ignored. | +| `password` | Password for authenticating with the remote data node during bootstrapping or validation. A password only needs to be provided if the data node requires password authentication and a password for the user does not exist in a local password file on the access node. If password authentication is not used, the specified password is ignored. | -### Returns +### Returns | Column | Description | |---------------------|---------------------------------------------------| @@ -63,7 +63,7 @@ after it is added. #### Errors -An error will be given if: +An error is given if: * The function is executed inside a transaction. * The function is executed in a database that is already a data node. * The data node already exists and `if_not_exists` is `FALSE`. @@ -87,7 +87,7 @@ Note, however, that superuser privileges might still be necessary on the data node in order to bootstrap it, including creating the TimescaleDB extension on the data node unless it is already installed. -### Sample Usage +### Sample Usage Let's assume that you have an existing hypertable `conditions` and want to use `time` as the time partitioning column and `location` as @@ -111,4 +111,4 @@ SELECT create_distributed_hypertable('conditions', 'time', 'location'); ``` Note that this does not offer any performance advantages over using a -regular hypertable, but it can be useful for testing. \ No newline at end of file +regular hypertable, but it can be useful for testing. diff --git a/api/add_dimension.md b/api/add_dimension.md index f7b88f4273dc..cecb172408ad 100644 --- a/api/add_dimension.md +++ b/api/add_dimension.md @@ -97,8 +97,8 @@ queries. | `created` | BOOLEAN | True if the dimension was added, false when `if_not_exists` is true and no dimension was added. | When executing this function, either `number_partitions` or -`chunk_time_interval` must be supplied, which will dictate if the -dimension will use hash or interval partitioning. +`chunk_time_interval` must be supplied, which dictates if the +dimension uses hash or interval partitioning. The `chunk_time_interval` should be specified as follows: diff --git a/api/add_job.md b/api/add_job.md index 404127e63697..53821817e3a8 100644 --- a/api/add_job.md +++ b/api/add_job.md @@ -15,7 +15,7 @@ multiple example actions. |Name|Type|Description| |---|---|---| -| `config` | JSONB | Job-specific configuration (this will be passed to the function when executed) | +| `config` | JSONB | Job-specific configuration (this is passed to the function when executed) | | `initial_start` | TIMESTAMPTZ | Time of first execution of job | | `scheduled` | BOOLEAN | Set to `FALSE` to exclude this job from scheduling. Defaults to `TRUE`. | diff --git a/api/add_reorder_policy.md b/api/add_reorder_policy.md index 29e39d0a336d..c7b7ca2aee48 100644 --- a/api/add_reorder_policy.md +++ b/api/add_reorder_policy.md @@ -1,11 +1,11 @@ -## add_reorder_policy() Community +## add_reorder_policy() Community Create a policy to reorder chunks on a given hypertable index in the background. (See [reorder_chunk](/hypertable/reorder_chunk)). Only one reorder policy may -exist per hypertable. Only chunks that are the 3rd from the most recent will be +exist per hypertable. Only chunks that are the 3rd from the most recent are reordered to avoid reordering chunks that are still being inserted into. - Once a chunk has been reordered by the background worker it will not be + Once a chunk has been reordered by the background worker it is not reordered again. So if one were to insert significant amounts of data in to older chunks that have already been reordered, it might be necessary to manually re-run the [reorder_chunk](/api/latest/hypertable/reorder_chunk) function on older chunks, or to drop @@ -25,14 +25,14 @@ and re-create the policy if many older chunks have been affected. |---|---|---| | `if_not_exists` | BOOLEAN | Set to true to avoid throwing an error if the reorder_policy already exists. A notice is issued instead. Defaults to false. | -### Returns +### Returns |Column|Type|Description| |---|---|---| |`job_id`| INTEGER | TimescaleDB background job id created to implement this policy| -### Sample Usage +### Sample Usage ```sql diff --git a/api/add_retention_policy.md b/api/add_retention_policy.md index e5762ea94c1c..0f78da189182 100644 --- a/api/add_retention_policy.md +++ b/api/add_retention_policy.md @@ -1,8 +1,8 @@ -## add_retention_policy() Community +## add_retention_policy() Community Create a policy to drop chunks older than a given interval of a particular hypertable or continuous aggregate on a schedule in the background. (See [drop_chunks](/hypertable/drop_chunks)). -This implements a data retention policy and will remove data on a schedule. Only +This implements a data retention policy and removes data on a schedule. Only one retention policy may exist per hypertable. ### Required Arguments @@ -10,13 +10,13 @@ one retention policy may exist per hypertable. |Name|Type|Description| |---|---|---| | `relation` | REGCLASS | Name of the hypertable or continuous aggregate to create the policy for. | -| `drop_after` | INTERVAL or INTEGER | Chunks fully older than this interval when the policy is run will be dropped| +| `drop_after` | INTERVAL or INTEGER | Chunks fully older than this interval when the policy is run are dropped| -The `drop_after` parameter should be specified differently depending on the +The `drop_after` parameter should be specified differently depending on the type of the time column of the hypertable: -- For hypertables with TIMESTAMP, TIMESTAMPTZ, and DATE time columns: the time +- For hypertables with TIMESTAMP, TIMESTAMPTZ, and DATE time columns: the time interval should be an INTERVAL type. -- For hypertables with integer-based timestamps: the time interval should be an +- For hypertables with integer-based timestamps: the time interval should be an integer type (this requires the [integer_now_func](/hypertable/set_integer_now_func) to be set). ### Optional Arguments @@ -25,20 +25,20 @@ integer type (this requires the [integer_now_func](/hypertable/set_integer_now_f |---|---|---| | `if_not_exists` | BOOLEAN | Set to true to avoid throwing an error if the drop_chunks_policy already exists. A notice is issued instead. Defaults to false. | -### Returns +### Returns |Column|Type|Description| |---|---|---| |`job_id`| INTEGER | TimescaleDB background job id created to implement this policy| -### Sample Usage +### Sample Usage Create a data retention policy to discard chunks greater than 6 months old: ```sql SELECT add_retention_policy('conditions', INTERVAL '6 months'); ``` -Create a data retention policy with an integer-based time column: +Create a data retention policy with an integer-based time column: ```sql SELECT add_retention_policy('conditions', BIGINT '600000'); -``` \ No newline at end of file +``` diff --git a/api/alter_job.md b/api/alter_job.md index 98505bc1b3b7..9d8b8b94ed58 100644 --- a/api/alter_job.md +++ b/api/alter_job.md @@ -1,10 +1,10 @@ -## alter_job() Community +## alter_job() Community Actions scheduled via TimescaleDB's automation framework run periodically in a background worker. You can change the schedule of their execution using `alter_job`. To alter an existing job, you must refer to it by `job_id`. The `job_id` which executes a given action and its current schedule can be found -either in the `timescaledb_information.jobs` view, which lists information +either in the `timescaledb_information.jobs` view, which lists information about every scheduled action, as well as in `timescaledb_information.job_stats`. The `job_stats` view additionally contains information about when each job was last run and other useful statistics for deciding what the new schedule should be. @@ -20,28 +20,28 @@ last run and other useful statistics for deciding what the new schedule should b |Name|Type|Description| |---|---|---| | `schedule_interval` | INTERVAL | The interval at which the job runs | -| `max_runtime` | INTERVAL | The maximum amount of time the job will be allowed to run by the background worker scheduler before it is stopped | -| `max_retries` | INTEGER | The number of times the job will be retried should it fail | -| `retry_period` | INTERVAL | The amount of time the scheduler will wait between retries of the job on failure | +| `max_runtime` | INTERVAL | The maximum amount of time the job is allowed to run by the background worker scheduler before it is stopped | +| `max_retries` | INTEGER | The number of times the job is retried should it fail | +| `retry_period` | INTERVAL | The amount of time the scheduler waits between retries of the job on failure | | `scheduled` | BOOLEAN | Set to `FALSE` to exclude this job from being run as background job. | -| `config` | JSONB | Job-specific configuration (this will be passed to the function when executed)| +| `config` | JSONB | Job-specific configuration (this is passed to the function when executed)| | `next_start` | TIMESTAMPTZ | The next time at which to run the job. The job can be paused by setting this value to 'infinity' (and restarted with a value of now()). | -| `if_exists` | BOOLEAN | Set to true to avoid throwing an error if the job does not exist, a notice will be issued instead. Defaults to false. | +| `if_exists` | BOOLEAN | Set to true to avoid throwing an error if the job does not exist, a notice is issued instead. Defaults to false. | -### Returns +### Returns |Column|Type|Description| |---|---|---| | `job_id` | INTEGER | the id of the job being modified | | `schedule_interval` | INTERVAL | The interval at which the job runs | -| `max_runtime` | INTERVAL | The maximum amount of time the job will be allowed to run by the background worker scheduler before it is stopped | -| `max_retries` | INTEGER | The number of times the job will be retried should it fail | -| `retry_period` | INTERVAL | The amount of time the scheduler will wait between retries of the job on failure | -| `scheduled` | BOOLEAN | True if this job will be executed by the TimescaleDB scheduler. | -| `config` | JSONB | Job-specific configuration (this will be passed to the function when executed)| +| `max_runtime` | INTERVAL | The maximum amount of time the job is allowed to run by the background worker scheduler before it is stopped | +| `max_retries` | INTEGER | The number of times the job is retried should it fail | +| `retry_period` | INTERVAL | The amount of time the scheduler waits between retries of the job on failure | +| `scheduled` | BOOLEAN | True if this job is executed by the TimescaleDB scheduler. | +| `config` | JSONB | Job-specific configuration (this is passed to the function when executed)| | `next_start` | TIMESTAMPTZ | The next time at which to run the job. | -### Sample Usage +### Sample Usage ```sql SELECT alter_job(1000, schedule_interval => INTERVAL '2 days'); diff --git a/api/alter_table_compression.md b/api/alter_table_compression.md index 5e86fa318107..456b1b24d447 100644 --- a/api/alter_table_compression.md +++ b/api/alter_table_compression.md @@ -10,12 +10,12 @@ ALTER TABLE SET (timescaledb.compress, timescaledb.compress_orderby timescaledb.compress_segmentby = ' [, ...]' ); ``` -#### Required Options +#### Required Options |Name|Type|Description| |---|---|---| | `timescaledb.compress` | BOOLEAN | Enable/Disable compression | -#### Other Options +#### Other Options |Name|Type|Description| |---|---|---| | `timescaledb.compress_orderby` | TEXT |Order used by compression, specified in the same way as the ORDER BY clause in a SELECT query. The default is the descending order of the hypertable's time column. | @@ -24,12 +24,12 @@ timescaledb.compress_segmentby = ' [, ...]' ### Parameters |Name|Type|Description| |---|---|---| -| `table_name` | TEXT |Hypertable that will support compression | +| `table_name` | TEXT |Hypertable that supports compression | | `column_name` | TEXT | Column used to order by and/or segment by | -### Sample Usage +### Sample Usage Configure a hypertable that ingests device data to use compression. ```sql ALTER TABLE metrics SET (timescaledb.compress, timescaledb.compress_orderby = 'time DESC', timescaledb.compress_segmentby = 'device_id'); -``` \ No newline at end of file +``` diff --git a/api/approximate_row_count.md b/api/approximate_row_count.md index 66f42c76050f..f79712cf9c8a 100644 --- a/api/approximate_row_count.md +++ b/api/approximate_row_count.md @@ -1,9 +1,9 @@ -## approximate_row_count() +## approximate_row_count() Get approximate row count for hypertable, distributed hypertable, or regular PostgreSQL table based on catalog estimates. This function support tables with nested inheritance and declarative partitioning. -The accuracy of approximate_row_count depends on the database having up-to-date statistics about the table or hypertable, which are updated by VACUUM, ANALYZE, and a few DDL commands. If you have auto-vacuum configured on your table or hypertable, or changes to the table are relatively infrequent, you might not need to explicitly ANALYZE your table as shown below. Otherwise, if your table statistics are too out-of-date, running this command will update your statistics and yield more accurate approximation results. +The accuracy of approximate_row_count depends on the database having up-to-date statistics about the table or hypertable, which are updated by VACUUM, ANALYZE, and a few DDL commands. If you have auto-vacuum configured on your table or hypertable, or changes to the table are relatively infrequent, you might not need to explicitly ANALYZE your table as shown below. Otherwise, if your table statistics are too out-of-date, running this command updates your statistics and yield more accurate approximation results. ### Required Arguments @@ -11,7 +11,7 @@ The accuracy of approximate_row_count depends on the database having up-to-date |---|---|---| | `relation` | REGCLASS | Hypertable or regular PostgreSQL table to get row count for. | -### Sample Usage +### Sample Usage Get the approximate row count for a single hypertable. ```sql @@ -25,4 +25,4 @@ The expected output: approximate_row_count ---------------------- 240000 -``` \ No newline at end of file +``` diff --git a/api/attach_data_node.md b/api/attach_data_node.md index dad45e701e8e..2d8875b94cc3 100644 --- a/api/attach_data_node.md +++ b/api/attach_data_node.md @@ -3,10 +3,10 @@ Attach a data node to a hypertable. The data node should have been previously created using [`add_data_node`](/distributed-hypertables/add_data_node). -When a distributed hypertable is created it will by default use all +When a distributed hypertable is created, by default it uses all available data nodes for the hypertable, but if a data node is added -*after* a hypertable is created, the data node will not automatically -be used by existing distributed hypertables. +*after* a hypertable is created, the data node is not automatically +used by existing distributed hypertables. If you want a hypertable to use a data node that was created later, you must attach the data node to the hypertable using this @@ -23,7 +23,7 @@ function. | Name | Description | |-------------------|-----------------------------------------------| -| `if_not_attached` | Prevents error if the data node is already attached to the hypertable. A notice will be printed that the data node is attached. Defaults to `FALSE`. | +| `if_not_attached` | Prevents error if the data node is already attached to the hypertable. A notice is printed that the data node is attached. Defaults to `FALSE`. | | `repartition` | Change the partitioning configuration so that all the attached data nodes are used. Defaults to `TRUE`. | ### Returns @@ -34,7 +34,7 @@ function. | `node_hypertable_id` | Hypertable id on the remote data node | | `node_name` | Name of the attached data node | -### Sample Usage +### Sample Usage Attach a data node `dn3` to a distributed hypertable `conditions` previously created with @@ -53,4 +53,4 @@ hypertable_id | node_hypertable_id | node_name You must add a data node to your distributed database first with [`add_data_node`](/distributed-hypertables/add_data_node) first before attaching it. - \ No newline at end of file + diff --git a/api/attach_tablespace.md b/api/attach_tablespace.md index 14823a2305aa..129eada91ffb 100644 --- a/api/attach_tablespace.md +++ b/api/attach_tablespace.md @@ -1,4 +1,4 @@ -## attach_tablespace() +## attach_tablespace() Attach a tablespace to a hypertable and use it to store chunks. A [tablespace][postgres-tablespaces] is a directory on the filesystem @@ -10,11 +10,11 @@ there. Please review the standard PostgreSQL documentation for more TimescaleDB can manage a set of tablespaces for each hypertable, automatically spreading chunks across the set of tablespaces attached -to a hypertable. If a hypertable is hash partitioned, TimescaleDB will -try to place chunks that belong to the same partition in the same +to a hypertable. If a hypertable is hash partitioned, TimescaleDB +tries to place chunks that belong to the same partition in the same tablespace. Changing the set of tablespaces attached to a hypertable may also change the placement behavior. A hypertable with no attached -tablespaces will have its chunks placed in the database's default +tablespaces has its chunks placed in the database's default tablespace. ### Required Arguments @@ -29,7 +29,7 @@ being attached to a hypertable. Once created, tablespaces can be attached to multiple hypertables simultaneously to share the underlying disk storage. Associating a regular table with a tablespace using the `TABLESPACE` option to `CREATE TABLE`, prior to calling -`create_hypertable`, will have the same effect as calling +`create_hypertable`, has the same effect as calling `attach_tablespace` immediately following `create_hypertable`. ### Optional Arguments @@ -38,7 +38,7 @@ using the `TABLESPACE` option to `CREATE TABLE`, prior to calling |---|---|---| | `if_not_attached` | BOOLEAN |Set to true to avoid throwing an error if the tablespace is already attached to the table. A notice is issued instead. Defaults to false. | -### Sample Usage +### Sample Usage Attach the tablespace `disk1` to the hypertable `conditions`: diff --git a/api/chunks.md b/api/chunks.md index 9ef2073e28b8..ea9717e41431 100644 --- a/api/chunks.md +++ b/api/chunks.md @@ -1,16 +1,16 @@ -## timescaledb_information.chunks +## timescaledb_information.chunks Get metadata about the chunks of hypertables. This view shows metadata for the chunk's primary time-based dimension. -For information about a hypertable's secondary dimensions, +For information about a hypertable's secondary dimensions, the [dimensions view](/informational-views/dimensions/) should be used instead. If the chunk's primary dimension is of a time datatype, `range_start` and `range_end` are set. Otherwise, if the primary dimension type is integer based, `range_start_integer` and `range_end_integer` are set. -### Available Columns +### Available Columns |Name|Type|Description| |---|---|---| @@ -24,17 +24,17 @@ If the chunk's primary dimension is of a time datatype, `range_start` and | `range_end` | TIMESTAMP WITH TIME ZONE | End of the range for the chunk's dimension | | `range_start_integer` | BIGINT | Start of the range for the chunk's dimension, if the dimension type is integer based | | `range_end_integer` | BIGINT | End of the range for the chunk's dimension, if the dimension type is integer based | -| `is_compressed` | BOOLEAN | Is the data in the chunk compressed?

Note that for distributed hypertables, this is the cached compression status of the chunk on the access node. The cached status on the access node and data node will not be in sync in some scenarios. For example, if a user compresses or decompresses the chunk on the data node instead of the access node, or sets up compression policies directly on data nodes.

Use `chunk_compression_stats()` function to get real-time compression status for distributed chunks.| +| `is_compressed` | BOOLEAN | Is the data in the chunk compressed?

Note that for distributed hypertables, this is the cached compression status of the chunk on the access node. The cached status on the access node and data node is not in sync in some scenarios. For example, if a user compresses or decompresses the chunk on the data node instead of the access node, or sets up compression policies directly on data nodes.

Use `chunk_compression_stats()` function to get real-time compression status for distributed chunks.| | `chunk_tablespace` | TEXT | Tablespace used by the chunk| | `data_nodes` | ARRAY | Nodes on which the chunk is replicated. This is applicable only to chunks for distributed hypertables | -### Sample Usage +### Sample Usage Get information about the chunks of a hypertable. ```sql CREATE TABLESPACE tablespace1 location '/usr/local/pgsql/data1'; - + CREATE TABLE hyper_int (a_col integer, b_col integer, c integer); SELECT table_name from create_hypertable('hyper_int', 'a_col', chunk_time_interval=> 10); CREATE OR REPLACE FUNCTION integer_now_hyper_int() returns int LANGUAGE SQL STABLE as $$ SELECT coalesce(max(a_col), 0) FROM hyper_int $$; @@ -54,13 +54,13 @@ chunk_schema | _timescaledb_internal chunk_name | _hyper_7_10_chunk primary_dimension | a_col primary_dimension_type | integer -range_start | -range_end | +range_start | +range_end | range_start_integer | 0 range_end_integer | 10 is_compressed | f -chunk_tablespace | -data_nodes | +chunk_tablespace | +data_nodes | -[ RECORD 2 ]----------+---------------------- hypertable_schema | public hypertable_name | hyper_int @@ -68,11 +68,11 @@ chunk_schema | _timescaledb_internal chunk_name | _hyper_7_11_chunk primary_dimension | a_col primary_dimension_type | integer -range_start | -range_end | +range_start | +range_end | range_start_integer | 20 range_end_integer | 30 is_compressed | f chunk_tablespace | tablespace1 -data_nodes | +data_nodes | ``` diff --git a/api/compress_chunk.md b/api/compress_chunk.md index 380476b486e9..2cd66ee8df2d 100644 --- a/api/compress_chunk.md +++ b/api/compress_chunk.md @@ -22,7 +22,7 @@ You can get a list of chunks belonging to a hypertable using the |Name|Type|Description| |---|---|---| -| `if_not_compressed` | BOOLEAN | Setting to true will skip chunks that are already compressed, but the operation will still succeed. Defaults to false.| +| `if_not_compressed` | BOOLEAN | Setting to true skips chunks that are already compressed, but the operation still succeeds. Defaults to false.| ### Returns @@ -31,7 +31,7 @@ You can get a list of chunks belonging to a hypertable using the | `compress_chunk` | (REGCLASS) Name of the chunk that was compressed| -### Sample Usage +### Sample Usage Compress a single chunk. ``` sql diff --git a/api/create_distributed_hypertable.md b/api/create_distributed_hypertable.md index 5e020ce48fa0..51ab5d2d4fef 100644 --- a/api/create_distributed_hypertable.md +++ b/api/create_distributed_hypertable.md @@ -23,7 +23,7 @@ when creating distributed hypertables. | `create_default_indexes` | BOOLEAN | Boolean whether to create default indexes on time/partitioning columns. Default is TRUE. | | `if_not_exists` | BOOLEAN | Boolean whether to print warning if table already converted to hypertable or raise exception. Default is FALSE. | | `partitioning_func` | REGCLASS | The function to use for calculating a value's partition.| -| `migrate_data` | BOOLEAN | Set to TRUE to migrate any existing data from the `relation` table to chunks in the new hypertable. A non-empty table will generate an error without this option. Large tables may take significant time to migrate. Default is FALSE. | +| `migrate_data` | BOOLEAN | Set to TRUE to migrate any existing data from the `relation` table to chunks in the new hypertable. A non-empty table generates an error without this option. Large tables may take significant time to migrate. Default is FALSE. | | `time_partitioning_func` | REGCLASS | Function to convert incompatible primary time column values to compatible ones. The function must be `IMMUTABLE`. | | `replication_factor` | INTEGER | The number of data nodes to which the same data is written to. This is done by creating chunk copies on this amount of data nodes. Must be >= 1; default is 1. Read [the best practices](/timescaledb/latest/how-to-guides/hypertables/best-practices/) before changing the default. | | `data_nodes` | ARRAY | The set of data nodes used for the distributed hypertable. If not present, defaults to all data nodes known by the access node (the node on which the distributed hypertable is created). | @@ -39,7 +39,7 @@ when creating distributed hypertables. ### Sample Usage -Create a table `conditions` which will be partitioned across data +Create a table `conditions` which is partitioned across data nodes by the 'location' column. Note that the number of space partitions is automatically equal to the number of data nodes assigned to this hypertable (all configured data nodes in this case, as @@ -59,17 +59,17 @@ SELECT create_distributed_hypertable('conditions', 'time', 'location', **Space partitions:** As opposed to the normal [`create_hypertable` best practices][create-hypertable], space partitions are highly recommended for distributed hypertables. -Incoming data will be divided among data nodes based upon the space +Incoming data is divided among data nodes based upon the space partition (the first one if multiple space partitions have been defined). If there is no space partition, all the data for each time -slice will be written to a single data node. +slice is written to a single data node. **Time intervals:** Follow the same guideline in setting the `chunk_time_interval` as with [`create_hypertable`][create-hypertable], bearing in mind that the calculation needs to be based on the memory capacity of the data nodes. However, one additional thing to consider, assuming space partitioning is being used, is that the -hypertable will be evenly distributed across the data nodes, allowing +hypertable is evenly distributed across the data nodes, allowing a larger time interval. For example, assume you are ingesting 10GB of data per day and you @@ -80,17 +80,17 @@ most recent chunks). If space partitioning is not being used, the `chunk_time_interval` should be the same as the non-distributed case, as all of the incoming -data will be handled by a single node. +data is handled by a single node. **Replication factor:** The hypertable's `replication_factor` defines to how -many data nodes a newly created chunk will be replicated. That is, a chunk -with a `replication_factor` of three will exist on three separate data nodes, -and rows written to that chunk will be inserted (as part of a two-phase +many data nodes a newly created chunk is replicated. That is, a chunk +with a `replication_factor` of three exists on three separate data nodes, +and rows written to that chunk are inserted (as part of a two-phase commit protocol) to all three chunk copies. For chunks replicated more -than once, if a data node fails or is removed, no data will be lost, and writes +than once, if a data node fails or is removed, no data is lost, and writes can continue to succeed on the remaining chunk copies. However, the chunks -present on the lost data node will now be under-replicated. Currently, it is -not possible to restore under-replicated chunks, although this limitation will +present on the lost data node are now under-replicated. Currently, it is +not possible to restore under-replicated chunks, although this limitation might be removed in a future release. To avoid such inconsistency, we do not yet recommend using `replication_factor` > 1, and instead rely on physical replication of each data node if such fault-tolerance is required. diff --git a/api/create_hypertable.md b/api/create_hypertable.md index 6aa132d38e24..c2419697bcfb 100644 --- a/api/create_hypertable.md +++ b/api/create_hypertable.md @@ -1,4 +1,4 @@ -# create_hypertable() +# create_hypertable() Creates a TimescaleDB hypertable from a PostgreSQL table (replacing the latter), partitioned on time and with the option to partition on @@ -31,12 +31,12 @@ still work on the resulting hypertable. | `partitioning_func` | REGCLASS | The function to use for calculating a value's partition.| | `associated_schema_name` | REGCLASS | Name of the schema for internal hypertable tables. Default is "_timescaledb_internal". | | `associated_table_prefix` | TEXT | Prefix for internal hypertable chunk names. Default is "_hyper". | -| `migrate_data` | BOOLEAN | Set to TRUE to migrate any existing data from the `relation` table to chunks in the new hypertable. A non-empty table will generate an error without this option. Large tables may take significant time to migrate. Defaults to FALSE. | +| `migrate_data` | BOOLEAN | Set to TRUE to migrate any existing data from the `relation` table to chunks in the new hypertable. A non-empty table generates an error without this option. Large tables may take significant time to migrate. Defaults to FALSE. | | `time_partitioning_func` | REGCLASS | Function to convert incompatible primary time column values to compatible ones. The function must be `IMMUTABLE`. | -| `replication_factor` | INTEGER | If set to 1 or greater, will create a distributed hypertable. Default is NULL. When creating a distributed hypertable, consider using [`create_distributed_hypertable`](/distributed-hypertables/create_distributed_hypertable) in place of `create_hypertable`. | -| `data_nodes` | ARRAY | This is the set of data nodes that will be used for this table if it is distributed. This has no impact on non-distributed hypertables. If no data nodes are specified, a distributed hypertable will use all data nodes known by this instance. | +| `replication_factor` | INTEGER | If set to 1 or greater, creates a distributed hypertable. Default is NULL. When creating a distributed hypertable, consider using [`create_distributed_hypertable`](/distributed-hypertables/create_distributed_hypertable) in place of `create_hypertable`. | +| `data_nodes` | ARRAY | This is the set of data nodes that are used for this table if it is distributed. This has no impact on non-distributed hypertables. If no data nodes are specified, a distributed hypertable uses all data nodes known by this instance. | -### Returns +### Returns |Column|Type|Description| |---|---|---| @@ -46,32 +46,32 @@ still work on the resulting hypertable. | `created` | BOOLEAN | TRUE if the hypertable was created, FALSE when `if_not_exists` is true and no hypertable was created. | - If you use `SELECT * FROM create_hypertable(...)` you will get the return value formatted as a table with column headings. + If you use `SELECT * FROM create_hypertable(...)` you get the return value formatted as a table with column headings. The use of the `migrate_data` argument to convert a non-empty table can lock the table for a significant amount of time, depending on how much data is -in the table. It can also run into deadlock if foreign key constraints exist to +in the table. It can also run into deadlock if foreign key constraints exist to other tables. -If you would like finer control over index formation and other aspects of your +If you would like finer control over index formation and other aspects of your hypertable, [follow these migration instructions instead](/timescaledb/latest/how-to-guides/migrate-data). -When converting a normal SQL table to a hypertable, pay attention to how you handle -constraints. A hypertable can contain foreign keys to normal SQL table columns, -but the reverse is not allowed. UNIQUE and PRIMARY constraints must include the +When converting a normal SQL table to a hypertable, pay attention to how you handle +constraints. A hypertable can contain foreign keys to normal SQL table columns, +but the reverse is not allowed. UNIQUE and PRIMARY constraints must include the partitioning key. The deadlock is likely to happen when concurrent transactions simultaneously try -to insert data into tables that are referenced in the foreign key constraints +to insert data into tables that are referenced in the foreign key constraints and into the converting table itself. The deadlock can be prevented by manually -obtaining `SHARE ROW EXCLUSIVE` lock on the referenced tables before calling -`create_hypertable` in the same transaction, see +obtaining `SHARE ROW EXCLUSIVE` lock on the referenced tables before calling +`create_hypertable` in the same transaction, see [PostgreSQL documentation](https://www.postgresql.org/docs/current/sql-lock.html) for the syntax. -#### Units +#### Units The 'time' column supports the following data types: @@ -81,13 +81,13 @@ The 'time' column supports the following data types: | DATE | | Integer (SMALLINT, INT, BIGINT) | - The type flexibility of the 'time' column allows the use -of non-time-based values as the primary chunk partitioning column, as long as + The type flexibility of the 'time' column allows the use +of non-time-based values as the primary chunk partitioning column, as long as those values can increment. - For incompatible data types (e.g. `jsonb`) you can -specify a function to the `time_partitioning_func` argument which can extract + For incompatible data types (e.g. `jsonb`) you can +specify a function to the `time_partitioning_func` argument which can extract a compatible data type @@ -121,11 +121,11 @@ the dimension's key space, which is then divided across the partitions. The time column in `create_hypertable` must be defined as `NOT NULL`. If this is not already specified on table creation, - `create_hypertable` will automatically add this constraint on the + `create_hypertable` automatically adds this constraint on the table when it is executed. -### Sample Usage +### Sample Usage Convert table `conditions` to hypertable with just time partitioning on column `time`: ```sql @@ -183,7 +183,7 @@ SELECT create_hypertable('events', 'event', time_partitioning_func => 'event_sta ``` -#### Best Practices +#### Best Practices One of the most common questions users of TimescaleDB have revolves around configuring `chunk_time_interval`. @@ -194,14 +194,14 @@ manually-set intervals, users should specify a `chunk_time_interval` when creating their hypertable (the default value is 1 week). The interval used for new chunks can be changed by calling [`set_chunk_time_interval()`](/hypertable/set_chunk_time_interval). -The key property of choosing the time interval is that the chunk (including indexes) +The key property of choosing the time interval is that the chunk (including indexes) belonging to the most recent interval (or chunks if using space partitions) fit into memory. As such, we typically recommend setting the interval so that these chunk(s) comprise no more than 25% of main memory. -Make sure that you are planning for recent chunks from _all_ active hypertables +Make sure that you are planning for recent chunks from _all_ active hypertables to fit into 25% of main memory, rather than 25% per hypertable. @@ -227,8 +227,8 @@ function. **Space partitions:** In most cases, it is advised for users not to use -space partitions. However, if you create a distributed hypertable, it is -important to create space partitioning, see -[create_distributed_hypertable](/distributed-hypertables/create_distributed_hypertable). +space partitions. However, if you create a distributed hypertable, it is +important to create space partitioning, see +[create_distributed_hypertable](/distributed-hypertables/create_distributed_hypertable). The rare cases in which space partitions may be useful for non-distributed hypertables are described in the [add_dimension](/hypertable/add_dimension) section. diff --git a/api/create_index.md b/api/create_index.md index 16ff3087c3b6..646bdd0d353a 100644 --- a/api/create_index.md +++ b/api/create_index.md @@ -1,4 +1,4 @@ -## CREATE INDEX (Transaction Per Chunk) +## CREATE INDEX (Transaction Per Chunk) ```SQL CREATE INDEX ... WITH (timescaledb.transaction_per_chunk, ...); @@ -14,20 +14,20 @@ if a regular `CREATE INDEX` were called on that chunk, however other chunks are completely un-blocked. - This version of `CREATE INDEX` can be used as an alternative to + This version of `CREATE INDEX` can be used as an alternative to `CREATE INDEX CONCURRENTLY`, which is not currently supported on hypertables. -If the operation fails partway through, indexes may not be created on all +If the operation fails partway through, indexes may not be created on all hypertable chunks. If this occurs, the index on the root table of the hypertable -will be marked as invalid (this can be seen by running `\d+` on the hypertable). -The index will still work, and will be created on new chunks, but if you +is marked as invalid (this can be seen by running `\d+` on the hypertable). +The index still works, and is created on new chunks, but if you wish to ensure _all_ chunks have a copy of the index, drop and recreate it. -### Sample Usage +### Sample Usage Anonymous index ```SQL @@ -37,4 +37,4 @@ Other index methods ```SQL CREATE INDEX ON conditions(time, location) USING brin WITH (timescaledb.transaction_per_chunk); -``` \ No newline at end of file +``` diff --git a/api/create_materialized_view.md b/api/create_materialized_view.md index b35fadf03cc4..3717398fb2da 100644 --- a/api/create_materialized_view.md +++ b/api/create_materialized_view.md @@ -57,7 +57,7 @@ the hypertable's time column, and all aggregates must be parallelizable. #### Notes -- The view will be automatically refreshed (as outlined under +- The view is automatically refreshed (as outlined under [`refresh_continuous_aggregate`](/continuous-aggregates/refresh_continuous_aggregate/)) unless `WITH NO DATA` is given (`WITH DATA` is the default). - The `SELECT` query should be of the form specified in the syntax above, which is discussed in diff --git a/api/decompress_chunk.md b/api/decompress_chunk.md index 3ba7d18df85c..e040d51c8fa9 100644 --- a/api/decompress_chunk.md +++ b/api/decompress_chunk.md @@ -1,13 +1,13 @@ -## decompress_chunk() Community +## decompress_chunk() Community If you need to modify or add data to a chunk that has already been -compressed, you will need to decompress the chunk first. This is especially +compressed, you need to decompress the chunk first. This is especially useful for backfilling old data. Prior to decompressing chunks for the purpose of data backfill or updating you should first stop any compression policy that is active on the hypertable you plan to perform this operation on. Once the update and/or backfill is complete simply turn the policy back on -and the system will recompress your chunks. +and the system recompresses your chunks. ### Required Arguments @@ -19,11 +19,11 @@ and the system will recompress your chunks. |Name|Type|Description| |---|---|---| -| `if_compressed` | BOOLEAN | Setting to true will skip chunks that are not compressed. Defaults to false.| +| `if_compressed` | BOOLEAN | Setting to true skips chunks that are not compressed. Defaults to false.| -### Sample Usage +### Sample Usage Decompress a single chunk ``` sql SELECT decompress_chunk('_timescaledb_internal._hyper_2_2_chunk'); -``` \ No newline at end of file +``` diff --git a/api/delete_data_node.md b/api/delete_data_node.md index c4aaf82b9b81..0871fb0a4ab8 100644 --- a/api/delete_data_node.md +++ b/api/delete_data_node.md @@ -24,7 +24,7 @@ but is no longer synchronized. #### Errors -An error will be generated if the data node cannot be detached from +An error is generated if the data node cannot be detached from all attached hypertables. ### Required Arguments @@ -41,11 +41,11 @@ all attached hypertables. | `force` | BOOLEAN | Force removal of data nodes from hypertables unless that would result in data loss. Defaults to false. | | `repartition` | BOOLEAN | Make the number of space partitions equal to the new number of data nodes (if such partitioning exists). This ensures that the remaining data nodes are used evenly. Defaults to true. | -### Returns +### Returns A boolean indicating if the operation was successful or not. -### Sample Usage +### Sample Usage To delete a data node named `dn1`: ```sql diff --git a/api/delete_job.md b/api/delete_job.md index 8eb64f8721cc..96a1876a669f 100644 --- a/api/delete_job.md +++ b/api/delete_job.md @@ -1,9 +1,9 @@ -## delete_job() Community +## delete_job() Community Delete a job registered with the automation framework. This works for user-defined actions as well as policies. -If the job is currently running, the process will be terminated. +If the job is currently running, the process is terminated. ### Required Arguments @@ -11,7 +11,7 @@ If the job is currently running, the process will be terminated. |---|---|---| |`job_id`| INTEGER | TimescaleDB background job id | -### Sample Usage +### Sample Usage ```sql SELECT delete_job(1000); diff --git a/api/detach_data_node.md b/api/detach_data_node.md index 2fbb2a2475b5..21f636bb1bfe 100644 --- a/api/detach_data_node.md +++ b/api/detach_data_node.md @@ -19,9 +19,9 @@ partition across | Name | Type|Description | |---------------|---|-------------------------------------| -| `hypertable` | REGCLASS | Name of the distributed hypertable where the data node should be detached. If NULL, the data node will be detached from all hypertables. | +| `hypertable` | REGCLASS | Name of the distributed hypertable where the data node should be detached. If NULL, the data node is detached from all hypertables. | | `if_attached` | BOOLEAN | Prevent error if the data node is not attached. Defaults to false. | -| `force` | BOOLEAN | Force detach of the data node even if that means that the replication factor is reduced below what was set. Note that it will never be allowed to reduce the replication factor below 1 since that would cause data loss. | +| `force` | BOOLEAN | Force detach of the data node even if that means that the replication factor is reduced below what was set. Note that it is never allowed to reduce the replication factor below 1 since that would cause data loss. | | `repartition` | BOOLEAN | Make the number of space partitions equal to the new number of data nodes (if such partitioning exists). This ensures that the remaining data nodes are used evenly. Defaults to true. | ### Returns @@ -48,7 +48,7 @@ up with under-replicated chunks. The only safe way to detach a data node is to first safely delete any data on it or replicate it to another data node. -### Sample Usage +### Sample Usage Detach data node `dn3` from `conditions`: diff --git a/api/detach_tablespace.md b/api/detach_tablespace.md index ccc53b7f3bfd..23bd8d317964 100644 --- a/api/detach_tablespace.md +++ b/api/detach_tablespace.md @@ -1,11 +1,11 @@ -## detach_tablespace() +## detach_tablespace() Detach a tablespace from one or more hypertables. This _only_ means -that _new_ chunks will not be placed on the detached tablespace. This +that _new_ chunks are not placed on the detached tablespace. This is useful, for instance, when a tablespace is running low on disk space and one would like to prevent new chunks from being created in the tablespace. The detached tablespace itself and any existing chunks -with data on it will remain unchanged and will continue to work as +with data on it remains unchanged and continue to work as before, including being available for queries. Note that newly inserted data rows may still be inserted into an existing chunk on the detached tablespace since existing data is not cleared from a detached @@ -19,7 +19,7 @@ again be considered for chunk placement. | `tablespace` | TEXT | Tablespace to detach.| When giving only the tablespace name as argument, the given tablespace -will be detached from all hypertables that the current role has the +is detached from all hypertables that the current role has the appropriate permissions for. Therefore, without proper permissions, the tablespace may still receive new chunks after this command is issued. @@ -33,11 +33,11 @@ is issued. | `if_attached` | BOOLEAN | Set to true to avoid throwing an error if the tablespace is not attached to the given table. A notice is issued instead. Defaults to false. | -When specifying a specific hypertable, the tablespace will only be +When specifying a specific hypertable, the tablespace is only detached from the given hypertable and thus may remain attached to other hypertables. -### Sample Usage +### Sample Usage Detach the tablespace `disk1` from the hypertable `conditions`: @@ -51,4 +51,4 @@ user has permissions for: ```sql SELECT detach_tablespace('disk1'); -``` \ No newline at end of file +``` diff --git a/api/detach_tablespaces.md b/api/detach_tablespaces.md index 7cf9d1f17df9..8f495539e4fc 100644 --- a/api/detach_tablespaces.md +++ b/api/detach_tablespaces.md @@ -1,8 +1,8 @@ -## detach_tablespaces() +## detach_tablespaces() Detach all tablespaces from a hypertable. After issuing this command -on a hypertable, it will no longer have any tablespaces attached to -it. New chunks will instead be placed in the database's default +on a hypertable, it no longer has any tablespaces attached to +it. New chunks are instead placed in the database's default tablespace. ### Required Arguments @@ -11,10 +11,10 @@ tablespace. |---|---|---| | `hypertable` | REGCLASS | Hypertable to detach a the tablespace from.| -### Sample Usage +### Sample Usage Detach all tablespaces from the hypertable `conditions`: ```sql SELECT detach_tablespaces('conditions'); -``` \ No newline at end of file +``` diff --git a/api/dimensions.md b/api/dimensions.md index 20fb9674d264..0d621f96eeea 100644 --- a/api/dimensions.md +++ b/api/dimensions.md @@ -1,22 +1,22 @@ -## timescaledb_information.dimensions +## timescaledb_information.dimensions -Get metadata about the dimensions of hypertables, returning one row of metadata -for each dimension of a hypertable. For a time-and-space-partitioned -hypertable, for example, two rows of metadata will be returned for the +Get metadata about the dimensions of hypertables, returning one row of metadata +for each dimension of a hypertable. For a time-and-space-partitioned +hypertable, for example, two rows of metadata are returned for the hypertable. -A time-based dimension column has either an integer datatype -(bigint, integer, smallint) or a time related datatype +A time-based dimension column has either an integer datatype +(bigint, integer, smallint) or a time related datatype (timestamptz, timestamp, date). The `time_interval` column is defined for hypertables that use time datatypes. Alternatively, for hypertables that use integer datatypes, the `integer_interval` and `integer_now_func` columns are defined. -For space based dimensions, metadata is returned that specifies their number -of `num_partitions`. The `time_interval` and `integer_interval` columns are +For space based dimensions, metadata is returned that specifies their number +of `num_partitions`. The `time_interval` and `integer_interval` columns are not applicable for space based dimensions. - -### Available Columns + +### Available Columns |Name|Type|Description| |---|---|---| @@ -31,7 +31,7 @@ not applicable for space based dimensions. | `integer_now_func` | TEXT | integer_now function for primary dimension if the column type is integer based datatype| | `num_partitions` | SMALLINT | Number of partitions for the dimension | -### Sample Usage +### Sample Usage Get information about the dimensions of hypertables. @@ -51,9 +51,9 @@ column_name | time column_type | timestamp with time zone dimension_type | Time time_interval | 7 days -integer_interval | -integer_now_func | -num_partitions | +integer_interval | +integer_now_func | +num_partitions | -[ RECORD 2 ]-----+------------------------- hypertable_schema | public hypertable_name | dist_table @@ -61,9 +61,9 @@ dimension_number | 2 column_name | device column_type | integer dimension_type | Space -time_interval | -integer_interval | -integer_now_func | +time_interval | +integer_interval | +integer_now_func | num_partitions | 2 ``` @@ -83,9 +83,9 @@ column_name | a_col column_type | date dimension_type | Time time_interval | 7 days -integer_interval | -integer_now_func | -num_partitions | +integer_interval | +integer_now_func | +num_partitions | -[ RECORD 2 ]-----+---------------------------- hypertable_schema | public hypertable_name | hyper_2dim @@ -94,7 +94,7 @@ column_name | b_col column_type | timestamp without time zone dimension_type | Time time_interval | 7 days -integer_interval | -integer_now_func | -num_partitions | -``` \ No newline at end of file +integer_interval | +integer_now_func | +num_partitions | +``` diff --git a/api/distributed-hypertables.md b/api/distributed-hypertables.md index 93da9bc31c42..97c92d9c037a 100644 --- a/api/distributed-hypertables.md +++ b/api/distributed-hypertables.md @@ -1,13 +1,13 @@ # Distributed Hypertables Community Distributed hypertables are an extention of regular hypertables, available when -using a [multi-node installation][getting-started-multi-node] of TimescaleDB. -Distributed hypertables provide the ability to store data chunks across multiple +using a [multi-node installation][getting-started-multi-node] of TimescaleDB. +Distributed hypertables provide the ability to store data chunks across multiple data nodes for better scale-out performance. Most management APIs used with regular hypertable chunks also work with distributed -hypertables as documented in this section. You will also find a number of new APIs -specifically dealing with data nodes and a special API for executing SQL commands +hypertables as documented in this section. There are a number of new APIs for +specifically dealing with data nodes and a special API for executing SQL commands on data nodes. diff --git a/api/distributed_exec.md b/api/distributed_exec.md index 9a0e03429811..617ac643c98a 100644 --- a/api/distributed_exec.md +++ b/api/distributed_exec.md @@ -6,9 +6,9 @@ case is to create the roles and permissions needed in a distributed database. The procedure can run distributed commands transactionally, so a command -is executed either everywhere or nowhere. However, not all SQL commands can run in a -transaction. This can be toggled with the argument `transactional`. Note if the execution -is not transactional, a failure on one of the data node will require manual dealing with +is executed either everywhere or nowhere. However, not all SQL commands can run in a +transaction. This can be toggled with the argument `transactional`. Note if the execution +is not transactional, a failure on one of the data node requires manual dealing with any introduced inconsistency. Note that the command is _not_ executed on the access node itself and @@ -27,7 +27,7 @@ it is not possible to chain multiple commands together in one call. | `node_list` | ARRAY | An array of data nodes where the command should be executed. Defaults to all data nodes if not specified. | | `transactional` | BOOLEAN | Allows to specify if the execution of the statement should be transactional or not. Defaults to TRUE. | -### Sample Usage +### Sample Usage Create the role `testrole` across all data nodes in a distributed database: @@ -47,4 +47,4 @@ Create new databases `dist_database` on data nodes, which requires to set `trans ```sql CALL distributed_exec('CREATE DATABASE dist_database', transactional => FALSE); -``` \ No newline at end of file +``` diff --git a/api/drop_chunks.md b/api/drop_chunks.md index 197dba08263d..11c3edee22e9 100644 --- a/api/drop_chunks.md +++ b/api/drop_chunks.md @@ -1,4 +1,4 @@ -## drop_chunks() +## drop_chunks() Removes data chunks whose time range falls completely before (or after) a specified time. Shows a list of the chunks that were @@ -26,12 +26,12 @@ specified one. |Name|Type|Description| |---|---|---| | `newer_than` | INTERVAL | Specification of cut-off point where any full chunks newer than this timestamp should be removed. | -| `verbose` | BOOLEAN | Setting to true will display messages about the progress of the reorder command. Defaults to false.| +| `verbose` | BOOLEAN | Setting to true displays messages about the progress of the reorder command. Defaults to false.| The `older_than` and `newer_than` parameters can be specified in two ways: - **interval type:** The cut-off point is computed as `now() - - older_than` and similarly `now() - newer_than`. An error will be + older_than` and similarly `now() - newer_than`. An error is returned if an INTERVAL is supplied and the time column is not one of a `TIMESTAMP`, `TIMESTAMPTZ`, or `DATE`. @@ -48,12 +48,12 @@ in the future (i.e., erroneous entries), use a timestamp. When both arguments are used, the function returns the intersection of the resulting two ranges. For example, -specifying `newer_than => 4 months` and `older_than => 3 months` will drop all full chunks that are between 3 and -4 months old. Similarly, specifying `newer_than => '2017-01-01'` and `older_than => '2017-02-01'` will drop +specifying `newer_than => 4 months` and `older_than => 3 months` drops all full chunks that are between 3 and +4 months old. Similarly, specifying `newer_than => '2017-01-01'` and `older_than => '2017-02-01'` drops all full chunks between '2017-01-01' and '2017-02-01'. Specifying parameters that do not result in an overlapping -intersection between two ranges will result in an error. +intersection between two ranges results in an error. -### Sample Usage +### Sample Usage Drop all chunks from hypertable `conditions` older than 3 months: ```sql diff --git a/api/error.md b/api/error.md index 4aca912df381..56d7fe52099f 100644 --- a/api/error.md +++ b/api/error.md @@ -4,7 +4,7 @@ error(sketch UddSketch) RETURNS DOUBLE PRECISION ``` -This returns the maximum relative error that a percentile estimate will have +This returns the maximum relative error that a percentile estimate has relative to the correct value. This means the actual value falls in the range defined by `approx_percentile(sketch) +/- approx_percentile(sketch)*error(sketch)`. diff --git a/api/first.md b/api/first.md index 981b7fa2e5c0..14f8d5c18903 100644 --- a/api/first.md +++ b/api/first.md @@ -1,7 +1,7 @@ ## first() The `first` aggregate allows you to get the value of one column -as ordered by another. For example, `first(temperature, time)` will return the +as ordered by another. For example, `first(temperature, time)` returns the earliest temperature value based on time within an aggregate group. ### Required Arguments @@ -25,5 +25,5 @@ GROUP BY device_id; perform a sequential scan through their groups. They are primarily used for ordered selection within a `GROUP BY` aggregate, and not as an alternative to an `ORDER BY time DESC LIMIT 1` clause to find the - latest value (which will use indexes). + latest value (which uses indexes). diff --git a/api/index.md b/api/index.md index 71346661c38c..793e63e791aa 100644 --- a/api/index.md +++ b/api/index.md @@ -4,7 +4,7 @@ TimescaleDB is an open-source relational database for time-series data. We focus high volume time-series data fast and efficient on top of the solid foundation of PostgreSQL. -To manage the various aspects of TimescaleDB, you will need to become familiar +To manage the various aspects of TimescaleDB, you need to become familiar with the special SQL functions and VIEWs that we provide, as documented within the API reference. @@ -13,7 +13,7 @@ and our comprehensive documentation to help you become more familiar with what makes TimescaleDB tick and how you can be empowered to find new insights in your time-series data. To dig in further, consider reading: - * [Core Concepts][core-concepts]: This section describes the architecture of + * [Core Concepts][core-concepts]: This section describes the architecture of TimescaleDB and outlines the details of each major feature area. * [How-to Guides][how-to-guides]: A broad set of how-to guides grouped by TimescaleDB features aimed at helping you accomplish a specific task. diff --git a/api/jobs.md b/api/jobs.md index 093f4295938c..1b2d707cf55d 100644 --- a/api/jobs.md +++ b/api/jobs.md @@ -1,16 +1,16 @@ -## timescaledb_information.jobs -Shows information about all jobs registered with the automation framework. +## timescaledb_information.jobs +Shows information about all jobs registered with the automation framework. -### Available Columns +### Available Columns |Name|Type|Description| |---|---|---| |`job_id` | INTEGER | The id of the background job | |`application_name` | TEXT | Name of the policy or user defined action | |`schedule_interval` | INTERVAL | The interval at which the job runs | -|`max_runtime` | INTERVAL | The maximum amount of time the job will be allowed to run by the background worker scheduler before it is stopped | -|`max_retries` | INTEGER | The number of times the job will be retried should it fail | -|`retry_period` | INTERVAL | The amount of time the scheduler will wait between retries of the job on failure | +|`max_runtime` | INTERVAL | The maximum amount of time the job is allowed to run by the background worker scheduler before it is stopped | +|`max_retries` | INTEGER | The number of times the job is retried should it fail | +|`retry_period` | INTERVAL | The amount of time the scheduler waits between retries of the job on failure | |`proc_schema` | TEXT | Schema name of the function or procedure executed by the job | |`proc_name` | TEXT | Name of the function or procedure executed by the job | |`owner` | TEXT | Owner of the job | @@ -20,7 +20,7 @@ Shows information about all jobs registered with the automation framework. |`hypertable_schema` | TEXT | Schema name of the hypertable. NULL, if this is a user defined action.| |`hypertable_name` | TEXT | Table name of the hypertable. NULL, if this is a user defined action. | -### Sample Usage +### Sample Usage Get information about jobs. ```sql @@ -37,7 +37,7 @@ proc_schema | _timescaledb_internal proc_name | policy_refresh_continuous_aggregate owner | postgres scheduled | t -config | {"start_offset": "20 days", "end_offset": "10 +config | {"start_offset": "20 days", "end_offset": "10 days", "mat_hypertable_id": 2} next_start | 2020-10-02 12:38:07.014042-04 hypertable_schema | _timescaledb_internal @@ -84,8 +84,8 @@ owner | postgres scheduled | t config | {"type": "function"} next_start | 2020-10-02 14:45:33.339885-04 -hypertable_schema | -hypertable_name | +hypertable_schema | +hypertable_name | -[ RECORD 2 ]-----+------------------------------ job_id | 1004 application_name | User-Defined Action [1004] @@ -99,6 +99,6 @@ owner | postgres scheduled | t config | {"type": "function"} next_start | 2020-10-02 14:45:33.353733-04 -hypertable_schema | -hypertable_name | -``` \ No newline at end of file +hypertable_schema | +hypertable_name | +``` diff --git a/api/last.md b/api/last.md index fa449f53e691..1e6790e05922 100644 --- a/api/last.md +++ b/api/last.md @@ -1,7 +1,7 @@ ## last() The `last` aggregate allows you to get the value of one column -as ordered by another. For example, `last(temperature, time)` will return the +as ordered by another. For example, `last(temperature, time)` returns the latest temperature value based on time within an aggregate group. ### Required Arguments @@ -28,5 +28,5 @@ ORDER BY interval DESC; perform a sequential scan through their groups. They are primarily used for ordered selection within a `GROUP BY` aggregate, and not as an alternative to an `ORDER BY time DESC LIMIT 1` clause to find the - latest value (which will use indexes). + latest value (which uses indexes). diff --git a/api/move_chunk.md b/api/move_chunk.md index 876c9cc6c0cb..e6c073b3977d 100644 --- a/api/move_chunk.md +++ b/api/move_chunk.md @@ -24,7 +24,7 @@ Tiering][using-data-tiering] documentation. |Name|Type|Description| |-|-|-| |`reorder_index`|REGCLASS|The name of the index (on either the hypertable or chunk) to order by| -|`verbose`|BOOLEAN|Setting to true will display messages about the progress of the move_chunk command. Defaults to false.| +|`verbose`|BOOLEAN|Setting to true displays messages about the progress of the move_chunk command. Defaults to false.| ### Sample usage diff --git a/api/refresh_continuous_aggregate.md b/api/refresh_continuous_aggregate.md index d16d98edfab1..67dfa2f1deb3 100644 --- a/api/refresh_continuous_aggregate.md +++ b/api/refresh_continuous_aggregate.md @@ -7,10 +7,10 @@ A continuous aggregate materializes aggregates in time buckets (e.g., min, max, average over 1 day worth of data), as determined by the `time_bucket` interval specified when the continuous aggregate was created. Therefore, when refreshing the continuous aggregate, only -buckets that completely fit within the refresh window will be +buckets that completely fit within the refresh window are refreshed. In other words, it is not possible to compute the aggregate -over, for example, half a bucket. Therefore, any buckets that do no -fit within the given refresh window will be excluded. +over, for example, half a bucket. Therefore, any buckets that do not +fit within the given refresh window are excluded. The function expects the window parameter values to have a time type that is compatible with the continuous aggregate's time bucket diff --git a/api/remove_compression_policy.md b/api/remove_compression_policy.md index e34bb3afbf24..029dbbe2fe25 100644 --- a/api/remove_compression_policy.md +++ b/api/remove_compression_policy.md @@ -1,5 +1,5 @@ -## remove_compression_policy() -If you need to remove the compression policy. To re-start policy-based compression again you will need to re-add the policy. +## remove_compression_policy() +If you need to remove the compression policy. To re-start policy-based compression again you need to re-add the policy. ### Required Arguments @@ -11,10 +11,10 @@ If you need to remove the compression policy. To re-start policy-based compressi |Name|Type|Description| |---|---|---| -| `if_exists` | BOOLEAN | Setting to true will cause the command to fail with a notice instead of an error if a compression policy does not exist on the hypertable. Defaults to false.| +| `if_exists` | BOOLEAN | Setting to true causes the command to fail with a notice instead of an error if a compression policy does not exist on the hypertable. Defaults to false.| -### Sample Usage +### Sample Usage Remove the compression policy from the 'cpu' table: ``` sql SELECT remove_compression_policy('cpu'); -``` \ No newline at end of file +``` diff --git a/api/reorder_chunk.md b/api/reorder_chunk.md index 3e4c327eaebe..3846abdaeab0 100644 --- a/api/reorder_chunk.md +++ b/api/reorder_chunk.md @@ -1,4 +1,4 @@ -## reorder_chunk() Community +## reorder_chunk() Community Reorder a single chunk's heap to follow the order of an index. This function acts similarly to the [PostgreSQL CLUSTER command][postgres-cluster] , however @@ -28,14 +28,14 @@ using [add_reorder_policy](/hypertable/add_reorder_policy/) is often much more c |Name|Type|Description| |---|---|---| | `index` | REGCLASS | The name of the index (on either the hypertable or chunk) to order by.| -| `verbose` | BOOLEAN | Setting to true will display messages about the progress of the reorder command. Defaults to false.| +| `verbose` | BOOLEAN | Setting to true displays messages about the progress of the reorder command. Defaults to false.| -### Returns +### Returns This function returns void. -### Sample Usage +### Sample Usage ```sql SELECT reorder_chunk('_timescaledb_internal._hyper_1_10_chunk', 'conditions_device_id_time_idx'); diff --git a/api/rollup-stats.md b/api/rollup-stats.md index 8332d0b3506b..a2d333fa6132 100644 --- a/api/rollup-stats.md +++ b/api/rollup-stats.md @@ -11,14 +11,14 @@ rollup( ) RETURNS StatsSummary2D ``` -This combines multiple outputs from the [`stats_agg()` function][stats_agg] function, -it works with both one and two dimensional statistical aggregates. -This is especially useful for re-aggregation in a continuous aggregate. -For example, bucketing by a larger[`time_bucket()`][time_bucket], +This combines multiple outputs from the [`stats_agg()` function][stats_agg] function, +it works with both one and two dimensional statistical aggregates. +This is especially useful for re-aggregation in a continuous aggregate. +For example, bucketing by a larger[`time_bucket()`][time_bucket], or re-grouping on other dimensions included in an aggregation. For use in [window function][postgres-window-functions] see the [`rolling`][rolling-stats]. -`rollup` will work in window function contexts, but `rolling` can be more efficient. +`rollup` works in window function contexts, but `rolling` can be more efficient. For more information about statistical aggregation functions, see the [hyperfunctions documentation][hyperfunctions-stats-aggs]. @@ -59,4 +59,4 @@ GROUP BY 1; [hyperfunctions-stats-aggs]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/ [time_bucket]: /hyperfunctions/time_bucket/ [postgres-window-functions]: https://www.postgresql.org/docs/current/tutorial-window.html -[rolling-stats]: /hyperfunctions/stats_aggs/rolling-stats/ \ No newline at end of file +[rolling-stats]: /hyperfunctions/stats_aggs/rolling-stats/ diff --git a/api/set_number_partitions.md b/api/set_number_partitions.md index e290367b6792..62fbe4a618ba 100644 --- a/api/set_number_partitions.md +++ b/api/set_number_partitions.md @@ -17,7 +17,7 @@ hypertable. The new partitioning only affects new chunks. | `dimension_name` | REGCLASS | The name of the space dimension to set the number of partitions for. | The `dimension_name` needs to be explicitly specified only if the -hypertable has more than one space dimension. An error will be thrown +hypertable has more than one space dimension. An error is thrown otherwise. ### Sample Usage diff --git a/api/set_replication_factor.md b/api/set_replication_factor.md index f0cffd42802a..aeb616b36057 100644 --- a/api/set_replication_factor.md +++ b/api/set_replication_factor.md @@ -1,13 +1,13 @@ ## set_replication_factor() Community Sets the replication factor of a distributed hypertable to the given value. Changing the replication factor does not affect the number of replicas for existing chunks. -Chunks created after changing the replication factor will be replicated +Chunks created after changing the replication factor are replicated in accordance with new value of the replication factor. If the replication factor cannot be satisfied, since the amount of attached data nodes is less than new replication factor, the command aborts with an error. If existing chunks have less replicas than new value of the replication factor, -the function will print a warning. +the function prints a warning. ### Required Arguments @@ -18,7 +18,7 @@ the function will print a warning. #### Errors -An error will be given if: +An error is given if: - `hypertable` is not a distributed hypertable. - `replication_factor` is less than `1`, which cannot be set on a distributed hypertable. - `replication_factor` is bigger than the number of attached data nodes. @@ -26,7 +26,7 @@ An error will be given if: If a bigger replication factor is desired, it is necessary to attach more data nodes by using [attach_data_node](/distributed-hypertables/attach_data_node). -### Sample Usage +### Sample Usage Update the replication factor for a distributed hypertable to `2`: ```sql @@ -45,4 +45,4 @@ SELECT set_replication_factor('conditions', 3); ERROR: too big replication factor for hypertable "conditions" DETAIL: The hypertable has 2 data nodes attached, while the replication factor is 3. HINT: Decrease the replication factor or attach more data nodes to the hypertable. -``` \ No newline at end of file +``` diff --git a/api/show_chunks.md b/api/show_chunks.md index 113d37a44f9a..63bad360818e 100644 --- a/api/show_chunks.md +++ b/api/show_chunks.md @@ -1,7 +1,7 @@ -## show_chunks() +## show_chunks() Get list of chunks associated with a hypertable. -Function accepts the following required and optional arguments. These arguments +Function accepts the following required and optional arguments. These arguments have the same semantics as the `drop_chunks` [function](/hypertable/drop_chunks). ### Required Arguments @@ -21,7 +21,7 @@ have the same semantics as the `drop_chunks` [function](/hypertable/drop_chunks) The `older_than` and `newer_than` parameters can be specified in two ways: - **interval type:** The cut-off point is computed as `now() - - older_than` and similarly `now() - newer_than`. An error will be returned if an INTERVAL is supplied + older_than` and similarly `now() - newer_than`. An error is returned if an INTERVAL is supplied and the time column is not one of a TIMESTAMP, TIMESTAMPTZ, or DATE. @@ -30,12 +30,12 @@ The `older_than` and `newer_than` parameters can be specified in two ways: SMALLINT / INT / BIGINT. The choice of timestamp or integer must follow the type of the hypertable's time column. When both arguments are used, the function returns the intersection of the resulting two ranges. For example, -specifying `newer_than => 4 months` and `older_than => 3 months` will show all full chunks that are between 3 and -4 months old. Similarly, specifying `newer_than => '2017-01-01'` and `older_than => '2017-02-01'` will show +specifying `newer_than => 4 months` and `older_than => 3 months` shows all full chunks that are between 3 and +4 months old. Similarly, specifying `newer_than => '2017-01-01'` and `older_than => '2017-02-01'` shows all full chunks between '2017-01-01' and '2017-02-01'. Specifying parameters that do not result in an overlapping -intersection between two ranges will result in an error. +intersection between two ranges results in an error. -### Sample Usage +### Sample Usage Get list of all chunks associated with a table: ```sql diff --git a/api/time-weighted-averages.md b/api/time-weighted-averages.md index 55cb8abb5180..a4851f18a019 100644 --- a/api/time-weighted-averages.md +++ b/api/time-weighted-averages.md @@ -1,7 +1,7 @@ # Time-weighted average functions Toolkit This section contains functions related to time-weighted averages. Time weighted averages are commonly used in cases where a time series is not evenly sampled, -so a traditional average will give misleading results. For more information +so a traditional average gives misleading results. For more information about time-weighted average functions, see the [hyperfunctions documentation][hyperfunctions-time-weight-average]. diff --git a/api/time_weight.md b/api/time_weight.md index 77ef614f9703..cc5c9b6c4a9e 100644 --- a/api/time_weight.md +++ b/api/time_weight.md @@ -27,7 +27,7 @@ For more information about time-weighted average functions, see the Note that `ts` and `value` can be `null`, however the aggregate is not evaluated -on `null` values and will return `null`, but it will not error on `null` inputs. +on `null` values and returns `null`, but does not error on `null` inputs. ### Returns @@ -53,7 +53,7 @@ FROM t; ``` ## Advanced usage notes -Most cases will work out of the box, but for power users, or those who want to +Most cases work out of the box, but for power users, or those who want to dive deeper, we've included a bit more context below. ### Interpolation methods details diff --git a/api/uddsketch.md b/api/uddsketch.md index 312b584c2752..c3a9b4dab726 100644 --- a/api/uddsketch.md +++ b/api/uddsketch.md @@ -32,7 +32,7 @@ later with these functions and using the `UddSketch` data as input. ## Required arguments |Name| Type |Description| |-|-|-| -|`size`|`INTEGER`|Maximum number of buckets in the sketch. Providing a larger value here will make it more likely that the aggregate will able to maintain the desired error, though will potentially increase the memory usage.| +|`size`|`INTEGER`|Maximum number of buckets in the sketch. Providing a larger value here makes it more likely that the aggregate is able to maintain the desired error, but potentially increases the memory usage.| |`max_error`|`DOUBLE PRECISION`|This is the starting maximum relative error of the sketch, as a multiple of the actual value. The true error may exceed this if too few buckets are provided for the data distribution.| |`value`|`DOUBLE PRECISION`|Column to aggregate| diff --git a/api/x_intercept.md b/api/x_intercept.md index 4bfb6d704787..ef91bfd36470 100644 --- a/api/x_intercept.md +++ b/api/x_intercept.md @@ -6,8 +6,8 @@ x_intercept( ) RETURNS DOUBLE PRECISION ``` -The x intercept of the [least squares fit][least-squares] line computed -from a two-dimensional statistical aggregate. +The x intercept of the [least squares fit][least-squares] line computed +from a two-dimensional statistical aggregate. For more information about statistical aggregate functions, see the [hyperfunctions documentation][hyperfunctions-stats-agg]. @@ -39,4 +39,4 @@ GROUP BY id, time_bucket('15 min'::interval, ts) [hyperfunctions-stats-agg]: timescaledb/:currentVersion:/how-to-guides/hyperfunctions/stats-aggs/ [stats-agg]:/hyperfunctions/stats_aggs/stats_agg/ -[least-squares]:https://en.wikipedia.org/wiki/Least_squares \ No newline at end of file +[least-squares]:https://en.wikipedia.org/wiki/Least_squares diff --git a/cloud/create-a-service.md b/cloud/create-a-service.md index 938718b0bfa1..cd6d1359fd06 100644 --- a/cloud/create-a-service.md +++ b/cloud/create-a-service.md @@ -77,9 +77,9 @@ Read about TimescaleDB features in our documentation: ### Keep testing during your free trial You're now on your way to a great start with Timescale! -You will have an unthrottled, 30-day free trial with Timescale Cloud to +You have an unthrottled, 30-day free trial with Timescale Cloud to continue to test your use case. Before the end of your trial, we encourage you -to add your credit card information. This will ensure a smooth transition after +to add your credit card information. This ensures a smooth transition after your trial period concludes. ### Summary diff --git a/cloud/customize-configuration.md b/cloud/customize-configuration.md index e6026ce27d5c..c8b1f193d8e2 100644 --- a/cloud/customize-configuration.md +++ b/cloud/customize-configuration.md @@ -14,7 +14,7 @@ and flexibility you need when running your workloads in our hosted environment. Modifications of most parameters can be applied without restarting the Timescale Cloud Service. However, as when modifying the compute resources -of a running Service, some settings will require that a restart be performed, +of a running Service, some settings require a restart, resulting in some brief downtime (usually about 30 seconds). @@ -30,7 +30,7 @@ Under the Settings tab, you can modify a limited set of the parameters that are most often modified in a TimescaleDB or PostgreSQL instance. To modify a configured value, click the value that you would like to change. This reveals an editable field to apply your change. Clicking anywhere outside of that field -will save the value to be applied. +saves the value to be applied. Change Timescale Cloud configuration parameters diff --git a/cloud/scaling-a-service.md b/cloud/scaling-a-service.md index fd36631cf730..d7a67a7995ec 100644 --- a/cloud/scaling-a-service.md +++ b/cloud/scaling-a-service.md @@ -5,7 +5,7 @@ at any time. This is extremely useful when users have a need to increase storage to do for any service. Before you modify the compute or storage settings for a Cloud Service, please -note the following limitations and when a change to these settings will result +note the following limitations and when a change to these settings results in momentary downtime. **Storage**: Storage changes are applied with no downtime, typically available @@ -18,7 +18,7 @@ within a few seconds. Other things to note about storage changes: decreases) can be applied at any time, however, please note the following: * **_There is momentary downtime_** while the compute settings are applied. In most cases, this downtime is less than 30 seconds. -* Because there will be an interruption to your service, you should plan +* Because this requires an interruption to your service, you should plan accordingly to have the settings applied at an appropriate service window. ## View service operation details @@ -84,7 +84,7 @@ plan for this before you begin! 1. In the `Increase disk size` field, adjust the slider to the new disk size. 1. Review the new allocations and costs in the comparison chart. 1. Click `Apply` to save your changes. If you have changed the CPU and memory - allocation, your service will go down briefly while the changes are applied. + allocation, your service goes down briefly while the changes are applied. Configure resource allocations @@ -119,4 +119,4 @@ to 10 TB in size. available within a few seconds. Configure autoscaling disk size - \ No newline at end of file + diff --git a/cloud/vpc-peering-aws/create.md b/cloud/vpc-peering-aws/create.md index cf92074d526a..572f4b6c5203 100644 --- a/cloud/vpc-peering-aws/create.md +++ b/cloud/vpc-peering-aws/create.md @@ -14,15 +14,14 @@ The VPCs created here are peered with your own VPC as part of the setup process. Click `Create VPC`, type a name for your new VPC, and provide an IPv4 CIDR block (E.G., `10.0.0.0/16` or `192.168.0.0/24`). Make sure that the CIDR block you choose for your Timescale Cloud VPC does not overlap with the AWS VPC you are using to create -a peering connection. If the CIDR blocks overlap, the peering process will fail. +a peering connection. If the CIDR blocks overlap, the peering process fails. You can always find the CIDR block of your AWS VPC from the AWS console. Create a new Timescale Cloud VPC -VPC peering can be enabled for free during your Timescale Cloud trial, but you will be -required to enter a valid payment method in order to create a VPC (even though you -will not yet be charged for it). +VPC peering can be enabled for free during your Timescale Cloud trial, but you are required to enter a valid payment method in order to create a VPC (even though you +are not yet charged for it). ## Create a peering connection @@ -52,7 +51,7 @@ Make note of the peering connection ID (starting with `pcx-`) as it is used in t Peering information in AWS ## Network routing and security in AWS -Once you have accepted the peering connection, the two VPCs will now be peered; +Once you have accepted the peering connection, the two VPCs are peered; however, in order to use this peering connection, you need to update your VPC's route table to include the CIDR block of your peered Timescale Cloud VPC, and you also need to update your VPC's security groups. @@ -85,7 +84,7 @@ dashboard. From this view, click `Create security group` to create a new securit If you need to, you can use another security group which already exists in your VPC, -however, for simplicity we will assume the creation of a new security group. +however, for simplicity we assume the creation of a new security group. The AWS Security Groups dashboard @@ -97,18 +96,18 @@ which has been peered with your Cloud VPC. No inbound rules are required, so leave the inbound rules section empty. In the outbound rules section, select `Custom TCP` for the rule type. The protocol -should remain as TCP. The port range should be `5432`, which is the port which will -be used to connect to your Timescale Cloud services. The Destination should be set +should remain as TCP. The port range should be `5432`, which is the port which is +used to connect to your Timescale Cloud services. The Destination should be set to `Custom` and the value should be the CIDR block of your Cloud VPC. AWS may pre-populate the `Destination` column with the value `0.0.0.0/0`. Though this -value will certainly work, it is more "open" than needed, and should be deleted. +value certainly works, it is more "open" than needed, and should be deleted. -Finally, click `Create security group`. With this step, you will now be able to +Finally, click `Create security group`. With this step, you are now able to connect to any of your Timescale Cloud services attached to your peered VPC. In the next -section, you will learn how to create a Timescale Cloud service with a VPC attachment. +section, you learn how to create a Timescale Cloud service with a VPC attachment. ## Create a service with VPC attachment In the Timescale Cloud console, navigate to the @@ -119,8 +118,8 @@ Expand the dropdown menu under the `Select a VPC` step and select the VPC you cr previously. If you have multiple VPCs, select the VPC which you want your new service to be attached to. -Click `Create Service`, and Timescale Cloud will create your new service. Due to -selecting a VPC during setup, your new service will be created with an attachment to +Click `Create Service`, and Timescale Cloud creates your new service. Due to +selecting a VPC during setup, your new service is created with an attachment to your selected VPC. Create new service with VPC attachment diff --git a/cloud/vpc-peering-aws/index.md b/cloud/vpc-peering-aws/index.md index cc19bc24a515..8fdf86f12307 100644 --- a/cloud/vpc-peering-aws/index.md +++ b/cloud/vpc-peering-aws/index.md @@ -16,7 +16,7 @@ services to and from VPCs, and creating new services with VPC peering attachment To use Timescale Cloud VPC peering, you need your own cloud VPC, where your applications and infrastructure are already running. -If you do not have administrative access to your cloud provider account, you will need +If you do not have administrative access to your cloud provider account, you need to work with someone from your team with sufficient permissions to: - accept VPC peering requests, @@ -29,7 +29,7 @@ you may contact support to request a quota increase. -Once you have attached your Timescale Cloud service to a VPC, it will no longer be accessible -via the public internet. It will only be accessible via your AWS VPC which has been peered +Once you have attached your Timescale Cloud service to a VPC, it is no longer accessible +via the public internet. It is only accessible via your AWS VPC which has been peered with your Timescale Cloud VPC. - \ No newline at end of file + diff --git a/cloud/vpc-peering-aws/migrate.md b/cloud/vpc-peering-aws/migrate.md index b1da7a1ab82d..d02a1c6f8ce2 100644 --- a/cloud/vpc-peering-aws/migrate.md +++ b/cloud/vpc-peering-aws/migrate.md @@ -3,11 +3,11 @@ Timescale Cloud services may be migrated between VPCs within a Cloud project, and may also be migrated to and from the public network. Typically, once you have attached your service to a VPC, it should remain attached to ensure that your applications running in your AWS -VPC will have continued connectivity to your service. +VPC have continued connectivity to your service. Timescale Cloud uses a different DNS name for a Timescale service once it has been attached -to a VPC. This means that you will need to update your connection string if migrating a service +to a VPC. This means that you need to update your connection string if migrating a service from the public internet into a VPC, or vice-versa. @@ -30,15 +30,15 @@ to migrate your service into. Migrate from public network to VPC Once you have selected the VPC to migrate your service into, click `Attach VPC`. -You will then be prompted to confirm the migration. +You are then prompted to confirm the migration. Confirm migration into VPC -After confirming the migration, your service will be attached to the VPC you selected. +After confirming the migration, your service is attached to the VPC you selected. **These operations are not immediate, and also involve DNS changes which may take a few minutes to propagate.** -As mentioned on the confirmation modal, you will need to update your connection string +As mentioned on the confirmation modal, you need to update your connection string in order to connect to your service after migration. The `Service URL` back on the service details page is already updated to include the new DNS info, and should be used for connecting to your service. @@ -47,8 +47,8 @@ for connecting to your service. When migrating your service into a VPC, ensure that your AWS VPC's security groups allow network access from your AWS VPC to the Cloud VPC which your service has migrated into. Security group configuration was previously covered as part of -peering connection setup. Double-check to be sure, otherwise you will not be able -to connect to your Timescale Cloud service. +peering connection setup. Double-check to be sure, otherwise you can't +connect to your Timescale Cloud service. ## Migrate between VPCs @@ -58,24 +58,24 @@ it to another VPC within the same project. Migrate between VPCs To migrate between VPCs, expand the `Migrate into another VPC` menu and select the VPC -to migrate your service to. Then click `Migrate`. You will then be prompted to confirm +to migrate your service to. Then click `Migrate`. You are prompted to confirm the migration. Migrate between VPCs confirmation -After confirming the migration, your service will be detached from its previous VPC +After confirming the migration, your service is detached from its previous VPC and attached to the new VPC you selected. -In the case of VPC to VPC migration, the `Service URL` connection string will not -be updated, only the IP address which the DNS name is associated with will be updated. +In the case of VPC to VPC migration, the `Service URL` connection string is not +updated, only the IP address which the DNS name is associated with is updated. **Please allow a few minutes for the DNS record changes to propagate.** When migrating your service between VPCs, ensure that your AWS VPC's security groups allow network access from your AWS VPC to the Cloud VPC which your service has migrated into. Security group configuration was previously covered as part of -peering connection setup. Double-check to be sure, otherwise you will not be able -to connect to your Timescale Cloud service. +peering connection setup. Double-check to be sure, otherwise you can't +connect to your Timescale Cloud service. ### Migrate back to public network @@ -85,14 +85,14 @@ it back to the public network. Migrate back to public network To migrate your service back to the public network, click `Migrate back to public network`. -You will then be prompted to confirm the migration. +You are prompted to confirm the migration. Migrate back to public network confirm -After confirming the migration, your service will be detached from its previous VPC +After confirming the migration, your service is detached from its previous VPC and made accessible over the public internet. -As mentioned on the confirmation modal, you will need to update your connection string +As mentioned on the confirmation modal, you need to update your connection string in order to connect to your service after migration. The `Service URL` back on the service details page is already updated to include the new DNS info, and should be used for connecting to your service. diff --git a/mst/ingest-data.md b/mst/ingest-data.md index 0b2780811bc1..821520ec0833 100644 --- a/mst/ingest-data.md +++ b/mst/ingest-data.md @@ -85,7 +85,7 @@ Before you begin, make sure you have ``` We recommend that you set the number of workers lower than the number of available CPU cores on your client machine or server, to prevent the workers - having to compete for resources. This will help your ingest go faster. + having to compete for resources. This helps your ingest go faster. 1. *OPTIONAL:* If you don't want to use the `timescaledb-parallel-copy` tool, or if you have a very small dataset, you can use the PostgreSQL `COPY` command instead: diff --git a/mst/mst-multi-node.md b/mst/mst-multi-node.md index c697f58e19ce..77251b6e92c1 100644 --- a/mst/mst-multi-node.md +++ b/mst/mst-multi-node.md @@ -27,8 +27,8 @@ be accessed directly once joined to a multi-node cluster. A proper TimescaleDB cluster should have at least two data nodes to begin realizing the benefits of distributed hypertables. While it is technically -possible to add just one data node to a cluster, this will perform worse than a -single-node TimescaleDB instance and is not recommended. +possible to add just one data node to a cluster, this configuration performs +worse than a single-node TimescaleDB instance and is not recommended. ### Create services for access and data node services @@ -55,12 +55,12 @@ access node. To setup your first multi-node instance in Managed Service for TimescaleDB, you -will need to create new Services for the Access Node and Data Nodes. +need to create new Services for the Access Node and Data Nodes. ### Modify access node settings The hard work of handling distributed queries in a multi-node cluster is -handled by TimescaleDB for you. Some queries, however, will perform better in a +handled by TimescaleDB for you. Some queries, however, perform better in a distributed environment when TimescaleDB is configured to more efficiently push down some types of queries to the data nodes. @@ -89,10 +89,10 @@ for TimescaleDB Access node. With user mapping authentication, you don’t need to manage any new users, however, **you need to have the passwords for the `tsdbadmin` user from each -data node you will be adding to the cluster**. +data node you are adding to the cluster**. The main limitation of this approach is that any password changes to the -connected `tsdbadmin` user on a data node will break the mapping connection +connected `tsdbadmin` user on a data node breaks the mapping connection and impact normal cluster operations. Any time a password is changed on a data node, you'll need to complete the mapping process outlined below to re-establish the connection between the access node and the affected data node. You can read @@ -141,7 +141,7 @@ Description | ``` ### Add a user mapping for each data node -Now you can create a `USER MAPPING` that will enable communication between the +Now you can create a `USER MAPPING` that enables communication between the access node and data node: ```SQL @@ -178,7 +178,7 @@ with regular, single-node hypertables, there was often little benefit in specifying a partition key when creating the hypertable. With distributed hypertables, however, adding a partition key is essential to ensure that data is efficiently distributed across data nodes. Otherwise, all data for a specific -time range will go to one chunk on one node, rather than being distributed +time range goes to one chunk on one node, rather than being distributed across all available data nodes for the same time range. ## Adding additional database users (optional) @@ -214,7 +214,7 @@ server: 1. Multi-node clusters can still use _regular_, non-distributed features like regular hypertables, PostgreSQL tables, and continuous aggregations. The - data stored in any of these objects will reside only on the access node. + data stored in any of these objects reside only on the access node. 1. There is no limitation on the number of distributed hypertables a user can create on the access node. 1. Finally, remember that once a Service is marked as an access node or data diff --git a/mst/security.md b/mst/security.md index b79d802d19a7..78faa995b82e 100644 --- a/mst/security.md +++ b/mst/security.md @@ -18,7 +18,7 @@ Service-providing virtual machines are dedicated for a single customer, i.e. there is no multi-tenancy on a VM basis, and the customer data never leaves the machine, except when uploaded to the offsite backup location. -Virtual machines are not reused and will be terminated and wiped upon service +Virtual machines are not reused and are terminated and wiped upon service upgrade or termination. ## Data encryption @@ -27,7 +27,7 @@ service instances as well as service backups in cloud object storage. Service instances and the underlying VMs use full volume encryption using LUKS with a randomly generated ephemeral key per each instance and each volume. The -key is never re-used and will be trashed at the destruction of the instance, so +key is never re-used and are trashed at the destruction of the instance, so there's a natural key rotation with roll-forward upgrades. We use the LUKS default mode aes-xts-plain64:sha256 with a 512-bit key. @@ -78,7 +78,7 @@ No customer access to the virtual machine level is provided. ## Customer data privacy Customer data privacy is of utmost importance at Timescale and is covered by internal Security and Customer Privacy policies as well as the strict EU regulations. -Timescale operators will never access the customer data, unless explicitly +Timescale operators never access customer data, unless explicitly requested by the customer in order to troubleshoot a technical issue. The Timescale operations team has mandatory recurring training regarding the @@ -99,7 +99,7 @@ TimescaleDB provides the ability to configure, in a fine-grained manner, the set of source IP addresses and ranges, as well as connection ports, that can access your Managed Service for TimescaleDB services. -This tutorial will walk you through how to configure this capability. +This tutorial walks you through how to configure this capability. #### Before you start @@ -111,8 +111,8 @@ get signed up and create your first database instance. Once you have a database instance setup in the [Managed Service for TimescaleDB portal][timescale-mst-portal], browse to this service and click on the 'Overview' tab. In the 'Connection Information' -section, you will see the port number that is used for database connections. -This is the port we will protect by managing inbound access. +section, you can see the port number that is used for database connections. +This is the port you need to protect by managing inbound access. Timescale Cloud Overview tab @@ -122,7 +122,7 @@ Scroll down to find the 'Allowed IP Addresses' section. By default, this value i `0.0.0.0/0` which is actually wide-open. -This wide-open setting simplifies getting started since it will accept incoming traffic from all sources, but you will absolutely want to narrow this range. +This wide-open setting simplifies getting started since it accepts incoming traffic from all sources, but you absolutely want to narrow this range. If you are curious about how to interpret this [Classless Inter-Domain Routing][cidr-wiki] (CIDR) syntax, @@ -132,9 +132,9 @@ check out [this great online tool][cidr-tool] to help decipher CIDR. #### Step 3 - Change the allowed IP addresses section -Click 'Change' and adjust the CIDR value based on where your source traffic will come from. -For example, entering a value of `192.168.1.15/32` will ONLY allow incoming traffic from a -source IP of `192.168.1.15` and will deny all other traffic. +Click 'Change' and adjust the CIDR value based on where your source traffic is coming from. +For example, entering a value of `192.168.1.15/32` ONLY allows incoming traffic from a +source IP of `192.168.1.15` and denies all other traffic. #### Step 4 - Save your changes Click 'Save Changes' and see this take effect immediately. diff --git a/mst/viewing-service-logs.md b/mst/viewing-service-logs.md index d6c650216c29..1d971101e17d 100644 --- a/mst/viewing-service-logs.md +++ b/mst/viewing-service-logs.md @@ -11,7 +11,7 @@ TimescaleDB: recent events are available. Logs can be browsed back in time, but scrolling up several thousand lines is not very convenient. * [Command-line client][] supports programmatically downloading logs. avn service l -ogs -S desc -f --project your-project-name your-service-name will show all stored logs. +ogs -S desc -f --project your-project-name your-service-name shows all stored logs. * [REST API][] endpoint is available for fetching the same information two above methods output, in case programmatic access is needed. diff --git a/mst/vpc-peering.md b/mst/vpc-peering.md index 1c754f883a0f..72430b1490b0 100644 --- a/mst/vpc-peering.md +++ b/mst/vpc-peering.md @@ -20,7 +20,7 @@ peered network or on public internet. In order to set up a VPC peering for your Managed Service for TimescaleDB project please submit a request in the VPC section of the dashboard. -When creating a new service, you can choose whether the service will be placed +When creating a new service, you can choose whether the service is placed in a VPC or not: The list of cloud providers and regions contains options like "Belgium - Google Cloud: Belgium" and "Belgium - Google Cloud: Belgium - Project VPC". Here selecting the former would create the service to non-VPC environment @@ -37,14 +37,14 @@ Peering connections can be requested with the VPC request, or added later. Note however that the VPC is not accessible until at least one connection has been created. -After the request has been submitted VPC peering will be automatically set up by +After the request has been submitted VPC peering is automatically set up by Managed Service for TimescaleDB, and the status is updated in the web console's VPC view together with instructions for starting peering with our network. Note that you'll need to accept a VPC peering connection request (AWS) or create a corresponding peering from your project to Managed Service for TimescaleDB's (Google) before Managed Service for TimescaleDB's backend can notice the peering is ready and traffic can be routed through it. After setting up your side, the -VPC peering will activate shortly on the Managed Service for TimescaleDB console. +VPC peering activates shortly on the Managed Service for TimescaleDB console. When you have submitted a VPC peering request, you can find cloud-specific identification details for your VPC by hovering your mouse over the `pending diff --git a/timescaledb/contribute-to-docs.md b/timescaledb/contribute-to-docs.md index e45307bfa4a3..f59b6717ee99 100644 --- a/timescaledb/contribute-to-docs.md +++ b/timescaledb/contribute-to-docs.md @@ -4,7 +4,7 @@ open for contribution from all community members. If you find errors or would like to add content to our docs, you can create a pull request using GitHub for review by our documentation team. This document contains everything you need to know about our writing style and standards, but don't worry too much if you -aren't sure what to write. Our documentation team will help you craft the +aren't sure what to write. Our documentation team helps you craft the perfect words when you have a PR ready. We also have some automation on our repository to help you. @@ -23,7 +23,7 @@ the [README][readme]. Before we accept any contributions, Timescale contributors need to sign the Contributor License Agreement (CLA). By signing a CLA, we can ensure that the community is free and confident in its ability to use your contributions. You -will be prompted to sign the CLA during the pull request process. +are prompted to sign the CLA during the pull request process. ## Resources When making style decisions, consult resources in this order: @@ -53,12 +53,12 @@ notice? Readers are often in an agitated state by the time they get to our documentation. Stressed readers jump around in the text, skip words, steps, or -paragraphs, and will quickly give up if things seem too complex. To mitigate +paragraphs, and can quickly give up if things seem too complex. To mitigate this, use short sentences, plain language, and a minimum number of eye-catching details such as admonitions. Never assume that because you've explained something earlier in a document, -readers will know it later in the document. You can use cross-references to help +readers know it later in the document. You can use cross-references to help guide readers to further information if they need it. ## Grammar diff --git a/timescaledb/contribute-to-timescaledb.md b/timescaledb/contribute-to-timescaledb.md index b7ac87c0c98d..8e8365aef67b 100644 --- a/timescaledb/contribute-to-timescaledb.md +++ b/timescaledb/contribute-to-timescaledb.md @@ -9,7 +9,7 @@ GitHub. Timescale documentation is hosted in a [GitHub repository][github-docs] and is open for contribution from all community members. If you find errors or would like to add content to our docs, this tutorial -will walk you through the process. +walks you through the process. ### Making minor changes If you want to make only minor changes to docs, you can make corrections @@ -19,7 +19,7 @@ an option to submit a pull request at the bottom of the page. ### Making larger contributions to docs In order to modify documentation, you should have a working knowledge -of [git][install-git] and [Markdown][markdown-tutorial]. You will +of [git][install-git] and [Markdown][markdown-tutorial]. You also need to create a GitHub account. Be sure to read the [Timescale docs contribution styleguide][timescale-docs-style]. @@ -30,7 +30,7 @@ you as you author your contribution. Before we accept any contributions, Timescale contributors need to sign the Contributor License Agreement (CLA). By signing a CLA, we can ensure that the community is free and confident in its -ability to use your contributions. You will be prompted to sign the +ability to use your contributions. You are prompted to sign the CLA during the pull request process. diff --git a/timescaledb/getting-started/access-timescaledb.md b/timescaledb/getting-started/access-timescaledb.md index 0680f0615ece..8b7fed09bce3 100644 --- a/timescaledb/getting-started/access-timescaledb.md +++ b/timescaledb/getting-started/access-timescaledb.md @@ -2,19 +2,19 @@ Now that you have TimescaleDB setup and running in Timescale Cloud, it's time to connect to your database. While this can be accomplished with many tools, `psql` -is the standard command line interface for interacting with a PostgreSQL +is the standard command line interface for interacting with a PostgreSQL or TimescaleDB instance. Below, we'll verify that you have `psql` installed and show you how to connect to your TimescaleDB database. ## Verify that `psql` is installed -**Before you start**, let's confirm that you already have `psql` installed. -In fact, if you’ve ever installed Postgres or TimescaleDB before, you likely already +**Before you start**, let's confirm that you already have `psql` installed. +In fact, if you’ve ever installed Postgres or TimescaleDB before, you likely already have `psql` installed. In a command line or terminal window, type the following command and press **Enter**. -If `psql` is installed, it will return the version number. Otherwise, you will +If `psql` is installed, it returns the version number. Otherwise, you receive an error. ```bash @@ -44,8 +44,8 @@ psql postgres://[USERNAME]:[PASSWORD]@[HOSTNAME]:[PORT]/[DATABASENAME]?sslmode=r ``` -Because the URL provided in the Timescale Cloud interface does not supply the -password, you will be prompted for the password in order to finish authenticating. +Because the URL provided in the Timescale Cloud interface does not supply the +password, you are prompted for the password in order to finish authenticating. If you want to save yourself time, you can add the password to the URL by adding a colon and the password between the username and the hostname as shown diff --git a/timescaledb/getting-started/compress-data.md b/timescaledb/getting-started/compress-data.md index c5ca6eca728d..8f0e67e84d55 100644 --- a/timescaledb/getting-started/compress-data.md +++ b/timescaledb/getting-started/compress-data.md @@ -10,7 +10,7 @@ All postgresql data types can be used in compression. -At a high level, TimescaleDB's built-in job scheduler framework will asynchronously convert recent data from an uncompressed row-based form to a compressed columnar form across chunks of TimescaleDB hypertables. +At a high level, TimescaleDB's built-in job scheduler framework asynchronously converts recent data from an uncompressed row-based form to a compressed columnar form across chunks of TimescaleDB hypertables. Let's enable compression on our hypertable and then look at two ways of compressing data: with an automatic policy or manually. @@ -43,7 +43,7 @@ We can also view the compression settings for our hypertables by using the `comp SELECT * FROM timescaledb_information.compression_settings; ``` -Now that compression is enabled, we need to schedule a policy to automatically compress data according to the settings defined above. We will set a policy to compress data older than 10 years by using the following query: +Now that compression is enabled, we need to schedule a policy to automatically compress data according to the settings defined above. We set a policy to compress data older than 10 years by using the following query: ```sql -- Add compression policy @@ -92,13 +92,13 @@ This is especially beneficial when backups and high-availability replicas are ta In addition to saving storage space and costs, compressing data might increase query performance on certain kinds of queries. Compressed data tends to be older data and older data tends to have different query patterns than recent data. -**Newer data tends to be queried in a shallow and wide fashion**. In this case, shallow refers to the length of time and wide refers to the range of columns queried. These are often debugging or "whole system" queries. For example, "Show me all the metrics for all cities in the last 2 days." In this case the uncompressed, row based format that is native to PostgreSQL will give us the best query performance. +**Newer data tends to be queried in a shallow and wide fashion**. In this case, shallow refers to the length of time and wide refers to the range of columns queried. These are often debugging or "whole system" queries. For example, "Show me all the metrics for all cities in the last 2 days." In this case the uncompressed, row based format that is native to PostgreSQL gives us the best query performance. **Older data tends to be queried in a deep and narrow fashion.** In this case, deep refers to the length of time and narrow refers to the range of columns queried. As data begins to age, queries tend to become more analytical in nature and involve fewer columns. For example, "Show me the average annual temperature for city A in the past 20 years". This type of queries greatly benefit from the compressed, columnar format. TimescaleDB's compression design allows you to get the best of both worlds: recent data is ingested in an uncompressed, row format for efficient shallow and wide queries, and then automatically converted to a compressed, columnar format after it ages and is most often queried using deep and narrow queries. -Here's an example of a deep and narrow query on our compressed data. It calculates the average temperature for New York City for all years in the dataset before 2010. Data for these years will be compressed, since we compressed all data older than 10 years with either our policy or the manual compression method above. +Here's an example of a deep and narrow query on our compressed data. It calculates the average temperature for New York City for all years in the dataset before 2010. Data for these years is compressed, since we compressed all data older than 10 years with either our policy or the manual compression method above. ```sql -- Deep and narrow query on compressed data diff --git a/timescaledb/getting-started/create-cagg.md b/timescaledb/getting-started/create-cagg.md index 15457a6f9e78..45746e432490 100644 --- a/timescaledb/getting-started/create-cagg.md +++ b/timescaledb/getting-started/create-cagg.md @@ -38,7 +38,7 @@ sampled at high frequency, and querying downsampled data over long time periods. Now that you're familiar with what Continuous Aggregates are, let's create our first continuous aggregate. Creating a continuous aggregate is a two step process: -first we define our view and second, we create a policies which will refresh the +first we define our view and second, we create a policies which refresh the continuous aggregate according to a schedule. We'll use the example of creating a daily aggregation of all weather metrics. @@ -149,7 +149,7 @@ so that you can spend time on feature development. You'll see policies for compression and data retention later in this **Getting started** section. -Let's create a policy which will auto-update the continuous aggregate every two weeks: +Let's create a policy which auto-updates the continuous aggregate every two weeks: ```sql -- create policy @@ -160,8 +160,8 @@ SELECT add_continuous_aggregate_policy('weather_metrics_daily', schedule_interval => INTERVAL '14 days'); ``` -The policy above will run every 14 days (`schedule_interval`). When it runs, it -will materialize data from between 6 months (`start_offset`) and 1 hour (`end_offset`) +The policy above runs every 14 days (`schedule_interval`). When it runs, it +materializes data from between 6 months (`start_offset`) and 1 hour (`end_offset`) of the time it executes, according to the query which defined the continuous aggregate `weather_metrics_daily`. @@ -219,7 +219,7 @@ by default). **With real-time aggregation turned off**, continuous aggregates only return results for data in the time period they have materialized (refreshed). If you query continuous aggregates for data newer than the last materialized time, it -will not return it or return stale results. +does not return it or returns stale results. **With real-time aggregation turned on**, you always receive up-to-date results, as querying a continuous aggregate returns data that is already materialized diff --git a/timescaledb/getting-started/data-retention.md b/timescaledb/getting-started/data-retention.md index 14c34fe70791..dca0cc25fe3c 100644 --- a/timescaledb/getting-started/data-retention.md +++ b/timescaledb/getting-started/data-retention.md @@ -54,8 +54,8 @@ SELECT drop_chunks('weather_metrics', INTERVAL '25 years'); ``` -This will drop all chunks from the hypertable conditions *that only include data -older than the specified duration* of 25 years, and will *not* delete any individual rows of data in chunks. +This drops all chunks from the hypertable conditions *that only include data +older than the specified duration* of 25 years, and does *not* delete any individual rows of data in chunks. ## Downsampling @@ -64,7 +64,7 @@ downsampling on our data. We can downsample high fidelity raw data into summarie via continuous aggregation and then discard the underlying raw observations from our hypertable, while retaining our aggregated version of the data. -We can also take this a step further, by applying data retention policies (or +We can also take this a step further, by applying data retention policies (or using drop_chunks) on continuous aggregates themselves, since they are a special kind of hypertable. The only restrictions at this time is that you cannot apply compression or continuous aggregation to these hypertables. diff --git a/timescaledb/getting-started/index.md b/timescaledb/getting-started/index.md index f68419cb3444..95fcd8a7e0ff 100644 --- a/timescaledb/getting-started/index.md +++ b/timescaledb/getting-started/index.md @@ -18,7 +18,7 @@ all the tools and connectors in the PostgreSQL ecosystem. If it works with PostgreSQL, it works with Timescale! ## Let's get up and running -This Getting Started section will give you a hands-on introduction to the +This Getting Started section gives you a hands-on introduction to the fundamentals of TimescaleDB. Using a real-world dataset, you'll learn definitions of key terms (like hypertables and chunks), mental models for working with TimescaleDB, as well as TimescaleDB's key features (like continuous aggregation, diff --git a/timescaledb/getting-started/launch-timescaledb.md b/timescaledb/getting-started/launch-timescaledb.md index 78db3bebaafc..c993d21d69cd 100644 --- a/timescaledb/getting-started/launch-timescaledb.md +++ b/timescaledb/getting-started/launch-timescaledb.md @@ -23,7 +23,7 @@ Provide your full name, email address, and a strong password to start: Sign up for Timescale Cloud -You will need to confirm your account by clicking the link you receive via +You need to confirm your account by clicking the link you receive via email. If you do not receive this link, please first check your spam folder and, failing that, please [contact us][contact-timescale]. @@ -41,7 +41,7 @@ and storage options**. But don't worry, if you want to do more with Timescale Cl after you've completed everything, you can always resize your service or create a new one in a few clicks! -After you select 'Create service', you will see confirmation of your service account and +After you select 'Create service', you can see confirmation of your service account and password information. You should save the information in this confirmation screen in a safe place: @@ -51,7 +51,7 @@ a safe place: If you forget your password in the future, you can reset your password from the *service dashboard*. -It will take a couple minutes for your service to be provisioned. When your database is +It takes a couple minutes for your service to be provisioned. When your database is ready for connection, you should see a green `Running` label above the service in the service dashboard. diff --git a/timescaledb/getting-started/migrate-data.md b/timescaledb/getting-started/migrate-data.md index 5de66e902be7..56c8dd916005 100644 --- a/timescaledb/getting-started/migrate-data.md +++ b/timescaledb/getting-started/migrate-data.md @@ -113,7 +113,7 @@ need `pg_dump` for exporting your schema and data. Migration falls into three main steps: -1. Copy over the database schema and choose which tables will become +1. Copy over the database schema and choose which tables should become hypertables (i.e., those that currently have time-series data). 1. Backup data to comma-separated values (CSV). 1. Import the data into TimescaleDB diff --git a/timescaledb/how-to-guides/alerting.md b/timescaledb/how-to-guides/alerting.md index 06fb14a8eedc..9c8f51172caf 100644 --- a/timescaledb/how-to-guides/alerting.md +++ b/timescaledb/how-to-guides/alerting.md @@ -6,7 +6,7 @@ There are a variety of different alerting solutions you can use in conjunction w Grafana is a great way to visualize and explore time-series data and has a first-class integration with TimescaleDB. Beyond data visualization, Grafana also provides alerting functionality to keep you notified of anomalies. -Within Grafana, you can [define alert rules][define alert rules] which are time-based thresholds for your dashboard data (e.g. “Average CPU usage greater than 80 percent for 5 minutes”). When those alert rules are triggered, Grafana will send a message via the chosen notification channel. Grafana provides integration with webhooks, email and more than a dozen external services including Slack and PagerDuty. +Within Grafana, you can [define alert rules][define alert rules] which are time-based thresholds for your dashboard data (e.g. “Average CPU usage greater than 80 percent for 5 minutes”). When those alert rules are triggered, Grafana sends a message via the chosen notification channel. Grafana provides integration with webhooks, email and more than a dozen external services including Slack and PagerDuty. To get started, first download and install [Grafana][Grafana-install]. Next, add a new [PostgreSQL datasource][PostgreSQL datasource] that points to your TimescaleDB instance. This data source was built by TimescaleDB engineers, and it is designed to take advantage of the database's time-series capabilities. From there, proceed to your dashboard and set up alert rules as described above. diff --git a/timescaledb/how-to-guides/compression/about-compression.md b/timescaledb/how-to-guides/compression/about-compression.md index 29c340ba426f..795bea1a78ef 100644 --- a/timescaledb/how-to-guides/compression/about-compression.md +++ b/timescaledb/how-to-guides/compression/about-compression.md @@ -70,7 +70,7 @@ queries are more likely to be shallow in time, and wide in columns. Generally, they are debugging queries, or queries that cover the whole system, rather than specific, analytic queries. An example of the kind of query more likely for new data is "show me the current CPU usage, disk usage, energy consumption, and I/O -for a particular server". When this is the case, the uncompressed data will have +for a particular server". When this is the case, the uncompressed data has better query performance, so the native PostgreSQL row-based format is the best option. @@ -190,7 +190,7 @@ ALTER TABLE example timescaledb.compress_orderby = 'device_id, time DESC'); ``` -Using those settings, the compressed table now shows each measurement in consecutive order, and the `cpu` values show a trend. This table will compress much better: +Using those settings, the compressed table now shows each measurement in consecutive order, and the `cpu` values show a trend. This table compresses much better: |time|device_id|cpu|disk_io|energy_consumption| |---|---|---|---|---| @@ -198,7 +198,7 @@ Using those settings, the compressed table now shows each measurement in consecu Putting items in `orderby` and `segmentby` columns often achieves similar results. In this same example, if you set it to segment by the `device_id` -column, it will have good compression, even without setting `orderby`. This is +column, it has good compression, even without setting `orderby`. This is because ordering only matters within a segment, and segmenting by device means that each segment represents a series if it is ordered by time. So, if segmenting by an identifier causes segments to become too small, try moving the diff --git a/timescaledb/how-to-guides/compression/backfill-historical-data.md b/timescaledb/how-to-guides/compression/backfill-historical-data.md index c80865b70df1..ec1d84033666 100644 --- a/timescaledb/how-to-guides/compression/backfill-historical-data.md +++ b/timescaledb/how-to-guides/compression/backfill-historical-data.md @@ -35,7 +35,7 @@ To use this procedure: If using a temp table, the table is automatically dropped at the end of your database session. If using a normal table, after you are done backfilling the -data successfully, you will likely want to truncate your table in preparation +data successfully, you want to truncate your table in preparation for the next backfill (or drop it completely). ## Manually decompressing chunks for backfill @@ -58,8 +58,8 @@ SELECT alter_job(, scheduled => false); ``` We have now paused the compress chunk policy from the hypertable which -will leave us free to decompress the chunks we need to modify via backfill or -update. To decompress the chunk(s) that we will be modifying, for each chunk: +leaves us free to decompress the chunks we need to modify via backfill or +update. To decompress the chunk(s) that need to be modified, for each chunk: ``` sql SELECT decompress_chunk('_timescaledb_internal._hyper_2_2_chunk'); @@ -73,7 +73,7 @@ SELECT decompress_chunk(i) from show_chunks('conditions', newer_than, older_than ``` -You need to run 'decompress_chunk' for each chunk that will be impacted +You need to run 'decompress_chunk' for each chunk that is impacted by your INSERT or UPDATE statement in backfilling data. Once your needed chunks are decompressed you can proceed with your data backfill operations. @@ -85,7 +85,7 @@ our compression policy job: SELECT alter_job(, scheduled => true); ``` -This job will re-compress any chunks that were decompressed during your backfilling +This job re-compresses any chunks that were decompressed during your backfilling operation the next time it runs. To have it run immediately, you can expressly execute the command via [`run_job`][run-job]: @@ -96,10 +96,10 @@ CALL run_job(); ## Future Work [](future-work) One of the current limitations of TimescaleDB is that once chunks are converted -into compressed column form, we do not allow updates and deletes of the data -or changes to the schema without manual decompression, except as noted [above][compression-schema-changes]. -In other words, chunks are partially immutable in compressed form. -Attempts to modify the chunks' data in those cases will either error or fail silently (as preferred by users). +into compressed column form, we do not allow updates and deletes of the data +or changes to the schema without manual decompression, except as noted [above][compression-schema-changes]. +In other words, chunks are partially immutable in compressed form. +Attempts to modify the chunks' data in those cases either errors or fails silently (as preferred by users). We plan to remove this limitation in future releases. diff --git a/timescaledb/how-to-guides/compression/decompress-chunks.md b/timescaledb/how-to-guides/compression/decompress-chunks.md index 54b2bde641d4..1918e401dacf 100644 --- a/timescaledb/how-to-guides/compression/decompress-chunks.md +++ b/timescaledb/how-to-guides/compression/decompress-chunks.md @@ -15,7 +15,7 @@ or backfilling data: trying to compress chunks that you are currently working on. 1. Decompress chunks. 1. Perform the insertion or backfill. -1. Re-enable the compression policy. This will re-compress the chunks you worked on. +1. Re-enable the compression policy. This re-compresses the chunks you worked on. ## Decompress chunks manually There are several methods for selecting chunks and decompressing them. diff --git a/timescaledb/how-to-guides/configuration/configuration.md b/timescaledb/how-to-guides/configuration/configuration.md index f3baebd140d1..c28d4aafdbcb 100644 --- a/timescaledb/how-to-guides/configuration/configuration.md +++ b/timescaledb/how-to-guides/configuration/configuration.md @@ -60,7 +60,7 @@ work_mem = 26214kB Is this okay? [(y)es/(s)kip/(q)uit]: ``` -These changes are then written to your `postgresql.conf` and will take effect +These changes are then written to your `postgresql.conf` and take effect on the next (re)start. If you are starting on fresh instance and don't feel the need to approve each group of changes, you can also automatically accept and append the suggestions to the end of your `postgresql.conf` like so: @@ -102,11 +102,11 @@ setting to the sum of your total number of databases and the total number of concurrent background workers you want running at any given point in time. You need a background worker allocated to each database to run a lightweight scheduler that schedules jobs. On top of that, any additional -workers you allocate here will run background jobs when needed. +workers you allocate here run background jobs when needed. For larger queries, PostgreSQL automatically uses parallel workers if they are available. To configure this use the `max_parallel_workers` setting. -Increasing this setting will improve query performance for +Increasing this setting improves query performance for larger queries. Smaller queries may not trigger parallel workers. By default, this setting corresponds to the number of CPUs available. Use the `--cpus` flag or the `TS_TUNE_NUM_CPUS` docker environment variable to change it. @@ -120,7 +120,7 @@ workers). By default, `timescaledb-tune` sets `timescaledb.max_background_workers` to 8. In order to change this setting, use the `--max-bg-workers` flag or the `TS_TUNE_MAX_BG_WORKERS` docker environment variable. The `max_worker_processes` -setting will automatically be adjusted as well. +setting is automatically adjusted as well. ### Disk-write settings [](disk-write) @@ -191,18 +191,18 @@ workers. Default value is 8. #### `timescaledb.enable_2pc (bool)` [](enable_2pc) Enables two-phase commit for distributed hypertables. If disabled, it -will use a one-phase commit instead, which is faster but can result in +uses a one-phase commit instead, which is faster but can result in inconsistent data. It is by default enabled. #### `timescaledb.enable_per_data_node_queries (bool)` [](enable_per_data_node_queries) -If enabled, TimescaleDB will combine different chunks belonging to the +If enabled, TimescaleDB combines different chunks belonging to the same hypertable into a single query per data node. It is by default enabled. #### `timescaledb.max_insert_batch_size (int)` [](max_insert_batch_size) When acting as a access node, TimescaleDB splits batches of inserted -tuples across multiple data nodes. It will batch up to +tuples across multiple data nodes. It batches up to `max_insert_batch_size` tuples per data node before flushing. Setting this to 0 disables batching, reverting to tuple-by-tuple inserts. The default value is 1000. @@ -225,7 +225,7 @@ data nodes. It is by default enabled. #### `timescaledb.enable_remote_explain (bool)` [](enable_remote_explain) Enable getting and showing `EXPLAIN` output from remote nodes. This -will require sending the query to the data node, so it can be affected +requires sending the query to the data node, so it can be affected by the network connection and availability of data nodes. It is by default disabled. #### `timescaledb.remote_data_fetcher (enum)` [](remote_data_fetcher) diff --git a/timescaledb/how-to-guides/configuration/postgres-config.md b/timescaledb/how-to-guides/configuration/postgres-config.md index d63fdb3f61d3..b3b658dc2429 100644 --- a/timescaledb/how-to-guides/configuration/postgres-config.md +++ b/timescaledb/how-to-guides/configuration/postgres-config.md @@ -36,11 +36,11 @@ setting to the sum of your total number of databases and the total number of concurrent background workers you want running at any given point in time. You need a background worker allocated to each database to run a lightweight scheduler that schedules jobs. On top of that, any additional -workers you allocate here will run background jobs when needed. +workers you allocate here run background jobs when needed. For larger queries, PostgreSQL automatically uses parallel workers if they are available. To configure this use the `max_parallel_workers` setting. -Increasing this setting will improve query performance for +Increasing this setting improves query performance for larger queries. Smaller queries may not trigger parallel workers. By default, this setting corresponds to the number of CPUs available. Use the `--cpus` flag or the `TS_TUNE_NUM_CPUS` docker environment variable to change it. @@ -54,7 +54,7 @@ workers). By default, `timescaledb-tune` sets `timescaledb.max_background_workers` to 8. In order to change this setting, use the `--max-bg-workers` flag or the `TS_TUNE_MAX_BG_WORKERS` docker environment variable. The `max_worker_processes` -setting will automatically be adjusted as well. +setting is automatically adjusted as well. ### Disk-write settings [](disk-write) diff --git a/timescaledb/how-to-guides/configuration/telemetry.md b/timescaledb/how-to-guides/configuration/telemetry.md index b60e552cff54..83f3c11bc599 100644 --- a/timescaledb/how-to-guides/configuration/telemetry.md +++ b/timescaledb/how-to-guides/configuration/telemetry.md @@ -51,7 +51,7 @@ a text string of the exact JSON that is sent to our servers. Additionally any content of the table `_timescaledb_catalog.metadata` which has `include_in_telemetry` set to `true` and the value of `timescaledb_telemetry.cloud` -will be included in the telemetry report. +is included in the telemetry report. Notably, telemetry reports a different set of values depending on the license that your TimescaleDB instance is running under. If you are using OSS or Community, @@ -61,8 +61,8 @@ as relevant. ## Version checking The database sends telemetry reports periodically in the background. -In response to the telemetry report, the database will receive the most recent -version of TimescaleDB available for installation. This version will be +In response to the telemetry report, the database receives the most recent +version of TimescaleDB available for installation. This version is recorded in the user's server logs, along with any applicable out-of-date version warnings. While you do not have to update immediately to the newest release, many users have reported that performance issues or bugs @@ -88,9 +88,9 @@ Alternatively, in a `psql` console, run: ALTER [SYSTEM | DATABASE | USER] { *db_name* | *role_specification* } SET timescaledb.telemetry_level=off ``` -If `ALTER DATABASE` is run, then this will disable telemetry for the specified +If `ALTER DATABASE` is run, then this disables telemetry for the specified database, but not for other databases in the instance. If `ALTER SYSTEM` is -run, this will disable telemetry for the entire instance. +run, this disables telemetry for the entire instance. Note that superuser privileges are necessary to run `ALTER SYSTEM`. After running the desired command, reload the new server configuration with `SELECT pg_reload_conf()` in order @@ -109,4 +109,4 @@ or run the following command in psql: ALTER [SYSTEM | DATABASE | USER] { *db_name* | *role_specification* } SET timescaledb.telemetry_level=basic ``` -[get_telemetry_report]: /api/:currentVersion:/administration/get_telemetry_report +[get_telemetry_report]: /api/:currentVersion:/administration/get_telemetry_report diff --git a/timescaledb/how-to-guides/configuration/timescaledb-config.md b/timescaledb/how-to-guides/configuration/timescaledb-config.md index 9bccf015751e..6eed5d27a6e6 100644 --- a/timescaledb/how-to-guides/configuration/timescaledb-config.md +++ b/timescaledb/how-to-guides/configuration/timescaledb-config.md @@ -18,18 +18,18 @@ workers. Default value is 8. ### `timescaledb.enable_2pc (bool)` [](enable_2pc) Enables two-phase commit for distributed hypertables. If disabled, it -will use a one-phase commit instead, which is faster but can result in +uses a one-phase commit instead, which is faster but can result in inconsistent data. It is by default enabled. ### `timescaledb.enable_per_data_node_queries` -If enabled, TimescaleDB will combine different chunks belonging to the +If enabled, TimescaleDB combines different chunks belonging to the same hypertable into a single query per data node. It is by default enabled. ### `timescaledb.max_insert_batch_size (int)` When acting as a access node, TimescaleDB splits batches of inserted -tuples across multiple data nodes. It will batch up to +tuples across multiple data nodes. It batches up to `max_insert_batch_size` tuples per data node before flushing. Setting this to 0 disables batching, reverting to tuple-by-tuple inserts. The default value is 1000. @@ -52,7 +52,7 @@ data nodes. It is by default enabled. ### `timescaledb.enable_remote_explain (bool)` Enable getting and showing `EXPLAIN` output from remote nodes. This -will require sending the query to the data node, so it can be affected +requires sending the query to the data node, so it can be affected by the network connection and availability of data nodes. It is by default disabled. ### `timescaledb.remote_data_fetcher (enum)` diff --git a/timescaledb/how-to-guides/configuration/timescaledb-tune.md b/timescaledb/how-to-guides/configuration/timescaledb-tune.md index 745ce7ca157f..029902ad7d1a 100644 --- a/timescaledb/how-to-guides/configuration/timescaledb-tune.md +++ b/timescaledb/how-to-guides/configuration/timescaledb-tune.md @@ -43,7 +43,7 @@ work_mem = 26214kB Is this okay? [(y)es/(s)kip/(q)uit]: ``` -These changes are then written to your `postgresql.conf` and will take effect +These changes are then written to your `postgresql.conf` and take effect on the next (re)start. If you are starting on fresh instance and don't feel the need to approve each group of changes, you can also automatically accept and append the suggestions to the end of your `postgresql.conf` like so: diff --git a/timescaledb/how-to-guides/connecting/index.md b/timescaledb/how-to-guides/connecting/index.md index 007585406215..6ad0a522ae87 100644 --- a/timescaledb/how-to-guides/connecting/index.md +++ b/timescaledb/how-to-guides/connecting/index.md @@ -5,8 +5,8 @@ we provide some basic instructions on connecting with popular tools for running queries against PostgreSQL. In most cases, if you don't find instructions for a tool below or the instructions -we provide don't seem to work in your scenario, the tooling website will certainly -have up-to-date instructions for connecting. +we provide don't seem to work in your scenario, the tooling website +has up-to-date instructions for connecting. Generally, you'll need to know the following information to connect any tool to your TimescaleDB instance: @@ -20,9 +20,9 @@ to your TimescaleDB instance: With that information ready to go, you can connect using one of these tools. -**[Connect with `psql`][connect-psql]**: `psql` is the standard command line interface for +**[Connect with `psql`][connect-psql]**: `psql` is the standard command line interface for interacting with a PostgreSQL or TimescaleDB instance and used in most of our tutorials and documentation. -[connect-psql]: /how-to-guides/connecting/psql/ \ No newline at end of file +[connect-psql]: /how-to-guides/connecting/psql/ diff --git a/timescaledb/how-to-guides/connecting/psql.md b/timescaledb/how-to-guides/connecting/psql.md index 2ace07c7c9a9..4abff6a321db 100644 --- a/timescaledb/how-to-guides/connecting/psql.md +++ b/timescaledb/how-to-guides/connecting/psql.md @@ -1,10 +1,10 @@ # Installing psql on Mac, Ubuntu, Debian, Windows -`psql` is the standard command line interface for interacting with a PostgreSQL +`psql` is the standard command line interface for interacting with a PostgreSQL or TimescaleDB instance. Here we explain how to install `psql` on various platforms. -**Before you start**, you should confirm that you don’t already have `psql` installed. -In fact, if you’ve ever installed Postgres or TimescaleDB before, you likely already +**Before you start**, you should confirm that you don’t already have `psql` installed. +In fact, if you’ve ever installed Postgres or TimescaleDB before, you likely already have `psql` installed. ```bash @@ -48,7 +48,7 @@ sudo apt-get install postgresql-client We recommend using the installer from [PostgreSQL.org][windows-installer]. ## Connect to your PostgreSQL server -In order to connect to your PostgreSQL server, you’ll need the following +In order to connect to your PostgreSQL server, you’ll need the following connection parameters: - Hostname - Port @@ -60,13 +60,13 @@ There are two ways to use these parameters to connect to your PostgreSQL databas ### Option 1: Supply parameters at the command line In this method, use parameter flags on the command line to supply the required -information to connect to a PostgreSQL database: +information to connect to a PostgreSQL database: ```bash psql -h HOSTNAME -p PORT -U USERNAME -W -d DATABASENAME ``` -Once you run that command, the prompt will ask you for your password. (This is the purpose +Once you run that command, the prompt asks you for your password. (This is the purpose of the `-W` flag.) ### Option 2: Use a service URI @@ -119,15 +119,15 @@ SELECT rates.description, COUNT(vendor_id) AS num_trips, FROM rides JOIN rates ON rides.rate_code = rates.rate_code WHERE rides.rate_code IN (2,3) AND pickup_datetime < '2016-02-01' - GROUP BY rates.description + GROUP BY rates.description ORDER BY rates.description; ``` It would be pretty common to make an error the first couple of times you attempt to write something that long in SQL. Instead of re-typing every line or character, -you can launch a `vim` editor using the `\e` command. Your previous command can -then be edited, and once you save ("Escape-Colon-W-Q") your edits, the command will -appear in the buffer. You will be able to get back to it by pressing the up arrow +you can launch a `vim` editor using the `\e` command. Your previous command can +then be edited, and once you save ("Escape-Colon-W-Q") your edits, the command +appears in the buffer. You can get back to it by pressing the up arrow in your Terminal window. Congrats! Now you have connected via `psql`. diff --git a/timescaledb/how-to-guides/continuous-aggregates/real-time-aggregates.md b/timescaledb/how-to-guides/continuous-aggregates/real-time-aggregates.md index 4d5a4488efbe..93e2f4aa3996 100644 --- a/timescaledb/how-to-guides/continuous-aggregates/real-time-aggregates.md +++ b/timescaledb/how-to-guides/continuous-aggregates/real-time-aggregates.md @@ -4,7 +4,7 @@ underlying hypertable. Real time aggregates use the aggregated data and add the most recent raw data to it to provide accurate and up to date results, without needing to aggregate data as it is being written. In TimescaleDB 1.7 and later, real time aggregates are enabled by default. When you create a continuous -aggregate view, queries to that view will include the most recent data, even if +aggregate view, queries to that view include the most recent data, even if it has not yet been aggregated. For more detail on the comparison between continuous and real time aggregates, diff --git a/timescaledb/how-to-guides/continuous-aggregates/refresh-policies.md b/timescaledb/how-to-guides/continuous-aggregates/refresh-policies.md index e5e772f67cf9..70f14ce0e23c 100644 --- a/timescaledb/how-to-guides/continuous-aggregates/refresh-policies.md +++ b/timescaledb/how-to-guides/continuous-aggregates/refresh-policies.md @@ -18,10 +18,10 @@ The policy takes three arguments: * `schedule_interval`: the refresh interval in minutes or hours If you set the `start_offset` or `end_offset` to NULL, the range is open-ended -and will extend to the beginning or end of time. However, we recommend that you +and extends to the beginning or end of time. However, we recommend that you set the `end_offset` so that at least the most recent time bucket is excluded. For time-series data that mostly contains writes that occur in time stamp order, -the time buckets that see lots of writes will quickly have out-of-date +the time buckets that see lots of writes quickly have out-of-date aggregates. You get better performance by excluding the time buckets that are getting a lot of writes. @@ -45,13 +45,13 @@ It also does not refresh the last time bucket of the continuous aggregate. Because it has an open-ended `start_offset` parameter, any data that is removed from the table, for example with a DELETE or with `drop_chunks`, is also removed from the continuous aggregate view. This means that the continuous aggregate -will always reflect the data in the underlying hypertable. +always reflects the data in the underlying hypertable. If you want to keep data in the continuous aggregate even if it is removed from the underlying hypertable, you can set the `start_offset` to match the [data retention policy][sec-data-retention] on the source hypertable. For example, if you have a retention policy that removes data older than one month, set -`start_offset` to one month or less. This will set your policy so that it does +`start_offset` to one month or less. This sets your policy so that it does not refresh the dropped data. diff --git a/timescaledb/how-to-guides/data-retention/data-retention-with-continuous-aggregates.md b/timescaledb/how-to-guides/data-retention/data-retention-with-continuous-aggregates.md index 29abe7027c4c..d895fda2d82e 100644 --- a/timescaledb/how-to-guides/data-retention/data-retention-with-continuous-aggregates.md +++ b/timescaledb/how-to-guides/data-retention/data-retention-with-continuous-aggregates.md @@ -3,9 +3,9 @@ Extra care must be taken when using retention policies or `drop_chunks` calls on hypertables which have [continuous aggregates][continuous_aggregates] defined on them. Similar to a refresh of a materialized view, a refresh on a continuous aggregate -will update the aggregate to reflect changes in the underlying source data. This means +updates the aggregate to reflect changes in the underlying source data. This means that any chunks that are dropped in the region still being refreshed by the -continuous aggregate will cause the chunk data to disappear from the aggregate as +continuous aggregate causes the chunk data to disappear from the aggregate as well. If the intent is to keep the aggregate while dropping the underlying data, the interval being dropped should not overlap with the offsets for the continuous aggregate. @@ -21,11 +21,11 @@ WITH (timescaledb.continuous) AS SELECT add_continuous_aggregate_policy('conditions_summary_daily', '7 days', '1 day', '1 day'); ``` -This will create the `conditions_summary_daily` aggregate which will store the daily +This creates the `conditions_summary_daily` aggregate which stores the daily temperature per device from our `conditions` table. However, we have a problem here -if we're using our 24 hour retention policy from above, as our aggregate will capture -changes to the data for up to seven days. As a result, we will update the aggregate -when we drop the chunk from the table, and we'll ultimately end up with no data in our +if we're using our 24 hour retention policy from above, as our aggregate captures +changes to the data for up to seven days. As a result, you update the aggregate +when you drop the chunk from the table, and you'll ultimately end up with no data in the `conditions_summary_daily` table. We can fix this by replacing the `conditions` retention policy with one having a more @@ -38,7 +38,7 @@ SELECT add_retention_policy('conditions', INTERVAL '30 days'); It's worth noting that continuous aggregates are also valid targets for `drop_chunks` and retention policies. To continue our example, we now have our `conditions` table holding the last 30 days worth of data, and our `conditions_daily_summary` holding -average daily values for an indefinite window after that. The following will change +average daily values for an indefinite window after that. The following changes this to also drop the aggregate data after 600 days: ```sql diff --git a/timescaledb/how-to-guides/data-retention/manually-drop-chunks.md b/timescaledb/how-to-guides/data-retention/manually-drop-chunks.md index a5374bcdb425..2bc2e3c13a56 100644 --- a/timescaledb/how-to-guides/data-retention/manually-drop-chunks.md +++ b/timescaledb/how-to-guides/data-retention/manually-drop-chunks.md @@ -29,15 +29,15 @@ For example: SELECT drop_chunks('conditions', INTERVAL '24 hours'); ``` -This will drop all chunks from the hypertable `conditions` that _only_ -include data older than this duration, and will _not_ delete any +This drops all chunks from the hypertable `conditions` that _only_ +include data older than this duration, and does _not_ delete any individual rows of data in chunks. For example, if one chunk has data more than 36 hours old, a second chunk has data between 12 and 36 hours old, and a third chunk has the most recent 12 hours of data, only the first chunk is dropped when executing `drop_chunks`. Thus, in this scenario, -the `conditions` hypertable will still have data stretching back 36 hours. +the `conditions` hypertable still has data stretching back 36 hours. For more information on the `drop_chunks` function and related parameters, please review the [API documentation][drop_chunks]. diff --git a/timescaledb/how-to-guides/data-tiering/move-data.md b/timescaledb/how-to-guides/data-tiering/move-data.md index 83f0b8cb670b..15ea8451f251 100644 --- a/timescaledb/how-to-guides/data-tiering/move-data.md +++ b/timescaledb/how-to-guides/data-tiering/move-data.md @@ -3,10 +3,10 @@ The [`move_chunk`][api-move-chunk] function requires multiple tablespaces set up start with a quick review of how this works. ## Creating a tablespace -First, add a storage mount that will serve as a home for your new tablespace. This -process will differ based on how you are deployed, but your system administrator +First, add a storage mount that can serve as a home for your new tablespace. This +process differs based on how you are deployed, but your system administrator should be able to arrange setting up the mount point. The key here is to provision -your tablespace with storage that is appropriate for how its resident data will be used. +your tablespace with storage that is appropriate for how its resident data is used. To create a [tablespace][] in Postgres: @@ -16,7 +16,7 @@ OWNER postgres LOCATION '/mnt/history'; ``` -Here we are creating a tablespace called `history` that will be +Here we are creating a tablespace called `history` that is owned by the default `postgres` user, using the storage mounted at `/mnt/history`. ## Move Chunks [](move_chunks) @@ -59,7 +59,7 @@ SELECT tablename from pg_tables WHERE tablespace = 'history' and tablename like '_hyper_%_%_chunk'; ``` -As you will see, the target chunk is now listed as residing on `history`; we +The target chunk is now listed as residing on `history`; we can similarly validate the location of our index: ```sql diff --git a/timescaledb/how-to-guides/hyperfunctions/advanced-agg.md b/timescaledb/how-to-guides/hyperfunctions/advanced-agg.md index f00ff8e24e77..b3a7bf044cbf 100644 --- a/timescaledb/how-to-guides/hyperfunctions/advanced-agg.md +++ b/timescaledb/how-to-guides/hyperfunctions/advanced-agg.md @@ -37,7 +37,7 @@ percentiles, then choose `tdigest`. If you're more concerned about getting highly accurate median estimates, choose `uddsketch`. The algorithms differ in the way they estimate data. `uddsketch` has a stable -bucketing function, so it will always return the same percentile estimate for +bucketing function, so it always returns the same percentile estimate for the same underlying data, regardless of how it is ordered or re-aggregated. On the other hand, `tdigest` builds up incremental buckets based on the average of nearby points, which can result in some subtle differences in estimates based on @@ -57,7 +57,7 @@ absolute error of those at the low end of the range. This gets much more extreme if the data range is `[0,100]`. If having a stable absolute error is important to your use case, choose `tdigest`. -While both algorithms will probably get smaller and faster with future +While both algorithms are likely to get smaller and faster with future optimizations, `uddsketch` generally requires a smaller memory footprint than `tdigest`, and a correspondingly smaller disk footprint for any continuous aggregates. Regardless of the algorithm you choose, the best way to improve the diff --git a/timescaledb/how-to-guides/hyperfunctions/function-pipelines.md b/timescaledb/how-to-guides/hyperfunctions/function-pipelines.md index 43606a25ad4c..311de378df66 100644 --- a/timescaledb/how-to-guides/hyperfunctions/function-pipelines.md +++ b/timescaledb/how-to-guides/hyperfunctions/function-pipelines.md @@ -26,10 +26,10 @@ GROUP BY device_id; You can express the same query with a function pipeline like this: ```sql SELECT device_id, - toolkit_experimental.timevector(ts, val) - -> toolkit_experimental.sort() - -> toolkit_experimental.delta() - -> toolkit_experimental.abs() + toolkit_experimental.timevector(ts, val) + -> toolkit_experimental.sort() + -> toolkit_experimental.delta() + -> toolkit_experimental.abs() -> toolkit_experimental.sum() as volatility FROM measurements WHERE ts >= now()-'1 day'::interval @@ -76,10 +76,10 @@ operator. To put it more plainly, you can think of it as "do the next thing". A typical function pipeline could look something like this: ```sql SELECT device_id, - toolkit_experimental.timevector(ts, val) - -> toolkit_experimental.sort() - -> toolkit_experimental.delta() - -> toolkit_experimental.abs() + toolkit_experimental.timevector(ts, val) + -> toolkit_experimental.sort() + -> toolkit_experimental.delta() + -> toolkit_experimental.abs() -> toolkit_experimental.sum() as volatility FROM measurements WHERE ts >= now() - '1 day'::interval @@ -131,9 +131,9 @@ an output in a specified format. or they can produce an aggregate of the For example, a finalizer element that produces an output: ```sql SELECT device_id, - toolkit_experimental.timevector(ts, val) + toolkit_experimental.timevector(ts, val) -> toolkit_experimental.sort() - -> toolkit_experimental.delta() + -> toolkit_experimental.delta() -> toolkit_experimental.unnest() FROM measurements ``` @@ -141,9 +141,9 @@ FROM measurements Or a finalizer element that produces an aggregate: ```sql SELECT device_id, - toolkit_experimental.timevector(ts, val) + toolkit_experimental.timevector(ts, val) -> toolkit_experimental.sort() - -> toolkit_experimental.delta() + -> toolkit_experimental.delta() -> toolkit_experimental.time_weight() FROM measurements ``` @@ -209,7 +209,7 @@ floating point representation of the integer. For example: SELECT ( toolkit_experimental.timevector(time, value) -> toolkit_experimental.abs() - -> toolkit_experimental.unnest()).* + -> toolkit_experimental.unnest()).* FROM (VALUES (TimestampTZ '2021-01-06 UTC', 0.0 ), ( '2021-01-01 UTC', 25.0 ), ( '2021-01-02 UTC', 0.10), @@ -246,7 +246,7 @@ the second argument of the function. The available elements are: |`sub(N)`|Computes each value less `N`| These elements calculate `vector -> power(2)` by squaring all of the `values`, -and `vector -> logn(3)` will give the log-base-3 of each `value`. For example: +and `vector -> logn(3)` gives the log-base-3 of each `value`. For example: ```sql SELECT ( toolkit_experimental.timevector(time, value) @@ -277,7 +277,7 @@ Mathematical transforms are applied only to the `value` in each point in a `timevector` and always produce one-to-one output `timevectors`. Compound transforms can involve both the `time` and `value` parts of the points in the `timevector`, and they are not necessarily one-to-one. One or more points in the input can be used to produce zero or more points in the output. So, where -mathematical transforms will always produce `timevectors` of the same length, +mathematical transforms always produce `timevectors` of the same length, compound transforms can produce larger or smaller `timevectors` as an output. #### Delta transforms @@ -317,7 +317,7 @@ without a previous value. #### Fill method transform The `fill_to()` transform ensures that there is a point at least every -`interval`, if there is not a point, it will fill in the point using the method +`interval`, if there is not a point, it fills in the point using the method provided. The `timevector` must be sorted before calling `fill_to()`. The available fill methods are: @@ -625,7 +625,7 @@ available counter aggregate functions are: |`intercept()`|The y-intercept of the least squares fit line of the adjusted counter value| |`irate_left()`/`irate_right()`|Computes the instantaneous rate of change between the second and first points (left) or last and next-to-last points (right)| |`num_changes()`|Number of times the counter changed values| -|`num_elements()`|Number of items - any with the exact same time will have been counted only once| +|`num_elements()`|Number of items - any with the exact same time have been counted only once| |`num_changes()`|Number of times the counter reset| |`slope()`|The slope of the least squares fit line of the adjusted counter value| |`with_bounds(range)`|Applies bounds using the `range` (a `TSTZRANGE`) to the `CounterSummary` if they weren't provided in the aggregation step| diff --git a/timescaledb/how-to-guides/hyperfunctions/time-weighted-averages.md b/timescaledb/how-to-guides/hyperfunctions/time-weighted-averages.md index 6ab3ee3610bb..fe4568aa3a10 100644 --- a/timescaledb/how-to-guides/hyperfunctions/time-weighted-averages.md +++ b/timescaledb/how-to-guides/hyperfunctions/time-weighted-averages.md @@ -8,7 +8,7 @@ using data that is not evenly sampled is not always useful. For example, if you have a lot of ice cream in freezers, you need to make sure the ice cream stays within a 0-10℉ (-20 to -12℃) temperature range. The temperature in the freezer can vary if folks are opening and closing the door, -but the ice cream will only have a problem if the temperature is out of range +but the ice cream only has a problem if the temperature is out of range for a long time. You can set your sensors in the freezer to sample every five minutes while the temperature is in range, and every 30 seconds while the temperature is out of range. If the results are generally stable, but with some diff --git a/timescaledb/how-to-guides/hypertables/best-practices.md b/timescaledb/how-to-guides/hypertables/best-practices.md index 2af3946b0d03..e03ce261346d 100644 --- a/timescaledb/how-to-guides/hypertables/best-practices.md +++ b/timescaledb/how-to-guides/hypertables/best-practices.md @@ -90,7 +90,7 @@ following: - For each physical disk on a single instance, add a separate tablespace to the database. TimescaleDB actually allows you to add multiple tablespaces to a *single* hypertable (although under the -covers, each underlying chunk will be mapped by TimescaleDB to a +covers, each underlying chunk is mapped by TimescaleDB to a single tablespace / physical disk). - Configure a distributed hypertable that spreads inserts and queries diff --git a/timescaledb/how-to-guides/hypertables/distributed-hypertables.md b/timescaledb/how-to-guides/hypertables/distributed-hypertables.md index dae09161d559..0895a41a699c 100644 --- a/timescaledb/how-to-guides/hypertables/distributed-hypertables.md +++ b/timescaledb/how-to-guides/hypertables/distributed-hypertables.md @@ -146,7 +146,7 @@ the [`now()`][current_time] function to get the current transaction time. This function depends on the current timezone setting on each node. If the query includes a user-defined function (UDF) the access node assumes -that the function does not exist on the data nodes and therefore will not push +that the function does not exist on the data nodes and therefore does not push it down. TimescaleDB employs several optimizations to increase the likelihood of being diff --git a/timescaledb/how-to-guides/ingest-data.md b/timescaledb/how-to-guides/ingest-data.md index 7ea05cb278a2..8c6d07e8cd7d 100644 --- a/timescaledb/how-to-guides/ingest-data.md +++ b/timescaledb/how-to-guides/ingest-data.md @@ -70,7 +70,7 @@ The connector is designed to work with [Kafka Connect][kafka-connect] and to be deployed to a Kafka Connect runtime service. It's purpose is to ingest change events from PostgreSQL databases (i.e. TimescaleDB). -The deployed connector will monitor one or more schemas within a TimescaleDB +The deployed connector monitors one or more schemas within a TimescaleDB server and write all change events to Kafka topics, which can be independently consumed by one or more clients. Kafka Connect can be distributed to provide fault tolerance to ensure the connectors are running and continually keeping diff --git a/timescaledb/how-to-guides/install-timescaledb/installation-apt-debian.md b/timescaledb/how-to-guides/install-timescaledb/installation-apt-debian.md index 54b4cbc6ca13..81f0c3629d4a 100644 --- a/timescaledb/how-to-guides/install-timescaledb/installation-apt-debian.md +++ b/timescaledb/how-to-guides/install-timescaledb/installation-apt-debian.md @@ -1,6 +1,6 @@ ## apt Installation (Debian) [](installation-apt-debian) -This will install TimescaleDB via `apt` on Debian distros. +This installs TimescaleDB via `apt` on Debian distros. **Note: TimescaleDB requires PostgreSQL 12, 13, or 14.** @@ -12,7 +12,7 @@ This will install TimescaleDB via `apt` on Debian distros. If you have another PostgreSQL installation not via `apt`, -this will likely cause problems. +this could cause problems. If you wish to maintain your current version of PostgreSQL outside of `apt`, we recommend installing from source. Otherwise, please be @@ -35,8 +35,8 @@ sudo apt install timescaledb-2-postgresql-13 ``` #### Upgrading from TimescaleDB 1.x -If you are upgrading from TimescaleDB 1.x, the `apt` package will first -uninstall the previous version of TimescaleDB and then install the latest TimescaleDB 2.0 +If you are upgrading from TimescaleDB 1.x, the `apt` package first +uninstalls the previous version of TimescaleDB and then install the latest TimescaleDB 2.0 binaries. The feedback in your terminal should look similar to the following: ```bash @@ -60,7 +60,7 @@ EXTENSION update as discussed in [Updating Timescale to 2.0][update-tsdb-2]. #### Configure your database There are a [variety of settings that can be configured][config] for your -new database. At a minimum, you will need to update your `postgresql.conf` +new database. At a minimum, you need to update your `postgresql.conf` file to include `shared_preload_libraries = 'timescaledb'`. The easiest way to get started is to run `timescaledb-tune`, which is installed by default when using `apt`: @@ -69,7 +69,7 @@ sudo apt install timescaledb-tools sudo timescaledb-tune ``` -This will ensure that our extension is properly added to the parameter +This ensures that our extension is properly added to the parameter `shared_preload_libraries` as well as offer suggestions for tuning memory, parallelism, and other settings. diff --git a/timescaledb/how-to-guides/install-timescaledb/installation-apt-ubuntu.md b/timescaledb/how-to-guides/install-timescaledb/installation-apt-ubuntu.md index cd555a16208b..12f6900641de 100644 --- a/timescaledb/how-to-guides/install-timescaledb/installation-apt-ubuntu.md +++ b/timescaledb/how-to-guides/install-timescaledb/installation-apt-ubuntu.md @@ -1,6 +1,6 @@ ## apt Installation (Ubuntu) [](installation-apt-ubuntu) -This will install TimescaleDB via `apt` on Ubuntu distros. +This installs TimescaleDB via `apt` on Ubuntu distros. **Note: TimescaleDB requires PostgreSQL 12, 13, or 14.** @@ -14,7 +14,7 @@ non-obsolete releases. If you have another PostgreSQL installation not via `apt`, -this will likely cause problems. +this could cause problems. If you wish to maintain your current version of PostgreSQL outside of `apt`, we recommend installing from source. Otherwise, please be @@ -43,8 +43,8 @@ new Ubuntu package repository for the release of TimescaleDB 2.4. #### Upgrading from TimescaleDB 1.x -If you are upgrading from TimescaleDB 1.x, the `apt` package will first -uninstall the previous version of TimescaleDB and then install the latest TimescaleDB 2.0 +If you are upgrading from TimescaleDB 1.x, the `apt` package first +uninstalls the previous version of TimescaleDB and then install the latest TimescaleDB 2.0 binaries. The feedback in your terminal should look similar to the following: ```bash @@ -69,7 +69,7 @@ EXTENSION update as discussed in [Updating Timescale to 2.0][update-tsdb-2]. #### Configure your database There are a [variety of settings that can be configured][config] for your -new database. At a minimum, you will need to update your `postgresql.conf` +new database. At a minimum, you need to update your `postgresql.conf` file to include `shared_preload_libraries = 'timescaledb'`. The easiest way to get started is to run `timescaledb-tune`, which is installed by default when using `apt`: @@ -77,7 +77,7 @@ installed by default when using `apt`: sudo timescaledb-tune ``` -This will ensure that our extension is properly added to the parameter +This ensures that our extension is properly added to the parameter `shared_preload_libraries` as well as offer suggestions for tuning memory, parallelism, and other settings. diff --git a/timescaledb/how-to-guides/install-timescaledb/installation-docker.md b/timescaledb/how-to-guides/install-timescaledb/installation-docker.md index bf0596181e07..4ff497f53e92 100644 --- a/timescaledb/how-to-guides/install-timescaledb/installation-docker.md +++ b/timescaledb/how-to-guides/install-timescaledb/installation-docker.md @@ -15,11 +15,11 @@ docker run -d --name timescaledb -p 5432:5432 -e POSTGRES_PASSWORD=password time The -p flag binds the container port to the host port, meaning -anything that can access the host port will be able to access your TimescaleDB +anything that can access the host port can access your TimescaleDB container. This can be particularly dangerous if you do not set a PostgreSQL password at runtime using the `POSTGRES_PASSWORD` environment variable as we -do in the above command. Without that variable, the Docker container will -disable password checks for all database users. If you want to access the +do in the above command. Without that variable, the Docker container +disables password checks for all database users. If you want to access the container from the host but avoid exposing it to the outside world, you can explicitly have it bind to 127.0.0.1 instead of the public interface by using `-p 127.0.0.1:5432:5432`. @@ -28,7 +28,7 @@ Otherwise, you'll want to ensure that your host box is adequately locked down through security groups, IP Tables, or whatever you're using for access control. Note also that Docker binds the container by modifying your Linux IP Tables. For systems that use Linux UFW (Uncomplicated Firewall) for security -rules, this means that Docker will potentially override any UFW settings that +rules, this means that Docker potentially overrides any UFW settings that restrict the port you are binding to. If you are relying on UFW rules for network security, consider adding `DOCKER_OPTS="--iptables=false"` to `/etc/default/docker` to prevent Docker from overwriting IP Tables. @@ -51,7 +51,7 @@ docker exec -it timescaledb psql -U postgres Our Docker image is derived from the [official PostgreSQL image][official-image] and includes [alpine Linux][] as its OS. -While the above `run` command will pull the Docker image on demand, +While the above `run` command pulls the Docker image on demand, you can also -- and for upgrades, **need to** -- explicitly pull our image from [Docker Hub][]: @@ -70,7 +70,7 @@ volume should be stored/mounted via the `-v` flag. In particular, the above `docker run` command should now include some additional argument such as `-v /your/data/dir:/var/lib/postgresql/data`. -Note that creating a new container (`docker run`) will also create a new +Note that creating a new container (`docker run`) also creates a new volume unless an existing data volume is reused by reference via the -v parameter (e.g., `-v VOLUME_ID:/var/lib/postgresql/data`). Existing containers can be stopped (`docker stop`) and started again (`docker @@ -92,4 +92,3 @@ code, you should pull the tag `latest-pg12-oss` as an example. [official-image]: https://github.com/docker-library/postgres/ [alpine Linux]: https://alpinelinux.org/ [docker-data-volumes]: https://docs.docker.com/storage/volumes/ - diff --git a/timescaledb/how-to-guides/install-timescaledb/installation-grafana.md b/timescaledb/how-to-guides/install-timescaledb/installation-grafana.md index d3d8bf58decd..b829b2dd37e9 100644 --- a/timescaledb/how-to-guides/install-timescaledb/installation-grafana.md +++ b/timescaledb/how-to-guides/install-timescaledb/installation-grafana.md @@ -2,7 +2,7 @@ ### Prerequisites -You will need to [setup an instance of TimescaleDB][install-timescale]. +You need to [setup an instance of TimescaleDB][install-timescale]. ### Options for installing Grafana @@ -40,14 +40,14 @@ your TimescaleDB instance. ### Enable TimescaleDB within Grafana -Since we will be connecting to a TimescaleDB instance for this -tutorial, we will also want to check the option for 'TimescaleDB' in the +Since we are connecting to a TimescaleDB instance for this +tutorial, you also need to check the option for 'TimescaleDB' in the 'PostgreSQL details' section of the PostgreSQL configuration screen. ### Wrapping up You should also change the 'Name' of the database to something descriptive. This is -optional, but will inform others who use your Grafana dashboard what this data source +optional, but informs others who use your Grafana dashboard what this data source contains. Once done, click 'Save & Test'. You should receive confirmation that your database diff --git a/timescaledb/how-to-guides/install-timescaledb/installation-homebrew.md b/timescaledb/how-to-guides/install-timescaledb/installation-homebrew.md index 0b43a9811b85..01f59ec25472 100644 --- a/timescaledb/how-to-guides/install-timescaledb/installation-homebrew.md +++ b/timescaledb/how-to-guides/install-timescaledb/installation-homebrew.md @@ -1,6 +1,6 @@ ## Homebrew [](homebrew) -This will install both TimescaleDB *and* PostgreSQL via Homebrew. +This installs both TimescaleDB *and* PostgreSQL via Homebrew. **Note: TimescaleDB requires PostgreSQL 12, 13, or 14.** @@ -12,7 +12,7 @@ This will install both TimescaleDB *and* PostgreSQL via Homebrew. If you have another PostgreSQL installation -(such as through Postgres.app), the following instructions will +(such as through Postgres.app), the following instructions could cause problems. If you wish to maintain your current version of PostgreSQL outside of Homebrew we recommend installing from source. Otherwise please be sure to remove non-Homebrew installations before using this method. @@ -32,7 +32,7 @@ brew install timescaledb #### Configure your database There are a [variety of settings that can be configured][config] for your -new database. At a minimum, you will need to update your `postgresql.conf` +new database. At a minimum, you need to update your `postgresql.conf` file to include `shared_preload_libraries = 'timescaledb'`. The easiest way to get started is to run `timescaledb-tune`, which is installed as a dependency when you install via Homebrew: @@ -40,7 +40,7 @@ installed as a dependency when you install via Homebrew: timescaledb-tune ``` -This will ensure that our extension is properly added to the parameter +This ensures that our extension is properly added to the parameter `shared_preload_libraries` as well as offer suggestions for tuning memory, parallelism, and other settings. diff --git a/timescaledb/how-to-guides/install-timescaledb/installation-source-windows.md b/timescaledb/how-to-guides/install-timescaledb/installation-source-windows.md index 02d0bcbf503c..66fddc749853 100644 --- a/timescaledb/how-to-guides/install-timescaledb/installation-source-windows.md +++ b/timescaledb/how-to-guides/install-timescaledb/installation-source-windows.md @@ -24,7 +24,7 @@ git checkout # e.g., git checkout 2.5.0 ``` If you are using Visual Studio 2017 with the CMake and Git components, -you should be able to open the folder in Visual Studio, which will take +you should be able to open the folder in Visual Studio, which takes care of the rest. If you are using an earlier version of Visual Studio: @@ -49,7 +49,7 @@ cmake --build ./build --config Release --target install #### Update `postgresql.conf` -You will need to edit your `postgresql.conf` file to include +You need to edit your `postgresql.conf` file to include the TimescaleDB library, and then restart PostgreSQL. First, locate your `postgresql.conf` file: diff --git a/timescaledb/how-to-guides/install-timescaledb/installation-source.md b/timescaledb/how-to-guides/install-timescaledb/installation-source.md index d26554764085..6a00285ebd92 100644 --- a/timescaledb/how-to-guides/install-timescaledb/installation-source.md +++ b/timescaledb/how-to-guides/install-timescaledb/installation-source.md @@ -39,7 +39,7 @@ installed with. #### Update `postgresql.conf` -You will need to edit your `postgresql.conf` file to include +You need to edit your `postgresql.conf` file to include the TimescaleDB library, and then restart PostgreSQL. First, locate your `postgresql.conf` file: diff --git a/timescaledb/how-to-guides/install-timescaledb/installation-ubuntu-ami.md b/timescaledb/how-to-guides/install-timescaledb/installation-ubuntu-ami.md index 4091b629cdfb..d9a97546ed06 100644 --- a/timescaledb/how-to-guides/install-timescaledb/installation-ubuntu-ami.md +++ b/timescaledb/how-to-guides/install-timescaledb/installation-ubuntu-ami.md @@ -28,9 +28,9 @@ To launch the AMI, go to the `AMIs` section of your AWS EC2 Dashboard run the fo You can also use the image id to build an instance using Cloudformation, Terraform, the AWS CLI, or any other AWS deployment tool that supports building from public AMIs. -TimescaleDB is installed on the AMI, but you will still need to follow the steps for +TimescaleDB is installed on the AMI, but you still need to follow the steps for initializing a database with the TimescaleDB extension. See our [setup][] section for details. -Depending on your user/permission needs, you will also need to set up a postgres superuser for your +Depending on your user/permission needs, you also need to set up a postgres superuser for your database by following these [postgres instructions][]. Another possibility is using the operating system's `ubuntu` user and modifying the [pg_hba][]. diff --git a/timescaledb/how-to-guides/install-timescaledb/installation-windows.md b/timescaledb/how-to-guides/install-timescaledb/installation-windows.md index e48eb60a75de..4f58245ff8af 100644 --- a/timescaledb/how-to-guides/install-timescaledb/installation-windows.md +++ b/timescaledb/how-to-guides/install-timescaledb/installation-windows.md @@ -14,7 +14,7 @@ 1. Download the the .zip file for your PostgreSQL version - [12][windows-dl-12] or [13][windows-dl-13] or [14][windows-dl-14]. 1. Extract the zip file locally 1. Run `setup.exe`, making sure that PostgreSQL is not currently running -1. If successful, a `cmd.exe` window will pop open and you will see the following: +1. If successful, a `cmd.exe` window pops open and you see the following: ```bash TimescaleDB installation completed succesfully. @@ -29,12 +29,12 @@ in your database as discussed in [Updating Timescale to 2.0][update-tsdb-2]. #### Configure your database There are a [variety of settings that can be configured][config] for your -new database. At a minimum, you will need to update your `postgresql.conf` +new database. At a minimum, you need to update your `postgresql.conf` file to include `shared_preload_libraries = 'timescaledb'`. If you ran `timescaledb-tune` during the install, you are already done. If you did not, you can re-run the installer. -This will ensure that our extension is properly added to the parameter +This ensures that our extension is properly added to the parameter `shared_preload_libraries` as well as offer suggestions for tuning memory, parallelism, and other settings. diff --git a/timescaledb/how-to-guides/install-timescaledb/installation-yum.md b/timescaledb/how-to-guides/install-timescaledb/installation-yum.md index 0785a917fd6d..6e0b1050fabc 100644 --- a/timescaledb/how-to-guides/install-timescaledb/installation-yum.md +++ b/timescaledb/how-to-guides/install-timescaledb/installation-yum.md @@ -1,6 +1,6 @@ ## yum Installation [](installation-yum) -This will install both TimescaleDB *and* PostgreSQL via `yum` +This installs both TimescaleDB *and* PostgreSQL via `yum` (or `dnf` on Fedora). **Note: TimescaleDB requires PostgreSQL 12, 13, or 14.** @@ -13,7 +13,7 @@ This will install both TimescaleDB *and* PostgreSQL via `yum` If you have another PostgreSQL installation not -via `yum`, this will likely cause problems. +via `yum`, this could cause problems. If you wish to maintain your current version of PostgreSQL outside of `yum`, we recommend installing from source. Otherwise please be sure to remove non-`yum` installations before using this method. @@ -28,7 +28,7 @@ sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-$( ``` Add TimescaleDB's third party repository and install TimescaleDB, -which will download any dependencies it needs from the PostgreSQL repo: +which downloads any dependencies it needs from the PostgreSQL repo: ```bash # Add timescaledb repo sudo tee /etc/yum.repos.d/timescale_timescaledb.repo < During your managed service for TimescaleDB trial, you have up to $300 USD in credits to use. -This will be sufficient to complete all our tutorials and run a few test projects. +This is sufficient to complete all our tutorials and run a few test projects. If you're interested in learning more about pricing of managed service for TimescaleDB, visit the [managed service for TimescaleDB pricing calculator][timescale-pricing]. Or, [contact us][contact] -and we will be happy to walk you through the best managed service for TimescaleDB configuration +and we are happy to walk you through the best managed service for TimescaleDB configuration for your use cases. Once you've selected your service options, click `Create Service`. -It will take a few minutes for your service to provision in your cloud. Now is +It takes a few minutes for your service to provision in your cloud. Now is a good time to familiarize yourself with some of the [features of TimescaleDB][using-timescale] and our [getting started tutorial][getting-started]. @@ -63,7 +63,7 @@ utility for configuring and maintaining PostgreSQL. We recommend ### Step 4: Connect to your database using psql -You will see a green `Running` label and a green dot under the "Nodes" column when +You can see a green `Running` label and a green dot under the "Nodes" column when your instance is ready for use. Once your instance is ready, navigate to the ‘Overview Tab’ of your Timescale @@ -121,7 +121,7 @@ TimescaleDB provides the ability to configure, in a fine-grained manner, the set of source IP addresses and ranges, as well as connection ports, that can access your managed service for TimescaleDB services. -This tutorial will walk you through how to configure this capability. +This tutorial walks you through how to configure this capability. #### Before you start @@ -132,8 +132,8 @@ get signed up and create your first database instance. Once you have a database instance setup in the [managed service for TimescaleDB portal][mst-portal], browse to this service and click on the 'Overview' tab. In the 'Connection Information' -section, you will see the port number that is used for database connections. This is -the port we will protect by managing inbound access. +section, you can see the port number that is used for database connections. This is +the port we are protecting by managing inbound access. Timescale Cloud Overview tab @@ -143,8 +143,8 @@ Scroll down to find the 'Allowed IP Addresses' section. By default, this value i `0.0.0.0/0` which is actually wide-open. - This wide-open setting simplifies getting started since it will accept incoming - traffic from all sources, but you will absolutely want to narrow this range. + This wide-open setting simplifies getting started since it accepts incoming + traffic from all sources, but you absolutely want to narrow this range. If you are curious about how to interpret this [Classless Inter-Domain Routing][cidr-wiki] (CIDR) syntax, @@ -154,9 +154,9 @@ check out [this great online tool][cidr-tool] to help decipher CIDR. #### Step 3 - Change the allowed IP addresses section -Click 'Change' and adjust the CIDR value based on where your source traffic will come from. -For example, entering a value of `192.168.1.15/32` will ONLY allow incoming traffic from a -source IP of `192.168.1.15` and will deny all other traffic. +Click 'Change' and adjust the CIDR value based on where your source traffic comes from. +For example, entering a value of `192.168.1.15/32` ONLY allows incoming traffic from a +source IP of `192.168.1.15` and denies all other traffic. #### Step 4 - Save your changes Click 'Save Changes' and see this take effect immediately. @@ -180,4 +180,3 @@ visit the [managed service for TimescaleDB Knowledge Base][mst-kb]. [using-timescale]: /overview/core-concepts/ [hello-timescale]: /tutorials/tutorial-hello-timescale [install-psql]: /how-to-guides/connecting/psql/ - diff --git a/timescaledb/how-to-guides/migrate-data/different-db.md b/timescaledb/how-to-guides/migrate-data/different-db.md index b0d95240c0d1..b0171ac74ca5 100644 --- a/timescaledb/how-to-guides/migrate-data/different-db.md +++ b/timescaledb/how-to-guides/migrate-data/different-db.md @@ -11,7 +11,7 @@ need `pg_dump` for exporting your schema and data. Migration falls into three main steps: -1. Copy over the database schema and choose which tables will become +1. Copy over the database schema and choose which tables become hypertables (i.e., those that currently have time-series data). 1. Backup data to comma-separated values (CSV). 1. Import the data into TimescaleDB diff --git a/timescaledb/how-to-guides/migrate-data/migrate-influxdb.md b/timescaledb/how-to-guides/migrate-data/migrate-influxdb.md index bdc6946250db..a3eeaabb3e44 100644 --- a/timescaledb/how-to-guides/migrate-data/migrate-influxdb.md +++ b/timescaledb/how-to-guides/migrate-data/migrate-influxdb.md @@ -8,7 +8,7 @@ It's easy to use, configurable, and most importantly, it's fast. ### Before we start -Before we start, you will need the following setup: +Before we start, you need the following setup: * A running instance of InfluxDB at a known location and a means to connect to it * [TimescaleDB installed][getting-started] and a means to connect to it * And if you need to import some sample data, the [InfluxDB Command Line Interface][influx-cmd] @@ -37,7 +37,7 @@ Available Commands: with the discovered schema ``` -You will see the help output for Outflux, a brief explanation of what it can do, the usage, and available commands. +You see the help output for Outflux, a brief explanation of what it can do, the usage, and available commands. For instructions on how to set up Outflux from source you can visit the [README][outflux-readme]. @@ -45,7 +45,7 @@ For instructions on how to set up Outflux from source you can visit the [README] If you don't already have an existing InfluxDB database, you can try Outflux by importing this example file with data written in the Influx Line Protocol found at https://timescaledata.blob.core.windows.net/datasets/outflux_taxi.txt -You can use the Influx CLI client to load the data. The file will first create the "outflux_tutorial" database and then do 2741 inserts. +You can use the Influx CLI client to load the data. The file first creates the "outflux_tutorial" database and then do 2741 inserts. ``` $ influx -import -path=outflux_taxit.txt -database=outflux_tutorial @@ -54,7 +54,7 @@ $ influx -import -path=outflux_taxit.txt -database=outflux_tutorial 2019/03/27 11:39:11 Failed 0 inserts ``` -The data in the file is without a timestamp so the current time of the Influx server will be used at the time of insert. +The data in the file is without a timestamp so the current time of the Influx server is used at the time of insert. All the data points belong to one measurement `taxi`. The points are tagged with location, rating, and vendor. Four fields are recorded: fare, mta_tax, tip, and tolls. The `influx` client assumes the server is available at `http://localhost:8086` by default. @@ -65,7 +65,7 @@ One of Outflux's features is the ability to discover the schema of an InfluxDB m We can now create a TimescaleDB hypertable ready to receive the demo data we inserted into the InfluxDB instance. If you followed the tutorial and inserted the data from the example, there should be a `taxi` measurement in the `outflux_tutorial` database in the InfluxDB instance. The `schema-transfer` command of Outflux can work with 4 schema strategies: -* `ValidateOnly`: checks if the TimescaleDB extension is installed, a specified database has a hypertable with the proper columns, and if it's partitioned properly, but will not perform modifications +* `ValidateOnly`: checks if the TimescaleDB extension is installed, a specified database has a hypertable with the proper columns, and if it's partitioned properly, but does not perform modifications * `CreateIfMissing`: runs all checks that `ValidateOnly` does and creates and properly partitions any missing hypertables * `DropAndCreate`: drops any existing table with the same name as the measurement, and creates a new hypertable and partitions it properly * `DropCascadeAndCreate`: performs the same action as DropAndCreate with the additional strength of executing a cascade table drop if there is an existing table with the same name as the measurement @@ -81,7 +81,7 @@ $ outflux schema-transfer outflux_tutorial taxi \ ``` The `schema-transfer` command is executed by specifying the database (`outflux_tutorial`) and then the measurements (`taxi`). -If no measurements are specified, all measurements in a database will be transferred. +If no measurements are specified, all measurements in a database are transferred. The location of the InfluxDB server is specified with the `--input-server` flag. The target database and other connection options are specified with the `--output-conn` flag. Here we're using the `postgres` user and database to connect to our server. @@ -132,9 +132,9 @@ $ outflux migrate outflux_tutorial taxi \ --schema-strategy=DropAndCreate ``` -Here we're using the DropAndCreate strategy that will drop any previous table named `cpu` and create it before piping the data. +Here we're using the DropAndCreate strategy that drops any previous table named `cpu` and create it before piping the data. The migrate command supports several flags that offer the user flexibility in the selection of data to be migrated. -One of them is the `--limit` flag that will only export the first N rows from the InfluxDB database ordered by time. +One of them is the `--limit` flag that only exports the first N rows from the InfluxDB database ordered by time. The output of the migrate command with a N=10 limit should look like this: ``` diff --git a/timescaledb/how-to-guides/psql-basics.md b/timescaledb/how-to-guides/psql-basics.md index 888521f4dea8..a200913c4bc5 100644 --- a/timescaledb/how-to-guides/psql-basics.md +++ b/timescaledb/how-to-guides/psql-basics.md @@ -2,7 +2,7 @@ Psql is the terminal-based front end to PostgreSQL, and is the primary tool used to communicate with your TimescaleDB instances. Below is a refresher on some essential -psql commands that you may come across in our documentation, and will find useful +psql commands that you may come across in our documentation, and could find useful as you explore PostgreSQL and TimescaleDB. For an in-depth breakdown of all commands, visit [psql's documentation](https://www.postgresql.org/docs/13/app-psql.html). diff --git a/timescaledb/how-to-guides/query-data/advanced-analytic-queries.md b/timescaledb/how-to-guides/query-data/advanced-analytic-queries.md index e00c0933e31e..7aae44b59404 100644 --- a/timescaledb/how-to-guides/query-data/advanced-analytic-queries.md +++ b/timescaledb/how-to-guides/query-data/advanced-analytic-queries.md @@ -332,7 +332,7 @@ CREATE TRIGGER create_vehicle_trigger ``` You could also implement this functionality without a separate metadata table by performing a [loose index scan][loose-index-scan] over the `location` -hypertable, although this will require more compute resources. +hypertable, although this requires more compute resources. [percentile_cont]: https://www.postgresql.org/docs/current/static/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE [toolkit-approx-percentile]: /api/:currentVersion:/hyperfunctions/percentile-approximation/ diff --git a/timescaledb/how-to-guides/replication-and-ha/replication.md b/timescaledb/how-to-guides/replication-and-ha/replication.md index 88c59c0f5884..9bf9b6fac5e2 100644 --- a/timescaledb/how-to-guides/replication-and-ha/replication.md +++ b/timescaledb/how-to-guides/replication-and-ha/replication.md @@ -7,7 +7,7 @@ requires schema synchronization between the primary and replica nodes and replicating partition root tables, which are [not currently supported][postgres-partition-limitations]. -This tutorial will outline the basic configuration needed to set up streaming +This tutorial outline the basic configuration needed to set up streaming replication on one or more replicas, covering both synchronous and asynchronous options. It assumes you have at least two separate instances of TimescaleDB running. If you're using our [Docker Image][timescale-docker], we recommend @@ -26,7 +26,7 @@ see their [WAL Documentation](https://www.postgresql.org/docs/current/static/wal ## Configure the primary database Create a PostgreSQL user with a role that allows it to initialize streaming -replication. This will be the user each replica uses to stream from the primary +replication. This is the user each replica uses to stream from the primary database. Run the command as the `postgres` user, or another user that is configured with superuser privileges on the database you're working with. @@ -66,7 +66,7 @@ you intend to have. streaming replication. * `max_replication_slots` - The total number of replication slots the primary database can support. See below for more information about replication slots. -* `listen_address` - Since remote replicas will be connecting to the primary to +* `listen_address` - Since remote replicas are connecting to the primary to stream the WAL, we'll need to make sure that the primary is not just listening on the local loopback. @@ -90,11 +90,11 @@ max_replication_slots = 1 synchronous_commit = off ``` -In this example, the WAL will be streamed to the replica, but the primary server -will not wait for confirmation that the WAL has been written to disk on either +In this example, the WAL is streamed to the replica, but the primary server +does not wait for confirmation that the WAL has been written to disk on either the primary or the replica. This is the most performant replication configuration, but it does carry the risk of a small amount of data loss in the -event of a system crash. It also makes no guarantees that the replica will be +event of a system crash. It also makes no guarantees that the replica is fully up to date with the primary, which could cause inconsistencies between read queries on the primary and the replica. @@ -131,9 +131,9 @@ host replication repuser /32 scram-sh ``` -The above settings will restrict replication connections to traffic coming +The above settings restrict replication connections to traffic coming from `REPLICATION_HOST_IP` as the PostgreSQL user `repuser` with a valid -password. `REPLICATION_HOST_IP` will be able to initiate streaming replication +password. `REPLICATION_HOST_IP` can initiate streaming replication from that machine without additional credentials. You may want to change the `address` and `method` values to match your security and network settings. Read more about `pg_hba.conf` in the [official documentation](https://www.postgresql.org/docs/current/static/auth-pg-hba-conf.html). @@ -148,7 +148,7 @@ by restoring the replica from a base backup of the primary instance. ### Create a base backup on the replica -Stop PostgreSQL. If the replica's PostgreSQL database already has data, you will +Stop PostgreSQL. If the replica's PostgreSQL database already has data, you need to remove it prior to running the backup. This can be done by removing the contents of the PostgreSQL data directory. To determine the location of the data directory, run `show data_directory;` in a `psql` shell. @@ -165,7 +165,7 @@ pg_basebackup -h -D -U repuser -vP -W ``` -The -W flag will prompt you for a password on the command line. This may +The -W flag prompts you for a password on the command line. This may cause problems for automated setups. If you are using password based authentication in an automated setup, you may need to make use of a [pgpass file](https://www.postgresql.org/docs/current/static/libpq-pgpass.html). @@ -225,7 +225,7 @@ LOG: database system is ready to accept read only connections LOG: started streaming WAL from primary at 0/3000000 on timeline 1 ``` -Any clients will be able to perform reads on the replica. Verify this +Any clients can perform reads on the replica. Verify this by running inserts, updates, or other modifications to your data on the primary and querying the replica to ensure they have been properly copied over. This is fully compatible with TimescaleDB's functionality, provided @@ -247,18 +247,18 @@ If `synchronous_standby_names` is empty, the settings `on`, `remote_apply`, transaction commits only wait for local flush to disk. -* `on` - Default value. The server will not return "success" until the WAL +* `on` - Default value. The server does not return "success" until the WAL transaction has been written to disk on the primary and any replicas. -* `off` - The server will return "success" when the WAL transaction has been +* `off` - The server returns "success" when the WAL transaction has been sent to the operating system to write to the WAL on disk on the primary, but - will not wait for the operating system to actually write it. This can cause + does not wait for the operating system to actually write it. This can cause a small amount of data loss if the server crashes when some data has not been - written, but it will not result in data corruption. Turning + written, but it does not result in data corruption. Turning `synchronous_commit` off is a well known PostgreSQL optimization for workloads that can withstand some data loss in the event of a system crash. * `local` - Enforces `on` behavior only on the primary server. -* `remote_write` - The database will return "success" to a client when the +* `remote_write` - The database returns "success" to a client when the WAL record has been sent to the operating system for writing to the WAL on the replicas, but before confirmation that the record has actually been persisted to disk. This is basically asynchronous commit except it waits @@ -267,7 +267,7 @@ transaction commits only wait for local flush to disk. * `remote_apply` - Requires confirmation that the WAL records have been written to the WAL *and* applied to the databases on all replicas. This provides the strongest consistency of any of the `synchronous_commit` - options. In this mode, replicas will always reflect the latest state of + options. In this mode, replicas always reflect the latest state of the primary, and the concept of replication lag (see [Replication Diagnostics](#view-replication-diagnostics)) is basically non-existent. @@ -283,31 +283,31 @@ Remote Apply | X | X | X | X | X An important complementary setting to `synchronous_commit` is `synchronous_standby_names`. This setting lists the names of all replicas the -primary database will support for synchronous replication, and configures *how* -the primary database will wait for them. The setting supports several +primary database supports for synchronous replication, and configures *how* +the primary database waits for them. The setting supports several different formats: -* `FIRST num_sync (replica_name_1, replica_name_2)` - This will wait for +* `FIRST num_sync (replica_name_1, replica_name_2)` - This waits for confirmation from the first `num_sync` replicas before returning "success". The list of replica_names determines the relative priority of the replicas. Replica names are determined by the `application_name` setting on the replicas. -* `ANY num_sync (replica_name_1, replica_name_2)` - This will wait for +* `ANY num_sync (replica_name_1, replica_name_2)` - This waits for confirmation from `num_sync` replicas in the provided list, regardless of their priority/position in the list. This is essentially a quorum function. -Any synchronous replication mode will force the primary to wait until all +Any synchronous replication mode forces the primary to wait until all required replicas have written the WAL or applied the database transaction, depending on the `synchronous_commit` level. This could cause the primary to hang indefinitely if a required replica crashes. When the replica -reconnects, it will replay any of the WAL it needs to catch up. Only then will +reconnects, it replays any of the WAL it needs to catch up. Only then does the primary be able to resume writes. To mitigate this, provision more than the amount of nodes required under the `synchronous_standby_names` setting and list -them in the `FIRST` or `ANY` clauses. This will allow the primary to move +them in the `FIRST` or `ANY` clauses. This allows the primary to move forward as long as a quorum of replicas have written the most recent WAL -transaction. Replicas that were out of service will be able to reconnect and +transaction. Replicas that were out of service is able to reconnect and replay the missed WAL transactions asynchronously. diff --git a/timescaledb/how-to-guides/schema-management/alter.md b/timescaledb/how-to-guides/schema-management/alter.md index a25e0bc83b00..6560bf48b024 100644 --- a/timescaledb/how-to-guides/schema-management/alter.md +++ b/timescaledb/how-to-guides/schema-management/alter.md @@ -6,8 +6,8 @@ underlying chunk. This change can be a potentially expensive operation if it requires a rewrite of the underlying data. However, a common modification is to add a field with a -default value of NULL (if no DEFAULT clause is specified, then the default will -be NULL); such a schema modification is inexpensive. More details can be found +default value of NULL (if no DEFAULT clause is specified, then the default is +NULL); such a schema modification is inexpensive. More details can be found in the Notes section of the [PostgreSQL documentation on ALTER TABLE][postgres-alter-table]. [postgres-alter-table]: https://www.postgresql.org/docs/current/static/sql-altertable.html diff --git a/timescaledb/how-to-guides/schema-management/constraints.md b/timescaledb/how-to-guides/schema-management/constraints.md index a58f6a983da5..668f593a79fc 100644 --- a/timescaledb/how-to-guides/schema-management/constraints.md +++ b/timescaledb/how-to-guides/schema-management/constraints.md @@ -3,7 +3,7 @@ Hypertables support all standard PostgreSQL constraint types, with the exception of foreign key constraints on other tables that reference values in a hypertable. Creating, deleting, or altering constraints on -hypertables will propagate to chunks, accounting also for any indexes +hypertables propagates to chunks, accounting also for any indexes associated with the constraints. For instance, a table can be created as follows: @@ -20,11 +20,11 @@ CREATE TABLE conditions ( SELECT create_hypertable('conditions', 'time'); ``` -This table will only allow positive device IDs, non-null temperature -readings, and will guarantee unique time values for each device. It +This table only allows positive device IDs, non-null temperature +readings, and guarantees unique time values for each device. It also references values in another `locations` table via a foreign key constraint. Note that time columns used for partitioning do not allow -`NULL` values by default. TimescaleDB will automatically add a `NOT +`NULL` values by default. TimescaleDB automatically adds a `NOT NULL` constraint to such columns if missing. For additional information on how to manage constraints, see the diff --git a/timescaledb/how-to-guides/schema-management/indexing.md b/timescaledb/how-to-guides/schema-management/indexing.md index 13015a147d16..bdb734038310 100644 --- a/timescaledb/how-to-guides/schema-management/indexing.md +++ b/timescaledb/how-to-guides/schema-management/indexing.md @@ -2,7 +2,7 @@ TimescaleDB supports the range of PostgreSQL index types, and creating, altering, or dropping an index on the hypertable ([PostgreSQL docs][postgres-createindex]) -will similarly be propagated to all its constituent chunks. +is similarly propagated to all its constituent chunks. Data is indexed via the SQL `CREATE INDEX` command. For instance, ```sql @@ -118,7 +118,7 @@ CREATE INDEX ON conditions (time DESC); Additionally, if the `create_hypertable` command specifies an optional "space partition" in addition to time (say, the `location` column), -TimescaleDB will automatically create the following index: +TimescaleDB automatically creates the following index: ```sql CREATE INDEX ON conditions (location, time DESC); @@ -127,4 +127,4 @@ CREATE INDEX ON conditions (location, time DESC); This default behavior can be overridden when executing the [`create_hypertable`][create_hypertable] command. -[create_hypertable]: /api/:currentVersion:/hypertable/create_hypertable/ \ No newline at end of file +[create_hypertable]: /api/:currentVersion:/hypertable/create_hypertable/ diff --git a/timescaledb/how-to-guides/schema-management/json.md b/timescaledb/how-to-guides/schema-management/json.md index 99317960b9ca..e22cdadfbe34 100644 --- a/timescaledb/how-to-guides/schema-management/json.md +++ b/timescaledb/how-to-guides/schema-management/json.md @@ -43,7 +43,7 @@ since it allows for more powerful queries: CREATE INDEX idxgin ON metrics USING GIN (data); ``` -Please note that this index will only optimize queries for which the WHERE clause +Please note that this index only optimizes queries for which the WHERE clause uses the `?`, `?&`, `?|`, or `@>` operator (for a description of these operators see [this table][json-operators] in the PostgreSQL docs). So you should make sure to structure your queries appropriately. diff --git a/timescaledb/how-to-guides/schema-management/tablespaces.md b/timescaledb/how-to-guides/schema-management/tablespaces.md index 117dabd8843f..ac3d18d956cc 100644 --- a/timescaledb/how-to-guides/schema-management/tablespaces.md +++ b/timescaledb/how-to-guides/schema-management/tablespaces.md @@ -25,12 +25,12 @@ of the dimensions is used to determine the tablespace assigned to a particular hypertable chunk. If a hypertable has one or more hash-partitioned ("space") dimensions, then the first hash-partitioned dimension is used. Otherwise, the first time dimension is used. This assignment -strategy ensures that hash-partitioned hypertables will have chunks +strategy ensures that hash-partitioned hypertables have chunks colocated according to hash partition, as long as the list of tablespaces attached to the hypertable remains the same. Modulo calculation is used to pick a tablespace, so there can be more partitions than tablespaces (e.g., if there are two tablespaces, partition number -three will use the first tablespace). +three uses the first tablespace). Note that attaching more tablespaces than there are partitions for the @@ -39,8 +39,8 @@ or additional partitions are added. This is especially true for hash-partitioned tables. -Hypertables that are only time-partitioned will add new -partitions continuously, and will therefore have chunks assigned to +Hypertables that are only time-partitioned add new +partitions continuously, and therefore have chunks assigned to tablespaces in a way similar to round-robin. diff --git a/timescaledb/how-to-guides/schema-management/triggers.md b/timescaledb/how-to-guides/schema-management/triggers.md index 2e97ad3939d7..12201d739224 100644 --- a/timescaledb/how-to-guides/schema-management/triggers.md +++ b/timescaledb/how-to-guides/schema-management/triggers.md @@ -1,8 +1,8 @@ ## Creating triggers TimescaleDB supports the full range of PostgreSQL triggers, and creating, -altering, or dropping triggers on the hypertable will similarly -propagate these changes to all of a hypertable's constituent chunks. +altering, or dropping triggers on the hypertable similarly +propagates these changes to all of a hypertable's constituent chunks. In the following example, let's say you want to create a new table `error_conditions` with the same schema as `conditions`, but designed @@ -11,7 +11,7 @@ signals a sensor error by sending a `temperature` or `humidity` having a value >= 1000. So, we'll take a two-step approach. First, let's create a function that -will insert data deemed erroneous into this second table: +inserts data deemed erroneous into this second table: ```sql CREATE OR REPLACE FUNCTION record_error() @@ -25,7 +25,7 @@ BEGIN END; $record_error$ LANGUAGE plpgsql; ``` -Second, create a trigger that will call this function whenever a new row is +Second, create a trigger that calls this function whenever a new row is inserted into the hypertable. ```sql diff --git a/timescaledb/how-to-guides/tooling.md b/timescaledb/how-to-guides/tooling.md index 594884246847..814dbab76710 100644 --- a/timescaledb/how-to-guides/tooling.md +++ b/timescaledb/how-to-guides/tooling.md @@ -8,7 +8,7 @@ We've created several open-source tools to help users make the most out of their `timescaledb-tune` is packaged along with our binary releases as a dependency, so if you installed one of our binary releases (including Docker), you should have access to the tool. Alternatively, with a standard Go environment, you can `go get` the repository to install it. -The tool will first analyze the existing `postgresql.conf` file to ensure that the TimescaleDB extension is appropriately installed, and then it will provide recommendations for memory, parallelism, WAL, and other settings. These changes are written to your `postgresql.conf` and will take effect on the next (re)start. If you are starting on fresh instance and don't feel the need to approve each group of changes, you can automatically accept and append the suggestions to the end of your `postgresql.conf`. +The tool first analyzes the existing `postgresql.conf` file to ensure that the TimescaleDB extension is appropriately installed, and then it provides recommendations for memory, parallelism, WAL, and other settings. These changes are written to your `postgresql.conf` and take effect on the next (re)start. If you are starting on fresh instance and don't feel the need to approve each group of changes, you can automatically accept and append the suggestions to the end of your `postgresql.conf`. For more information on how to get started with `timescaledb-tune`, visit the [GitHub repo][github-tstune]. diff --git a/timescaledb/how-to-guides/update-timescaledb/index.md b/timescaledb/how-to-guides/update-timescaledb/index.md index a35b67ff51d4..301cedde516e 100644 --- a/timescaledb/how-to-guides/update-timescaledb/index.md +++ b/timescaledb/how-to-guides/update-timescaledb/index.md @@ -68,7 +68,7 @@ It must also be the first command you execute in the session. -This will upgrade TimescaleDB to the latest installed version, even if you +This upgrades TimescaleDB to the latest installed version, even if you are several versions behind. After executing the command, the psql `\dx` command should show the latest version: diff --git a/timescaledb/how-to-guides/update-timescaledb/update-timescaledb-1.md b/timescaledb/how-to-guides/update-timescaledb/update-timescaledb-1.md index 6780acb61037..9fa4c41f59b8 100644 --- a/timescaledb/how-to-guides/update-timescaledb/update-timescaledb-1.md +++ b/timescaledb/how-to-guides/update-timescaledb/update-timescaledb-1.md @@ -46,7 +46,7 @@ It must also be the first command you execute in the session. -This will upgrade TimescaleDB to the latest installed version, even if you +This upgrades TimescaleDB to the latest installed version, even if you are several versions behind. After executing the command, the psql `\dx` command should show the latest version: @@ -90,7 +90,7 @@ using you can run the following command: ```bash docker inspect timescaledb --format='{{range .Mounts }}{{.Type}}{{end}}' ``` -This command will return either `volume` or `bind`, corresponding +This command returns either `volume` or `bind`, corresponding to the two options below. 1. [Volumes][volumes] -- to get the current volume name use: @@ -116,7 +116,7 @@ docker rm timescaledb #### Step 4: Start new container [](update-docker-4) Launch a new container with the updated docker image, but pointing to -the existing mount point. This will again differ by mount type. +the existing mount point. This again differs by mount type. 1. For volume mounts you can use: ```bash diff --git a/timescaledb/how-to-guides/update-timescaledb/update-timescaledb-2.md b/timescaledb/how-to-guides/update-timescaledb/update-timescaledb-2.md index fe442821dc64..35604a5aa43f 100644 --- a/timescaledb/how-to-guides/update-timescaledb/update-timescaledb-2.md +++ b/timescaledb/how-to-guides/update-timescaledb/update-timescaledb-2.md @@ -40,7 +40,7 @@ TimescaleDB 2.2, upgrade in this order: ### Notice of breaking changes from TimescaleDB 1.3+ TimescaleDB 2.0 supports **in-place updates** just like previous releases. During -the update, scripts will automatically configure updated features to work as expected +the update, scripts automatically configure updated features to work as expected with TimescaleDB 2.0. Because this is our first major version release in two years, however, we’re providing additional guidance @@ -74,7 +74,7 @@ Whenever possible, prefer the most recent supported version, PostgreSQL 12. Plea Before starting the upgrade to TimescaleDB 2.0, **we highly recommend checking the database log for errors related to failed retention policies that were occurring in TimescaleDB 1.x** and then either remove them or update them to be compatible with existing continuous aggregates. Any remaining retention policies that are -still incompatible with the `ignore_invalidation_older_than` setting will automatically be disabled during +still incompatible with the `ignore_invalidation_older_than` setting is automatically disabled during the upgrade and a notice provided. @@ -87,7 +87,7 @@ Read more about changes to continuous aggregates and data retension policies [he #### Step 1: Verify TimescaleDB 1.x policy settings (Optional) As discussed in the [Changes to TimescaleDB 2.0][changes-in-ts2] document, the APIs and setting names -that configure various policies are changing. The update process below will automatically configure +that configure various policies are changing. The update process below automatically configures new policies using your current configurations in TimescaleDB 1.x. If you would like to verify the policy settings after the update is complete, we suggest querying the informational views below and saving the output so that you can refer to it once the update is complete. @@ -137,7 +137,7 @@ triggering the load of a previous TimescaleDB version on session startup. It must also be the first command you execute in the session. -This will upgrade TimescaleDB to the latest installed version, even if you +This upgrades TimescaleDB to the latest installed version, even if you are several versions behind. After executing the command, the psql `\dx` command should show the latest version: diff --git a/timescaledb/how-to-guides/update-timescaledb/update-timescaledb.md b/timescaledb/how-to-guides/update-timescaledb/update-timescaledb.md index dc9e50079568..4bc972b1f400 100644 --- a/timescaledb/how-to-guides/update-timescaledb/update-timescaledb.md +++ b/timescaledb/how-to-guides/update-timescaledb/update-timescaledb.md @@ -2,7 +2,7 @@ Use these instructions to update TimescaleDB within the same major release version (for example, from TimescaleDB 2.1 to 2.2, or from 1.7 to 1.7.4). If you need to upgrade between -TimescaleDB 1.x and 2.x, see our [separate upgrade document][update-tsdb-2] +TimescaleDB 1.x and 2.x, see our [separate upgrade document][update-tsdb-2] for detailed instructions. ### TimescaleDB release compatibility @@ -45,7 +45,7 @@ It must also be the first command you execute in the session. -This will upgrade TimescaleDB to the latest installed version, even if you +This upgrades TimescaleDB to the latest installed version, even if you are several versions behind. After executing the command, the psql `\dx` command should show the latest version: diff --git a/timescaledb/how-to-guides/update-timescaledb/updating-docker.md b/timescaledb/how-to-guides/update-timescaledb/updating-docker.md index 16a0235d558b..ab4a61bf0830 100644 --- a/timescaledb/how-to-guides/update-timescaledb/updating-docker.md +++ b/timescaledb/how-to-guides/update-timescaledb/updating-docker.md @@ -28,7 +28,7 @@ using you can run the following command: ```bash docker inspect timescaledb --format='{{range .Mounts }}{{.Type}}{{end}}' ``` -This command will return either `volume` or `bind`, corresponding +This command returns either `volume` or `bind`, corresponding to the two options below. 1. [Volumes][volumes] -- to get the current volume name use: @@ -54,7 +54,7 @@ docker rm timescaledb #### Step 4: Start new container [](update-docker-4) Launch a new container with the updated docker image, but pointing to -the existing mount point. This will again differ by mount type. +the existing mount point. This again differs by mount type. 1. For volume mounts you can use: ```bash diff --git a/timescaledb/how-to-guides/update-timescaledb/upgrade-postgresql.md b/timescaledb/how-to-guides/update-timescaledb/upgrade-postgresql.md index cf68ff5a8f83..f6023126f59d 100644 --- a/timescaledb/how-to-guides/update-timescaledb/upgrade-postgresql.md +++ b/timescaledb/how-to-guides/update-timescaledb/upgrade-postgresql.md @@ -1,17 +1,16 @@ - # Upgrade PostgreSQL - -Each release of TimescaleDB is compatible with specific versions of PostgreSQL. Over time we will add support -for a newer version of PostgreSQL while simultaneously dropping support for an older versions. +Each release of TimescaleDB is compatible with specific versions of PostgreSQL. +Over time support is added for a newer version of PostgreSQL while +simultaneously dropping support for older versions. When the supported versions of PostgreSQL changes, you may need to upgrade the version of the **PostgreSQL instance** (e.g. from 10 to 12) before you can install the latest release of TimescaleDB. -To upgrade PostgreSQL, you have two choices, as outlined in the PostgreSQL online documentation. +To upgrade PostgreSQL, you have two choices, as outlined in the PostgreSQL online documentation. ### Use `pg_upgrade` [`pg_upgrade`][pg_upgrade] is a tool that avoids the need to dump all data and then import it -into a new instance of PostgreSQL after a new version is installed. Instead, `pg_upgrade` allows you to +into a new instance of PostgreSQL after a new version is installed. Instead, `pg_upgrade` allows you to retain the data files of your current PostgreSQL installation while binding the new PostgreSQL binary runtime to them. This is currently supported for all releases 8.4 and greater. @@ -20,7 +19,7 @@ runtime to them. This is currently supported for all releases 8.4 and greater. ``` ### Use `pg_dump` and `pg_restore` -When `pg_upgrade` is not an option, such as moving data to a new physical instance of PostgreSQL, using the +When `pg_upgrade` is not an option, such as moving data to a new physical instance of PostgreSQL, using the tried and true method of dumping all data in the database and then restoring into a database in the new instance is always supported with PostgreSQL and TimescaleDB. diff --git a/timescaledb/how-to-guides/user-defined-actions/create-and-register.md b/timescaledb/how-to-guides/user-defined-actions/create-and-register.md index 81999aec0770..20027320275a 100644 --- a/timescaledb/how-to-guides/user-defined-actions/create-and-register.md +++ b/timescaledb/how-to-guides/user-defined-actions/create-and-register.md @@ -25,7 +25,7 @@ as well as the schedule on which it is run. When registered, the action's `job_id` and `config` are stored in the TimescaleDB catalog. The `config` JSONB can be modified with [`alter_job`][api-alter_job]. -`job_id` and `config` will be passed as arguments when the procedure is +`job_id` and `config` are passed as arguments when the procedure is executed as background process or when expressly called with [`run_job`][api-run_job]. Register the created job with the automation framework. `add_job` returns the job_id diff --git a/timescaledb/how-to-guides/write-data/delete.md b/timescaledb/how-to-guides/write-data/delete.md index ee2ec243257e..c4a82290545b 100644 --- a/timescaledb/how-to-guides/write-data/delete.md +++ b/timescaledb/how-to-guides/write-data/delete.md @@ -1,7 +1,7 @@ # DELETE Data can be deleted from a hypertable using the standard `DELETE` SQL -command ([PostgreSQL docs][postgres-delete]), which will propagate +command ([PostgreSQL docs][postgres-delete]), which propagates down to the appropriate chunks that comprise the hypertable. ```sql @@ -26,4 +26,3 @@ For deleting old data, such as in the second example [postgres-delete]: https://www.postgresql.org/docs/current/static/sql-delete.html [postgres-vacuum]: https://www.postgresql.org/docs/current/static/sql-vacuum.html - diff --git a/timescaledb/how-to-guides/write-data/index.md b/timescaledb/how-to-guides/write-data/index.md index ba66f35eedd5..f20bd4fdf517 100644 --- a/timescaledb/how-to-guides/write-data/index.md +++ b/timescaledb/how-to-guides/write-data/index.md @@ -1,7 +1,7 @@ # Writing data If you are familiar with SQL, then the commands for writing to the database -will be familiar to you. TimescaleDB uses standard SQL commands for writing data, +should be familiar to you. TimescaleDB uses standard SQL commands for writing data, including INSERT, UPDATE, and DELETE as well as UPSERTs through ON CONFLICT statements; and it all works as expected with changes to hypertables propagating down to -individual chunks. \ No newline at end of file +individual chunks. diff --git a/timescaledb/how-to-guides/write-data/insert.md b/timescaledb/how-to-guides/write-data/insert.md index 2f6961d3d28f..37ee58115770 100644 --- a/timescaledb/how-to-guides/write-data/insert.md +++ b/timescaledb/how-to-guides/write-data/insert.md @@ -24,8 +24,8 @@ INSERT INTO conditions The rows that belong to a single batch INSERT command do **not** need to belong to the same chunk (by time interval or partitioning key). Upon receiving an `INSERT` command for multiple rows, the TimescaleDB -engine will determine which rows (sub-batches) belong to which chunks, -and will write them accordingly to each chunk in a single transaction. +engine determines which rows (sub-batches) belong to which chunks, +and writes them accordingly to each chunk in a single transaction. You can also specify that `INSERT` returns some or all of the inserted diff --git a/timescaledb/how-to-guides/write-data/update.md b/timescaledb/how-to-guides/write-data/update.md index 7cc2892ceb66..9cd84380c832 100644 --- a/timescaledb/how-to-guides/write-data/update.md +++ b/timescaledb/how-to-guides/write-data/update.md @@ -8,7 +8,7 @@ UPDATE conditions SET temperature = 70.2, humidity = 50.0 ``` An update command can touch many rows at once, i.e., the following -will modify all rows found in a 10-minute block of data. +modifies all rows found in a 10-minute block of data. ```sql UPDATE conditions SET temperature = temperature + 0.1 diff --git a/timescaledb/how-to-guides/write-data/upsert.md b/timescaledb/how-to-guides/write-data/upsert.md index a94fd6f1dbf7..0ebf891ff53d 100644 --- a/timescaledb/how-to-guides/write-data/upsert.md +++ b/timescaledb/how-to-guides/write-data/upsert.md @@ -13,7 +13,7 @@ index is created automatically when marking column(s) as `PRIMARY KEY` or with a `UNIQUE` constraint. Following the examples given above, an `INSERT` with an identical -timestamp and location as an existing row will succeed and create an +timestamp and location as an existing row succeeds and create an additional row in the database. If, however, the `conditions` table had been created with a UNIQUE @@ -30,8 +30,8 @@ CREATE TABLE conditions ( ); ``` -then the second attempt to insert to this same time will normally -return an error. +then the second attempt to insert to this same time normally +returns an error. The above `UNIQUE` statement during table creation internally is similar to: @@ -54,7 +54,7 @@ Indexes: Now, however, the `INSERT` command can specify that nothing be done on a conflict. This is particularly important when writing many rows as -one batch, as otherwise the entire transaction will fail (as opposed +one batch, as otherwise the entire transaction fails (as opposed to just skipping the row that conflicts). ```sql @@ -90,7 +90,7 @@ If the schema were to have an additional column like `device` that is used TimescaleDB does not yet support using `ON CONFLICT ON CONSTRAINT` with a named key (e.g., `conditions_time_location_idx`), but much of this functionality can be captured by specifying the same columns as above with - a unique index/constraint. This limitation will be removed in a future version. + a unique index/constraint. This limitation is expected to be removed in a future version. diff --git a/timescaledb/index.md b/timescaledb/index.md index 7bec4430d3cf..5a92b57d677b 100644 --- a/timescaledb/index.md +++ b/timescaledb/index.md @@ -21,7 +21,7 @@ current version of TimescaleDB that you should be aware of. ### Getting started -The **[Getting started tutorial][getting-started]** will lead you through your first experience with +The **[Getting started tutorial][getting-started]** leads you through your first experience with TimescaleDB by setting up hypertables, importing data, running queries, and exploring the key features that make TimescaleDB a pleasure to use. diff --git a/timescaledb/integrations/ingesting-data.md b/timescaledb/integrations/ingesting-data.md index 32b01583a8a1..67f74a8278d3 100644 --- a/timescaledb/integrations/ingesting-data.md +++ b/timescaledb/integrations/ingesting-data.md @@ -65,7 +65,7 @@ The connector is designed to work with [Kafka Connect][kafka-connect] and to be deployed to a Kafka Connect runtime service. It’s purpose is to ingest change events from PostgreSQL databases (i.e. TimescaleDB). -The deployed connector will monitor one or more schemas within a TimescaleDB +The deployed connector monitors one or more schemas within a TimescaleDB server and write all change events to Kafka topics, which can be independently consumed by one or more clients. Kafka Connect can be distributed to provide fault tolerance to ensure the connectors are running and continually keeping diff --git a/timescaledb/overview/core-concepts/chunks.md b/timescaledb/overview/core-concepts/chunks.md index 2350b41397c5..cefa7c72b00c 100644 --- a/timescaledb/overview/core-concepts/chunks.md +++ b/timescaledb/overview/core-concepts/chunks.md @@ -10,9 +10,9 @@ different days belong to different chunks. TimescaleDB creates these chunks automatically as rows are inserted into the database. If the timestamp of a newly-inserted row belongs to a day not yet -present in the database, TimescaleDB will create a new chunk corresponding to -that day as part of the INSERT process. Otherwise, TimescaleDB will -determine the existing chunk(s) to which the new row(s) belong, and +present in the database, TimescaleDB creates a new chunk corresponding to +that day as part of the INSERT process. Otherwise, TimescaleDB +determines the existing chunk(s) to which the new row(s) belong, and insert the rows into the corresponding chunks. The interval of a hypertable's partitioning can also be changed over time (e.g., to adapt to changing workload conditions, so in one example, a hypertable could initially create a new chunk @@ -27,10 +27,10 @@ We sometimes refer to hypertables partitioned by both time and this additional dimension as "time and space" partitions. This time-and-space partitioning is primarily used for *distributed hypertables*. -With such two-dimensional partitioning, each time interval will also be +With such two-dimensional partitioning, each time interval is also partitioned across multiple nodes comprising the distributed hypertables. In such cases, for the same hour, information about some portion of the -devices will be stored on each node. This allows multi-node TimescaleDB +devices are stored on each node. This allows multi-node TimescaleDB to parallelize inserts and queries for data during that time interval. [//]: # (Comment: We should include an image that shows a chunk picture of a @@ -43,7 +43,7 @@ A chunk includes constraints that specify and enforce its partitioning ranges, e.g., that the time interval of the chunk covers ['2020-07-01 00:00:00+00', '2020-07-02 00:00:00+00'), and all rows included in the chunk must have a time value within that -range. Any space partitions will be reflected as chunk constraints as well. +range. Any space partitions are reflected as chunk constraints as well. As these ranges and partitions are non-overlapping, all chunks in a hypertable are disjoint in their partitioning dimensional space. diff --git a/timescaledb/overview/core-concepts/compression.md b/timescaledb/overview/core-concepts/compression.md index d3b65cc67917..58239ae72102 100644 --- a/timescaledb/overview/core-concepts/compression.md +++ b/timescaledb/overview/core-concepts/compression.md @@ -9,19 +9,19 @@ historical, compressed data. Compression is powered by TimescaleDB’s built-in job scheduler framework. We leverage it to asynchronously convert individual chunks from an uncompressed row-based form to a compressed columnar form across a hypertable: Once a chunk -is old enough, the chunk will be transactionally converted from the row to columnar form. +is old enough, the chunk is transactionally converted from the row to columnar form. -With native compression, even though a single hypertable in TimescaleDB will -store data in both row and columnar forms, users don’t need to worry about -this: they will continue to see a standard row-based schema when querying data. +With native compression, even though a single hypertable in TimescaleDB +stores data in both row and columnar forms, users don’t need to worry about +this: they continue to see a standard row-based schema when querying data. This is similar to building a view on the decompressed columnar data. TimescaleDB enables this capability by both (1) transparently appending data stored in the standard row format with decompressed data from the columnar format, and (2) transparently decompressing individual columns from selected rows at query time. -During a query, uncompressed chunks will be processed normally, while data from -compressed chunks will first be decompressed and converted to a standard row +During a query, uncompressed chunks are processed normally, while data from +compressed chunks are first decompressed and converted to a standard row format at query time, before being appended or merged into other data. This approach is compatible with everything you expect from TimescaleDB, such as relational JOINs and analytical queries, as well as aggressive constraint exclusion diff --git a/timescaledb/overview/core-concepts/continuous-aggregates.md b/timescaledb/overview/core-concepts/continuous-aggregates.md index a88608fb6f86..a2b83f6ccb56 100644 --- a/timescaledb/overview/core-concepts/continuous-aggregates.md +++ b/timescaledb/overview/core-concepts/continuous-aggregates.md @@ -65,11 +65,11 @@ SELECT add_continuous_aggregate_policy('conditions_summary_hourly', schedule_interval => INTERVAL '1 h'); ``` -In this case, the continuous aggregate will be refreshed every hour +In this case, the continuous aggregate is refreshed every hour and refresh the last month's data. You can now run a normal `SELECT` on the continuous aggregate and it -will give you the aggregated data, for example, to select the hourly +gives you the aggregated data, for example, to select the hourly averages for device 1 during the first three months: ```sql @@ -91,7 +91,7 @@ support for this in a future version. ## Real-time aggregation [](real-time-aggregates) -A query on a continuous aggregate will, by default, use *real-time +A query on a continuous aggregate, by default, uses *real-time aggregation* (first introduced in TimescaleDB 1.7) to combine materialized aggregates with recent data from the source hypertable. By combining raw and materialized data in this way, diff --git a/timescaledb/overview/core-concepts/data-retention.md b/timescaledb/overview/core-concepts/data-retention.md index 3e51ad29e7a2..8c512f47197e 100644 --- a/timescaledb/overview/core-concepts/data-retention.md +++ b/timescaledb/overview/core-concepts/data-retention.md @@ -34,8 +34,8 @@ For example: SELECT drop_chunks('conditions', INTERVAL '24 hours'); ``` -This will drop all chunks from the hypertable `conditions` that _only_ -include data older than this duration, and will _not_ delete any +This drops all chunks from the hypertable `conditions` that _only_ +include data older than this duration, and does _not_ delete any individual rows of data in chunks. diff --git a/timescaledb/overview/core-concepts/distributed-hypertables.md b/timescaledb/overview/core-concepts/distributed-hypertables.md index 7330f4a73152..9beb8cbe596b 100644 --- a/timescaledb/overview/core-concepts/distributed-hypertables.md +++ b/timescaledb/overview/core-concepts/distributed-hypertables.md @@ -38,23 +38,23 @@ purposes, act just like a single instance of TimescaleDB from an operational per To ensure best performance, you should partition a distributed hypertable by both time and space. If you only partition data by -time, that chunk will have to fill up before the access node chooses +time, that chunk has to fill up before the access node chooses another data node to store the next chunk, so during that -chunk's time interval, all writes to the latest interval will be +chunk's time interval, all writes to the latest interval is handled by a single data node, rather than load balanced across all available data nodes. On the other hand, if you specify a space -partition, the access node will distribute chunks across multiple data +partition, the access node distributes chunks across multiple data nodes based on the space partition so that multiple chunks are created for a given chunk time interval, and both reads and writes to that -recent time interval will be load balanced across the cluster. +recent time interval are load balanced across the cluster. By default, we automatically set the number of space partitions equal to the -number of data nodes if a value is not specified. The system will also increase +number of data nodes if a value is not specified. The system also increases the number of space partitions, if necessary, when adding new data nodes. If setting manually, we recommend that the number of space partitions are equal or a multiple of the number of data nodes associated with the distributed hypertable for optimal data distribution across data nodes. In case of multiple -space partitions, only the first space partition will be used to determine +space partitions, only the first space partition is used to determine how chunks are distributed across servers. ## Scaling distributed hypertables @@ -63,12 +63,12 @@ As time-series data grows, a common use case is to add data nodes to expand the storage and compute capacity of distributed hypertables. Thus, TimescaleDB can be elastically scaled out by simply adding data nodes to a distributed database. -As mentioned earlier, TimescaleDB can (and will) adjust the number of space -partitions as new data nodes are added. Although existing chunks will not have -their space partitions updated, the new settings will be applied to newly +As mentioned earlier, TimescaleDB adjusts the number of space +partitions as new data nodes are added. Although existing chunks do not have +their space partitions updated, the new settings are applied to newly created chunks. Because of this behavior, we do not need to move data between data nodes when the cluster size is increased, and simply update how data is -distributed for the next time interval. Writes for new incoming data will +distributed for the next time interval. Writes for new incoming data leverage the new partitioning settings, while the access node can still support queries across all chunks (even those that were created using the old partitioning settings). Do note that although the number of space partitions diff --git a/timescaledb/overview/core-concepts/hypertables-and-chunks.md b/timescaledb/overview/core-concepts/hypertables-and-chunks.md index 8ac22b456f22..37b25b75ffa9 100644 --- a/timescaledb/overview/core-concepts/hypertables-and-chunks.md +++ b/timescaledb/overview/core-concepts/hypertables-and-chunks.md @@ -29,9 +29,9 @@ different days belong to different chunks. TimescaleDB creates these chunks automatically as rows are inserted into the database. If the timestamp of a newly-inserted row belongs to a day not yet -present in the database, TimescaleDB will create a new chunk corresponding to -that day as part of the INSERT process. Otherwise, TimescaleDB will -determine the existing chunk(s) to which the new row(s) belong, and +present in the database, TimescaleDB creates a new chunk corresponding to +that day as part of the INSERT process. Otherwise, TimescaleDB +determines the existing chunk(s) to which the new row(s) belong, and insert the rows into the corresponding chunks. The interval of a hypertable's partitioning can also be changed over time (e.g., to adapt to changing workload conditions, so in one example, a hypertable could initially create a new chunk @@ -46,10 +46,10 @@ We sometimes refer to hypertables partitioned by both time and this additional dimension as "time and space" partitions. This time-and-space partitioning is primarily used for *[distributed hypertables]*. -With such two-dimensional partitioning, each time interval will also be +With such two-dimensional partitioning, each time interval is also partitioned across multiple nodes comprising the distributed hypertables. In such cases, for the same hour, information about some portion of the -devices will be stored on each node. This allows multi-node TimescaleDB +devices are stored on each node. This allows multi-node TimescaleDB to parallelize inserts and queries for data during that time interval. [//]: # (Comment: We should include an image that shows a chunk picture of a) @@ -62,7 +62,7 @@ A chunk includes constraints that specify and enforce its partitioning ranges, e.g., that the time interval of the chunk covers ['2020-07-01 00:00:00+00', '2020-07-02 00:00:00+00'), and all rows included in the chunk must have a time value within that -range. Any space partitions will be reflected as chunk constraints as well. +range. Any space partitions are reflected as chunk constraints as well. As these ranges and partitions are non-overlapping, all chunks in a hypertable are disjoint in their partitioning dimensional space. diff --git a/timescaledb/overview/core-concepts/scaling.md b/timescaledb/overview/core-concepts/scaling.md index 314b26315bf3..25a19ab2f0e7 100644 --- a/timescaledb/overview/core-concepts/scaling.md +++ b/timescaledb/overview/core-concepts/scaling.md @@ -10,7 +10,7 @@ A single instance of PostgreSQL with TimescaleDB installed can often support the needs of very large datasets and application querying. In a regular PostgreSQL instance without TimescaleDB, a common problem with scaling database performance on a single machine is the significant cost/performance trade-off between memory -and disk. Eventually, our entire dataset will not fit in memory, and you will need +and disk. Eventually, the entire dataset does not fit in memory, and you need to write your data and indexes to disk. Once the data is sufficiently large that we can’t fit all pages of our indexes diff --git a/timescaledb/overview/data-model-flexibility/narrow-data-model.md b/timescaledb/overview/data-model-flexibility/narrow-data-model.md index 78acc4ea2d56..f7e8a245d0bd 100644 --- a/timescaledb/overview/data-model-flexibility/narrow-data-model.md +++ b/timescaledb/overview/data-model-flexibility/narrow-data-model.md @@ -38,4 +38,4 @@ same timestamp, since it requires writing a timestamp for each metric. This ulti results in higher storage and ingest requirements. Further, queries that correlate different metrics are also more complex, since each additional metric you want to correlate requires another JOIN. If you typically query multiple metrics together, it is both faster and easier to store them -in a wide table format, which we will cover in the following section. +in a wide table format, which we cover in the following section. diff --git a/timescaledb/overview/faq/faq-postgres.md b/timescaledb/overview/faq/faq-postgres.md index 17214ae89c69..b42a66979b94 100644 --- a/timescaledb/overview/faq/faq-postgres.md +++ b/timescaledb/overview/faq/faq-postgres.md @@ -17,7 +17,7 @@ not scale well to the volume of data that most time-series applications produce, when running on a single server. In particular, vanilla PostgreSQL has poor write performance for moderate tables, and this problem only becomes worse over time as data volume grows linearly in time. These problems emerge when table indexes can no longer fit in memory, -as each insert will translate to many disk fetches to swap in portions of the indexes' +as each insert translates to many disk fetches to swap in portions of the indexes' B-Trees. TimescaleDB solves this through its heavy utilization of time-space partitioning, even when running _on a single machine_. So all writes to recent time intervals are only to tables that remain in memory, and updating any diff --git a/timescaledb/overview/faq/faq-products.md b/timescaledb/overview/faq/faq-products.md index b068bd933ef5..118ec6b6fa64 100644 --- a/timescaledb/overview/faq/faq-products.md +++ b/timescaledb/overview/faq/faq-products.md @@ -106,7 +106,7 @@ our [top-rated support team][timescale-support]. Yes, all of SQL, including: secondary indexes, JOINs, window functions. In fact, to the outside world, TimescaleDB looks like a PostgreSQL database: You connect to the database as if it's PostgreSQL, and you can administer the database as if -it's PostgreSQL. Any tools and libraries that connect with PostgreSQL will +it's PostgreSQL. Any tools and libraries that connect with PostgreSQL automatically work with TimescaleDB. ## Why SQL? @@ -173,7 +173,7 @@ anything that speaks SQL (i.e., the entire PostgreSQL ecosystem). * If you already use and like PostgreSQL, and don't want to have to "give it up" and move to a NoSQL system in order to scale to larger volumes of data. * If you already chose to abandon PostgreSQL or another relational database for a Hadoop/NoSQL -system due to scaling concerns or issues. We will provide support for the migration back. +system due to scaling concerns or issues. We provide support for the migration back. ## What if my use case is simple key-value reads? For this scenario, in-memory or column-oriented databases are designed for diff --git a/timescaledb/overview/how-does-it-compare/timescaledb-vs-postgres.md b/timescaledb/overview/how-does-it-compare/timescaledb-vs-postgres.md index 34b9958e53e5..22cb6547adab 100644 --- a/timescaledb/overview/how-does-it-compare/timescaledb-vs-postgres.md +++ b/timescaledb/overview/how-does-it-compare/timescaledb-vs-postgres.md @@ -22,7 +22,7 @@ can no longer fit in memory. In particular, whenever a new row is inserted, the database needs to update the indexes (e.g., B-trees) for each of the table's indexed -columns, which will involve swapping one or more pages in from disk. +columns, which involves swapping one or more pages in from disk. Throwing more memory at the problem only delays the inevitable, and your throughput in the 10K-100K+ rows per second can crash to hundreds of rows per second once your time-series table is in the tens @@ -71,7 +71,7 @@ perform indexed lookups or table scans are similarly performant between PostgreSQL and TimescaleDB. For example, on a 100M row table with indexed time, hostname, and cpu -usage information, the following query will take less than 5ms for +usage information, the following query takes less than 5ms for each database: ```sql @@ -125,8 +125,8 @@ SELECT date_trunc('minute', time) AS minute, max(usage_user) LIMIT 5; ``` -We will be publishing more complete benchmarking comparisons between -PostgreSQL and TimescaleDB soon, as well as the software to replicate +We are always publishing more complete benchmarking comparisons between +PostgreSQL and TimescaleDB, as well as the software to replicate our benchmarks. The high-level result from our query benchmarking is that @@ -159,7 +159,7 @@ including some of the following: - **Last** and **first** aggregates: These functions allow you to get the value of one column as ordered by another. For - example, `last(temperature, time)` will return the latest + example, `last(temperature, time)` returns the latest temperature value based on time within a group (e.g., an hour). These type of functions enable very natural time-oriented queries. @@ -223,7 +223,7 @@ rather than at the row level, via its `drop_chunks` functionality. SELECT drop_chunks('conditions', INTERVAL '7 days'); ``` -This will delete all chunks (files) from the hypertable 'conditions' +This deletes all chunks (files) from the hypertable 'conditions' that only include data older than this duration, rather than deleting any individual rows of data in chunks. This avoids fragmentation in the underlying database files, which in turn avoids the need for diff --git a/timescaledb/overview/release-notes/changes-in-timescaledb-2.md b/timescaledb/overview/release-notes/changes-in-timescaledb-2.md index b5b3094c345e..a786be806cf4 100644 --- a/timescaledb/overview/release-notes/changes-in-timescaledb-2.md +++ b/timescaledb/overview/release-notes/changes-in-timescaledb-2.md @@ -18,7 +18,7 @@ you need to take. * **Dropping support for PostgreSQL versions 9.6 and 10:** As mentioned in [our upgrade documentation](/how-to-guides/update-timescaledb/), TimescaleDB 2.0 - no longer supports older PostgreSQL versions. You will need to be running PostgreSQL + no longer supports older PostgreSQL versions. You need to be running PostgreSQL version 11 or 12 to upgrade your TimescaleDB installation to 2.0. * **Continuous aggregates:** We have made major changes in the creation and management of continuous aggregates to address user feedback. @@ -38,7 +38,7 @@ about the impetus for these changes and our design decisions. For a more in-dept Once you have read through this guide and understand the impact that upgrading to the latest version may have on your existing application and infrastructure, please follow our [upgrade to TimescaleDB 2.0](/how-to-guides/update-timescaledb/update-timescaledb-2/) - documentation. You will find straight-forward instructions and recommendations to ensure + documentation. You can find straight-forward instructions and recommendations to ensure everything is updated and works correctly. @@ -156,7 +156,7 @@ return multiple columns and (possibly) multiple rows of information. * [`hypertable_detailed_size(hypertable)`](/api/:currentVersion:/hypertable/hypertable_detailed_size): The function has been renamed from `hypertable_relation_size(hypertable)`. Further, if the hypertable is distributed, -it will return multiple rows, one per each of the hypertable's data nodes. +it returns multiple rows, one per each of the hypertable's data nodes. * [`hypertable_size(hypertable)`](/api/:currentVersion:/hypertable/hypertable_size): Returns a single value giving the aggregated hypertable size, including both tables (chunks) and indexes. * [`chunks_detailed_size(hypertable)`](/api/:currentVersion:/hypertable/chunks_detailed_size): Returns @@ -228,7 +228,7 @@ In the example above, `CREATE MATERIALIZED VIEW `creates a continuous aggregate associated with it. Notice also that `WITH NO DATA` is specified at the end. This prevents the view from materializing data at creation time, instead deferring the population of aggregated data until the policy runs as a background job or as part of a manual refresh. Therefore, we recommend that users create continuous aggregates -using the `WITH NO DATA` option, especially if a significant amount of historical data will be materialized. +using the `WITH NO DATA` option, especially if a significant amount of historical data can be materialized. Once the Continuous Aggregate is created, calling `add_continuous_aggregate_policy` creates a continuous aggregate policy, which automatically materializes or refreshes the data following the schedule and rules @@ -246,9 +246,9 @@ The above example sets the refresh interval as between four weeks and two hours respectively). Therefore, if any late data arrives with timestamps within the last four weeks and is backfilled into the source hypertable, then the continuous aggregate view is updated with this old data the next time the policy executes. -This policy will, in the worst case, materialize the whole window every time it runs if data at least four weeks +This policy can, in the worst case, materialize the whole window every time it runs if data at least four weeks old continues to arrive and be inserted into the source hypertables. However, since a continuous aggregate tracks -changes since the last refresh, it will in most cases materialize a subset of the window that corresponds to the +changes since the last refresh, it can in most cases materialize a subset of the window that corresponds to the data that has actually changed. In this example, data backfilled more than 4 weeks ago is not rematerialized, nor does the continuous aggregate @@ -286,7 +286,7 @@ then scheduled via `add_job`. Note that `refresh_continuous_aggregate` only recomputes the aggregated time buckets that completely fall inside the given refresh window and are in a region that has seen changes in the underlying hypertable. Thus, if no changes have occurred in the underlying source data (that is, no data has been backfilled to the -region or no updates to existing data have been made), no materialization will be performed either over that +region or no updates to existing data have been made), no materialization is performed either over that region. This behavior is similar to the continuous aggregate policy and ensures more efficient operation. @@ -298,7 +298,7 @@ users should understand the interactions between data retention policy settings Before starting the upgrade to TimescaleDB 2.0, **we highly recommend checking the database log for errors related to failed retention policies that were occurring in TimescaleDB 1.x** and then either removing them or updating them to be compatible with existing continuous aggregates. Any remaining retention policies that are still incompatible -with the `ignore_invalidation_older_than` setting will automatically be disabled with a notice during the upgrade. +with the `ignore_invalidation_older_than` setting is automatically disabled with a notice during the upgrade. As an example, if a data retention policy on a hypertable is set for `drop_after => '4 weeks'`, then the policy associated with a continuous aggregate on that same hypertable should have a `start_offset` less than or equal @@ -317,12 +317,12 @@ be dropped by a retention policy, the retention policy would silently fail. Mak required users to modify settings in either the retention policy or the continuous aggregate, and even then some data wasn't always materialized as expected. -After upgrading to TimescaleDB 2.0, **retention policies will no longer fail due to incompatibilities with +After upgrading to TimescaleDB 2.0, **retention policies no longer fail due to incompatibilities with continuous aggregates** and users have to ensure that retention and continuous aggregate policies have the desired interplay. -Another change in 2.0 is that `drop_chunks` and the retention policy will no longer -automatically refresh continuous aggregates to account for changes in original hypertable +Another change in 2.0 is that `drop_chunks` and the retention policy no longer +automatically refreshes continuous aggregates to account for changes in original hypertable after the last refresh. Previously, the goal was to ensure that all updates were processed prior to dropping chunks in the original hypertable. In practice, it often didn't work as intended. @@ -384,7 +384,7 @@ one should simply use a refresh window that does not include that region of data always be refreshed at a later time, either manually or via a policy. To ensure that previously ignored backfill can be refreshed after the upgrade to TimescaleDB 2.0, the upgrade -process will mark the region older than the `ignore_invalidation_older_than` threshold as "requiring refresh". +process marks the region older than the `ignore_invalidation_older_than` threshold as "requiring refresh". This allows a manual refresh to bring a continuous aggregate up-to-date with the underlying source data. If the `ignore_invalidation_older_than` threshold was modified at some point to a longer interval, we recommend setting it back to the smaller interval prior to upgrading to ensure that all the backfill can be refreshed, @@ -413,7 +413,7 @@ statistics related to all jobs. ### Updating existing continuous aggregates [](updating-continuous-aggregates) If you have existing continuous aggregates and you update your database to TimescaleDB 2.0, the update scripts -will automatically reconfigure your continuous aggregates to use the new framework. +automatically reconfigure your continuous aggregates to use the new framework. In particular, the update process should: @@ -424,8 +424,8 @@ ather than `refresh_interval`). * Automatically configure `end_offset` to have an offset from `now()` equivalent to the old `refresh_lag` setting. * Mark all the data older than the interval `ignore_invalidation_older_than` as out-of-date, so that it can be refreshed. * Disable any retention policies that are failing due to being incompatible with the current setting of -`ignore_invalidation_older_than` on a continuous aggregate (as described above). Disabled policies will remain post -upgrade, but will not be scheduled to run (`scheduled=false `in` timescaledb_information.jobs`). If failing policies +`ignore_invalidation_older_than` on a continuous aggregate (as described above). Disabled policies remain after +upgrade, but are not scheduled to run (`scheduled=false `in` timescaledb_information.jobs`). If failing policies were to be migrated to 2.0 they would start to work again, but likely with unintended consequences. Therefore, any retention policies that are disabled post update should have their settings carefully reviewed before being enabled again. diff --git a/timescaledb/overview/release-notes/index.md b/timescaledb/overview/release-notes/index.md index 9850a45a498f..317936a76905 100644 --- a/timescaledb/overview/release-notes/index.md +++ b/timescaledb/overview/release-notes/index.md @@ -29,15 +29,15 @@ follow these [setup instructions][distributed-hypertables-setup]. - Continuous aggregates for distributed hypertables - Support for PostgreSQL 14 -- Experimental: Support for timezones in 'time_bucket_ng()', including the 'origin' argument +- Experimental: Support for timezones in 'time_bucket_ng()', including the 'origin' argument You can read more about this release on our [blog post](https://tsdb.co/timescaledb-2-5). -This release also contains bug fixes since the 2.4.2 release. +This release also contains bug fixes since the 2.4.2 release. - This release is medium priority for upgrade. We recommend that you upgrade at the next available opportunity. + This release is medium priority for upgrade. We recommend that you upgrade at the next available opportunity. @@ -63,7 +63,7 @@ Several bugs fixed, see the release notes for more details. Timescale is working hard on our next exciting features. To make that possible, we require functionality that is available in Postgres 12 and above. -For this reason, we removed support for PostgreSQL 11 in the TimescaleDB 2.4 release. +For this reason, we removed support for PostgreSQL 11 in the TimescaleDB 2.4 release. For TimescaleDB 2.5 and onwards, PostgreSQL 12, 13 or 14 are required. @@ -284,10 +284,9 @@ planning. Timescale is working hard on our next exciting features. To make that possible, we require functionality that is unfortunately absent on -PostgreSQL 11. For this reason, we will continue supporting PostgreSQL -11 until mid-June 2021. Sooner to that time, we will announce the -specific version of TimescaleDB in which PostgreSQL 11 support will -not be included going forward. +PostgreSQL 11. For this reason, we continue supporting PostgreSQL +11 only until mid-June 2021. At some point before that time, we are going to +announce in which version of TimescaleDB PostgreSQL 11 support is dropped. **Major features** * #2843 Add distributed restore point functionality @@ -501,7 +500,7 @@ This release also adds: Some of the changes above (e.g., continuous aggregates, updated informational views) do introduce breaking changes to APIs and are not -backwards compatible. While the update scripts in TimescaleDB 2.0 will +backwards compatible. While the update scripts in TimescaleDB 2.0 upgrade databases running TimescaleDB 1.x automatically, some of these API and feature changes may require changes to clients and/or upstream scripts that rely on the previous APIs. Before upgrading, we recommend @@ -855,10 +854,10 @@ This release adds the long-awaited support for PostgreSQL 12 to TimescaleDB. This release also adds a new default behavior when querying continuous aggregates that we call real-time aggregation. A query on a continuous -aggregate will now combine materialized data with recent data that has +aggregate now combines materialized data with recent data that has yet to be materialized. -Note that only newly created continuous aggregates will have this real-time +Note that only newly created continuous aggregates have this real-time query behavior, although it can be enabled on existing continuous aggregates with a configuration setting as follows: @@ -869,7 +868,7 @@ Community version of TimescaleDB (from Enterprise), including data reordering and data retention policies. **Deprecation notice:** Please note that with the release of Timescale 1.7, we are deprecating support for PostgreSQL 9.6.x and 10.x. -The current plan is that the Timescale 2.0 release later this year will only support PostgreSQL major versions 11.x, 12.x, or newer. +The current plan is that the Timescale 2.0 release later this year only supports PostgreSQL major versions 11.x, 12.x, or newer. **Major features** * #1807 Add support for PostgreSQL 12 @@ -904,7 +903,7 @@ for upgrading. In particular the fixes contained in this maintenance release address bugs in continuous aggregates, time_bucket_gapfill, partial index handling and drop_chunks. -For this release only, you will need to restart the database after upgrade before restoring a backup. +For this release only, you need to restart the database after upgrade before restoring a backup. **Minor features** * #1666 Support drop_chunks API for continuous aggregates @@ -1042,7 +1041,7 @@ underlying scans for parallel plans. For more information on this release, read the [announcement blog](https://blog.timescale.com/blog/building-columnar-compression-in-a-row-oriented-database), this [tutorial](/timescaledb/:currentVersion:/getting-started/compress-data/), and the [blog on data tiering](https://blog.timescale.com/blog/optimize-your-storage-costs-with-timescaledbs-data-tiering-functionality/). -**For this release only**, you will need to restart the database before running +**For this release only**, you need to restart the database before running `ALTER EXTENSION` **Major features** @@ -1223,10 +1222,10 @@ in the view. Continuous aggregates are somewhat similar to PostgreSQL materialized views, but unlike a materialized view, continuous -aggregates do not need to be refreshed manually; the view will be refreshed +aggregates do not need to be refreshed manually; the view is refreshed automatically in the background as new data is added, or old data is modified. Additionally, it does not need to re-calculate all of the data on -every refresh. Only new and/or invalidated data will be calculated. Since this +every refresh. Only new and/or invalidated data is calculated. Since this re-aggregation is automatic, it doesn’t add any maintenance burden to your database. @@ -1321,7 +1320,7 @@ We are excited to be introducing new time-series analytical functions, advanced This release adds code under a new license, LICENSE_TIMESCALE. This code can be found in `tsl`. -**For this release only**, you will need to restart the database before running +**For this release only**, you need to restart the database before running `ALTER EXTENSION` **Notable commits** diff --git a/timescaledb/overview/why-timescaledb.md b/timescaledb/overview/why-timescaledb.md index 69f7ed77f3e7..19973a2e3885 100644 --- a/timescaledb/overview/why-timescaledb.md +++ b/timescaledb/overview/why-timescaledb.md @@ -3,7 +3,7 @@ Is TimescaleDB the right choice for your legacy application or next great startup idea? -As you will read in the next section, Core Concepts, TimescaleDB has been built +In the next section, Core Concepts, TimescaleDB has been built and designed from the ground up to bring the power of relational details to the massive scale of time-series data. By building on the strong foundation of PostgreSQL and utilizing the extension ecosystem, TimescaleDB can help you bring meaning to diff --git a/timescaledb/quick-start/dotnet.md b/timescaledb/quick-start/dotnet.md index ea6eb7f88a89..6ab69529f3fa 100644 --- a/timescaledb/quick-start/dotnet.md +++ b/timescaledb/quick-start/dotnet.md @@ -12,8 +12,8 @@ In this Quick Start, you need to: You build the application one step at a time, adding new methods to the `TimescaleHelper` class which are called from the `Main` method of the application. When you have finished, the application code provides a brief template for further experimentation as you learn more about TimescaleDB and .NET with `Npgsql`. ## Prerequisites -Before you begin this Quick Start, make sure you have: -* At least some knowledge of SQL (structured query language). The tutorial will walk you through each SQL command, but it is helpful if you've seen SQL before. +Before you begin this Quick Start, make sure you have: +* At least some knowledge of SQL (structured query language). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. * The latest compatible [.NET runtime installed](https://dotnet.microsoft.com/download/dotnet-framework) and accessible * TimescaleDB installed, either in a [self-hosted environment](http://docs.timescale.com/timescaledb/latest/how-to-guides/install-timescaledb/self-hosted/) or in the [cloud](https://www.timescale.com/timescale-signup). * A PostgreSQL query tool like [psql](https://docs.timescale.com/timescaledb/latest/how-to-guides/connecting/psql/) or any other PostgreSQL client (for example, DBeaver). You need this to explore the final TimescaleDB database. @@ -42,7 +42,7 @@ cd dotnet-tutorial dotnet new console ``` -1. Add the `Npgsql` package to your project which will be used to connect to TimescaleDB: +1. Add the `Npgsql` package to your project which is used to connect to TimescaleDB: ```bash dotnet add package Npgsql ``` @@ -57,25 +57,25 @@ namespace com.timescale.docs { class Program { - // - // This is the main method that will be called + // + // This is the main method that is called // by default when .NET builds this small application static void Main(string[] args) { // Create a new instance of our helper class. This class - // will contain all of the methods for interacting with + // contains all of the methods for interacting with // TimescaleDB for this tutorial TimescaleHelper ts = new TimescaleHelper(); - // Procedure - Connecting .NET to TimescaleDB: - // Verify that the program can connect + // Procedure - Connecting .NET to TimescaleDB: + // Verify that the program can connect // to the database and that TimescaleDB is installed! ts.CheckDatabaseConnection(); } } - // This class will contain all of the methods needed to complete the + // This class contains all of the methods needed to complete the // quick-start, providing a sample of each database operation in total // to refer to later. public class TimescaleHelper @@ -89,9 +89,9 @@ namespace com.timescale.docs // // This is the constructor for our TimescaleHelper class - // + // public TimescaleHelper(string host="localhost", string user="postgres", - string dbname="postgres", string password="password",string port="5432") + string dbname="postgres", string password="password",string port="5432") { Host=host; User=user; @@ -164,9 +164,9 @@ If you don't see the extension, check our troubleshooting section. ## Create a relational table [](create-relational-table) -When the application can successfully connect to TimescaleDB, you can create some relational data that your time-series data can reference when creating data and executing queries. +When the application can successfully connect to TimescaleDB, you can create some relational data that your time-series data can reference when creating data and executing queries. -The new functionality to create the table and insert data is added as a method to the `TimescaleHelper` class and called from the `Main` method of the program. +The new functionality to create the table and insert data is added as a method to the `TimescaleHelper` class and called from the `Main` method of the program. @@ -175,7 +175,7 @@ The new functionality to create the table and insert data is added as a method t ```csharp // // Procedure - Creating a relational table: - // Create a table for basic relational data and + // Create a table for basic relational data and // populate it with a few fake sensors // public void CreateRelationalData() { @@ -203,7 +203,7 @@ The new functionality to create the table and insert data is added as a method t new KeyValuePair("b","ceiling") }; - // Iterate over the list to insert it into the newly + // Iterate over the list to insert it into the newly // created relational table using parameter substitution foreach(KeyValuePair kvp in sensors) { @@ -251,12 +251,12 @@ A hypertable is the core architecture that many other TimescaleDB features is bu ### Creating a hypertable -1. Add a new method to the bottom of the `TimescaleHelper` class that will create a new table and convert it to a hypertable: +1. Add a new method to the bottom of the `TimescaleHelper` class that creates a new table and convert it to a hypertable: ```csharp // // Procedure - Creating a hypertable: // Create a new table to store time-series data and create - // a new TimescaleDB hypertable using the new table. It will be + // a new TimescaleDB hypertable using the new table. It is // partitioned on the 'time' column public void CreateHypertable(){ //use one connection to use for all three commands below. @@ -293,7 +293,7 @@ A hypertable is the core architecture that many other TimescaleDB features is bu ```csharp // Procedure - Creating a hypertable // Create a new table and make it a hypertable to store - // time-series data that we will generate + // the generated time-series data ts.CreateHypertable(); ``` @@ -337,7 +337,7 @@ Your Timescale database has all of the components necessary to start creating an { using (var conn = getConnection()) { - // This query will create one row of data every minute for each + // This query creates one row of data every minute for each // sensor_id, for the last 24 hours ~= 1440 readings per sensor var sql = @"INSERT INTO sensor_data SELECT generate_series(now() - interval '24 hour', @@ -368,7 +368,7 @@ Your Timescale database has all of the components necessary to start creating an 2. Call this method from the `Main` program **after** the `ts.CreateHypertable();` reference: ```csharp // Procedure - Insert time-series data - // Insert time-series data using the built-in + // Insert time-series data using the built-in // PostgreSQL function generate_series() ts.InsertData(); ``` @@ -440,7 +440,7 @@ After executing the query, iterate the results using the `NpgsqlDataReader` and ts.RunQueryExample(); ``` -3. Save and run the application again. As before, if you execute all of the methods in the `Main` program, your output should look similar to this. The values of the output will be different because we used the `random()` function to generate them: +3. Save and run the application again. As before, if you execute all of the methods in the `Main` program, your output should look similar to this. The values of the output are different because we used the `random()` function to generate them: ```bash $ dotnet run @@ -484,4 +484,3 @@ Now that you're able to connect, read, and write to a TimescaleDB instance from [Continuous Aggregates](/how-to-guides/continuous-aggregates/) [Try Other Sample Datasets](/tutorials/sample-datasets/) [Migrate your own Data](/how-to-guides/migrate-data/) - diff --git a/timescaledb/quick-start/golang.md b/timescaledb/quick-start/golang.md index 81022f1bcbee..012ba3a4c535 100644 --- a/timescaledb/quick-start/golang.md +++ b/timescaledb/quick-start/golang.md @@ -12,14 +12,14 @@ you'll learn how to: * [Execute a query on your Timescale database](#execute_query) ## Prerequisites -To complete this tutorial, you will need a cursory knowledge of the Structured Query -Language (SQL). The tutorial will walk you through each SQL command, but it will be +To complete this tutorial, you need a cursory knowledge of the Structured Query +Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. To start, [install TimescaleDB][timescaledb-install]. Once your installation is complete, we can proceed to ingesting or creating sample data and finishing the tutorial. -You will also need: +You also need: * Go installed on your machine. ([Install instructions][golang-install]) * The [PGX driver][pgx-driver-github] for Go @@ -147,7 +147,7 @@ Congratulations, you've successfully connected to TimescaleDB using Go. ## Create a table [](create_table) -Note: For the rest of this tutorial, we will use a connection pool, since +Note: For the rest of this tutorial, you use a connection pool, since having concurrent connections is the most common use case. ### Step 1: Formulate your SQL statement @@ -375,7 +375,7 @@ the SQL statement to generate the data, called `queryDataGeneration`. Then we use the `.Query()` function to execute the statement and return our sample data. Then we store the data returned by our query in `results`, a slice of structs, -which we will then use as a source to insert data into our hypertable. +which is then used as a source to insert data into our hypertable. ```go // Generate data to insert @@ -413,7 +413,7 @@ which we will then use as a source to insert data into our hypertable. } results = append(results, r) } - // Any errors encountered by rows.Next or rows.Scan will be returned here + // Any errors encountered by rows.Next or rows.Scan are returned here if rows.Err() != nil { fmt.Fprintf(os.Stderr, "rows Error: %v\n", rows.Err()) os.Exit(1) @@ -525,7 +525,7 @@ func main() { } results = append(results, r) } - // Any errors encountered by rows.Next or rows.Scan will be returned here + // Any errors encountered by rows.Next or rows.Scan are returned here if rows.Err() != nil { fmt.Fprintf(os.Stderr, "rows Error: %v\n", rows.Err()) os.Exit(1) @@ -567,7 +567,7 @@ to be inserted. This can make ingestion of data slow. To speed up ingestion, we batch inserting data. Here's a sample pattern for how to do so, using the sample data generated in Step 0 -above. We will use the pgx `Batch` object +above, it uses the pgx `Batch` object: ```go package main @@ -630,7 +630,7 @@ func main() { } results = append(results, r) } - // Any errors encountered by rows.Next or rows.Scan will be returned here + // Any errors encountered by rows.Next or rows.Scan are returned here if rows.Err() != nil { fmt.Fprintf(os.Stderr, "rows Error: %v\n", rows.Err()) os.Exit(1) @@ -776,7 +776,7 @@ for your desired purpose. results = append(results, r) fmt.Printf("Time bucket: %s | Avg: %f\n", &r.Bucket, r.Avg) } - // Any errors encountered by rows.Next or rows.Scan will be returned here + // Any errors encountered by rows.Next or rows.Scan are returned here if rows.Err() != nil { fmt.Fprintf(os.Stderr, "rows Error: %v\n", rows.Err()) os.Exit(1) @@ -855,7 +855,7 @@ func main() { results = append(results, r) fmt.Printf("Time bucket: %s | Avg: %f\n", &r.Bucket, r.Avg) } - // Any errors encountered by rows.Next or rows.Scan will be returned here + // Any errors encountered by rows.Next or rows.Scan are returned here if rows.Err() != nil { fmt.Fprintf(os.Stderr, "rows Error: %v\n", rows.Err()) os.Exit(1) diff --git a/timescaledb/quick-start/java.md b/timescaledb/quick-start/java.md index c43159b7defa..9f74687736cb 100644 --- a/timescaledb/quick-start/java.md +++ b/timescaledb/quick-start/java.md @@ -13,25 +13,25 @@ In this tutorial, you'll learn how to: ## Pre-requisites -To complete this tutorial, you will need a cursory knowledge of the Structured Query Language (SQL). -The tutorial will walk you through each SQL command, but it will be helpful if you've seen SQL before. +To complete this tutorial, you need a cursory knowledge of the Structured Query Language (SQL). +The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. To start, [install TimescaleDB][timescaledb-install]. Once your installation is complete, you can proceed to ingesting or creating sample data and finishing the tutorial. -You will also need to install [Java Development Kit (JDK)][jdk] +You also need to install [Java Development Kit (JDK)][jdk] and [PostgreSQL Java Database Connectivity (JDBC) Driver][pg-jdbc-driver] as well. -All code is presented for Java 16 and above. +All code is presented for Java 16 and above. If you are working with older JDK versions, use legacy coding techniques. ## Connect Java to TimescaleDB [](new-database) ### Step 1: Create a new Java application -For simplicity, we will use the application in a single file as an example. +For simplicity, this tutorial uses the application in a single file as an example. You can use any of your favorite build tools, including `gradle` and `maven`. -Create a separate directory and navigate to it. +Create a separate directory and navigate to it. In it, create a text file with name and extension `Main.java` and the following content: ```java @@ -51,15 +51,15 @@ From the command line in the current directory, try running the application with java Main.java ``` -You should see the `Hello, World!` line output to your console. +You should see the `Hello, World!` line output to your console. In case of an error, refer to the documentation and check if the JDK was installed correctly. You don't have to create directory structure `./com/timescale/java` similar to package path in source file. You should just create a single java file in empty folder and run `java Main.java` from it. ### Step 2: Import Postgres JDBC driver -To work with the `PostgreSQL`, you need to import the appropriate `JDBC Driver`. -If you are using a dependency manager, include [PostgreSQL JDBC Driver as dependency][pg-jdbc-driver-dependency]. +To work with the `PostgreSQL`, you need to import the appropriate `JDBC Driver`. +If you are using a dependency manager, include [PostgreSQL JDBC Driver as dependency][pg-jdbc-driver-dependency]. In this case, download [jar artifact of JDBC Driver][pg-jdbc-driver-artifact] and place it next to the `Main.java` file. Now you can import the `JDBC Driver` into the Java application and display a list of available drivers for the check: @@ -105,11 +105,11 @@ Next, compose your connection string variable using this format: var connUrl = "jdbc:postgresql://host:port/dbname?user=username&password=password"; ``` -Full documentation on [the formation of the connection string][pg-jdbc-driver-conn-docs] +Full documentation on [the formation of the connection string][pg-jdbc-driver-conn-docs] can be found in the official documentation of the PostgreSQL JDBC Driver. -The above method of composing a connection string is for test or development purposes only, +The above method of composing a connection string is for test or development purposes only, for production purposes be sure to make sensitive details like your password, hostname, and port number environment variables. @@ -133,7 +133,7 @@ public class Main { } ``` -Run with the `java -cp *.jar Main.java` command +Run with the `java -cp *.jar Main.java` command and you should see this output: `{ApplicationName=PostgreSQL JDBC Driver}`. Congratulations, you've successfully connected to TimescaleDB using Java. @@ -198,19 +198,19 @@ Congratulations, you've successfully created a relational table in TimescaleDB u ## Generate a hypertable [](generate_hypertable) In TimescaleDB, the primary point of interaction with your data is a [hypertable][timescaledb-hypertable], -the abstraction of a single continuous table across all space and time intervals, +the abstraction of a single continuous table across all space and time intervals, such that one can query it via standard SQL. Virtually all user interactions with TimescaleDB are with hypertables. Creating tables and indexes, altering tables, inserting data, and selecting data, can (and should) all be executed on the hypertable. -A hypertable is defined by a standard schema with column names and types, +A hypertable is defined by a standard schema with column names and types, with at least one column specifying a time value. ### Step 1: Create sensors data table -First, we create `CREATE TABLE` SQL statement for our hypertable. +First, we create `CREATE TABLE` SQL statement for our hypertable. Notice how the hypertable has the compulsory time column: ```sql @@ -225,7 +225,7 @@ CREATE TABLE sensor_data ( Next, you can formulate the `SELECT` statement to convert the table we created in Step 1 into a hypertable. Note that you must specify the table name to convert to a hypertable -and its time column name as the two arguments, +and its time column name as the two arguments, as mandated by the [`create_hypertable` docs][timescaledb-hypertable-create-docs]: ```sql @@ -316,7 +316,7 @@ for (final var sensor : sensors) { You can insert a batch of rows into TimescaleDB in a couple of different ways. Let's see what it looks like to insert a number of rows with batching mechanism. -For simplicity's sake, we’ll use PostgreSQL to generate some sample time-series data in order +For simplicity's sake, we’ll use PostgreSQL to generate some sample time-series data in order to insert into the `sensor_data` hypertable: ```java @@ -429,7 +429,7 @@ public class Main { ``` -If you are inserting data from a CSV file, we recommend the [timescale-parallel-copy tool](https://github.com/timescale/timescaledb-parallel-copy), +If you are inserting data from a CSV file, we recommend the [timescale-parallel-copy tool](https://github.com/timescale/timescaledb-parallel-copy), which is a command line program for parallelizing PostgreSQL's built-in `COPY` functionality for bulk inserting data into TimescaleDB. @@ -456,7 +456,7 @@ Notice the use of placeholders for sensor type and location. ### Step 2: Execute the query -Now you can execute the query with the prepared statement and read out the result set +Now you can execute the query with the prepared statement and read out the result set for all `a`-type sensors located on the `floor`: ```java @@ -603,7 +603,7 @@ Congratulations 🎉, you've successfully executed a query on TimescaleDB using ## Next steps -Now that you're able to connect, read, and write to a TimescaleDB instance from your Java application, +Now that you're able to connect, read, and write to a TimescaleDB instance from your Java application, be sure to check out these advanced tutorials: * Get up and running with TimescaleDB with our [Getting Started][timescaledb-getting-started] tutorial. diff --git a/timescaledb/quick-start/node.md b/timescaledb/quick-start/node.md index f27b839f6263..ecdf2357f4c1 100644 --- a/timescaledb/quick-start/node.md +++ b/timescaledb/quick-start/node.md @@ -13,20 +13,20 @@ you'll learn how to: ## Prerequisites -To complete this tutorial, you will need a cursory knowledge of the Structured Query -Language (SQL). The tutorial will walk you through each SQL command, but it will be +To complete this tutorial, you need a cursory knowledge of the Structured Query +Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. To start, [install TimescaleDB][install-timescale]. Once your installation is complete, we can proceed to ingesting or creating sample data and finishing the tutorial. -Obviously, you will need to [install Node][node-install] and the +Obviously, you need to [install Node][node-install] and the [Node Package Manager (npm)][npm-install] as well. ## Connect Node to TimescaleDB [](new-database) TimescaleDB is based on PostgreSQL and we can use common PostgreSQL tools to connect -your Node app to the database. In this example, we will use a common Node.js +your Node app to the database. This example uses a common Node.js Object Relational Mapper (ORM) called [Sequelize][sequelize-info]. ### Step 1: Create your Node app @@ -37,7 +37,7 @@ Let's initialize a new Node app. From your command line, type the following: npm init -y ``` -This will create a `package.json` file in your directory, which contains all +This creates a `package.json` file in your directory, which contains all of the depenencies for your project: ```json @@ -164,7 +164,7 @@ project. From the command line, type the following: npx sequelize init ``` -This will create a `config/config.json` file in your project. We will need to +This creates a `config/config.json` file in your project. You need to modify it with the connection details we tested earlier. For the remainder of this application, we'll use a database called `node_test`. Here's a full example file. Again, note the `dialectOptions`. @@ -245,8 +245,8 @@ To start, create a database migration by running the following command: npx sequelize migration:generate --name add_tsdb_extension ``` -You will see a file that has the name `add_tsdb_extension` appended to it in -your `migrations` folder. Let's modify that file to look like this: +There is a file that has the name `add_tsdb_extension` appended to it in +your `migrations` folder. Modify that file to look like this: ```javascript 'use strict'; @@ -343,7 +343,7 @@ let PageLoads = sequelize.define('page_loads', { }); ``` -We will now be able to instantiate a `PageLoads` object and save it to the +You can now instantiate a `PageLoads` object and save it to the database. ## Generate hypertable [](create_hypertable) @@ -418,10 +418,10 @@ Now you have a working connection to your database, a table configured with the proper schema, and a hypertable created to more efficiently query data by time. Let's add data to the table. -In the `index.js` file, we will modify the `/` route like so to first get the +In the `index.js` file, modify the `/` route like so to first get the `user-agent` from the request object (`req`) and the current timestamp. Then, -we will save call the `create` method on our model (`PageLoads`), supplying -the user agent and timestamp parameters. The `create` call will execute +call the `create` method on our model (`PageLoads`), supplying +the user agent and timestamp parameters. The `create` call executes an `INSERT` on the database: ```javascript @@ -449,7 +449,7 @@ app.get('/', async (req, res) => { Each time the page is reloaded, we also want to display all information currently in the table. -To do this, we will once again modify the `/` route in our `index.js` file +To do this, modify the `/` route in our `index.js` file to call the Sequelize `findAll` function and retrieve all data from the `page_loads` table via the `PageLoads` model, like so: diff --git a/timescaledb/quick-start/python.md b/timescaledb/quick-start/python.md index c822c0b452f5..7bbe2397db69 100644 --- a/timescaledb/quick-start/python.md +++ b/timescaledb/quick-start/python.md @@ -13,7 +13,7 @@ you'll learn how to: ## Prerequisites Before you start, make sure you have: -* At least some knowledge of SQL (structured query language). The tutorial will walk you through each SQL command, +* At least some knowledge of SQL (structured query language). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. * TimescaleDB installed, either in a [self-hosted environment][self-hosted-install] or [in the cloud][cloud-install] * The `psycopg2` library installed, [which you can install with pip][psycopg2-docs]. @@ -66,7 +66,7 @@ The above method of composing a connection string is for test or development pur ### Step 3: Connect to TimescaleDB using the psycopg2 connect function -Use the psycopg2 [connect function][psycopg2-connect] to create a new database session and create +Use the psycopg2 [connect function][psycopg2-connect] to create a new database session and create a new [cursor object][psycopg2-cursor] to interact with the database. In your `main` function, add the following lines: @@ -108,7 +108,7 @@ query_create_sensors_table = "CREATE TABLE sensors (id SERIAL PRIMARY KEY, type ### Step 2: Execute the SQL statement and commit changes Next, we execute the `CREATE TABLE` statement by opening a cursor, executing the -query from Step 1 and committing the query we executed in order to make the changes persistent. +query from Step 1 and committing the query we executed in order to make the changes persistent. Afterward, we close the cursor to clean up: ```python @@ -153,8 +153,8 @@ query_create_sensordata_table = """CREATE TABLE sensor_data ( ### Step 2: Formulate the SELECT statement to create your hypertable -Next, formulate a `SELECT` statement that converts the `sensor_data` table to a hypertable. Note that you must specify -the table name which you wish to convert to a hypertable and its time column name as the two arguments, as mandated by +Next, formulate a `SELECT` statement that converts the `sensor_data` table to a hypertable. Note that you must specify +the table name which you wish to convert to a hypertable and its time column name as the two arguments, as mandated by the [`create_hypertable` docs][create-hypertable-docs]: ```python @@ -200,7 +200,7 @@ for sensor in sensors: conn.commit() ``` -A cleaner way to pass variables to the `cursor.execute` function is to separate the formulation of our SQL +A cleaner way to pass variables to the `cursor.execute` function is to separate the formulation of our SQL statement, `SQL`, from the data being passed with it into the prepared statement, `data`: ```python @@ -231,7 +231,7 @@ from pgcopy import CopyManager ### Step 1: Get data to insert into database First we generate random sensor data using the `generate_series` function provided by PostgreSQL. -In the example query below, you will insert a total of 480 rows of data (4 readings, every 5 minutes, for 24 hours). +This example inserts a total of 480 rows of data (4 readings, every 5 minutes, for 24 hours). In your application, this would be the query that saves your time-series data into the hypertable. ```python @@ -359,12 +359,12 @@ If you want a list of dictionaries instead, you can define the cursor using [`Di cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) ``` -Using this cursor, `cursor.fetchall()` will return a list of dictionary-like objects. +Using this cursor, `cursor.fetchall()` returns a list of dictionary-like objects. ### Executing queries using prepared statements For more complex queries than a simple `SELECT *`, we can use prepared statements to ensure our queries are executed safely against the database. We write our -query using placeholders as shown in the sample code below. For more information about properly using placeholders +query using placeholders as shown in the sample code below. For more information about properly using placeholders in psycopg2, see the [basic module usage document][psycopg2-docs-basics]. ```python diff --git a/timescaledb/quick-start/ruby.md b/timescaledb/quick-start/ruby.md index d7adbcdea95e..e3c22efdfd64 100644 --- a/timescaledb/quick-start/ruby.md +++ b/timescaledb/quick-start/ruby.md @@ -13,14 +13,14 @@ you'll learn how to: ## Prerequisites -To complete this tutorial, you will need a cursory knowledge of the Structured Query -Language (SQL). The tutorial will walk you through each SQL command, but it will be +To complete this tutorial, you need a cursory knowledge of the Structured Query +Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. To start, [install TimescaleDB][install-timescale]. Once your installation is complete, we can proceed to ingesting or creating sample data and finishing the tutorial. -You will also need to [install Rails][rails-install]. +You also need to [install Rails][rails-install]. ## Connect Ruby to TimescaleDB [](new-database) @@ -32,7 +32,7 @@ database. TimescaleDB is a PostgreSQL extension. rails new my_app -d=postgresql ``` -Rails will finish creating and bundling your application, installing all required Gems in the process. +Rails finishes creating and bundling your application, installing all required Gems in the process. ### Step 2: Configure the TimescaleDB database @@ -62,7 +62,7 @@ default: &default ``` -Experienced Rails developers will want to set and retrieve environment variables for the username and password of the database. For the purposes of this quick start, we will hard code the `host`, `port`, `username`, and `password`. This is *not* advised for code or databases of consequence. +Experienced Rails developers might want to set and retrieve environment variables for the username and password of the database. For the purposes of this quick start, we hard code the `host`, `port`, `username`, and `password`. This is *not* advised for code or databases of consequence. Then configure the database name in the `development`, `test`, and `production` sections. Let's call our @@ -111,15 +111,15 @@ Now we can run the following `rake` command to create the database in TimescaleD rails db:create ``` -This will create the `my_app_db` database in your TimescaleDB instance and a `schema.rb` +This creates the `my_app_db` database in your TimescaleDB instance and a `schema.rb` file that represents the state of your TimescaleDB database. ## Create a relational table [](create_table) ### Step 1: Add TimescaleDB to your Rails migration -First, let's setup our database to include the TimescaleDB extension. We will -start by creating a migration: +First, let's setup our database to include the TimescaleDB extension. +Start by creating a migration: ```bash rails generate migration add_timescale @@ -148,7 +148,7 @@ rails db:migrate ``` -In order for the command to work, you will need to make sure there is a database named `postgres` in your TimescaleDB deployment. This database is sometimes not present by default. +In order for the command to work, you need to make sure there is a database named `postgres` in your TimescaleDB deployment. This database is sometimes not present by default. With `rails dbconsole` you can test that the extension has been added by running the `\dx` @@ -181,8 +181,8 @@ rails generate scaffold PageLoads user_agent:string ``` TimescaleDB requires that any `UNIQUE` or `PRIMARY KEY` indexes on your table -include all partitioning columns, which in our case is the time column. A new Rails model will -include a `PRIMARY KEY` index for `id` by default, so we need to either remove the +include all partitioning columns, which in our case is the time column. A new Rails model +includes a `PRIMARY KEY` index for `id` by default, so we need to either remove the column or make sure that the index includes time as part of a "composite key". @@ -276,7 +276,7 @@ class AddHypertable < ActiveRecord::Migration[5.2] end ``` -When we run `rails db:migrate` we will generate the hypertable. +Run `rails db:migrate` to generate the hypertable. We can confirm this in `psql` by running the `\d page_loads` command and seeing the following: @@ -297,15 +297,15 @@ Triggers: ## Insert rows into TimescaleDB [](insert_rows) -Let's create a new view and controller so that we can insert a value into -the database and see our results. When our view displays, we will store -the user agent and time into our database. +Create a new view and controller so that we can insert a value into +the database and see our results. When the view displays, you can store +the user agent and time into the database. ```bash rails generate controller static_pages home ``` -This will generate the view and controller files for a page called `/static_pages/home` +This generates the view and controller files for a page called `/static_pages/home` in our site. Let's first add a line to the `static_pages_controller.rb` file to retrieve the user agent of the site visitor's browser: @@ -317,8 +317,8 @@ class StaticPagesController < ApplicationController end ``` -Subsequently, in the `home.html.erb` file, we will print the `@agent` -variable we just created: +Subsequently, in the `home.html.erb` file, print the `@agent` +variable you just created: ```erb

StaticPages#home

@@ -426,7 +426,7 @@ ab -n 50000 -c 10 http://localhost:3000/static_pages/home Now, you can grab a tea and relax while it creates thousands of records in your first hypertable. You'll be able to count how many 'empty requests' your -Rails will support. +Rails supports. ## Counting requests per minute diff --git a/timescaledb/tutorials/analyze-cryptocurrency-data.md b/timescaledb/tutorials/analyze-cryptocurrency-data.md index 0255f0c89fed..7faf82c57cc9 100644 --- a/timescaledb/tutorials/analyze-cryptocurrency-data.md +++ b/timescaledb/tutorials/analyze-cryptocurrency-data.md @@ -23,8 +23,8 @@ You can also download the resources for this tutorial: September 2019. Follow the steps in Section 2 of this tutorial if you require fresh data) ## Prerequisites -To complete this tutorial, you will need a cursory knowledge of the Structured Query -Language (SQL). The tutorial will walk you through each SQL command, but it will be +To complete this tutorial, you need a cursory knowledge of the Structured Query +Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. To start, [install TimescaleDB][install-timescale]. When your installation is diff --git a/timescaledb/tutorials/analyze-intraday-stocks/fetch-and-ingest.md b/timescaledb/tutorials/analyze-intraday-stocks/fetch-and-ingest.md index fed7ac0d0270..562a11726a16 100644 --- a/timescaledb/tutorials/analyze-intraday-stocks/fetch-and-ingest.md +++ b/timescaledb/tutorials/analyze-intraday-stocks/fetch-and-ingest.md @@ -9,7 +9,7 @@ In this step: ## Create a configuration file This is an optional step, but it is highly recommended that you do not store your password or other sensitive information -directly in your code. Instead, create a configuration file, for example `config.py`, and include your +directly in your code. Instead, create a configuration file, for example `config.py`, and include your database connection details and Alpha Vantage API key in there: ```python @@ -31,7 +31,7 @@ apikey = config.APIKEY ## Collect ticker symbols -In order to fetch intraday stock data, you will need to know which ticker symbols you want to analyze. +In order to fetch intraday stock data, you need to know which ticker symbols you want to analyze. First, let's collect a list of symbols so that we can fetch their data later. In general, you have a few options to gather a list of ticker symbols dynamically: @@ -63,13 +63,13 @@ Now you have a list of ticker symbols that you can use later to make requests to ### About the API -Alpha Vantage API provides 2 year historical intraday stock data in 1, 5, 15, or 30 minute -intervals. The API outputs a lot of data in a CSV file (around 2200 rows per symbol per -day, for a 1 minute interval), so it slices the dataset into one month buckets. This means +Alpha Vantage API provides 2 year historical intraday stock data in 1, 5, 15, or 30 minute +intervals. The API outputs a lot of data in a CSV file (around 2200 rows per symbol per +day, for a 1 minute interval), so it slices the dataset into one month buckets. This means that for one request for a single symbol, the most amount of data you can get is one month. - The maximum amount of historical intraday data is 24 months. To fetch the maximum - amount, you need to slice up your requests by month. For example, `year1month1`, - `year1month2`, and so on. Keep in mind that each request can only fetch data for one + The maximum amount of historical intraday data is 24 months. To fetch the maximum + amount, you need to slice up your requests by month. For example, `year1month1`, + `year1month2`, and so on. Keep in mind that each request can only fetch data for one symbol at a time. Here's an example API endpoint: @@ -110,7 +110,7 @@ def fetch_stock_data(symbol, month): CSV_URL = 'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY_EXTENDED&' \ 'symbol={symbol}&interval={interval}&slice={slice}&apikey={apikey}' \ .format(symbol=symbol, slice=slice, interval=interval,apikey=apikey) - + # read CSV file directly into a pandas dataframe df = pd.read_csv(CSV_URL) @@ -123,15 +123,15 @@ def fetch_stock_data(symbol, month): df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S') # rename columns to match database schema - df = df.rename(columns={'time': 'time', - 'open': 'price_open', - 'close': 'price_close', + df = df.rename(columns={'time': 'time', + 'open': 'price_open', + 'close': 'price_close', 'high': 'price_high', 'low': 'price_low', 'volume': 'trading_volume'}) # convert the dataframe into a list of tuples ready to be ingested - return [row for row in df.itertuples(index=False, name=None)] + return [row for row in df.itertuples(index=False, name=None)] ``` ## Ingest data into TimescaleDB @@ -139,7 +139,7 @@ def fetch_stock_data(symbol, month): When you have the `fetch_stock_data` function working, and you can fetch the candlestick from the API, you can insert it into the database. To make the ingestion faster, use [pgcopy][pgcopy-docs] instead of ingesting -data row by row. TimescaleDB is packaged as an extension to PostgreSQL, meaning all the PostgreSQL tools you know and +data row by row. TimescaleDB is packaged as an extension to PostgreSQL, meaning all the PostgreSQL tools you know and love already work with TimescaleDB. ### Ingest data fast with pgcopy @@ -162,14 +162,14 @@ from pgcopy import CopyManager import config, psycopg2 # establish database connection -conn = psycopg2.connect(database=config.DB_NAME, - host=config.DB_HOST, - user=config.DB_USER, - password=config.DB_PASS, +conn = psycopg2.connect(database=config.DB_NAME, + host=config.DB_HOST, + user=config.DB_USER, + password=config.DB_PASS, port=config.DB_PORT) # column names in the database (pgcopy needs it as a parameter) -COLUMNS = ('time', 'symbol', 'price_open', 'price_close', +COLUMNS = ('time', 'symbol', 'price_open', 'price_close', 'price_low', 'price_high', 'trading_volume') # iterate over the symbols list @@ -216,7 +216,7 @@ time |symbol|price_open|price_close|price_low|price_high|trading_v ``` -Fetching and ingesting intraday data can take a while, so if you want to see results quickly, +Fetching and ingesting intraday data can take a while, so if you want to see results quickly, reduce the number of months, or limit the number of symbols. diff --git a/timescaledb/tutorials/analyze-intraday-stocks/index.md b/timescaledb/tutorials/analyze-intraday-stocks/index.md index 234570a2d9c4..36856ba25876 100644 --- a/timescaledb/tutorials/analyze-intraday-stocks/index.md +++ b/timescaledb/tutorials/analyze-intraday-stocks/index.md @@ -1,32 +1,32 @@ # Analyze historical intraday stock data -This tutorial is a step-by-step guide on how to collect, store, and analyze intraday stock data +This tutorial is a step-by-step guide on how to collect, store, and analyze intraday stock data with TimescaleDB. This tutorial has a few main steps: 1. [Design database schema][design-schema] - - You will create a table that will be capable of storing 1-min candlestick data. + + You create a table that is capable of storing 1-min candlestick data. 2. [Fetch and ingest stock data][fetch-ingest] - - You will learn how to fetch data from the Alpha Vantage API and ingest it into the database in a fast manner. + + You learn how to fetch data from the Alpha Vantage API and ingest it into the database in a fast manner. 3. [Explore stock market data][explore] - - After all the plumbing work is done, you will see several ways to explore stock price points, lows and highs, price changes over time, symbols with the most daily gains, candlestick charts, and more! + + After all the plumbing work is done, you can see several ways to explore stock price points, lows and highs, price changes over time, symbols with the most daily gains, candlestick charts, and more! ## Prerequisites * Python 3 -* TimescaleDB (see [installation options][install-timescale]) +* TimescaleDB (see [installation options][install-timescale]) * Alpha Vantage API key ([get one for free][alpha-vantage-apikey]) * Virtualenv (installation: `pip install virtualenv`) * [Psql][psql-install] or any other PostgreSQL client (e.g. DBeaver) ## Get started: create a virtual environment -It's recommended to create a new Python virtual environment to isolate the packages used +It's recommended to create a new Python virtual environment to isolate the packages used throughout this tutorial. ```bash diff --git a/timescaledb/tutorials/analyze-nft-data/analyzing-nft-transactions.md b/timescaledb/tutorials/analyze-nft-data/analyzing-nft-transactions.md index 0fc143d5c461..b14706467b93 100644 --- a/timescaledb/tutorials/analyze-nft-data/analyzing-nft-transactions.md +++ b/timescaledb/tutorials/analyze-nft-data/analyzing-nft-transactions.md @@ -1,50 +1,50 @@ # Analyzing NFT transactions -When you have successfully collected and ingested the data, it's time to analyze -it. For this analysis, we use data collected with our ingestion script that -contains only successful sale transactions that happened between -1 January 2021 to 12 October 2021 on the OpenSea marketplace, as reported by the -OpenSea API. - -For simplicity, this tutorial analyzes only those transactions that used `ETH` -as their payment symbol, but you can modify the script to include more +When you have successfully collected and ingested the data, it's time to analyze +it. For this analysis, we use data collected with our ingestion script that +contains only successful sale transactions that happened between +1 January 2021 to 12 October 2021 on the OpenSea marketplace, as reported by the +OpenSea API. + +For simplicity, this tutorial analyzes only those transactions that used `ETH` +as their payment symbol, but you can modify the script to include more payment symbols in your analysis if you want to. -All the queries in this section, plus some additional ones, are in our +All the queries in this section, plus some additional ones, are in our [NFT Starter Kit on GitHub][nft-starter-kit] in the [`queries.sql` file][queries]. -We divide our analysis into two parts: simple queries and complex queries. But -first we will create something to speed up our queries: TimescaleDB continuous +We divide our analysis into two parts: simple queries and complex queries. But +first we create something to speed up our queries: TimescaleDB continuous aggregates. -All queries in this section only include data that's accessible from the +All queries in this section only include data that's accessible from the OpenSea API. ## Speeding up queries with continuous aggregates -TimescaleDB continuous aggregates speed up workloads that need to process large -amounts of data. They look like PostgreSQL materialized views, but have a -built-in refresh policy that makes sure that the data is up to date as new -data comes in. Additionally, the refresh procedure is careful to only refresh -data in the materialized view that actually needs to be changed, thereby -avoiding recomputation of data that did not change. This smart refresh procedure -massively improves the refresh performance of the materialized view and the -refresh policy ensures that the data is always up to date. - -[Continuous aggregates][cont-agg] are often used to speed up dashboards and -visualizations, summarizing data sampled at high frequency, and querying +TimescaleDB continuous aggregates speed up workloads that need to process large +amounts of data. They look like PostgreSQL materialized views, but have a +built-in refresh policy that makes sure that the data is up to date as new +data comes in. Additionally, the refresh procedure is careful to only refresh +data in the materialized view that actually needs to be changed, thereby +avoiding recomputation of data that did not change. This smart refresh procedure +massively improves the refresh performance of the materialized view and the +refresh policy ensures that the data is always up to date. + +[Continuous aggregates][cont-agg] are often used to speed up dashboards and +visualizations, summarizing data sampled at high frequency, and querying downsampled data over long time periods. -This tutorial creates two continuous aggregates to speed up queries on assets +This tutorial creates two continuous aggregates to speed up queries on assets and on collections. ### Assets continuous aggregates -Create a new continuous aggregate called `assets_daily` that computes and stores -the following information about all assets for each day: `asset_id`, the collection -it belongs to, `daily average price`, `median price`, `sale volume`, `ETH volume`, +Create a new continuous aggregate called `assets_daily` that computes and stores +the following information about all assets for each day: `asset_id`, the collection +it belongs to, `daily average price`, `median price`, `sale volume`, `ETH volume`, `open`, `high`, `low` and `close` prices: ```sql @@ -67,8 +67,8 @@ WHERE payment_symbol = 'ETH' GROUP BY bucket, asset_id, collection_id ``` -Add a refresh policy to update the continuous aggregate daily with the latest data, -so that you can save +Add a refresh policy to update the continuous aggregate daily with the latest data, +so that you can save computation at query time: ```sql SELECT add_continuous_aggregate_policy('assets_daily', @@ -78,9 +78,9 @@ SELECT add_continuous_aggregate_policy('assets_daily', ``` ### Collections continuous aggregates -Create another continuous aggregate called `collections_daily` that computes and -stores the following information about all collections for each day, -including `daily average price`, `median price`, `sale volume`, `ETH volume`, +Create another continuous aggregate called `collections_daily` that computes and +stores the following information about all collections for each day, +including `daily average price`, `median price`, `sale volume`, `ETH volume`, `the most expensive nft`, and `the highest price`: ```sql @@ -106,27 +106,27 @@ SELECT add_continuous_aggregate_policy('collections_daily', schedule_interval => INTERVAL '1 day'); ``` -When you are asking questions where daily aggregations can help with the answer, -you can query the continuous aggregate, rather than the raw data in the `nft_sales` +When you are asking questions where daily aggregations can help with the answer, +you can query the continuous aggregate, rather than the raw data in the `nft_sales` hypertable. This helps speed up the result. ## Simple queries -You can start your analysis by asking simple questions about NFT sales that -happened in 2021 and answering them using SQL queries. Use these queries -as a starting point for your own further analysis. You can modify each query +You can start your analysis by asking simple questions about NFT sales that +happened in 2021 and answering them using SQL queries. Use these queries +as a starting point for your own further analysis. You can modify each query to analyze the time-period, asset, collection, or account that you are curious about! -Where possible, we include dashboard examples from Superset to serve as -inspiration for creating your own dashboard which monitors and analyzes NFT -sales using free, open-source tools. You can find the code used to create each +Where possible, we include dashboard examples from Superset to serve as +inspiration for creating your own dashboard which monitors and analyzes NFT +sales using free, open-source tools. You can find the code used to create each graph in the [NFT Starter Kit Github repo][nft-starter-kit]. ### Collections with the highest sales volume -Which collections have the highest volume of sales? Answering this is a great -starting point for finding collections with assets that are frequently traded, -which is important for buyers thinking about the resale value of their NFTs. If -you buy an NFT in one of the collections below, there is a good chance you'll -be able to find a buyer. In this query, you order the collections by total volume +Which collections have the highest volume of sales? Answering this is a great +starting point for finding collections with assets that are frequently traded, +which is important for buyers thinking about the resale value of their NFTs. If +you buy an NFT in one of the collections below, there is a good chance you'll +be able to find a buyer. In this query, you order the collections by total volume of sales, but you could also order them by ETH volume instead: ```sql /* Collections with the highest volume? */ @@ -152,22 +152,22 @@ ORDER BY total_volume DESC; | 24px | 24872 | 3203.9084810874024 | | pudgypenguins | 24165 | 35949.81731415086 | -For this query, you take advantage of the pre-calculated data about collections -stored in the `collections_daily` continuous aggregate. You also perform an +For this query, you take advantage of the pre-calculated data about collections +stored in the `collections_daily` continuous aggregate. You also perform an `INNER JOIN` on the collections relational table to find the collection name in human readable form, represented by the `slug`. -Querying from continuous aggregates is faster and allows you to write shorter, -more readable queries. It is a pattern that you'll use again in this tutorial, +Querying from continuous aggregates is faster and allows you to write shorter, +more readable queries. It is a pattern that you'll use again in this tutorial, so look out for it! ### Daily sales of a collection -How many sales took place each day for a certain collection? This query looks -at the daily volume of sales for NFTs in the `cryptokitties` collection. This -can help you find which days the NFT traders have been more active, and help you +How many sales took place each day for a certain collection? This query looks +at the daily volume of sales for NFTs in the `cryptokitties` collection. This +can help you find which days the NFT traders have been more active, and help you spot patterns about which days of the week or month have higher or lower volume and why. -You can modify this query to look at your favorite NFT collection, such as +You can modify this query to look at your favorite NFT collection, such as `cryptopunks`, `lazy-lions`, or `afrodroids-by-owo`: ```sql SELECT bucket, slug, volume @@ -190,13 +190,13 @@ Here's what this query would look like as a time-series chart in Apache Superset ![daily number of nft transactions](https://assets.timescale.com/docs/images/tutorials/nft-tutorial/daily-number-of-nft-transactions.jpg) -As a reminder, charts like this are pre-built and ready for you to use and -modify as part of the pre-built dashboards +As a reminder, charts like this are pre-built and ready for you to use and +modify as part of the pre-built dashboards in our [NFT Starter Kit][nft-starter-kit]. ### Comparison of daily NFT sales for different collections -How do the daily sales of NFTs in one collection compare to that of another -collection? This query compares the daily sales of two popular NFT collections: +How do the daily sales of NFTs in one collection compare to that of another +collection? This query compares the daily sales of two popular NFT collections: CryptoKitties and Ape Gang, in the past three months: ```sql /* Daily number of NFT transactions, "CryptoKitties" vs Ape Gang from past 3 months? */ @@ -219,24 +219,24 @@ bucket |slug |volume| ![comparison of different collections](https://assets.timescale.com/docs/images/tutorials/nft-tutorial/comparison-of-different-collections.jpg) -This sort of query is useful to track sales activity in collections you're -interested in or own assets in, so you can see the activity of other NFT holders. -Also, you can modify the time-period under consideration to look at larger +This sort of query is useful to track sales activity in collections you're +interested in or own assets in, so you can see the activity of other NFT holders. +Also, you can modify the time-period under consideration to look at larger (such as 9 months), or smaller (such as 14 days) periods of time. ### Snoop Dogg's NFT activity (or individual account activity) -How many NFTs did a particular person buy in a certain period of time? This -sort of query is useful to monitor the activity of popular NFT collectors, -like American rapper Snoop Dogg (or [Cozomo_de_Medici][snoop-dogg-opensea]) or -African NFT evangelist [Daliso Ngoma][daliso-opensea] or even compare trading -patterns of multiple collectors. Since NFT transactions are public on the Ethereum -blockchain and our database contains seller (`seller_account`) and -buyer (`winner_account`) columns as well, you can analyze the purchase -activity of a specific account. - -This query analyzes [Snoop Dogg’s](https://twitter.com/cozomomedici) address to -analyze his trades, but you can edit the query to add any address in the `WHERE` +How many NFTs did a particular person buy in a certain period of time? This +sort of query is useful to monitor the activity of popular NFT collectors, +like American rapper Snoop Dogg (or [Cozomo_de_Medici][snoop-dogg-opensea]) or +African NFT evangelist [Daliso Ngoma][daliso-opensea] or even compare trading +patterns of multiple collectors. Since NFT transactions are public on the Ethereum +blockchain and our database contains seller (`seller_account`) and +buyer (`winner_account`) columns as well, you can analyze the purchase +activity of a specific account. + +This query analyzes [Snoop Dogg’s](https://twitter.com/cozomomedici) address to +analyze his trades, but you can edit the query to add any address in the `WHERE` clause to see the specified account's transactions: ```sql /* Snoop Dogg's transactions in the past 3 months aggregated */ @@ -263,14 +263,14 @@ trade_count|nft_count|collection_count|sale_count|buy_count|total_volume_eth |a -----------|---------|----------------|----------|---------|------------------|------------------|---------|---------| 59| 57| 20| 1| 58|1835.5040000000006|31.110237288135604| 0.0| 1300.0| -From the result of the query, we can see that Snoop Dogg made 59 trades overall in the past 3 months (bought 58 times, -and sold only once). His trades included 57 individual NFTs and 23 collections, totaling 1835.504 ETH spent, with -minimum paid price of 0 and max of 1300 ETH. +From the result of the query, we can see that Snoop Dogg made 59 trades overall in the past 3 months (bought 58 times, +and sold only once). His trades included 57 individual NFTs and 23 collections, totaling 1835.504 ETH spent, with +minimum paid price of 0 and max of 1300 ETH. ### Most expensive asset in a collection -Whats the most expensive NFT in a certain collection? This query looks at a -specific collection (CryptoKitties) and finds the most expensive NFT sold from it. -This can help you find the rarest items in a collection and look at the properties +Whats the most expensive NFT in a certain collection? This query looks at a +specific collection (CryptoKitties) and finds the most expensive NFT sold from it. +This can help you find the rarest items in a collection and look at the properties that make it rare in order to help you buy items with similar properties from that collection: ```sql /* Top 5 most expensive NFTs in the CryptoKitties collection */ @@ -291,8 +291,8 @@ grey | 149.0|2021-09-03 02:32:26|https://opensea.io/assets/0x0601 Founder Cat #38| 148.0|2021-09-03 01:58:13|https://opensea.io/assets/0x06012c8cf97bead5deae237070f9587f8e7a266d/38| ### Daily ETH volume of assets in a collection -What is the daily volume of Ether (ETH) for a specific collection? Using the -example of CryptoKitties, this query calculates the daily total ETH spent in +What is the daily volume of Ether (ETH) for a specific collection? Using the +example of CryptoKitties, this query calculates the daily total ETH spent in sales of NFTs in a certain collection: ```sql @@ -321,10 +321,10 @@ This graph uses a logarithmic scale, which you can configure in the graph's sett
### Comparison of daily ETH volume of multiple collections -How does the daily volume of ETH spent on assets in one collection compare to -others? This query uses CryptoKitties and Ape Gang as examples, to find the daily -ETH spent on buying assets in those collections in the past three months. You -can extend this query to monitor and compare the daily volume spent on your +How does the daily volume of ETH spent on assets in one collection compare to +others? This query uses CryptoKitties and Ape Gang as examples, to find the daily +ETH spent on buying assets in those collections in the past three months. You +can extend this query to monitor and compare the daily volume spent on your favorite NFT collections and find patterns in sales: ```sql @@ -349,13 +349,13 @@ bucket |slug |volume_eth | ![comparison-daily-eth-volume-collections](https://assets.timescale.com/docs/images/tutorials/nft-tutorial/comparison-daily-eth-volume-collections.jpg) -The graph above uses a logarithmic scale, which we configured in the graph's +The graph above uses a logarithmic scale, which we configured in the graph's settings in Superset. ### Daily mean and median sale price of assets in a collection -When you are analyzing the daily price of assets in a specific collection, two -useful statistics to use are the mean price and the median price. This query +When you are analyzing the daily price of assets in a specific collection, two +useful statistics to use are the mean price and the median price. This query finds the daily mean and median sale prices of assets in the CryptoKitties collection: ```sql /* Mean vs median sale price of CryptoKitties? */ @@ -377,10 +377,10 @@ bucket |slug |mean_price |median_price | ![daily mean median](https://assets.timescale.com/docs/images/tutorials/nft-tutorial/daily-mean-median.jpg) -Since calculating the mean and median are computationally expensive for large -datasets, we use the [`percentile_agg` hyperfunction][percentile-agg], a SQL -function that is part of the Timescale Toolkit extension. It accurately -approximates both statistics, as shown in the definition of `mean_price` and +Since calculating the mean and median are computationally expensive for large +datasets, we use the [`percentile_agg` hyperfunction][percentile-agg], a SQL +function that is part of the Timescale Toolkit extension. It accurately +approximates both statistics, as shown in the definition of `mean_price` and `median_price` in the continuous aggregate we created earlier in the tutorial: ```sql @@ -400,11 +400,11 @@ GROUP BY bucket, collection_id; ``` ### Daily total volume of top buyers -What days do the most prolific accounts buy on? To answer that question, you -can analyze the top five NFT buyer accounts based on the number of NFT purchases, -and their total daily volume of NFT bought over time. This is a good starting -point to dig deeper into the analysis, as it can help you find days when something -happened that made these users buy a lot of NFTs. For example a dip in ETH prices, +What days do the most prolific accounts buy on? To answer that question, you +can analyze the top five NFT buyer accounts based on the number of NFT purchases, +and their total daily volume of NFT bought over time. This is a good starting +point to dig deeper into the analysis, as it can help you find days when something +happened that made these users buy a lot of NFTs. For example a dip in ETH prices, leading to lower gas fees, or drops of high anticipated collections: ```sql /* Daily total volume of the 5 top buyers */ @@ -423,12 +423,12 @@ ORDER BY bucket DESC ![volume top buyers](https://assets.timescale.com/docs/images/tutorials/nft-tutorial/volume-top-buyers.jpg) ## Complex queries -Let's take a look at some more complex questions you can ask about the NFT -dataset, as well as more complex queries to +Let's take a look at some more complex questions you can ask about the NFT +dataset, as well as more complex queries to retrieve interesting things. ### Calculating 30-min mean and median sale prices of highest trade count NFT from yesterday -What are the mean and median sales prices of the highest traded NFT from the +What are the mean and median sales prices of the highest traded NFT from the past day, in 30-minute intervals? ```sql @@ -456,21 +456,21 @@ bucket |nft |mean_price |median_price | 2021-10-17 22:00:00|Zero [Genesis]| 0.0775| 0.09995839119153871| 2021-10-17 21:30:00|Zero [Genesis]| 0.0555| 0.05801803032917102| -This is a more complex query which uses PostgreSQL Common Table Expressions (CTE) -to first create a sub-table of the data from the past day, called `one_day`. -Then you use the hyperfunction time_bucket to create 30-minute buckets of our data -and use the [percentile_agg hyperfunction][percentile-agg] to find the mean and -median prices for each interval period. Finally, you JOIN on the `assets` table -to get the name of the specific NFT in order to return it along with the mean and +This is a more complex query which uses PostgreSQL Common Table Expressions (CTE) +to first create a sub-table of the data from the past day, called `one_day`. +Then you use the hyperfunction time_bucket to create 30-minute buckets of our data +and use the [percentile_agg hyperfunction][percentile-agg] to find the mean and +median prices for each interval period. Finally, you JOIN on the `assets` table +to get the name of the specific NFT in order to return it along with the mean and median price for each time interval. ### Daily OHLCV data per asset -Open-high-low-close-volume (OHLCV) charts are most often used to illustrate the -price of a financial instrument, most commonly stocks, over time. You can create +Open-high-low-close-volume (OHLCV) charts are most often used to illustrate the +price of a financial instrument, most commonly stocks, over time. You can create OHLCV charts for a single NFT, or get the OHLCV values for a set of NFTs. -This query finds the OHLCV for NFTs with more than 100 sales in a day, as well +This query finds the OHLCV for NFTs with more than 100 sales in a day, as well as the day on which the trades occurred: ```sql @@ -495,26 +495,26 @@ bucket |asset_id|open_price|close_price|low_price |high_price|volum 2021-02-26 01:00:00|18198072| 0.1| 0.1| 0.1| 0.1| 154| 2021-02-26 01:00:00|18198081| 0.25| 0.25| 0.25| 0.25| 155| -In this query, you used the TimescaleDB hyperfunctions [`first()`][first-docs] and -[`last()`][last-docs] to find the open and close prices respectively. These -hyperfunctions allow you to find the value of one column as ordered by another, -by performing a sequential scan through their groups. In this case, you get the -first and last values of the `total_price` column, as ordered by +In this query, you used the TimescaleDB hyperfunctions [`first()`][first-docs] and +[`last()`][last-docs] to find the open and close prices respectively. These +hyperfunctions allow you to find the value of one column as ordered by another, +by performing a sequential scan through their groups. In this case, you get the +first and last values of the `total_price` column, as ordered by the `time` column. [See the docs for more information.][first-docs] -If you want to run this query regularly, you can create a continuous aggregate -for it, which greatly improves the query performance. Moreover, you can remove -the `LIMIT 5` and replace it with an additional WHERE clause filtering for a +If you want to run this query regularly, you can create a continuous aggregate +for it, which greatly improves the query performance. Moreover, you can remove +the `LIMIT 5` and replace it with an additional WHERE clause filtering for a specific time-period to make the query more useful. ### Assets with the biggest intraday price change -Which assets had the biggest intraday sale price change? You can identify -interesting behaviour such as an asset being bought and then sold again for a -much higher (or lower) amount within the same day. This can help you -identify good flips of NFTs, or perhaps owners whose brand elevated the -NFT price thanks to it being part of their collection. +Which assets had the biggest intraday sale price change? You can identify +interesting behaviour such as an asset being bought and then sold again for a +much higher (or lower) amount within the same day. This can help you +identify good flips of NFTs, or perhaps owners whose brand elevated the +NFT price thanks to it being part of their collection. -This query finds the assets with the biggest intraday sale price change in the +This query finds the assets with the biggest intraday sale price change in the last six months: ```sql /* Daily assets sorted by biggest intraday price change in the last 6 month*/ @@ -528,12 +528,12 @@ WITH top_assets AS ( ORDER BY intraday_max_change DESC LIMIT 5 ) -SELECT bucket, nft, url, +SELECT bucket, nft, url, open_price, close_price, intraday_max_change FROM top_assets ta INNER JOIN LATERAL ( - SELECT name AS nft, url FROM assets a + SELECT name AS nft, url FROM assets a WHERE a.id = ta.asset_id ) assets ON TRUE;``` ``` @@ -547,23 +547,23 @@ bucket |nft |url 2021-09-26 02:00:00|Page |https://opensea.io/assets/0xa7206d878c5c3871826dfdb42191c49b1d11f466/1 | 1.48| 4.341| 43.05| ## Resources and next steps -This section contains information about what to do when you've completed the +This section contains information about what to do when you've completed the tutorial, and some links to more resources. ### Claim your limited edition Time Travel Tigers NFT -The first 20 people to complete this tutorial can earn a limited edition NFT -from the -[Time Travel Tigers collection][eon-collection], for free! +The first 20 people to complete this tutorial can earn a limited edition NFT +from the +[Time Travel Tigers collection][eon-collection], for free! -Now that you’ve completed the tutorial, all you need to do is answer the questions -in [this form][nft-form] (including the challenge question), and we’ll send one +Now that you’ve completed the tutorial, all you need to do is answer the questions +in [this form][nft-form] (including the challenge question), and we’ll send one of the limited-edition Eon NFTs to your ETH address (at no cost to you!). You can see all NFTs in the Time Travel Tigers collection live on [OpenSea][eon-collection]. ### Build on the NFT Starter Kit -Congratulations! You’re now up and running with NFT data and TimescaleDB. Check out -our [NFT Starter Kit][nft-starter-kit] to use as your starting point to +Congratulations! You’re now up and running with NFT data and TimescaleDB. Check out +our [NFT Starter Kit][nft-starter-kit] to use as your starting point to build your own, more complex NFT analysis projects. The Starter Kit contains: @@ -574,7 +574,7 @@ The Starter Kit contains: * Pre-built dashboards and charts in [Apache Superset][superset] and [Grafana][grafana] for visualizing your data analysis * Queries to use as a starting point for your own analysis - + ### Learn more about how to use TimescaleDB to store and analyze crypto data Check out these resources for more about using TimescaleDB with crypto data: * [Analyze cryptocurrency market data][analyze-cryptocurrency] @@ -600,4 +600,4 @@ Check out these resources for more about using TimescaleDB with crypto data: [messari]: https://blog.timescale.com/blog/how-messari-uses-data-to-open-the-cryptoeconomy-to-everyone/ [trading-bot]: https://blog.timescale.com/blog/how-i-power-a-successful-crypto-trading-bot-with-timescaledb/ [eon-collection]: https://opensea.io/collection/time-travel-tigers-by-timescale -[nft-form]: https://docs.google.com/forms/d/e/1FAIpQLSdZMzES-vK8K_pJl1n7HWWe5-v6D9A03QV6rys18woGTZr0Yw/viewform?usp=sf_link \ No newline at end of file +[nft-form]: https://docs.google.com/forms/d/e/1FAIpQLSdZMzES-vK8K_pJl1n7HWWe5-v6D9A03QV6rys18woGTZr0Yw/viewform?usp=sf_link diff --git a/timescaledb/tutorials/analyze-nft-data/nft-schema-ingestion.md b/timescaledb/tutorials/analyze-nft-data/nft-schema-ingestion.md index a42a1b2c6857..50fafca00926 100644 --- a/timescaledb/tutorials/analyze-nft-data/nft-schema-ingestion.md +++ b/timescaledb/tutorials/analyze-nft-data/nft-schema-ingestion.md @@ -1,9 +1,9 @@ # NFT schema design and ingestion -A properly designed database schema is essential to efficiently store and -analyze data. This tutorial uses NFT time-series data with multiple supporting -relational tables. +A properly designed database schema is essential to efficiently store and +analyze data. This tutorial uses NFT time-series data with multiple supporting +relational tables. -To help you get familiar with NFT data, here are some of the questions that +To help you get familiar with NFT data, here are some of the questions that could be answered with this dataset: * Which collections have the highest trading volume? * What’s the number of daily transactions of a given collection or asset? @@ -11,12 +11,12 @@ could be answered with this dataset: * Which account made the most NFT trades? * How are the mean and median sale prices correlated? -One theme across all these questions is that most of the insights are about the -sale itself, or the aggregation of sales. So you need to create a schema which -focuses on the time-series aspect of the data. It's also important to make sure -that you can JOIN supporting tables, so you can more easily make queries that -touch both the time-series and the relational tables. TimescaleDB's PostgreSQL -foundation and full-SQL support allows you to easily combine time-series and +One theme across all these questions is that most of the insights are about the +sale itself, or the aggregation of sales. So you need to create a schema which +focuses on the time-series aspect of the data. It's also important to make sure +that you can JOIN supporting tables, so you can more easily make queries that +touch both the time-series and the relational tables. TimescaleDB's PostgreSQL +foundation and full-SQL support allows you to easily combine time-series and relational tables during your analysis. ## Tables and field descriptions @@ -31,18 +31,18 @@ Relational tables (regular PostgreSQL tables): * **accounts**: NFT trading accounts/users ### The nft_sales table -The `nft_sales` table contains information about successful sale transactions -in time-series form. One row represents one successful sale event on the +The `nft_sales` table contains information about successful sale transactions +in time-series form. One row represents one successful sale event on the OpenSea platform. * `id` field is a unique field provided by the OpenSea API. -* `total_price` field is the price paid for the NFTs in ETH (or other -cryptocurrency payment symbol available on OpenSea). -* `quantity` field indicates how many NFTs were sold in the transaction -(can be more than 1). -* `auction_type` field is NULL by default, unless the transaction happened +* `total_price` field is the price paid for the NFTs in ETH (or other +cryptocurrency payment symbol available on OpenSea). +* `quantity` field indicates how many NFTs were sold in the transaction +(can be more than 1). +* `auction_type` field is NULL by default, unless the transaction happened as part of an auction. -* `asset_id` and `collection_id` fields can be used to JOIN the supporting +* `asset_id` and `collection_id` fields can be used to JOIN the supporting relational tables. | Data field | Description | @@ -62,7 +62,7 @@ relational tables. | winner_account | Buyer's account, FK: accounts(id) | ### The assets table -The `assets` table contains information about the assets (NFTs) that are in the +The `assets` table contains information about the assets (NFTs) that are in the transactions. One row represents a unique NFT asset on the OpenSea platform. * `name` field is the name of the NFT, and is not unique. @@ -80,11 +80,11 @@ transactions. One row represents a unique NFT asset on the OpenSea platform. | details | Other extra data fields (JSONB) | ### The collections table -The `collections` table holds information about the NFT collections. One row -represents a unique NFT collection. +The `collections` table holds information about the NFT collections. One row +represents a unique NFT collection. One collection includes multiple unique NFTs (that are in the `assets` table). -* `slug` field is a unique identifier of the collection. +* `slug` field is a unique identifier of the collection. | Data field | Description | |---|---| @@ -96,8 +96,8 @@ One collection includes multiple unique NFTs (that are in the `assets` table). ### The accounts table -The `accounts` table includes the accounts that have participated in at least -one transaction from the nft_sales table. +The `accounts` table includes the accounts that have participated in at least +one transaction from the nft_sales table. One row represents one unique account on the OpenSea platform. * `address` is never NULL and it’s unique @@ -111,10 +111,10 @@ One row represents one unique account on the OpenSea platform. | details | Other extra data fields (JSONB) | ## Database schema -The data types used in the schema for this tutorial have been determined based -on our research and hands-on experience working with the OpenSea API and the -data pulled from OpenSea. Start by running these SQL commands to create the schema. -Alternatively, you can download and run the `schema.sql` +The data types used in the schema for this tutorial have been determined based +on our research and hands-on experience working with the OpenSea API and the +data pulled from OpenSea. Start by running these SQL commands to create the schema. +Alternatively, you can download and run the `schema.sql` file from our [NFT Starter Kit GitHub repository][nft-schema]. ```sql CREATE TABLE collections ( @@ -170,46 +170,46 @@ CREATE INDEX idx_payment_symbol ON nft_sales (payment_symbol); ``` ### Schema design -The `id` field in each table is `BIGINT` because its storage size is 8 bytes in -PostgreSQL (as opposed to `INT`’s 4 bytes) which is needed to make sure this +The `id` field in each table is `BIGINT` because its storage size is 8 bytes in +PostgreSQL (as opposed to `INT`’s 4 bytes) which is needed to make sure this value doesn’t overflow. -For the `quantity` field we suggest using numeric or decimal (which works the -same way in PostgreSQL) as the data type, because in some edge cases we +For the `quantity` field we suggest using numeric or decimal (which works the +same way in PostgreSQL) as the data type, because in some edge cases we experience transactions where the quantity was too big even for BIGINT. -`total_price` needs to be `double precision` because NFT prices often include -many decimals, especially in the case of Ether (ETH) and similar cryptocurrencies +`total_price` needs to be `double precision` because NFT prices often include +many decimals, especially in the case of Ether (ETH) and similar cryptocurrencies which are, functionally, infinitely divisible. -We created an `ENUM` for `auction_type` as this value can only be 'dutch', -'english', or 'min_price', representing the different types of auctions used +We created an `ENUM` for `auction_type` as this value can only be 'dutch', +'english', or 'min_price', representing the different types of auctions used to sell an NFT. -We decided to not store all the data fields that are available from the -OpenSea API, only those that we deem interesting or useful for future analysis. -But we still wanted to keep all of the unused data fields somewhere close, -so we added a `details` JSONB column to each relational table. This column -contains additional information about the record. For example, it includes a +We decided to not store all the data fields that are available from the +OpenSea API, only those that we deem interesting or useful for future analysis. +But we still wanted to keep all of the unused data fields somewhere close, +so we added a `details` JSONB column to each relational table. This column +contains additional information about the record. For example, it includes a `background_color` as a field for the assets. -Note: In our sample dataset, we chose not to include the JSONB data to keep the -size of the dataset easily managable. If you want a dataset with the full JSON -data included, you need to fetch the data directly from the OpenSea API -(see below for steps). +Note: In our sample dataset, we chose not to include the JSONB data to keep the +size of the dataset easily managable. If you want a dataset with the full JSON +data included, you need to fetch the data directly from the OpenSea API +(see below for steps). ## Ingest NFT data -When you have your database and schema created, you can ingest some data to play -with! You have two options to ingest NFT data for this tutorial: +When you have your database and schema created, you can ingest some data to play +with! You have two options to ingest NFT data for this tutorial: * Fetch data directly from the OpenSea API * Download sample data and import it ### Fetch data directly from the OpenSea API -To ingest data from the OpenSea API, you can use the `opensea_ingest.py` script included -in the starter kit repository on GitHub. The script connects to the OpenSea -API `/events` endpoint, and fetches data from the specified time period (no API +To ingest data from the OpenSea API, you can use the `opensea_ingest.py` script included +in the starter kit repository on GitHub. The script connects to the OpenSea +API `/events` endpoint, and fetches data from the specified time period (no API key required!). @@ -240,7 +240,7 @@ key required!). ```python python opensea_ingest.py ``` - This will start ingesting data in batches, 300 rows at a time: + This starts ingesting data in batches, 300 rows at a time: ```bash Start ingesting data between 2021-10-01 00:00:00+00:00 and 2021-10-06 23:59:59+00:00 --- @@ -250,13 +250,13 @@ key required!). Data has been backfilled until this time: 2021-10-06 23:51:31.140126+00:00 --- ``` - You can stop the ingesting process anytime (Ctrl+C), otherwise the script - will run until all the transactions have been ingested from the given time period. + You can stop the ingesting process anytime (Ctrl+C), otherwise the script + runs until all the transactions have been ingested from the given time period. ### Download sample NFT data -You can download and insert sample CSV files that contain NFT sales data from +You can download and insert sample CSV files that contain NFT sales data from 1 October 2021 to 7 October 2021. @@ -272,7 +272,7 @@ You can download and insert sample CSV files that contain NFT sales data from ```bash psql -x "postgres://host:port/tsdb?sslmode=require" ``` - If you're using Timescale Cloud, the instructions under `How to Connect` provide a + If you're using Timescale Cloud, the instructions under `How to Connect` provide a customized command to run to connect directly to your database. 1. Import the CSV files in this order (it can take a few minutes in total): ```bash @@ -286,10 +286,10 @@ You can download and insert sample CSV files that contain NFT sales data from After ingesting NFT data, you can try running some queries on your database: ```sql -SELECT count(*), MIN(time) AS min_date, MAX(time) AS max_date FROM nft_sales +SELECT count(*), MIN(time) AS min_date, MAX(time) AS max_date FROM nft_sales ``` [nft-starter-kit]: https://github.com/timescale/nft-starter-kit [ingest-script]: https://github.com/timescale/nft-starter-kit/blob/master/opensea_ingest.py [sample-data]: https://assets.timescale.com/docs/downloads/nft_sample.zip -[nft-schema]: https://github.com/timescale/nft-starter-kit/blob/master/schema.sql \ No newline at end of file +[nft-schema]: https://github.com/timescale/nft-starter-kit/blob/master/schema.sql diff --git a/timescaledb/tutorials/aws-lambda/3rd-party-api-ingest.md b/timescaledb/tutorials/aws-lambda/3rd-party-api-ingest.md index ea9a44c9b170..109544227544 100644 --- a/timescaledb/tutorials/aws-lambda/3rd-party-api-ingest.md +++ b/timescaledb/tutorials/aws-lambda/3rd-party-api-ingest.md @@ -60,7 +60,7 @@ def fetch_stock_data(symbol, month): Args: symbol (string): ticker symbol - month (int): month value as an integer 1-24 (for example month=4 will fetch data from the last 4 months) + month (int): month value as an integer 1-24 (for example month=4 fetches data from the last 4 months) Returns: list of tuples: intraday (candlestick) stock data @@ -216,7 +216,7 @@ an EventBridge trigger. This creates a rule using a [`cron` expression][cron-exa If you get an error saying `Parameter ScheduleExpression is not valid`, you -might have made a mistake in the cron expression. Check the [cron expression examples](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-create-rule-schedule.html#eb-cron-expressions) +might have made a mistake in the cron expression. Check the [cron expression examples](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-create-rule-schedule.html#eb-cron-expressions) documentation. diff --git a/timescaledb/tutorials/aws-lambda/continuous-deployment.md b/timescaledb/tutorials/aws-lambda/continuous-deployment.md index 641154ef9a4b..a1f423409585 100644 --- a/timescaledb/tutorials/aws-lambda/continuous-deployment.md +++ b/timescaledb/tutorials/aws-lambda/continuous-deployment.md @@ -99,7 +99,7 @@ Let's connect the Github repository AWS Lambda using Github actions. ### Adding your AWS credentials to the repository -You need to add your AWS credentials to the repository so it will have permission to connect to Lambda. +You need to add your AWS credentials to the repository so it has permission to connect to Lambda. You can do this by adding [GitHub secrets](https://docs.github.com/en/actions/reference/encrypted-secrets) using the GitHub CLI. @@ -178,7 +178,7 @@ to auto-deploy to AWS Lambda. function_name: lambda-cd source: function.py ``` - This configuration will make sure to deploy the code to Lambda when there's a new push to the main branch. + This configuration deploys the code to Lambda when there's a new push to the main branch. As you can also see in the YAML file, the AWS credentials are accessed using the `${{ secrets.AWS_ACCESS_KEY_ID }}` syntax. diff --git a/timescaledb/tutorials/grafana/create-dashboard-and-panel.md b/timescaledb/tutorials/grafana/create-dashboard-and-panel.md index 2d48bfd6bd8c..95f4d1a66ccb 100644 --- a/timescaledb/tutorials/grafana/create-dashboard-and-panel.md +++ b/timescaledb/tutorials/grafana/create-dashboard-and-panel.md @@ -9,8 +9,8 @@ data. ### Prerequisites -To complete this tutorial, you will need a cursory knowledge of the Structured Query -Language (SQL). The tutorial will walk you through each SQL command, but it will be +To complete this tutorial, you need a cursory knowledge of the Structured Query +Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. * To start, [install TimescaleDB][install-timescale]. @@ -23,31 +23,31 @@ on how to use TimescaleDB. ### Build a new dashboard -We will start by creating a new dashboard. In the far left of the Grafana user +Start by creating a new dashboard. In the far left of the Grafana user interface, you'll see a '+' icon. If you hover over it, you'll see a 'Create' menu, within which is a 'Dashboard' option. Select that 'Dashboard' option. After creating a new dashboard, you'll see a 'New Panel' screen, with options for 'Add Query' and 'Choose Visualization'. In the future, if you already have a dashboard with panels, you can click on the '+' icon at the **top** of the Grafana user -interface, which will enable you to add a panel to an existing dashboard. +interface, which enables you to add a panel to an existing dashboard. To proceed with our tutorial, let's add a new visualization by clicking on the 'Choose Visualization' option. -At this point, you'll have several options for different Grafana visualizations. We will -choose the first option, the 'Graph' visualization. +At this point, you'll have several options for different Grafana visualizations. +Choose the first option, the 'Graph' visualization. Grafana visualizations to choose from -There are multiple ways to configure our panel, but we will accept all the defaults +There are multiple ways to configure our panel, but you can accept all the defaults and create a simple 'Lines' graph. In the far left section of the Grafana user interface, select the 'Queries' tab. How to create a new Grafana query -Instead of using the Grafana query builder, we will edit our query directly. In the +Instead of using the Grafana query builder, edit the query directly. In the view, click on the 'Edit SQL' button at the bottom. Edit custom SQL queries in Grafana @@ -72,15 +72,15 @@ GROUP BY day ORDER BY day; ``` -We will need to alter this query to support Grafana's unique query syntax. +We need to alter this query to support Grafana's unique query syntax. #### Modifying the SELECT statement -First, we will modify the `date_trunc` function to use the TimescaleDB `time_bucket` +First, modify the `date_trunc` function to use the TimescaleDB `time_bucket` function. You can consult the TimescaleDB [API Reference on time_bucket][time-bucket-reference] for more information on how to use it properly. -Let's examine the `SELECT` portion of this query. First, we will bucket our results into +Let's examine the `SELECT` portion of this query. First, bucket the results into one day groupings using the `time_bucket` function. If you set the 'Format' of a Grafana panel to be 'Time series', for use in Graph panel for example, then the query must return a column named `time` that returns either a SQL `datetime` or any numeric datatype @@ -101,11 +101,11 @@ FROM rides #### The Grafana \_\_timeFilter function Grafana time-series panels include a tool that enables the end-user to filter on a given -time range. A "time filter", if you will. Not surprisingly, Grafana has a way to link the +time range, like a "time filter". Not surprisingly, Grafana has a way to link the user interface construct in a Grafana panel with the query itself. In this case, the `$__timefilter()` function. -In the modified query below, we will use the `$__timefilter()` function +In the modified query below, use the `$__timefilter()` function to set the `pickup_datetime` column as the filtering range for our visualizations. ```sql @@ -122,7 +122,7 @@ WHERE $__timeFilter(pickup_datetime) Finally, we want to group our visualization by the time buckets we've selected, and we want to order the results by the time buckets as well. So, our `GROUP BY` -and `ORDER BY` statements will reference `time`. +and `ORDER BY` statements reference `time`. With these changes, this is our final Grafana query: @@ -144,7 +144,7 @@ When we visualize this query in Grafana, we see the following: Remember to set the time filter in the upper right corner of your Grafana dashboard. - If you're using the pre-built sample dataset for this example, you will want to set + If you're using the pre-built sample dataset for this example, you can set your time filter around January 1st, 2016. @@ -163,7 +163,7 @@ GROUP BY time ORDER BY time ``` -When we visualize this query, it will look like this: +When you visualize this query, it looks like this: Visualizing time-series data in Grafana diff --git a/timescaledb/tutorials/grafana/geospatial-dashboards.md b/timescaledb/tutorials/grafana/geospatial-dashboards.md index 0b037d67fd27..0b6b4d960d59 100644 --- a/timescaledb/tutorials/grafana/geospatial-dashboards.md +++ b/timescaledb/tutorials/grafana/geospatial-dashboards.md @@ -1,13 +1,13 @@ # Use Grafana to visualize geospatial data stored in TimescaleDB -Grafana includes a WorldMap visualization that will help you see geospatial data overlaid +Grafana includes a WorldMap visualization that help you see geospatial data overlaid atop a map of the world. This can be helpful to understand how data changes based on its location. ### Prerequisites -To complete this tutorial, you will need a cursory knowledge of the Structured Query -Language (SQL). The tutorial will walk you through each SQL command, but it will be +To complete this tutorial, you need a cursory knowledge of the Structured Query +Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. * To start, [install TimescaleDB][install-timescale]. @@ -30,10 +30,10 @@ The NYC Taxi Cab data also contains the location of each ride pickup. In the near Times Square. Let's build on that query and **visualize rides whose distance traveled was greater than five miles in Manhattan**. -We can do this in Grafana using the 'Worldmap Panel'. We will start by creating a +We can do this in Grafana using the 'Worldmap Panel'. Start by creating a new panel, selecting 'New Visualization', and selecting the 'Worldmap Panel'. -Once again, we will edit our query directly. In the Query screen, be sure +Once again, you can edit the query directly. In the Query screen, be sure to select your NYC Taxicab Data as the data source. In the 'Format as' dropdown, select 'Table'. Click on 'Edit SQL' and enter the following query in the text window: @@ -56,8 +56,8 @@ LIMIT 500; ``` Let's dissect this query. First, we're looking to plot rides with visual markers that -denote the trip distance. Trips with longer distances will get different visual treatments -on our map. We will use the `trip_distance` as the value for our plot. We will store +denote the trip distance. Trips with longer distances get different visual treatments +on our map. Use the `trip_distance` as the value for our plot, and store this result in the `value` field. In the second and third lines of the `SELECT` statement, we are using the `pickup_longitude` @@ -82,16 +82,15 @@ left of the Grafana user interface. You'll see options for 'Map Visual Options', and more. First, make sure the 'Map Data Options' are set to 'table' and 'current'. Then in -the 'Field Mappings' section. We will set the 'Table Query Format' to be ‘Table'. +the 'Field Mappings' section. Set the 'Table Query Format' to be ‘Table'. We can map the 'Latitude Field' to our `latitude` variable, the 'Longitude Field' to our `longitude` variable, and the 'Metric' field to our `value` variable. In the 'Map Visual Options', set the 'Min Circle Size' to 1 and the 'Max Circle Size' to 5. -In the 'Threshold Options' set the 'Thresholds' to '2,5,10'. This will auto configure a set -of colors. Any plot whose `value` is below 2 will be a color, any `value` between 2 and 5 will -be another color, any `value` between 5 and 10 will be a third color, and any `value` over 10 -will be a fourth color. +In the 'Threshold Options' set the 'Thresholds' to '2,5,10'. This auto configures a set +of colors. Any plot whose `value` is below 2 is a color, any `value` between 2 and 5 is another color, any `value` between 5 and 10 is a third color, and any `value` over 10 +is a fourth color. Your configuration should look like this: diff --git a/timescaledb/tutorials/grafana/grafana-variables.md b/timescaledb/tutorials/grafana/grafana-variables.md index 8e08ee628501..06c59009ccd4 100644 --- a/timescaledb/tutorials/grafana/grafana-variables.md +++ b/timescaledb/tutorials/grafana/grafana-variables.md @@ -4,8 +4,8 @@ Grafana variables enable end-users of your dashboards to filter and customize vi ### Prerequisites -To complete this tutorial, you will need a cursory knowledge of the Structured Query -Language (SQL). The tutorial will walk you through each SQL command, but it will be +To complete this tutorial, you need a cursory knowledge of the Structured Query +Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. * To start, [install TimescaleDB][install-timescale]. @@ -17,7 +17,7 @@ to that database. Be sure to follow the full tutorial if you're interested in ba on how to use TimescaleDB. ### Creating a variable -Our goal here will be to create a variable which controls the type of ride displayed in the +Our goal here is to create a variable which controls the type of ride displayed in the visual, based on the payment type used for the ride. There are several types of payments, which we can see in the `payment_types` table: @@ -42,19 +42,19 @@ in our queries. To create a new variable, go to your Grafana dashboard settings, navigate to the 'Variable' option in the side-menu, and then click the 'Add variable' button. -In this case, we use the 'Query' type, where our variable will be defined as the results +In this case, we use the 'Query' type, where your variable is defined as the results of SQL query. Under the 'General' section, we'll name our variable `payment_type` and give it a type of `Query`. -Then, we'll assign it the label of "Payment Type", which is how it will appear in a drop-down menu. +Then, we'll assign it the label of "Payment Type", which is how it appears in a drop-down menu. -We will select our data source and supply the query: +Select your data source and supply the query: ```sql SELECT payment_type FROM payment_types; ``` -Turn on 'Multi-value' and 'Include All option'. This will enable users of your dashboard to +Turn on 'Multi-value' and 'Include All option'. This enables users of your dashboard to select more than one payment type. Our configuration should look like this: Using a variable to filter the results in a Grafana visualization @@ -69,7 +69,7 @@ notice is that now that we've defined a variable for this dashboard, there's now for that variable in the upper left hand corner of the panel. We can use this variable to filter the results of our query using the `WHERE` clause in SQL. -We will check and see if `rides.payment_type` is in the array of the variable, which we've +Check and see if `rides.payment_type` is in the array of the variable, which we've named `$payment_type`. Let's modify our earlier query like so: @@ -106,7 +106,7 @@ automatically create a graph panel for **each** of the payment types selected so that we can see those queries side-by-side. Let's first create a new graph panel that uses the `$payment_type` variable. -This will be our query: +This is your query: ```sql SELECT @@ -132,7 +132,7 @@ change the 'Title' to the following: In the 'Repeating' section, select the variable you want to generate dynamic panels based on. In this case, `payment_type`. You can have your dynamic panels -generate vertically or horizontally. In our case we will opt for repeating +generate vertically or horizontally. In this case, opt for repeating panels, 2 per row, horizontally: Create a dynamic panel in Grafana @@ -181,7 +181,7 @@ automatically create a graph panel for **each** of the payment types selected so that we can see those queries side-by-side. Let's first create a new graph panel that uses the `$payment_type` variable. -This will be our query: +This is your query: ```sql SELECT @@ -207,7 +207,7 @@ change the 'Title' to the following: In the 'Repeating' section, select the variable you want to generate dynamic panels based on. In this case, `payment_type`. You can have your dynamic panels -generate vertically or horizontally. In our case we will opt for repeating +generate vertically or horizontally. In this case, opt for repeating panels, 2 per row, horizontally: Create a dynamic panel in Grafana diff --git a/timescaledb/tutorials/grafana/setup-alerts.md b/timescaledb/tutorials/grafana/setup-alerts.md index 36ca122803ce..b3ff33c178ef 100644 --- a/timescaledb/tutorials/grafana/setup-alerts.md +++ b/timescaledb/tutorials/grafana/setup-alerts.md @@ -14,8 +14,8 @@ use. ### Prerequisites [](prereqs) -To complete this tutorial, you will need a cursory knowledge of the Structured Query -Language (SQL). The tutorial will walk you through each SQL command, but it will be +To complete this tutorial, you need a cursory knowledge of the Structured Query +Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. * To start, [install TimescaleDB][install-timescale]. @@ -26,7 +26,7 @@ Once your installation of TimescaleDB and Grafana are complete, follow the to that database. Be sure to follow the full tutorial if you're interested in background on how to use TimescaleDB. -For this tutorial, you will need to first create various Grafana visualizations before +For this tutorial, you need to first create various Grafana visualizations before setting up alerts. Use our [full set of Grafana tutorials][tutorial-grafana] to obtain the necessary background on Grafana. In this tutorial, we'll simply inform you of which Grafana visualization to create and the query to use. @@ -53,7 +53,7 @@ There are some downsides to using Grafana for alerts: - You can only set up alerts on graph visualizations with time-series output - Thus, you can't use table output or anything else that is not time-series data -Ultimately, for most cases, this will be okay because: +Ultimately, for most cases, this is okay because: - You're mainly dealing with time-series data for alerts - You can usually turn any other visualization (e.g., a Gauge or a Single Stat) into a time-series graph @@ -71,7 +71,7 @@ There are two parts of alerting in Grafana: **Alert Rules** and Alert Rules are the most important part of Grafana alerts. Rules are conditions that you define for when an alert gets triggered. Grafana -evaluates rules according to a scheduler and you will need to specify +evaluates rules according to a scheduler and you need to specify how often rules are evaluated. In plain language, examples of rules could be: @@ -83,7 +83,7 @@ In plain language, examples of rules could be: #### Notification channels Notification channels are where alerts get sent once alert rules are triggered. -If you have no notification channels, then your alerts will only show up on Grafana +If you have no notification channels, then your alerts only show up on Grafana. Examples of channels include tools your team may already use: @@ -108,7 +108,7 @@ NO DATA. ### Alert 1: Integrating TimescaleDB, Grafana, and Slack [](alert1) Our goal in this first alert is to proactively notify us in Slack when we -have sustained high memory usage over time. We will connect Grafana to +have sustained high memory usage over time. You can connect Grafana to Slack using webhooks. #### Step 0: Set up your Grafana visualization @@ -140,14 +140,14 @@ We'll define our alert so that we are notified when average memory consumption is greater than 90% for 5 consecutive minutes. Set the frequency for the rule to be evaluated at one minute. This means that the -graph will be polled every minute to determine whether or not an alert should +graph is polled every minute to determine whether or not an alert should be sent. Then set the evaluation period for five minutes. This configures Grafana to look at the alert in five minute windows. You won't be able to change the 'When' portion of the query, but you can -set the 'Is Above' threshold to 90. In other words, we will be alerted whenever +set the 'Is Above' threshold to 90. In other words, you receive an alert whenever the memory used is above 90%. Use the defaults for the remainder of the configuration. Your configuration should @@ -157,27 +157,27 @@ look like this: #### Step 2: Configure Slack for Grafana alerts -In most cases, you will want to build a tiered alert system where less critical +In most cases, you want to build a tiered alert system where less critical alerts go to less intrusive channels (such as Slack), while more critical alerts go to high attention channels (such as calling or texting someone). -Let's start by configuring Slack. To setup Slack, you will need your Slack +Let's start by configuring Slack. To setup Slack, you need your Slack Administrator to give you the webhoo URL to post to a channel. You can [follow these instructions][slack-webhook-instructions] to obtain this information. To configure a notification channel, go to the 'Bell' icon in your main -dashboard. It will be on the far left of the screen. Click on the 'Notification +dashboard. It is on the far left of the screen. Click on the 'Notification Channels' option. In the Notification Channels screen, click 'Add channel'. -In the resulting form, set up the name of your Slack Channel. This will -show up in drop-downs throughout your Grafana instance, so choose -something descriptive that other users of your Grafana instance will +In the resulting form, set up the name of your Slack Channel. This +shows up in drop-downs throughout your Grafana instance, so choose +something descriptive that other users of your Grafana instance can immediately identify with. Choose 'Slack' as the type and toggle 'Include image' and 'Send reminders' on. Enter the Webhook URL supplied by your Slack Admin and choose a -Username that will be descriptive to users of your Slack instance. If you +Username that is descriptive to users of your Slack instance. If you want to @-mention someone or a group with your alert post in Slack, you can do so in the 'Mention' field. @@ -205,12 +205,12 @@ this within five minutes or so: PagerDuty is a popular choice for managing support and incident responses for medium-large teams. Many of the steps in this section are similar to the steps -in the Slack section. With PagerDuty, we will need to set up alerts using direct +in the Slack section. With PagerDuty, you need to set up alerts using direct integration with the PagerDuty API. #### Step 0: Set up your Grafana visualization -In this section, we will monitor our database in case we run out of disk space +In this section, you monitor the database in case you run out of disk space unexpectedly. This is the kind of alert where you'd want to notify someone immediately. diff --git a/timescaledb/tutorials/grafana/visualize-missing-data.md b/timescaledb/tutorials/grafana/visualize-missing-data.md index 8f77f92f509e..40abcb100bbd 100644 --- a/timescaledb/tutorials/grafana/visualize-missing-data.md +++ b/timescaledb/tutorials/grafana/visualize-missing-data.md @@ -15,11 +15,11 @@ handling missing time-series data (using the TimescaleDB/PostgreSQL data source natively available in Grafana). ### Prerequisites [](prereqs) -To complete this tutorial, you will need a cursory knowledge of the Structured Query -Language (SQL). The tutorial will walk you through each SQL command, but it will be +To complete this tutorial, you need a cursory knowledge of the Structured Query +Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. -You will also need: +You also need: * Time-series dataset with missing data (Note: in case you don't have one handy, we include an optional step for creating one below.) diff --git a/timescaledb/tutorials/index.md b/timescaledb/tutorials/index.md index 69b51d2b16cb..b6e6f781ef04 100644 --- a/timescaledb/tutorials/index.md +++ b/timescaledb/tutorials/index.md @@ -1,5 +1,5 @@ # Tutorials -We've created a host of code-focused tutorials that will help you get +We've created a host of code-focused tutorials that help you get started with *TimescaleDB*. Most of these tutorials require a working [installation of TimescaleDB][install-timescale]. diff --git a/timescaledb/tutorials/monitor-django-with-prometheus.md b/timescaledb/tutorials/monitor-django-with-prometheus.md index 73a286ae15f9..756166b09e85 100644 --- a/timescaledb/tutorials/monitor-django-with-prometheus.md +++ b/timescaledb/tutorials/monitor-django-with-prometheus.md @@ -16,7 +16,7 @@ A machine with the following installed: Since machines commonly have multiple versions of Python -installed, in this tutorial we will call pip using the `python -m pip [foo]` +installed, in this tutorial we call `pip` using the `python -m pip [foo]` syntax instead of the `pip [foo]` syntax. This is to ensure that pip installs new components for the version of Python that we are using. @@ -37,7 +37,7 @@ and run: django-admin startproject mysite ``` -This will create a mysite directory in your current directory, that looks +This creates a `mysite` directory in your current directory, that looks something like this ([more here][django-first-app]): ``` @@ -81,7 +81,7 @@ You should see a "Congratulations!" page, with a rocket taking off. ## Step 2 - Export prometheus-style monitoring metrics from your Django application -We will use the [django-prometheus][get-django-prometheus] package for +We use the [django-prometheus][get-django-prometheus] package for exporting prometheus-style monitoring metrics from our Django application. ### Install django-prometheus diff --git a/timescaledb/tutorials/monitor-mst-with-prometheus.md b/timescaledb/tutorials/monitor-mst-with-prometheus.md index 458a417888fa..69321ad4d0e3 100644 --- a/timescaledb/tutorials/monitor-mst-with-prometheus.md +++ b/timescaledb/tutorials/monitor-mst-with-prometheus.md @@ -2,16 +2,16 @@ You can get more insights into the performance of your managed TimescaleDB database by monitoring it using [Prometheus][get-prometheus], a popular -open-source metrics-based systems monitoring solution. This tutorial will -take you through setting up a Prometheus endpoint for a database running +open-source metrics-based systems monitoring solution. This tutorial +takes you through setting up a Prometheus endpoint for a database running in a [managed service for TimescaleDB][timescale-mst]. To create a monitoring system to ingest and analyze Prometheus metrics from your managed TimescaleDB instance, you can use [Promscale][promscale]! -This will expose metrics from the [node_exporter][node-exporter-metrics] as well +This exposes metrics from the [node_exporter][node-exporter-metrics] as well as [pg_stats][pg-stats-metrics] metrics. ### Prerequisites -In order to proceed with this tutorial, you will need a managed service for TimescaleDB database. +In order to proceed with this tutorial, you need a managed service for TimescaleDB database. To create one, see these instructions for how to [get started with managed service for TimescaleDB][timescale-mst-get-started] @@ -22,15 +22,15 @@ integrations, pictured below. Service Integrations Menu Option -This will present you with the option to add a Prometheus integration point. +This presents you with the option to add a Prometheus integration point. Select the plus icon to add a new endpoint and give it a name of your choice. We've named ours `endpoint_dev`. Create a Prometheus endpoint on Timescale Cloud Furthermore, notice that you are given basic authentication information and a port number -in order to access the service. This will be used when setting up your Prometheus -installation, in the `prometheus.yml` configuration file. This will enable you to make +in order to access the service. This is used when setting up your Prometheus +installation, in the `prometheus.yml` configuration file. This enables you to make this managed TimescaleDB endpoint a target for Prometheus to scrape. Here's a sample configuration file you can use when you setup your Prometheus diff --git a/timescaledb/tutorials/nfl-analytics/advanced-analysis.md b/timescaledb/tutorials/nfl-analytics/advanced-analysis.md index 65a3f14d4340..e73c1c5bea21 100644 --- a/timescaledb/tutorials/nfl-analytics/advanced-analysis.md +++ b/timescaledb/tutorials/nfl-analytics/advanced-analysis.md @@ -43,7 +43,7 @@ are null for the first row. This row represents the average yard data for the football. ### Average and median yards run per game by type of player -For this query, you will use another one of the TimescaleDB percentile functions +For this query, you use another one of the TimescaleDB percentile functions called `percentile_agg`. You can use the `percentile_agg` function to find the 50th percentile, which is the approximate median. diff --git a/timescaledb/tutorials/nfl-analytics/index.md b/timescaledb/tutorials/nfl-analytics/index.md index cb32f6563fbf..ffd1a4549e93 100644 --- a/timescaledb/tutorials/nfl-analytics/index.md +++ b/timescaledb/tutorials/nfl-analytics/index.md @@ -1,6 +1,6 @@ # Analyze data using TimescaleDB continuous aggregates and hyperfunctions -This tutorial is a step-by-step guide on how to use TimescaleDB for analyzing time-series data. We will show you how to utilize TimescaleDB's continuous aggregates and hyperfunctions for faster and more efficient queries. -We will also take advantage of a unique capability of TimescaleDB: the ability to +This tutorial is a step-by-step guide on how to use TimescaleDB for analyzing time-series data. We show you how to utilize TimescaleDB's continuous aggregates and hyperfunctions for faster and more efficient queries. +We also take advantage of a unique capability of TimescaleDB: the ability to join time-series data with relational data. The dataset that we're using is provided by the National Football League (NFL) diff --git a/timescaledb/tutorials/nfl-analytics/join-with-relational.md b/timescaledb/tutorials/nfl-analytics/join-with-relational.md index 86c468da960a..88c66ad91ec8 100644 --- a/timescaledb/tutorials/nfl-analytics/join-with-relational.md +++ b/timescaledb/tutorials/nfl-analytics/join-with-relational.md @@ -1,7 +1,7 @@ ## Join time-series data with relational data for deeper analysis TimescaleDB is packaged as a PostgreSQL extension. As such, TimescaleDB is -PostgreSQL with super-powers. You can do anything in TimescaleDB that you can +PostgreSQL with super-powers. You can do anything in TimescaleDB that you can in PostgreSQL, including joining tables and combining data for further analysis. ### The Mile-High advantage @@ -13,38 +13,43 @@ visit Denver's Mile-High stadium are at a disadvantage because unlike the home t Earlier we ingested stadium data. Now we can run a query to see the performance of players when they are playing at Mile High Stadium. -Like many of the queries in our analysis section, for this example you will utilize the relational nature of this data. You will join the `tracking`, `player`, and `game` tables to compare the average acceleration and yards run of individual players when they are performing in stadiums outside of Denver versus when they are playing within Denver. The columns `avg_acc_den` and `avg_yards_den` represent the acceleration and yard statistics while in Denver. +Like many of the queries in our analysis section, for this example you use the +relational nature of this data. You join the `tracking`, `player`, and `game` +tables to compare the average acceleration and yards run of individual players +when they are performing in stadiums outside of Denver versus when they are +playing within Denver. The columns `avg_acc_den` and `avg_yards_den` represent +the acceleration and yard statistics while in Denver. ```sql WITH stat_vals AS ( -- This table collects the summed yard and avg acceleration data of a player during one game - SELECT a.player_id, displayname, SUM(yards) AS yards, AVG(acc) AS acc, team, gameid + SELECT a.player_id, displayname, SUM(yards) AS yards, AVG(acc) AS acc, team, gameid FROM player_yards_by_game a - LEFT JOIN player p ON a.player_id = p.player_id + LEFT JOIN player p ON a.player_id = p.player_id GROUP BY a.player_id, displayname, gameid, team ), team_data AS ( -- This table gets us the team information so that we can filter on teams SELECT a.player_id, acc, yards, a.gameid, - CASE - WHEN a.team = 'away' THEN g.visitor_team - WHEN a.team = 'home' THEN g.home_team - ELSE NULL + CASE + WHEN a.team = 'away' THEN g.visitor_team + WHEN a.team = 'home' THEN g.home_team + ELSE NULL END AS team_name, g.home_team FROM stat_vals a - LEFT JOIN game g ON a.gameid = g.game_id + LEFT JOIN game g ON a.gameid = g.game_id ), avg_stats AS ( -- This table takes the avg acceleration and yards run for players when they are not in denver -- and then when they are in denver -SELECT p.player_id, p.displayname, - AVG(acc) FILTER (WHERE team_name != 'DEN' AND home_team !='DEN') AS avg_acc, +SELECT p.player_id, p.displayname, + AVG(acc) FILTER (WHERE team_name != 'DEN' AND home_team !='DEN') AS avg_acc, AVG(acc) FILTER (WHERE team_name != 'DEN' AND home_team = 'DEN') AS avg_acc_den, - AVG(yards) FILTER (WHERE team_name != 'DEN' AND home_team !='DEN') AS avg_yards, - AVG(yards) FILTER (WHERE team_name != 'DEN' AND home_team = 'DEN') AS avg_yards_den, + AVG(yards) FILTER (WHERE team_name != 'DEN' AND home_team !='DEN') AS avg_yards, + AVG(yards) FILTER (WHERE team_name != 'DEN' AND home_team = 'DEN') AS avg_yards_den, COUNT(gameid) FILTER (WHERE team_name != 'DEN' AND home_team !='DEN') AS games, COUNT(gameid) FILTER (WHERE team_name != 'DEN' AND home_team ='DEN') AS games_den FROM team_data t -LEFT JOIN player p ON t.player_id = p.player_id +LEFT JOIN player p ON t.player_id = p.player_id GROUP BY p.player_id, p.displayname ) SELECT * FROM avg_stats @@ -61,14 +66,18 @@ You should see this: |2558194|Josh Reynolds| 2.26] |2.40] |527.80]|529.16 |15|1| |2543498|Brandin Cooks| 2.26 |2.26 |975.61| 875.90 |15|1| -You can see that generally, it appears many players may have worse acceleration and average number of yards run per game while playing in Denver. However, it is good to note that you only have one sample point showing Denver averages which effects statistical significance. +You can see that generally, it appears many players may have worse acceleration and average number of yards run per game while playing in Denver. However, it is good to note that you only have one sample point showing Denver averages which effects statistical significance. ### Grass vs. turf, the eternal (football) question Players often say they "feel" faster on artificial turf. How much faster are they in reality? -Using this query you will join the `tracking`, `stadium_info`, `game`, and `player` tables, to extract the average acceleration that a player has while using turf verses grass. The column `avg_acc_turf` represents the players average acceleration while using artificial turf, and `avg_acc_grass` represents their average acceleration while on grass. +Using this query you join the `tracking`, `stadium_info`, `game`, and `player` +tables, to extract the average acceleration that a player has while using turf +verses grass. The column `avg_acc_turf` represents the players average +acceleration while using artificial turf, and `avg_acc_grass` represents their +average acceleration while on grass. ```sql WITH acceleration AS ( @@ -78,18 +87,18 @@ WITH acceleration AS ( GROUP BY a.player_id, a.gameid, a.team ), team_data AS ( -- This table gets us the surface information so that we can filter on turf type - SELECT a.player_id, acc, g.game_id, si."location", si.surface + SELECT a.player_id, acc, g.game_id, si."location", si.surface FROM acceleration a - LEFT JOIN game g ON a.gameid = g.game_id - LEFT JOIN stadium_info si on g.home_team = si.team_abbreviation + LEFT JOIN game g ON a.gameid = g.game_id + LEFT JOIN stadium_info si on g.home_team = si.team_abbreviation ), avg_stats AS ( -- This table takes the avg acceleration and yards run for players when they are not in denver -- and then when they are in denver -SELECT p.player_id, p.displayname, - AVG(acc) FILTER (WHERE surface LIKE '%Turf%') AS avg_acc_turf, +SELECT p.player_id, p.displayname, + AVG(acc) FILTER (WHERE surface LIKE '%Turf%') AS avg_acc_turf, AVG(acc) FILTER (WHERE surface NOT LIKE '%Turf%') AS avg_acc_grass FROM team_data t -LEFT JOIN player p ON t.player_id = p.player_id +LEFT JOIN player p ON t.player_id = p.player_id GROUP BY p.player_id, p.displayname ) SELECT * FROM avg_stats @@ -107,7 +116,7 @@ You should see this: |2552374 |Ameer Abdullah |2.76 |2.48| |2552408 |Darren Waller |2.69 |2.83| -For many players, it appears that they are indeed faster on artificial turf. This 'feeling' of increased speed may in fact be grounded in reality. +For many players, it appears that they are indeed faster on artificial turf. This 'feeling' of increased speed may in fact be grounded in reality. ### We're going to overtime! diff --git a/timescaledb/tutorials/nfl-analytics/play-visualization.md b/timescaledb/tutorials/nfl-analytics/play-visualization.md index f3a423073dac..fab0c59772d8 100644 --- a/timescaledb/tutorials/nfl-analytics/play-visualization.md +++ b/timescaledb/tutorials/nfl-analytics/play-visualization.md @@ -1,7 +1,7 @@ ## Visualize pre-snap positions and player movement Interestingly, the NFL data set includes data on player movement within each football play. Visualizing the changes in your time-series data can often provide -even more insight. In this section, we will use `pandas` and `matplotlib` to +even more insight. In this section, we use `pandas` and `matplotlib` to visually depict a play during the season. ## Install pandas and matplotlib diff --git a/timescaledb/tutorials/nfl-fantasy-league.md b/timescaledb/tutorials/nfl-fantasy-league.md index c343f29797c0..cf01f86a41ed 100644 --- a/timescaledb/tutorials/nfl-fantasy-league.md +++ b/timescaledb/tutorials/nfl-fantasy-league.md @@ -2,22 +2,22 @@ This tutorial is a step-by-step guide on how to ingest and analyze American football data with TimescaleDB. -The dataset that we're using is provided by the National Football League (NFL) and contains data about -all the passing plays of the 2018 NFL season. We're going to ingest this dataset with Python into TimescaleDB -and start exploring it to discover interesting things about players that could help you win your next fantasy season. +The dataset that we're using is provided by the National Football League (NFL) and contains data about +all the passing plays of the 2018 NFL season. We're going to ingest this dataset with Python into TimescaleDB +and start exploring it to discover interesting things about players that could help you win your next fantasy season. If you aren't an NFL fan, this tutorial can still help you get started with TimescaleDB and explore a real world dataset with SQL and Python. 1. [Create tables](#create-tables) -2. [Ingest data from CSV files](#ingest-data-from-csv-files) +2. [Ingest data from CSV files](#ingest-data-from-csv-files) 3. [Analyze NFL data](#analyze-nfl-data) 4. [Visualize pre-snap positions and player movement](#visualize-pre-snap-positions-and-player-movement) - + ## Prerequisites * Python 3 -* TimescaleDB (see [installation options][install-timescale]) +* TimescaleDB (see [installation options][install-timescale]) * [Psql][psql-install] or any other PostgreSQL client (e.g. DBeaver) ## Download the dataset @@ -27,24 +27,24 @@ get started with TimescaleDB and explore a real world dataset with SQL and Pytho ## Create tables -You will need to create six tables: +You need to create six tables: * **game** - + Information about each game, `game_id` is a primary key. * **player** - - Player information, `player_id` is a primary_key. + + Player information, `player_id` is a primary_key. * **play** - + Play information. To query a specific play, you need to use gameid and playid together. * **tracking** - + Player tracking information from each play. This is going to be the biggest table (18M+ row) in the database. Important fields are `x` and `y` as they indicate the physical positions of the players on the field. * **scores** - Final result of each game. This table can be joined with the tracking table using the `home_team_abb` and + Final result of each game. This table can be joined with the tracking table using the `home_team_abb` and `visitor_team_abb` fields. * **stadium_info** @@ -175,10 +175,10 @@ import config import psycopg2 # connect to the database -conn = psycopg2.connect(database=config.DB_NAME, - host=config.HOST, - user=config.USER, - password=config.PASS, +conn = psycopg2.connect(database=config.DB_NAME, + host=config.HOST, + user=config.USER, + password=config.PASS, port=config.PORT) # insert CSV file into given table @@ -209,7 +209,7 @@ print("Inserting scores.csv") insert("data/scores.csv", "scores") # iterate over each week's CSV file and insert them -for i in range(1, 18): +for i in range(1, 18): print("Inserting week{i}".format(str(i))) insert("data/week{i}.csv".format(i=i), "tracking") @@ -221,17 +221,17 @@ conn.close() Now that you have all the data ingested, let's go over some ideas on how you can analyze the data using PostgreSQL and TimescaleDB to help you perfect your fantasy drafting strategy and win your fantasy season. -Some of this analysis includes visualizations to help you see the potential uses for this data. These are created using the Matplotlib Python module, which is one of many great visualization tools. +Some of this analysis includes visualizations to help you see the potential uses for this data. These are created using the Matplotlib Python module, which is one of many great visualization tools. -To optimize the analysis, you will need to create a continuous aggregate. Continuous aggregate's significantly cut down on query run time, running up to thirty times faster. This continuous aggregate sums all the players movement in yards over one day and groups them by the players ID and game ID. +To optimize the analysis, you need to create a continuous aggregate. Continuous aggregate's significantly cut down on query run time, running up to thirty times faster. This continuous aggregate sums all the players movement in yards over one day and groups them by the players ID and game ID. ```sql CREATE MATERIALIZED VIEW player_yards_by_game WITH (timescaledb.continuous) AS -SELECT t.player_id, t.gameid, +SELECT t.player_id, t.gameid, time_bucket(INTERVAL '1 day', t."time") AS bucket, SUM(t.dis) AS yards -FROM tracking t +FROM tracking t GROUP BY t.player_id, t.gameid, bucket; ``` @@ -247,10 +247,10 @@ GROUP BY t.player_id, t.gameid, bucket; Use this query to get the yard data from the continuous aggregate. You can then join that on the player table to get player details. ```sql -SELECT a.player_id, display_name, SUM(yards) AS yards, gameid +SELECT a.player_id, display_name, SUM(yards) AS yards, gameid FROM player_yards_by_game a -LEFT JOIN player p ON a.player_id = p.player_id -GROUP BY a.player_id, display_name, gameid +LEFT JOIN player p ON a.player_id = p.player_id +GROUP BY a.player_id, display_name, gameid ORDER BY gameid ASC, display_name ``` Your data should look like this: @@ -265,33 +265,33 @@ This query can be the foundation of many other analysis questions. This section ### **Average yards run for a player over a game** -This query uses one of the TimescaleDB percentile functions to find the mean yards run per game by a single player. +This query uses one of the TimescaleDB percentile functions to find the mean yards run per game by a single player. ```sql WITH sum_yards AS ( - SELECT a.player_id, display_name, SUM(yards) AS yards, gameid + SELECT a.player_id, display_name, SUM(yards) AS yards, gameid FROM player_yards_by_game a - LEFT JOIN player p ON a.player_id = p.player_id - GROUP BY a.player_id, display_name, gameid + LEFT JOIN player p ON a.player_id = p.player_id + GROUP BY a.player_id, display_name, gameid ) SELECT player_id, display_name, mean(percentile_agg(yards)) as yards FROM sum_yards GROUP BY player_id, display_name ORDER BY yards DESC ``` -When you run this query you might notice that the `player_id` and `display_name` are null for the first row. This row represents the avereage yard data for the football. +When you run this query you might notice that the `player_id` and `display_name` are null for the first row. This row represents the avereage yard data for the football. ### **Average and median yards run per game by type of player (not taking avg of individual)** - For this query, you will use another one of the TimescaleDB percentile functions called `percentile_agg`. You will set the `percentile_agg` function to find the 50th percentile which will return the approximate median. + For this query, you use another one of the TimescaleDB percentile functions called `percentile_agg`. You set the `percentile_agg` function to find the 50th percentile, which returns the approximate median. ```sql WITH sum_yards AS ( --Add position to the table to allow for grouping by it later - SELECT a.player_id, display_name, SUM(yards) AS yards, p.position, gameid + SELECT a.player_id, display_name, SUM(yards) AS yards, p.position, gameid FROM player_yards_by_game a - LEFT JOIN player p ON a.player_id = p.player_id - GROUP BY a.player_id, display_name, p.position, gameid + LEFT JOIN player p ON a.player_id = p.player_id + GROUP BY a.player_id, display_name, p.position, gameid ) --Find the mean and median for each position type SELECT position, mean(percentile_agg(yards)) AS mean_yards, approx_percentile(0.5, percentile_agg(yards)) AS median_yards @@ -308,38 +308,38 @@ If you scroll to the bottom of your results you should see this: |FB| 100.37912844036691 | 67.0876116670915 | |DT| 19.692499999999992 | 17.796475991050432 | -Notice how the Defensive End (DE) position has a large discrepency between its mean and median values. The median data implies that most DE players do not run very much during passing plays. However, the mean data implies that some of the DE players must be running a significant amount. You may want to find out who these high performing defensive players are. +Notice how the Defensive End (DE) position has a large discrepency between its mean and median values. The median data implies that most DE players do not run very much during passing plays. However, the mean data implies that some of the DE players must be running a significant amount. You may want to find out who these high performing defensive players are. ### **Number of snap plays by player where they were on the offense** -In this query, you are counting the number of passing events a player was involved in while playing the offensive. You will notice how much slower this query runs than the ones above which use continuous aggregates. The speed you see here is comparable to what you would get in the other queries without using continuous aggregates. +In this query, you are counting the number of passing events a player was involved in while playing the offensive. You might notice how much slower this query runs than the ones above which use continuous aggregates. The speed you see here is comparable to what you would get in the other queries without using continuous aggregates. ```sql WITH snap_events AS ( -- Create a table that filters the play events to show only snap plays -- and display the players team information SELECT DISTINCT player_id, t.event, t.gameid, t.playid, - CASE - WHEN t.team = 'away' THEN g.visitor_team - WHEN t.team = 'home' THEN g.home_team - ELSE NULL + CASE + WHEN t.team = 'away' THEN g.visitor_team + WHEN t.team = 'home' THEN g.home_team + ELSE NULL END AS team_name - FROM tracking t - LEFT JOIN game g ON t.gameid = g.game_id + FROM tracking t + LEFT JOIN game g ON t.gameid = g.game_id WHERE t.event LIKE '%snap%' ) -- Count these events and filter results to only display data when the player was -- on the offensive SELECT a.player_id, pl.display_name, COUNT(a.event) AS play_count, a.team_name FROM snap_events a -LEFT JOIN play p ON a.gameid = p.gameid AND a.playid = p.playid -LEFT JOIN player pl ON a.player_id = pl.player_id -WHERE a.team_name = p.possessionteam +LEFT JOIN play p ON a.gameid = p.gameid AND a.playid = p.playid +LEFT JOIN player pl ON a.player_id = pl.player_id +WHERE a.team_name = p.possessionteam GROUP BY a.player_id, pl.display_name, a.team_name ORDER BY play_count DESC ``` -Notice that the two highest passing plays are for Ben Roethlisberger and JuJu Smith-Schuster, a Quarterback and Wide Receiver respectively for the Pittsburgh Steelers. These may be two great options to consider when drafting your fantasy football leauge. +Notice that the two highest passing plays are for Ben Roethlisberger and JuJu Smith-Schuster, a Quarterback and Wide Receiver respectively for the Pittsburgh Steelers. These may be two great options to consider when drafting your fantasy football leauge. ### **Number of plays vs points scored** @@ -348,19 +348,19 @@ Use this query to get data on the number of plays and final score for each game ```sql WITH play_count AS ( -- Count distinct plays, join on the stadium and game tables for team names and game date -SELECT gameid, COUNT(playdescription) AS plays, p.possessionteam as team_name, g.game_date -FROM play p -LEFT JOIN game g ON p.gameid = g.game_id +SELECT gameid, COUNT(playdescription) AS plays, p.possessionteam as team_name, g.game_date +FROM play p +LEFT JOIN game g ON p.gameid = g.game_id GROUP BY gameid, team_name, game_date ), visiting_games AS ( -- Join on scores to grab only the visting team's data SELECT gameid, plays, s.visitor_team as team_name, s.visitor_score AS team_score FROM play_count p -INNER JOIN scores s ON p.team_name = s.visitor_team_abb +INNER JOIN scores s ON p.team_name = s.visitor_team_abb AND p.game_date = s."date" ), home_games AS ( -- Join on scores to grab only the home team's data SELECT gameid, plays, s.home_team AS team_name , s.home_score AS team_score FROM play_count p -INNER JOIN scores s ON p.team_name = s.home_team_abb +INNER JOIN scores s ON p.team_name = s.home_team_abb AND p.game_date = s."date" ) -- union the two resulting tables together @@ -369,17 +369,17 @@ UNION ALL SELECT * FROM home_games ORDER BY gameid ASC, team_score DESC ``` -The image below is an example of a visualization that you could create with the data collected from this query. The scatterplot is grouped, showing the winning team's plays and scores as gold, and the losing team's plays and scores as brown. +The image below is an example of a visualization that you could create with the data collected from this query. The scatterplot is grouped, showing the winning team's plays and scores as gold, and the losing team's plays and scores as brown. Wins vs Plays -The y-axis, or the number of plays for one team during a single game shows that more plays do not always imply a guaranteed win. In fact, the top three teams with the highest number of plays for a single game all appeared to have lost. There are many interesting facts which you could glean from this query, this scatterplot being just one possibility. +The y-axis, or the number of plays for one team during a single game shows that more plays do not always imply a guaranteed win. In fact, the top three teams with the highest number of plays for a single game all appeared to have lost. There are many interesting facts which you could glean from this query, this scatterplot being just one possibility. ### **Average yards per game for top three players of each position** -You can use this PostgreSQL query to extract the average yards run by an individual player over one game. This query will only include the top three highest player's average yard values per position type. The data is ordered by the average yards run across all players for each position. This becomes important later on. +You can use this PostgreSQL query to extract the average yards run by an individual player over one game. This query only includes the top three highest player's average yard values per position type. The data is ordered by the average yards run across all players for each position. This becomes important later on. -Note: This query excludes some position types from the list due to such low average yard values, the excluded positions are Kicker, Punter, Nose Tackle, Long Snapper, and Defensive Tackle +Note: This query excludes some position types from the list due to such low average yard values, the excluded positions are Kicker, Punter, Nose Tackle, Long Snapper, and Defensive Tackle ```sql WITH total_yards AS ( @@ -389,13 +389,13 @@ WITH total_yards AS ( GROUP BY t.player_id, t.gameid ), avg_yards AS ( -- This table takes the average of the yards run by each player and calls out thier position - SELECT p.player_id, p.display_name, AVG(yards) AS avg_yards, p."position" + SELECT p.player_id, p.display_name, AVG(yards) AS avg_yards, p."position" FROM total_yards t - LEFT JOIN player p ON t.player_id = p.player_id + LEFT JOIN player p ON t.player_id = p.player_id GROUP BY p.player_id, p.display_name, p."position" ), ranked_vals AS ( --- This table ranks each player by the average yards they run per game -SELECT a.*, RANK() OVER (PARTITION BY a."position" ORDER BY avg_yards DESC) +-- This table ranks each player by the average yards they run per game +SELECT a.*, RANK() OVER (PARTITION BY a."position" ORDER BY avg_yards DESC) FROM avg_yards AS a ), ranked_positions AS ( -- This table takes the average of the average yards run for each player so that we can order @@ -414,7 +414,7 @@ This is one possible visualization that you could create with this data: Top Three Players by Position -Notice that the average yards overall for Free Safety players is higher than that of Wide Receivers (this is because of how we ordered the data, noted above). However, individual Wide Receivers run more yards on average per game. Also, notice that Kyle Juszczyk runs far more on average than other Fullback players. +Notice that the average yards overall for Free Safety players is higher than that of Wide Receivers (this is because of how we ordered the data, noted above). However, individual Wide Receivers run more yards on average per game. Also, notice that Kyle Juszczyk runs far more on average than other Fullback players. ## Visualize pre-snap positions and player movement @@ -450,7 +450,7 @@ def generate_field(): if x > 50: numb = 120-x plt.text(x, 5, str(numb - 10), horizontalalignment='center', fontsize=20, color='white') - plt.text(x-0.95, 53.3-5, str(numb-10), + plt.text(x-0.95, 53.3-5, str(numb-10), horizontalalignment='center', fontsize=20, color='white',rotation=180) # hash marks @@ -459,36 +459,36 @@ def generate_field(): ax.plot([x, x], [53.0, 52.5], color='white') ax.plot([x, x], [22.91, 23.57], color='white') ax.plot([x, x], [29.73, 30.39], color='white') - + # set limits and hide axis plt.xlim(0, 120) plt.ylim(-5, 58.3) plt.axis('off') - + return fig, ax ``` **Draw players' movement based on `game_id` and `play_id`** ```python -conn = psycopg2.connect(database="db", - host="host", - user="user", - password="pass", +conn = psycopg2.connect(database="db", + host="host", + user="user", + password="pass", port="111") def draw_play(game_id, play_id, home_label='position', away_label='position', movements=False): - """Generates a chart to visualize player pre-snap positions and + """Generates a chart to visualize player pre-snap positions and movements during the given play. Args: game_id (int) play_id (int) - home_label (str, optional): Default is 'position' but can be 'displayname' + home_label (str, optional): Default is 'position' but can be 'displayname' or other column name available in the table. - away_label (str, optional): Default is 'position' but can be 'displayname' + away_label (str, optional): Default is 'position' but can be 'displayname' or other column name available in the table. - movements (bool, optional): If False, only draws the pre-snap positions. + movements (bool, optional): If False, only draws the pre-snap positions. If True, draws the movements as well. """ # query all tracking data for the given play @@ -506,12 +506,12 @@ def draw_play(game_id, play_id, home_label='position', away_label='position', mo # query pre_snap player positions home_pre_snap = home_team.query('event == "ball_snap"') away_pre_snap = away_team.query('event == "ball_snap"') - + # visualize pre-snap positions with scatter plot home_pre_snap.plot.scatter(x='x', y='y', ax=ax, color='yellow', s=35, zorder=3) away_pre_snap.plot.scatter(x='x', y='y', ax=ax, color='blue', s=35, zorder=3) - - # annotate the figure with the players' position or name + + # annotate the figure with the players' position or name # (depending on the *label* parameter's value) home_positions = home_pre_snap[home_label].tolist() away_positions = away_pre_snap[away_label].tolist() @@ -519,7 +519,7 @@ def draw_play(game_id, play_id, home_label='position', away_label='position', mo ax.annotate(pos, (home_pre_snap['x'].tolist()[i], home_pre_snap['y'].tolist()[i])) for i, pos in enumerate(away_positions): ax.annotate(pos, (away_pre_snap['x'].tolist()[i], away_pre_snap['y'].tolist()[i])) - + if movements: # visualize player movements for home team home_players = home_team['player_id'].unique().tolist() @@ -534,10 +534,10 @@ def draw_play(game_id, play_id, home_label='position', away_label='position', mo df.plot(x='x', y='y', ax=ax, linewidth=4, legend=False) # query play description and possession team and add them in the title - sql = """SELECT gameid, playid, playdescription, possessionteam FROM play + sql = """SELECT gameid, playid, playdescription, possessionteam FROM play WHERE gameid = {game} AND playid = {play}""".format(game=game_id, play=play_id) - play_info = pd.read_sql(sql, conn).to_dict('records') - plt.title('Possession team: {team}\nPlay: {play}'.format(team=play_info[0]['possessionteam'], + play_info = pd.read_sql(sql, conn).to_dict('records') + plt.title('Possession team: {team}\nPlay: {play}'.format(team=play_info[0]['possessionteam'], play=play_info[0]['playdescription'])) # show chart plt.show() @@ -546,7 +546,7 @@ def draw_play(game_id, play_id, home_label='position', away_label='position', mo Then, you can run the `draw_play` function like this to visualize pre-snap player positions: ```python -draw_play(game_id=2018112900, +draw_play(game_id=2018112900, play_id=2826, movements=False) ``` @@ -556,8 +556,8 @@ draw_play(game_id=2018112900, You can also visualize player movement during the play if you set `movements` to `True`: ```python -draw_play(game_id=2018112900, - play_id=2826, +draw_play(game_id=2018112900, + play_id=2826, home_label='position', away_label='displayname', movements=True) @@ -573,4 +573,4 @@ draw_play(game_id=2018112900, [install-timescale]: /how-to-guides/install-timescaledb/ [psql-install]: /how-to-guides/connecting/psql [kaggle-download]: https://www.kaggle.com/c/nfl-big-data-bowl-2021/data -[extra-download]: https://assets.timescale.com/docs/downloads/nfl_2018.zip \ No newline at end of file +[extra-download]: https://assets.timescale.com/docs/downloads/nfl_2018.zip diff --git a/timescaledb/tutorials/nyc-taxi-cab.md b/timescaledb/tutorials/nyc-taxi-cab.md index 1718ae10ccfe..3e503f08cfe0 100644 --- a/timescaledb/tutorials/nyc-taxi-cab.md +++ b/timescaledb/tutorials/nyc-taxi-cab.md @@ -2,7 +2,7 @@ Use case: IoT Analysis and Monitoring -In this tutorial, you will learn: +In this tutorial, you learn: 1. How to get started with TimescaleDB 2. How to use TimescaleDB to analyze and monitor data from IoT sensors @@ -10,9 +10,9 @@ Dataset: [nyc_data.tar.gz](https://timescaledata.blob.core. Estimated time for completion: 25 minutes. ### Prerequisites -To complete this tutorial, you will need a cursory knowledge of the -Structured Query Language (SQL). The tutorial will walk you through each -SQL command, but it will be helpful if you've seen SQL before. +To complete this tutorial, you need a cursory knowledge of the +Structured Query Language (SQL). The tutorial walks you through each +SQL command, but it is helpful if you've seen SQL before. ### Accessing Timescale There are multiple options for using Timescale to follow along with this tutorial. **All connection information @@ -21,7 +21,7 @@ fully-managed database-as-a-service. [Sign up for a free, 30-day demo account][c required. Once you confirm the account and get logged in, proceed to the **Background** section below. If you would like to follow along with a local or on-prem install, you can follow the [install TimescaleDB][install-timescale] -instructions. Once your installation is complete, you will need to create a tutorial database and +instructions. Once your installation is complete, you need to create a tutorial database and install the **Timescale** extension. Using `psql` from the command line, create a database called `nyc_data` and install the extension: @@ -39,21 +39,21 @@ You're all set to follow along locally! NYC Taxis New York City is home to more than 8.3 million people. In this tutorial, -we will analyze and monitor data from New York's yellow cab taxis +we analyze and monitor data from New York's yellow cab taxis using TimescaleDB in order to identify ways to gain efficiency and reduce greenhouse gas emissions. The analysis we perform is similar to the kind of analysis data science organizations in many problem domains use to plan upgrades, set budgets, allocate resources, and more. -In this tutorial, you will complete three missions: +In this tutorial, you complete three missions: -- **Mission 1: Gear up [5-15 minutes]** You will learn how to setup and connect to a *TimescaleDB* instance and load data from a CSV file in your local terminal using *psql*. -- **Mission 2: Analysis [10 minutes]** You will learn how to analyze a time-series dataset using TimescaleDB and *PostgreSQL*. -- **Mission 3: Monitoring [10 minutes]** You will learn how to use TimescaleDB to monitor IoT devices. You'll also learn about using TimescaleDB in conjunction with other PostgreSQL extensions like *PostGIS*, for querying geospatial data. +- **Mission 1: Gear up [5-15 minutes]** You learn how to setup and connect to a *TimescaleDB* instance and load data from a CSV file in your local terminal using *psql*. +- **Mission 2: Analysis [10 minutes]** You learn how to analyze a time-series dataset using TimescaleDB and *PostgreSQL*. +- **Mission 3: Monitoring [10 minutes]** You learn how to use TimescaleDB to monitor IoT devices. You'll also learn about using TimescaleDB in conjunction with other PostgreSQL extensions like *PostGIS*, for querying geospatial data. ### Mission 1: Gear up -For this tutorial, we will use yellow taxi cab data from the +For this tutorial, we use yellow taxi cab data from the [New York City Taxi and Limousine Commission][NYCTLC] (NYC TLC). The NYC TLC is the agency responsible for licensing and regulating New York City's Yellow taxi cabs and other for-hire @@ -64,7 +64,7 @@ The NYC TLC has over 200,000 licensee vehicles completing about 1 million trips each day – that's a lot of trips! They've made their taxi utilization data publicly available. And, because nearly all of this data is time-series data, proper analysis requires a purpose-built -time-series database. We will use the unique functions +time-series database. We use the unique functions of TimescaleDB to complete our missions in this tutorial. #### Download and load data @@ -74,7 +74,7 @@ and space (on your machine), we'll only grab data for the month of January 2016, containing ~11 million records! This download contains two files: -1. `nyc_data.sql` - A SQL file that will set up the necessary tables +1. `nyc_data.sql` - A SQL file that sets up the necessary tables 1. `nyc_data_rides.csv` - A CSV file with the ride data You can download the files from the below link: @@ -144,7 +144,7 @@ They collect the following data about each ride: * Payment type (Cash, credit card, etc.) To efficiently store that data, we're going to need three tables: -1. A [hypertable][hypertables] called `rides`, which will store all of the above data for each ride taken. +1. A [hypertable][hypertables] called `rides`, which stores all of the above data for each ride taken. 2. A regular Postgres table called `payment_types`, which maps the payment types to their English description. 3. A regular Postgres table called `rates`, which maps the numeric rate codes to their English description. @@ -162,7 +162,7 @@ psql -x "postgres://tsdbadmin:{YOUR_PASSWORD_HERE}@{|YOUR_HOSTNAME_HERE}:{YOUR_P ``` Alternatively, you can run each script manually from the `psql` command -line. This first script will create a table called `rides`, which will store +line. This first script creates a table called `rides`, which stores trip data. Notice also that we are creating a few indexes to help with later queries in this tutorial: @@ -193,7 +193,7 @@ CREATE INDEX ON rides (rate_code, pickup_datetime DESC); CREATE INDEX ON rides (passenger_count, pickup_datetime DESC); ``` -This script will create table `payment_types` and preconfigure +This script creates table `payment_types` and preconfigure the types of payments taxis can accept: ```sql @@ -210,7 +210,7 @@ INSERT INTO payment_types(payment_type, description) VALUES (6, 'voided trip'); ``` -This script will create table `rates` and preconfigure +This script creates table `rates` and preconfigure the types of rates taxis can charge: ```sql @@ -243,7 +243,7 @@ in the `psql` command line. You should see the following: #### Load trip data into TimescaleDB Next, let's upload the taxi cab data into your TimescaleDB instance. -The data is in the file called `nyc_data_rides.csv` and we will load it +The data is in the file called `nyc_data_rides.csv` and we load it into the `rides` hypertable. To do this, we'll use the `psql` `\copy` command below. >:WARNING: The PostgreSQL `\COPY` command is single-threaded and doesn't support batching @@ -256,7 +256,7 @@ database like **Timescale Cloud**. ``` A faster alternative is the [Parallel COPY command][parallel-copy], written in GoLang, that Timescale makes -available to the community. Once installed, issuing the following command will import the CSV file +available to the community. Once installed, issuing the following command imports the CSV file in multiple threads, 5,000 rows at a time, significantly improving import speed. Set `--workers` <= CPUs (or CPUs x 2) if they support Hyperthreading. **Be sure to replace your connection string, database name, and file location appropriately.** @@ -264,7 +264,7 @@ if they support Hyperthreading. **Be sure to replace your connection string, dat timescaledb-parallel-copy --connection {CONNECTION STRING} --db-name {DATABASE NAME} --table rides --file {PATH TO `nyc_data_rides.csv`} --workers 4 --truncate --reporting-period 30s ``` -With this Parallel Copy command you will get updates every 30 seconds on the progress of your import. +With this Parallel Copy command you can get updates every 30 seconds on the progress of your import. Once the import is complete, you can validate your setup by running the following command: @@ -323,16 +323,16 @@ total_amount | 19.3 Let's say that the NYC Taxi and Limousine Commission has made it a key goal to mitigate the impact of global warming by reducing their greenhouse gas emissions by 20% by 2024. Given the number of taxi rides taken each -day, they believe studying past taxi rider history and behavior will enable +day, they believe studying past taxi rider history and behavior enables them to plan for the future. -In this tutorial, we will limit analysis of historical taxi ride data +In this tutorial, we limit analysis of historical taxi ride data to all NYC TLC taxi rides taken in January 2016. You can imagine that in a -more expansive scenario, you will want to examine rides taken over several years. +more expansive scenario, you would want to examine rides taken over several years. #### How many rides took place on each day? -The first question you will explore is simple: *How many rides took place on each day during January 2016?* +The first question to explore is simple: *How many rides took place on each day during January 2016?* Since TimescaleDB supports full SQL, all that's required is a simple SQL query to count the number of rides and group/order them by the day they took place, @@ -576,7 +576,7 @@ have a basic understanding of how to analyze time-series data using TimescaleDB! We can also use the time-series data from taxi rides to monitor a ride's current status. ->:WARNING: A more realistic setup would involve creating a data pipeline that streams sensor data directly from the cars into TimescaleDB. However, we will use the January 2016 data to illustrate the underlying principles that are applicable regardless of setup. +>:WARNING: A more realistic setup would involve creating a data pipeline that streams sensor data directly from the cars into TimescaleDB. However, we use the January 2016 data to illustrate the underlying principles that are applicable regardless of setup. #### How many rides took place every 5 minutes for the first day of 2016? It's January 1st 2016. NYC riders have celebrated New Year's Eve, and are using taxi @@ -588,7 +588,7 @@ completed on the first day of 2016, in 5 minute intervals. While it's easy to count how many rides took place, there is no easy way to segment data by 5 minute time intervals in PostgreSQL. As a result, we -will need to use a query similar to the query below: +need to use a query similar to the query below: ```sql -- Vanilla Postgres query for num rides every 5 minutes @@ -716,7 +716,7 @@ Next we'll need to convert the latitude and longitude points into geometry coord so that it plays well with PostGIS: >:WARNING: This next query may take several minutes. Updating both columns in one UPDATE statement -as shown will reduce the amount of time it takes to update all rows in the `rides` table. +as shown reduces the amount of time it takes to update all rows in the `rides` table. ```sql -- Generate the geometry points and write to table diff --git a/timescaledb/tutorials/promscale/index.md b/timescaledb/tutorials/promscale/index.md index a31ece94b717..fcb88da0184c 100644 --- a/timescaledb/tutorials/promscale/index.md +++ b/timescaledb/tutorials/promscale/index.md @@ -20,7 +20,7 @@ For an overview of Promscale, see this short introductory video: [Intro to Proms ## Roadmap -In this tutorial you will learn: +In this tutorial you learn: 1. [The benefits of using Promscale to store and analyze Prometheus metrics][promscale-benefits] 2. [How Promscale works][promscale-how-it-works] 3. [How to install Prometheus, Promscale and TimescaleDB][promscale-install] diff --git a/timescaledb/tutorials/promscale/promscale-how-it-works.md b/timescaledb/tutorials/promscale/promscale-how-it-works.md index 2d32baf901cc..2e9007701eb3 100644 --- a/timescaledb/tutorials/promscale/promscale-how-it-works.md +++ b/timescaledb/tutorials/promscale/promscale-how-it-works.md @@ -9,17 +9,17 @@ The diagram below explains the high level architecture of Promscale, including h ### Ingesting metrics * Once installed alongside Prometheus, Promscale automatically generates an optimized schema which allows you to efficiently store and query your metrics using SQL. -* Prometheus will write data to the Connector using the Prometheus`remote_write` interface. -* The Connector will then write data to TimescaleDB. +* Prometheus writes data to the Connector using the Prometheus`remote_write` interface. +* The Connector writes data to TimescaleDB. ### Querying metrics -* PromQL queries can be directed to the Connector, or to the Prometheus instance, which will read data from the Connector using the `remote_read` interface. The Connector will, in turn, fetch data from TimescaleDB. +* PromQL queries can be directed to the Connector, or to the Prometheus instance, which reads data from the Connector using the `remote_read` interface. The Connector, in turn, fetches data from TimescaleDB. * SQL queries are handled by TimescaleDB directly. As can be seen, this architecture has relatively few components, enabling simple operations. ### Promscale PostgreSQL extension -Promscale has a dependency on the [Promscale PostgreSQL extension][promscale-extension], which contains support functions to improve the performance of Promscale. While Promscale is able to run without the additional extension installed, adding this extension will get better performance from Promscale. +Promscale has a dependency on the [Promscale PostgreSQL extension][promscale-extension], which contains support functions to improve the performance of Promscale. While Promscale is able to run without the additional extension installed, adding this extension gets better performance from Promscale. ### Deploying Promscale Promscale can be deployed in any environment running Prometheus, alongside any Prometheus instance. We provide Helm charts for easier deployments to Kubernetes environments (see [Up and running with Promscale][promscale-install] for more on installation and deployment). @@ -40,10 +40,10 @@ Promscale automatically creates and manages database tables. So, while understan ### 1. Metrics storage schema Each metric is stored in a separate hypertable. -A hypertable is a TimescaleDB abstraction that represents a single logical SQL table that is automatically physically partitioned into chunks, which are physical tables that are stored in different files in the filesystem. Hypertables are partitioned into chunks by the value of certain columns. In this case, we will partition out tables by time (with a default chunk size of 8 hours). +A hypertable is a TimescaleDB abstraction that represents a single logical SQL table that is automatically physically partitioned into chunks, which are physical tables that are stored in different files in the filesystem. Hypertables are partitioned into chunks by the value of certain columns. In this case, we partition out tables by time (with a default chunk size of 8 hours). #### Compression -The latest chunk will be decompressed to serve as a high-speed query cache. Older chunks are stored as compressed chunks. We configure compression with the `segment_by` column set to the `series_id` and the `order_by` column set to time DESC. These settings control how data is split into blocks of compressed data. Each block can be accessed and decompressed independently. +The latest chunk is decompressed to serve as a high-speed query cache. Older chunks are stored as compressed chunks. We configure compression with the `segment_by` column set to the `series_id` and the `order_by` column set to time DESC. These settings control how data is split into blocks of compressed data. Each block can be accessed and decompressed independently. The settings we have chosen mean that a block of compressed data is always associated with a single series_id and that the data is sorted by time before being split into blocks; thus each block is associated with a fairly narrow time range. As a result, in compressed form, access by series_id and time range are optimized. @@ -96,7 +96,7 @@ CREATE TABLE _prom_catalog.label ( ); ``` -## Promscale views +## Promscale views The good news is that in order to use Promscale well, you do not need to understand the schema design. Users interact with Prometheus data in Promscale through views. These views are automatically created and are used to interact with metrics and labels. Each metric and label has its own view. You can see a list of all metrics by querying the view named `metric`. Similarly, you can see a list of all labels by querying the view named `label`. These views are found in the `prom_info` schema. diff --git a/timescaledb/tutorials/promscale/promscale-install.md b/timescaledb/tutorials/promscale/promscale-install.md index f3658c6e3d6d..690eeb3b3eed 100644 --- a/timescaledb/tutorials/promscale/promscale-install.md +++ b/timescaledb/tutorials/promscale/promscale-install.md @@ -10,7 +10,7 @@ We recommend four methods to setup and install Promscale: See the [Promscale github installation guide](https://github.com/timescale/promscale#-choose-your-own-installation-adventure) for more information on installation options. -For demonstration purposes, we will use Docker to get up and running with Promscale. +For demonstration purposes, we use Docker to get up and running with Promscale. The easiest way to get started is by using Docker images. Make sure you have Docker installed on your local machine ([Docker installation instructions][docker]). @@ -35,9 +35,9 @@ docker network create --driver bridge promscale-timescaledb ``` -Secondly, let's install and spin up an instance of TimescaleDB in a docker container. This is where Promscale will store all metrics data scraped from Prometheus targets. +Secondly, let's install and spin up an instance of TimescaleDB in a docker container. This is where Promscale stores all metrics data scraped from Prometheus targets. -We will use a docker image which has the`promscale` PostgreSQL extension already pre-installed: +We use a Docker image which has the`promscale` PostgreSQL extension already pre-installed: ```bash docker run --name timescaledb \ @@ -45,12 +45,12 @@ docker run --name timescaledb \ -e POSTGRES_PASSWORD= -d -p 5432:5432 \ timescaledev/timescaledb-ha:pg12-latest ``` -The above commands create a TimescaleDB instanced named `timescaledb` (via the `--name` flag), on the network named `promscale-timescale` (via the `--network` flag), whose container will run in the background with the container ID printed after created (via the `-d` flag), with port-forwarding it to port `5432` on your machine (via the `-p` flag). +The above commands create a TimescaleDB instanced named `timescaledb` (via the `--name` flag), on the network named `promscale-timescale` (via the `--network` flag), whose container runs in the background with the container ID printed after created (via the `-d` flag), with port-forwarding it to port `5432` on your machine (via the `-p` flag). We set the `POSTGRES_PASSWORD` environment variable (using the `-e` flag) in the command above. Please ensure to replace `[password]` with the password of your choice for the `postgres` superuser. -For production deployments, you will want to fix the docker tag to a particular version instead of `pg12-latest` +For production deployments, you want to fix the Docker tag to a particular version instead of `pg12-latest` ## Install Promscale [](install-promscale) @@ -77,7 +77,7 @@ The setting `ssl-mode=allow` is for testing purposes only. For production deploy `node_exporter` is a Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors. To learn more about it, refer to the [Node Exporter Github][]. -For the purposes of this tutorial, we need a service that will expose metrics to Prometheus. We will use the `node_exporter` for this purpose. +For the purposes of this tutorial, we need a service that exposes metrics to Prometheus. We use the `node_exporter` for this purpose. Install the the `node_exporter` on your machine by running the docker command below: @@ -88,7 +88,7 @@ quay.io/prometheus/node-exporter ``` The command above creates a node exporter instanced named `node_exporter`, which port-forwards its output to port `9100` and runs on the `promscale-timescaledb` network created in Step 3.1. -Once the Node Exporter is running, you can verify that system metrics are being exported by visiting its `/metrics` endpoint at the following URL: `http://localhost:9100/metrics`. Prometheus will scrape this `/metrics` endpoint to get metrics. +Once the Node Exporter is running, you can verify that system metrics are being exported by visiting its `/metrics` endpoint at the following URL: `http://localhost:9100/metrics`. Prometheus scrapes this `/metrics` endpoint to get metrics. ## Install Prometheus [](install-prometheus) @@ -129,7 +129,7 @@ docker run \ ``` ## BONUS: Docker compose file [](promscale-docker-compose) -To save time spinning up and running each docker container separately, here is a sample`docker-compose.yml` file that will spin up docker containers for TimescaleDB, Promscale, node_exporter and Prometheus using the configurations mentioned in Steps 1-4 above. +To save time spinning up and running each docker container separately, here is a sample`docker-compose.yml` file that spins up docker containers for TimescaleDB, Promscale, node_exporter and Prometheus using the configurations mentioned in Steps 1-4 above. Ensure you have the Prometheus configuration file `prometheus.yml` in the same directory as `docker-compose.yml` diff --git a/timescaledb/tutorials/promscale/promscale-run-queries.md b/timescaledb/tutorials/promscale/promscale-run-queries.md index ba67fd063044..69e437d6fdbe 100644 --- a/timescaledb/tutorials/promscale/promscale-run-queries.md +++ b/timescaledb/tutorials/promscale/promscale-run-queries.md @@ -4,9 +4,9 @@ Promscale offers the combined power of PromQL and SQL, enabling you to ask any question, create any dashboard, and achieve greater visibility into the systems you monitor. -In the configuration used in [Installing Promscale][promscale-install], Prometheus will scrape the Node Exporter every 10s and metrics will be stored in both Prometheus and TimescaleDB, via Promscale. +In the configuration used in [Installing Promscale][promscale-install], Prometheus scrapes the Node Exporter every 10s and metrics are stored in both Prometheus and TimescaleDB, via Promscale. -This section will illustrate how to run simple and complex SQL queries against Promscale, as well as queries in PromQL. +This section illustrates how to run simple and complex SQL queries against Promscale, as well as queries in PromQL. ## SQL queries in Promscale [](sql-queries) @@ -24,14 +24,14 @@ Once inside, we can now run SQL queries and explore the metrics collected by Pro Queries on metrics are performed by querying the view named after the metric you're interested in. -In the example below, we will query a metric named `go_dc_duration` for its samples in the past 5 minutes. This metric is a measurement for how long garbage collection is taking in Golang applications: +In the example below, we query a metric named `go_dc_duration` for its samples in the past 5 minutes. This metric is a measurement for how long garbage collection is taking in Golang applications: ``` sql SELECT * from go_gc_duration_seconds WHERE time > now() - INTERVAL '5 minutes'; ``` -Here is a sample output for the query above (your output will differ): +Here is a sample output for the query above (your output might differ): ``` bash time | value | series_id | labels | instance_id | job_id | quantile_id @@ -259,7 +259,7 @@ To find the Promscale IP address, run the command `docker inspect promscale` (wh Alternatively, we can set the `URL` as `http://promscale:9201`, where `promscale` is the name of our container. This method works as we've created all of our containers in the same docker network (using the flag `-- network promscale-timescaledb` during our installs). -After configuring Promscale as a datasource in Grafana, all that's left is to create a sample panel using `Promscale` as the datasource. The query powering the panel will be written in PromQL. The sample query below shows the average rate of change in the past 5 minutes for the metric `go_memstats_alloc_bytes`, which measures the Go's memory allocation on the heap from the kernel: +After configuring Promscale as a datasource in Grafana, all that's left is to create a sample panel using `Promscale` as the datasource. The query powering the panel is written in PromQL. The sample query below shows the average rate of change in the past 5 minutes for the metric `go_memstats_alloc_bytes`, which measures the Go's memory allocation on the heap from the kernel: ``` rate(go_memstats_alloc_bytes{instance="localhost:9090"}[5m]) ``` diff --git a/timescaledb/tutorials/sample-datasets.md b/timescaledb/tutorials/sample-datasets.md index 661aa9ea4b7e..13ff49b7dc6e 100644 --- a/timescaledb/tutorials/sample-datasets.md +++ b/timescaledb/tutorials/sample-datasets.md @@ -77,7 +77,7 @@ psql -U postgres -h localhost -d devices_small ## In-depth: Device ops datasets [](in-depth-devices) After importing one of these datasets (`devices_small`, `devices_med`, `devices_big`), -you will find a plain PostgreSQL table called `device_info` +you have a plain PostgreSQL table called `device_info` and a hypertable called `readings`. The `device_info` table has (static) metadata about each device, such as the OS name and manufacturer. The `readings` hypertable tracks data sent from each device, e.g. CPU activity, @@ -194,7 +194,7 @@ hour | min_battery_level | max_battery_level ## In-depth: Weather datasets [](in-depth-weather) After importing one of these datasets (`weather_small`, `weather_med`, `weather_big`), -you will find a plain PostgreSQL table called `locations` and +you notice a plain PostgreSQL table called `locations` and a hypertable called `conditions`. The `locations` table has metadata about each of the locations, such as its name and environmental type. The `conditions` hypertable tracks readings of temperature and humidity from diff --git a/timescaledb/tutorials/setting-up-mst-for-prometheus.md b/timescaledb/tutorials/setting-up-mst-for-prometheus.md index 70102e5d85b6..db3a971abd02 100644 --- a/timescaledb/tutorials/setting-up-mst-for-prometheus.md +++ b/timescaledb/tutorials/setting-up-mst-for-prometheus.md @@ -2,12 +2,12 @@ You can get more insights into the performance of your managed service for TimescaleDB database by monitoring it using [Prometheus][get-prometheus], a popular -open-source metrics-based systems monitoring solution. This tutorial will -take you through setting up a Prometheus endpoint for a database running +open-source metrics-based systems monitoring solution. This tutorial +takes you through setting up a Prometheus endpoint for a database running in a [managed service for TimescaleDB][timescale-mst]. To create a monitoring system to ingest and analyze Prometheus metrics from your managed service for TimescaleDB instance, you can use [Promscale][promscale]! -This will expose metrics from the [node_exporter][node-exporter-metrics] as well +This exposes metrics from the [node_exporter][node-exporter-metrics] as well as [pg_stats][pg-stats-metrics] metrics. ## Prerequisites @@ -22,15 +22,15 @@ integrations, pictured below. Service Integrations Menu Option -This will present you with the option to add a Prometheus integration point. +This presents you with the option to add a Prometheus integration point. Select the plus icon to add a new endpoint and give it a name of your choice. We've named ours `endpoint_dev`. Create a Prometheus endpoint on Timescale Cloud Furthermore, notice that you are given basic authentication information and a port number -in order to access the service. This will be used when setting up your Prometheus -installation, in the `prometheus.yml` configuration file. This will enable you to make +in order to access the service. This is used when setting up your Prometheus +installation, in the `prometheus.yml` configuration file. This enables you to make this Managed Service for TimescaleDB endpoint a target for Prometheus to scrape. Here's a sample configuration file you can use when you setup your Prometheus diff --git a/timescaledb/tutorials/simulate-iot-sensor-data.md b/timescaledb/tutorials/simulate-iot-sensor-data.md index bad28b1a9173..a3580c155457 100644 --- a/timescaledb/tutorials/simulate-iot-sensor-data.md +++ b/timescaledb/tutorials/simulate-iot-sensor-data.md @@ -19,8 +19,8 @@ try the Time-series Benchmarking Suite (TSBS) ([Github][github-tsbs]).* ## Prerequisites [](prereqs) -To complete this tutorial, you will need a cursory knowledge of the Structured Query -Language (SQL). The tutorial will walk you through each SQL command, but it will be +To complete this tutorial, you need a cursory knowledge of the Structured Query +Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. To start, [install TimescaleDB][install-timescale]. Once your installation is complete, @@ -90,7 +90,7 @@ this: *Note: for the following sections we'll share the results of our queries as an example, but since the tutorial generates random data every time -it is run, your results will look different (but will be structured the +it is run, your results look different (but is structured the same way).* First, generate a dataset for all of our four sensors and insert into diff --git a/timescaledb/tutorials/telegraf-output-plugin.md b/timescaledb/tutorials/telegraf-output-plugin.md index 3c25e472958a..e3910c1734bc 100644 --- a/timescaledb/tutorials/telegraf-output-plugin.md +++ b/timescaledb/tutorials/telegraf-output-plugin.md @@ -145,7 +145,7 @@ From the configuration, you can see a few important things: The commented out parameters also show their default values. In the first example you'll set the connection parameter to a proper connection string to establish a connection to an instance of TimescaleDB or PostgreSQL. -All the other parameters will have their default values. +All the other parameters have their default values. ### Creating hypertables The plugin allows you to configure several parameters. The `table_template` @@ -166,7 +166,7 @@ each chunk holding 1 week intervals. Nothing else is needed to use the plugin with TimescaleDB. ## Running Telegraf -When you run Telegraf you only need to specify the configuration file to use. In this example, the output uses loaded inputs (`cpu`) and outputs (`postgresql`) along with global tags, and the intervals with which the agent will collect the data from the inputs, and flush to the outputs. You can stop Telegraf running after ~10-15 seconds: +When you run Telegraf you only need to specify the configuration file to use. In this example, the output uses loaded inputs (`cpu`) and outputs (`postgresql`) along with global tags, and the intervals with which the agent collects the data from the inputs, and flush to the outputs. You can stop Telegraf running after ~10-15 seconds: ```bash telegraf --config telegraf.conf 2019-05-23T13:48:09Z I! Starting Telegraf 1.13.0-with-pg @@ -210,7 +210,7 @@ the configuration by opening the file in any text editor and updating the ``` This way all metrics collected with the instance of Telegraf running with this -config will be tagged with `location="New York"`. If you run Telegraf again, +config is tagged with `location="New York"`. If you run Telegraf again, collecting the metrics in TimescaleDB, using this command: ```bash telegraf --config telegraf.conf diff --git a/timescaledb/tutorials/time-series-forecast.md b/timescaledb/tutorials/time-series-forecast.md index c53662481192..cbb897714f87 100644 --- a/timescaledb/tutorials/time-series-forecast.md +++ b/timescaledb/tutorials/time-series-forecast.md @@ -21,11 +21,11 @@ any developer. TimescaleDB is PostgreSQL for time-series data and as such, time-series data stored in TimescaleDB can be easily joined with business data in another relational database in order to develop an even more insightful forecast into how your data -(and business) will change over time. +(and business) changes over time. -In this time-series forecasting example, we will demonstrate how to integrate +In this time-series forecasting example, we demonstrate how to integrate TimescaleDB with R, Apache MADlib, and Python to perform various time-series -forecasting methods. We will be using New York City taxicab data that is also +forecasting methods. We are using New York City taxicab data that is also used in our [Hello Timescale Tutorial][hello_timescale]. The dataset contains information about all yellow cab trips in New York City in January 2016, including pickup and dropoff times, GPS coordinates, and total price of a trip. @@ -116,7 +116,7 @@ goes up during the day and down over night every day. In contrast, the price of Bitcoin over time is (probably) non-seasonal since there is no clear observable pattern that recurs in fixed time periods. -We will be using R to analyze the seasonality of the number of taxicab pickups +We are using R to analyze the seasonality of the number of taxicab pickups at Times Square over a week. The table `rides_count` contains the data needed for this section of the tutorial. @@ -155,7 +155,7 @@ SELECT * FROM rides_count; ... ``` -We will create two PostgreSQL views, `rides_count_train` and `rides_count_test` for +Create two PostgreSQL views, `rides_count_train` and `rides_count_test` for the training and testing datasets. ```sql @@ -357,7 +357,7 @@ Although R offers a rich library of statistical models, we had to import the data into R before performing calculations. With a larger dataset, this can become a bottleneck to marshal and transfer all the data to the R process (which itself might run -out of memory and start swapping). So, we will now look into +out of memory and start swapping). So, let's look into an alternative method that allows us to move our computations to the database and improve this performance. @@ -389,7 +389,7 @@ host and database. Now we can make use of MADlib's library to analyze our taxicab -dataset. Here, we will train an ARIMA model to predict the price +dataset. Here, we can train an ARIMA model to predict the price of a ride from JFK to Times Square at a given time. Let's look at the `rides_price` table. The `trip_price` column is @@ -429,7 +429,7 @@ SELECT * FROM rides_price; 2016-01-01 23:00:00 | 57.9088888888889 ``` -We will also create two tables for the training and testing datasets. +We can also create two tables for the training and testing datasets. We create tables instead of views here because we need to add columns to these datasets later in our time-series forecast analysis. @@ -442,7 +442,7 @@ WHERE one_hour <= '2016-01-21 23:59:59'; SELECT * INTO rides_price_test FROM rides_price WHERE one_hour >= '2016-01-22 00:00:00'; ``` -Now we will use [MADlib's ARIMA][madlib_arima] library to make forecasts +Now we can use [MADlib's ARIMA][madlib_arima] library to make forecasts on our dataset. MADlib does not yet offer a method that automatically finds the best @@ -500,7 +500,7 @@ then train the model using MADlib. You can use a combination of the options outlined in this tutorial to take advantage of the strengths and work around weaknesses of the different tools. -Using the parameters ARIMA(2,1,3) found using R, we will use MADlib's +Using the parameters ARIMA(2,1,3) found using R, we can use MADlib's `arima_train` and `arima_forecast` functions. ```sql @@ -598,7 +598,7 @@ in time-series forecasting. It is advised to create both models for a particular dataset and compare the performance to find out which is more suitable. -We will use Python to analyze how long it takes from the Financial +We can use Python to analyze how long it takes from the Financial District to Times Square at different time periods during the day. We need to install various Python packages: @@ -637,7 +637,7 @@ SELECT * FROM rides_length; ... ``` -We will also create two PostgreSQL views for the training +We can also create two PostgreSQL views for the training and testing datasets. ```sql diff --git a/timescaledb/tutorials/visualize-with-tableau.md b/timescaledb/tutorials/visualize-with-tableau.md index c9ebd610e6be..25c15a28876d 100644 --- a/timescaledb/tutorials/visualize-with-tableau.md +++ b/timescaledb/tutorials/visualize-with-tableau.md @@ -4,7 +4,7 @@ greater intelligence about your business. It is an ideal tool for visualizing data stored in [TimescaleDB][timescale-products]. -In this tutorial, we will cover: +In this tutorial, we cover: - Setting up Tableau to work with TimescaleDB - Running queries on TimescaleDB from within Tableau @@ -12,8 +12,8 @@ In this tutorial, we will cover: ### Prerequisites -To complete this tutorial, you will need a cursory knowledge of the Structured Query -Language (SQL). The tutorial will walk you through each SQL command, but it will be +To complete this tutorial, you need a cursory knowledge of the Structured Query +Language (SQL). The tutorial walks you through each SQL command, but it is helpful if you've seen SQL before. To start, [install TimescaleDB][install-timescale]. Once your installation is complete, @@ -21,9 +21,9 @@ we can proceed to ingesting or creating sample data and finishing the tutorial. Also, [get a copy or license of Tableau][get-tableau]. -You will also want to [complete the Cryptocurrency tutorial][crypto-tutorial], as it will -setup and configure the data you need to complete the remainder of this -tutorial. We will visualize many of the queries found at the end of the Cryptocurrency +You also want to [complete the Cryptocurrency tutorial][crypto-tutorial], as it +sets up and configures the data you need to complete the remainder of this +tutorial. We visualize many of the queries found at the end of the Cryptocurrency tutorial. ### Step 1: Setup Tableau to connect to TimescaleDB @@ -41,7 +41,7 @@ Let's use the built-in SQL editor in Tableau. To run a query, add custom SQL to by dragging and dropping the “New Custom SQL” button (in the bottom left of the Tableau desktop user interface) to the place that says ‘Drag tables here'. -A window will pop up, in which we can place a query. In this case, we will use the first +A window pops up, in which we can place a query. In this case, use the first query from the [Cryptocurrency Tutorial][crypto-tutorial]: ```sql @@ -72,7 +72,7 @@ To do this, create a new worksheet (or dashboard) and then select your desired d New worksheet in Tableau to examine time-series data In the far left pane, you'll see a section Tableau calls 'Dimensions' and 'Measures'. -Whenever you use Tableau, it will classify your fields as either dimensions or +Whenever you use Tableau, it classifies your fields as either dimensions or measures. A measure is a field that is a dependent variable, meaning its value is a function of one or more dimensions. For example, the price of an item on a given day is a measure based on which day is in question. A dimension, therefore, is an