Skip to content

Latest commit

 

History

History
896 lines (640 loc) · 38.9 KB

02_whats_new_postgres16.md

File metadata and controls

896 lines (640 loc) · 38.9 KB

Hands on Lab: Working with the latest developer capabilities of Postgres 16

In this lab you will explore the new developer and infrastructure features of PostgreSQL 16.

Prerequisites

  • Perform Lab 01 steps

Exercise 1: Setup and Configuration

In this exercise you will create some tables and use the COPY command to move data into those tables. The data is in JSON format and not SQL format so the usage of jsonb data type with be required to import the data into a temporary table. We will use this initial data to run some queries to transform the data such that we can utilize the new JSON syntax in PostgreSQL 16.

Task 1: Configure Server Parameters

You will utilize the query store and logical replication in subsequent labs. Here you will modify the server parameters to support these exercises. You are going to enable query store now as it takes a few minutes for the queries to start to be recorded.

  1. Switch to the Azure Portal.
  2. Browse to your primary PREFIX-pg-flex-REGION-16 instance or writer endpoint.
  3. Under Settings, select Server parameters.
  4. Browse for the wal_level parameters.
  5. Set the value to logical.
  6. Select Save.
  7. Select Save & Restart.
  8. Repeat the same steps for any replicas and for the PREFIX-pg-flex-REGION-14 instance.

Task 2: Create tables and data

  1. In your Windows-based lab virtual machine, open a command prompt window, in the windows search area, type cmd and select it.

    Open the windows command prompt

  2. Run the following command to connect to your database, be sure to replace PREFIX and REGION with your lab information (optionally you can use pgAdmin to open a psql window). On Windows you can find the pgbench tool in the C:\Program Files\PostgreSQL\16\bin directory, on ubuntu, you can install it using sudo apt-get install postgresql-contrib. When prompted, enter the password (Seattle123Seattle123):

    psql -h PREFIX-pg-flex-REGION-16.postgres.database.azure.com -U s2admin -d airbnb
  3. Run the following commands to create some temp tables and import the JSON and CSV data to the server. Notice the usage of json files to do the import using the COPY command. Once into a temporary table, we than do some massaging:

    NOTE: These paths are Windows based and you may need to adjust based on your environment (WSL, Linux, etc).

    DROP TABLE IF EXISTS temp_calendar;
    DROP TABLE IF EXISTS temp_listings;
    DROP TABLE IF EXISTS temp_reviews;
    
    CREATE TABLE temp_calendar (data jsonb);
    CREATE TABLE temp_listings (data jsonb);
    CREATE TABLE temp_reviews (data jsonb);
  4. Now, use the COPY command to populate the tables with data from JSON files in a public storage account.

    \COPY temp_calendar (data) FROM PROGRAM 'curl https://solliancepublicdata.blob.core.windows.net/ms-postgresql-labs/calendar.json'
    \COPY temp_listings (data) FROM PROGRAM 'curl https://solliancepublicdata.blob.core.windows.net/ms-postgresql-labs/listings.json'
    \COPY temp_reviews (data) FROM PROGRAM 'curl https://solliancepublicdata.blob.core.windows.net/ms-postgresql-labs/reviews.json'

    Results of the copy commands

  5. Run the following command to create the main tables:

    DROP TABLE IF EXISTS listings;
    DROP TABLE IF EXISTS reviews;
    DROP TABLE IF EXISTS calendar;
    
    CREATE TABLE listings (
        listing_id int,
        name varchar(50),
        street varchar(50),
        city varchar(50),
        state varchar(50),
        country varchar(50),
        zipcode varchar(50),
        bathrooms int,
        bedrooms int,
        latitude decimal(10,5), 
        longitude decimal(10,5), 
        summary varchar(2000),
        description varchar(2000),
        host_id varchar(2000),
        host_url varchar(2000),
        listing_url varchar(2000),
        room_type varchar(2000),
        amenities jsonb,
        host_verifications jsonb,
        data jsonb
    );
    
    CREATE TABLE reviews (
        id int, 
        listing_id int, 
        reviewer_id int, 
        reviewer_name varchar(50), 
        date date,
        comments varchar(2000)
    );
    
    CREATE TABLE calendar (
        listing_id int, 
        date date,
        price decimal(10,2), 
        available boolean
    );
  6. Run the following to import the data from the temp tables to the main tables:

    INSERT INTO listings
    SELECT 
        data['id']::int, 
        replace(data['name']::varchar(50), '"', ''),
        replace(data['street']::varchar(50), '"', ''),
        replace(data['city']::varchar(50), '"', ''),
        replace(data['state']::varchar(50), '"', ''),
        replace(data['country']::varchar(50), '"', ''),
        replace(data['zipcode']::varchar(50), '"', ''),
        data['bathrooms']::int,
        data['bedrooms']::int,
        data['latitude']::decimal(10,5),
        data['longitude']::decimal(10,5),
        replace(data['description']::varchar(2000), '"', ''),        
        replace(data['summary']::varchar(2000), '"', ''),        
        replace(data['host_id']::varchar(50), '"', ''),
        replace(data['host_url']::varchar(50), '"', ''),
        replace(data['listing_url']::varchar(50), '"', ''),
        replace(data['room_type']::varchar(50), '"', ''),
        data['amenities']::jsonb,
        data['host_verifications']::jsonb,
        data::jsonb
    FROM temp_listings;
    
    INSERT INTO reviews
    SELECT 
        data['id']::int,
        data['listing_id']::int,
        data['reviewer_id']::int,
        replace(data['reviewer_name']::varchar(50), '"', ''), 
        to_date(replace(data['date']::varchar(50), '"', ''), 'YYYY-MM-DD'),
        replace(data['comments']::varchar(2000), '"', '')
    FROM temp_reviews;
    
    INSERT INTO calendar
    SELECT 
        data['listing_id']::int,
        to_date(replace(data['date']::varchar(50), '"', ''), 'YYYY-MM-DD'),
        data['price']::decimal(10,2),
        replace(data['available']::varchar(50), '"', '')::boolean
    FROM temp_calendar;

    Results of the temp table inserts.

    NOTE: We are storing data in the tables as JSONB for lab purposes. In the real world, you may not want to do something like this as with normal columns, PostgreSQL maintains statistics about the distributions of values in each column of the table – most common values (MCV), NULL entries, histogram of distribution. Based on this data, the PostgreSQL query planner makes smart decisions on the plan to use for the query. At this point, PostgreSQL does not store any stats for JSONB columns or keys. This can sometimes result in poor choices like using nested loop joins vs. hash joins.

  7. Switch to pgAdmin.

  8. Navigate to Databases->airbnb->Schemas->public->Tables.

  9. Right-click the Tables node, select Query Tool.

    Open the Query Tool

  10. Run each of the following commands to see the imported data after its transformation. Note that we did not fully expand the JSON into all possible columns so as to show the new JSON syntax later:

    select * from listings limit 10;
    select * from reviews limit 10;
    select * from calendar limit 10;

    Results from listings table

    Results from reviews table

    Results from calendar table

Exercise 2: Developer Features

There are several developer-based changes in PostgreSQL 16 as related to SQL syntax. In this exercise we explore several of them including the new SQL standard JSON functions.

Task 1: Add SQL/JSON object checks

  1. In pgAdmin, run the following pre-16 commands. The use of -> and ->> are pre-Postgres 14 commands used to navigate a json hierarchy:

    SELECT
       listing_id,
       pg_typeof(data),
       pg_typeof(data ->> 'id')
    FROM
       listings LIMIT 1;

    Results from the query

  2. The same query can also be written in Postgres 14 and higher, note the usage of the bracket notation [] and the result is slightly different:

    SELECT
       listing_id,
       pg_typeof(data),
       pg_typeof(data['id'])
    FROM
       listings LIMIT 1;

    Results from the query

  3. In Postgres 16, you can now use the SQL standard IS JSON syntax. The IS JSON checks include checks for values, arrays, objects, scalars, and unique keys:

    SELECT
       listing_id,
       data IS JSON,
       data['id'] IS JSON
    FROM
       listings LIMIT 1;

    Results from the query

  4. Additionally, you can get more granular about the type of JSON.

    SELECT
    	data -> 'amenities' IS JSON ARRAY as amenities,
    	data -> 'host_verifications' IS JSON OBJECT as host_verifications,
    	data IS JSON as datacolumn,
    	data -> 'id' IS JSON SCALAR as id
    FROM
    	listings;

    Results from the query

  5. When combining the above, you can create intricate CASE statements based on the target type (in the event that it could be multiple types):

    SELECT
       CASE
        WHEN
            data -> 'street' IS JSON ARRAY
        THEN
            (data -> 'street')[0]
        WHEN
            data -> 'street' IS JSON OBJECT
        THEN
            data -> 'street'
        WHEN
            data IS JSON SCALAR
        THEN
            data
        ELSE
            data -> 'street'
       END
       AS primary_address
    FROM
       listings;

    Results from the query

  6. Finally, much of the basic JSON functionality that has existed pre-PG16 is still available and can also be used. In this example, you are using the containment operator (where one json document is contained inside another) to select data in addition to using the backwards compatible JSON syntax. Note the usage of the "?" operator that tests the existance of the top level key for the host_is_superhost:

    SELECT listing_id, name as listing_name, city, listings.amenities
    FROM listings
    WHERE
    listings.amenities @> '["Washer","Pets Allowed"]'
    and data -> 'host_is_superhost' ? 't';

    Results from the query are displayed.

Task 2: Exploring JSON_ARRAY, JSON_ARRAYAGG and JSON_OBJECT

In this series of steps, you will review the new functions JSON_ARRAY(), JSON_ARRAYAGG(), and JSON_OBJECT() that are part of the SQL standard and now PostgreSQL 16.

  1. In pgAdmin, run the following PostgreSQL 16 commands:

    SELECT
       json_array(data['id'], name, bedrooms, city, state)
    FROM
       listings;

    Results from the query are displayed.

    SELECT
        json_arrayagg(data['id'])
    FROM
        listings;

    Results from the query are displayed.

  2. You can also convert regular types into JSON using the JSON_OBJECT function. The following will take several data types and create a JSON object from them:

    SELECT json_object(ARRAY[1, 'a', true, row(2, 'b', false)]::TEXT[]);

    Results from the query are displayed.

  3. Additionally, you can use the json_agg combined with row_to_json to convert a series of columns in a select statement into json:

    select 
    	bedrooms,
    	json_agg(row_to_json((name, street))) as jData
    from 
        listings
    group by 
        bedrooms

    Results from the query are displayed.

There are many other types of functions and operators in PostgreSQL that you can utilize when working with JSON data. You can reference the latest information for PG16 in the 9.16. JSON Functions and Operators documentation.

Task 3: Creating Indexes

Indexes help increase query performance.

  1. Run the following query, notice the usage of a Seq Scan on the table, also record the costs and execution time:

        EXPLAIN ANALYZE select *
        from listings l, calendar c
        where l.city = 'seattle'
        and l.listing_id = c.listing_id
        and l.listing_id = 241032

    Results from the query are displayed.

  2. Create an index on the `listing_id`` column:

    CREATE INDEX listings_listing_id ON listings (listing_id);
  3. Re-run the query to see the Sequential Scan is now removed and a Index Scan is now used improving the cost and execution times:

        EXPLAIN ANALYZE select *
        from listings l, calendar c
        where l.city = 'seattle'
        and l.listing_id = c.listing_id
        and l.listing_id = 241032

    Results from the query are displayed.

Task 4: Using Full Text + GIN indexes

Although indexes on JSON data is not new to PG16 (available since 8.2 with JSON support since 9.2), it is a valuable feature to be aware of when working with PostgreSQL and JSON. GIN indexes can be used to efficiently search for keys or key/value pairs occurring within a large number of jsonb documents (datums). Two GIN "operator classes" are provided, offering different performance and flexibility trade-offs.

For information on Full Text Search, reference Full Text Search. For information on GiST and GIN indexes, reference GiST and GIN Index Types.

  1. Run the following query:

    ALTER TABLE listings 
    ADD COLUMN ts_summary tsvector
    GENERATED ALWAYS AS (to_tsvector('english', summary)) STORED;
  2. Do a text search, note the use of a Seq Scan:

    EXPLAIN ANALYZE SELECT *
    FROM listings
    WHERE ts_summary @@ to_tsquery('amazing');

    Results from the query are displayed.

  3. In pgAdmin, run the following command:

    CREATE INDEX ts_idx ON listings USING GIN (ts_summary);
  4. Again, re-run the query, you should see the usage of a Bitmap Heap Scan instead of a Seq Scan:

    EXPLAIN ANALYZE SELECT *
    FROM listings
    WHERE ts_summary @@ to_tsquery('amazing');

    Results from the query are displayed.

Task 5: Aggregate function ANY_VALUE()

The ANY_VALUE() function is a PostgreSQL aggregate function that helps optimize queries when utilizing GROUP BY clauses. The function will return an arbitrary non-null value in a given set of values. It effectively informs PostgreSQL that any value from the group is acceptable, resolving the ambiguity and allowing the query to execute successfully.

Prior to PostgreSQL 16, when using GROUP BY, all non-aggregated columns from the SELECT statement were included in the GROUP BY clause as well. Pre-16 PostgreSQL would throw an error if a non-aggregated column is not added in the GROUP BY clause.

  1. The following is an example of pre-16 syntax (will throw error):

    SELECT 
        l.city,
        l.zipcode as SampleZipCode
        count(*) as ListingCount
    FROM 
        listings l
    GROUP 
        BY l.city;

    Results from the query are displayed.

  2. Modify the query to utilize the new ANY_VALUE function:

    SELECT 
        l.city,
        ANY_VALUE(l.zipcode) as SampleZipCode,
        count(*) as ListingCount
    FROM 
        listings l
    GROUP 
        BY l.city;

    Results from the query are displayed.

  3. Keep in mind that the ANY_VALUE is the selection of an non-null item from the group, and does not act the same if you did the full group by clause:

    select
        l.city,
        l.zipcode
    from 
        listings l
    group 
        by l.city, l.zipcode;

    Results from the query are displayed.

Exercise 3: COPY Features

Task 1: Allow a COPY FROM value to map to a column's DEFAULT

The new COPY FROM DEFAULT parameter syntax allows for the import of data into a table using a common token in the source data.

NOTE: These paths below are Windows based and you may need to adjust based on your environment (WSL, Linux, etc)

  1. Using a web browser, download and review the https://solliancepublicdata.blob.core.windows.net/ms-postgresql-labs/default.csv file

  2. Notice the usage of the \D in the source data:

    Sampling of the file contents.

  3. In pgAdmin, right-click the airbnb database, select PSQL Tool.

  4. In the psql window, run the following command to import the data:

    CREATE TABLE default_test(c1 INT PRIMARY KEY, c2 TEXT DEFAULT 'the_default_value') ;
    
    \COPY default_test FROM PROGRAM 'curl https://solliancepublicdata.blob.core.windows.net/ms-postgresql-labs/default.csv' WITH (format csv, default '\D', header);
  5. Run the following command to review the results of the COPY FROM command:

    SELECT
        *
    FROM
        default_test;

    Results from the query are displayed.

Notice every entry from the source file with the default of '\D' was converted to the DEFAULT value from the column definition.

Exercise 4: Performance Features

Task 1: Allow parallelization of FULL and internal RIGHT OUTER hash joins

In general, the more things you can do in parallel the faster you will get results. As is the case when performing FULL and internal RIGHT OUTER joins. Previous to PostgreSQL these would not have been executed in parallel and the costs were more to perform than the parallelization setup.

With this change, many queries you were performing using these joins will now run drastically faster.

  1. Switch to pgAdmin.

  2. Run the following commands to setup some sample tables and data on the PG16 instance.

    DROP TABLE IF EXISTS left_table;
    DROP TABLE IF EXISTS right_table;
    
    create table left_table (x int, y int);
    create table right_table (x int, y int);
    
    insert into left_table
    select (case x % 4 when 1 then null else x end), x % 10
    from generate_series(1,3000000) x;
    
    insert into right_table
    select (case x % 4 when 1 then null else x end), x % 10
    from generate_series(1,3000000) x;
  3. Ensure that your instance is enabled and configured for parallel hash joins, this is the default for instances, but depending is always worth verifying. You should see the following values.

    • parallel_type_cost = 0.1
    • parallel_setup_cost = 1000
    • max_parallel_workers_per_gather = 2
    • enable_parallel_hash = on
    show parallel_tuple_cost;
    show parallel_setup_cost;
    show max_parallel_workers_per_gather;
    show enable_parallel_hash;

    NOTE: If the table values are very small, the effort of doing a parallel operation may be more than the effort to do a non-parallel execution. The tables and rows above should be enough to generate a Parallel Hash Full Join plan.

  4. Run the following command to see the execution plan of the select statement, note that we are disabling the calculation of costs to ensure that you see the parallel hash full join in the execution plan. This is because the costs to do parallel for this query may be higher than simply doing a regular hash full join:

    EXPLAIN (costs off)
    SELECT count(*)
    FROM left_table lt
    FULL OUTER JOIN right_table rt
    ON lt.x = rt.x;
  5. In the execution plan, you should notice the use of a Parallel Hash Full Join.

    Execution plan with a parallel hash full join

  6. In previous versions of PostgreSQL, you would see a regular Hash Full Join.

    Execution plan with a hash full join

Full JOINs are commonly used to find the differences between 2 tables. Prior to Postgres 16, parallelism was not implemented for full hash JOINs, which made them slower to execute. (link to commit)

Task 2: Allow aggregate functions string_agg() and array_agg() to be parallelized

Aggregate functions typically perform some kind of mathematical operation on a column or set of columns. If you were to calculate several aggregates at once, you could probably imagine that doing each one in a serialized manner would likely take much longer than doing it in a parallel manner.

Not all aggregate functions have supported this type of optimization, as such with the string_agg() and array_agg() functions. In PostgreSQL 16, this support was added and per the description on the code commit "adds combine, serial and deserial functions for the array_agg() and string_agg() aggregate functions, thus allowing these aggregates to partake in partial aggregations. This allows both parallel aggregation to take place when these aggregates are present and also allows additional partition-wise aggregation plan shapes to include plans that require additional aggregation once the partially aggregated results from the partitions have been combined."

The following is an example of a query that performs aggregates with the two functions included. If this were to run on a pre-16 version, the query would be much slower than in version 16.

  1. In pgAdmin, run the following:

    drop table if exists agg_test;
    
    create table agg_test (x int, y int);
    
    insert into agg_test
    select (case x % 4 when 1 then null else x end), x % 10
    from generate_series(1,500000) x;
  2. Run a select statement against it to review the data generated:

    SELECT
        y,
        string_agg(x::text, ',') AS t,
        string_agg(x::text::bytea, ',') AS b,
        array_agg(x) AS a,
        array_agg(ARRAY[x]) AS aa
    FROM
        agg_test
    GROUP BY
        y;

    Results from the query are displayed.

  3. Review the EXPLAIN plan details, notice the HashAggregate plan and the costs:

    EXPLAIN SELECT
        y,
        string_agg(x::text, ',') AS t,
        string_agg(x::text::bytea, ',') AS b,
        array_agg(x) AS a,
        array_agg(ARRAY[x]) AS aa
    FROM
        agg_test
    GROUP BY
        y;
  4. In 16+, you will see a Finalize GroupAggregate:

    Results from the query are displayed.

  5. In pre-16 instances, you would see a HashAggregate (feel free to test on the PG14 instance):

    Results from the query are displayed.

For a more in-depth look at the code change for this feature, reference here.

Task 3: Add EXPLAIN option GENERIC_PLAN to display the generic plan for a parameterized query

Previously, attempting to get an execution plan for a parameterized query was fairly complicated. For example, using a prepared statement will have several executions which may required you to execute all the sub-executions separately and then put the results together. Using the new PG16 feature will eliminate those extra steps when attempting to find performance issues with parameterized queries.

  1. Run the following command to attempt to get an execution plan for a parameterized query using the pre-16 method:

    EXPLAIN SELECT * FROM listings WHERE listing_id = $1;
  2. You should get an error.

    An error is displayed from the query.

  3. To get an execution plan for a parametrized query, run the following:

    EXPLAIN (GENERIC_PLAN) SELECT * FROM listings WHERE listing_id = $1;

    Results from the query are displayed.

    Note the use of the parenthesis. The old way (shown above) was to not utilize parenthesis and is only for backwards compatibility. Newer options such as GENERIC_PLAN will only work with the new syntax.

As you can see above, you can use parameter placeholders like $1 instead of an unknown or variable value. However, there are certain restrictions:

  • You can use parameters only with the statements SELECT, INSERT, UPDATE, DELETE and VALUES.
  • You can only use parameters instead of constants (literals). You can’t use parameters instead of identifiers (object names) or keywords, among other things.

Task 4: Using pg_stat_io for enhanced IO monitoring

pg_stat_io is a new catalog view that displays statistics around reads and writes and as of Postgres 16, extends information.

Per the postgresql documentation : "The pg_stat_io view will contain one row for each combination of backend type, target I/O object, and I/O context, showing cluster-wide I/O statistics. Combinations which do not make sense are omitted.

Currently, I/O on relations (e.g. tables, indexes) is tracked. However, relation I/O which bypasses shared buffers (e.g. when moving a table from one tablespace to another) is currently not tracked."

  1. Run the following command to clear the stats and see the information available, you should see all zeros:

    select pg_stat_reset_shared('io');
    
    select * from pg_stat_io order by writes desc;

    Query results showing no activity

  2. Using pgbench you can generate some IO data (~750MB of data). In your Windows-based lab virtual machine, open a command prompt window, in the windows search area, type cmd and select it.

  3. Run the following command. Be sure to replace the PREFIX and REGION tokens. On Windows you can find the pgbench tool in the C:\Program Files\PostgreSQL\16\bin directory, on ubuntu, you can install it using sudo apt-get install postgresql-contrib. When prompted, enter the Seattle123Seattle123 password:

    pgbench -i -s 50 -h PREFIX-pg-flex-REGION-16.postgres.database.azure.com -p 5432 -U s2admin -d airbnb

    NOTE: In Azure Cloud Shell, you will need to check the version to ensure it is compatable with your target version (pgbench --version)

  4. Again, run the previous command to see the newly generated IO information.

    --see client backed / bulk write in context after pgbench
    select * 
    from pg_stat_io 
    order by writes desc;
  5. You should see the backend_type client_backend values change to be much higher:

    Query results with IO activity displayed.

  6. pg_stat_io will also break apart the operations into more granular statistics via the context column. The pgbench test above generated context values in the vacuum and bulkwrite context categories. When using basic DDL commands, the values will go into different context categories.

  7. Run the following command to create some more test data using basic DDL INSERT:

    insert into agg_test
    select (case x % 4 when 1 then null else x end), x % 10
    from generate_series(1,200000) x;
    
    checkpoint;
  8. Again, run the previous command to see the newly generated IO information.

    select * from pg_stat_io 
    order by writes desc;
  9. Review the backendtype of client_backend, object of relation, context of normal and the extends column value. Because you were adding data to an existing table, you are performing extends operations.

Some common uses for this data include:

  • Review if high evictions are occurring. If so, shared buffers should be increased.
  • Large number of fsyncs by client backends could indicate misconfiguration of the shared buffers and/or the checkpointer.

Exercise 5: Logical Replication

Task 1 : Setup Publication

  1. You will need to assign the REPLICATION permission in order to setup replication. Run the following on the PREFIX-pg-flex-REGION-16 server:

    ALTER ROLE s2admin WITH REPLICATION;
  2. On the PREFIX-pg-flex-REGION-16 server for the airbnb database, run the following to create a publication, add a table to it and then create a slot:

    create publication my_pub;
    
    alter publication my_pub add table listings;
    alter publication my_pub add table calendar;
    alter publication my_pub add table reviews;

Task 2: Setup Subcsriber

  1. On the PREFIX-pg-flex-REGION-14 server for the airbnb database, run the following. It will setup the subscription (you should already have the tables from the lab setup). Be sure to replace the PREFIX and REGION values:

    CREATE SUBSCRIPTION my_pub_subscription CONNECTION 'host=PREFIX-pg-flex-REGION-16.postgres.database.azure.com port=5432 dbname=airbnb user=s2admin password=Seattle123Seattle123' PUBLICATION my_pub WITH (copy_data=true, enabled=true, create_slot=true, slot_name='my_pub_slot');

Task 3: Sync Data

  1. On the PREFIX-pg-flex-REGION-16 server, run the following to add some rows to the calendar table:

    INSERT INTO CALENDAR values (241032, '2024-01-01', 85, 't');
    INSERT INTO CALENDAR values (241032, '2024-01-02', 85, 't');
    INSERT INTO CALENDAR values (241032, '2024-01-03', 85, 't');
    INSERT INTO CALENDAR values (241032, '2024-01-04', 85, 't');
    INSERT INTO CALENDAR values (241032, '2024-01-05', 85, 't');
    INSERT INTO CALENDAR values (241032, '2024-01-06', 85, 't');
    INSERT INTO CALENDAR values (241032, '2024-01-07', 85, 't');
  2. On the PREFIX-pg-flex-REGION-14 server, run the following, notice that the row has replicated to from 16 to 14 instance:

    SELECT * 
    FROM calendar
    ORDER BY date desc
    limit 50;

    Results showing the data is being replicated.

Exercise 6: PgBouncer (Optional)

PgBouncer is a well-known and supported 3rd party open-source, community-developed project. PgBouncer is commonly used to reduce resource overhead by managing a pool of connections to PostgreSQL, making it ideal for environments with high concurrency and frequent short-lived connections. It enables optimization by reducing the load on PostgreSQL server caused by too many connections.

References:

Task 1: Enable PgBouncer and PgBouncer Metrics

You can use PgBouncer metrics to monitor the performance of the PgBouncer process, including details for active connections, idle connections, total pooled connections, and the number of connection pools. Each metric is emitted at a 1-minute interval and has up to 93 days of history. Customers can configure alerts on the metrics and also access the new metrics dimensions to split and filter metrics data by database name. PgBouncer metrics are disabled by default. For PgBouncer metrics to work, both the server parameters pgbouncer.enabled and metrics.pgbouncer_diagnostics must be enabled. These parameters are dynamic and don't require an instance restart.

  • Browse to the Azure Portal and your PREFIX-pg-flex-REGION-16 resource.

  • Under Settings, select Server parameters.

  • Search for the pgbouncer.enabled dynamic parameters.

  • Toggle the setting to TRUE.

    Server parameters setting are shown.

  • Search for the metrics.pgbouncer_diagnostics dynamic parameters.

  • Toggle the setting to ON.

  • Select Save.

Task 2: Performance without PgBouncer

  1. Switch to the Azure Portal.

  2. Browse to the PREFIX-pg-flex-REGION-16.postgres.database.azure.com instance.

  3. Under Monitoring select Metrics.

    Select the Metrics link

  4. For the Metric, under the TRAFFIC category, select Active connections.

    Select the Active Connection under TRAFFIC

  5. Select Add metric.

  6. Under the PGBOUNCER category, select Active client connections.

  7. In the top right, select the time to be Last 30 minutes then select Apply.

    Select the Active client connections under PGBOUNCER

  8. In your Windows-based lab virtual machine, open a command prompt window, in the windows search area, type cmd and select it.

  9. Run the following commands to execute a pgbench test directly against the database server, when prompted enter the password Seattle123Seattle123. Notice the use of the -c parameter that will create 100 different connections, be sure to replace PREFIX with your lab information. On Windows you can find the pgbench tool in the C:\Program Files\PostgreSQL\16\bin directory, on ubuntu, you can install it using sudo apt-get install postgresql-contrib::

    pgbench -c 100 -T 180 -h PREFIX-pg-flex-REGION-16.postgres.database.azure.com -p 5432 -U s2admin -d airbnb
  10. Switch back to the Metrics window, after a minute, you should see the active connections increase.

    Graph of active connections increasing.

  11. Stop the test or wait for it to finish.

Task 3: Performance with PgBouncer

  1. Switch back to the windows command prompt.

  2. Run the following commands to execute a pgbench test against the PgBouncer instance, when prompted enter the password Seattle123Seattle123. Notice the change of the port to the PgBouncer port of 6432, be sure to replace PREFIX and REGION with your lab information:

    pgbench -c 100 -T 180 -h PREFIX-pg-flex-REGION-16.postgres.database.azure.com -p 6432 -U s2admin -d airbnb
  3. Switch back to the metrics window. After a minute, you should see that the server active connections will max out and the PgBouncer active client connections will increase to handle the load on behalf of the server.

    Graph of active connections and active connections increasing.

Exercise 6: Other Features (Optional)

Task 1: Use new VACUUM options to improve VACUUM performance

The PostgreSQL VACUUM command is used to garbage-collect and analyze databases. It works by removing dead tuples left over by large changes to a database (such as frequently updated tables). By removing the gaps between the data, you can speed up the performance of specific operations and increase your disk space.

Once of the new features to VACUUM in Postgres 16 is the ability to update the cost limit on the fly. This will allow people that run large production databases that may be running out of disk space a bit too quickly; which if to occur, would likely take down the production system. to get VACUUM to execute faster. During a VACUUM is could be that it is not running as fast as it needs to because of the cost limit.

By allowing the change during the operation, you can speed up the VACUUM operation without restarting it.

These server parameters are called vacuum_cost* or auto_vacuum_vacuum_cost*. The default for the vacuum_cost_limit is 200 and auto_vacuum_vacuum_cost is -1 which indicates to use the default vacuum cost limit.

Perform the following steps to see how this could potentially work:

  1. Execute the following to start a vacuum operation:

    vacuum analyze;
  2. While the operation is executing, run the following command to increase the cost limits. Note that in pre-16 versions, this command would have no effect on currently running operations, in 16, this action applies during the execution:

    SET vacuum_cost_limit TO 400;

    NOTE: These can also be set in the Azure Portal.

  3. Use the following command to monitor the vacuum operations:

    select schemaname,relname,n_dead_tup,n_live_tup,round(n_dead_tup::float/n_live_tup::float*100) dead_pct,autovacuum_count,last_vacuum,last_autovacuum,last_autoanalyze,last_analyze from pg_stat_all_tables where n_live_tup >0;

For more information on Azure Database for PostgreSQL Flexible Server autovacuum features read Autovacuum Tuning in Azure Database for PostgreSQL - Flexible Server.

For a more in-depth look at the code change for this feature, reference here.