-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Respect deadlines for index backfill #5326
Comments
I found an issue where the query for building the index backfill stalls for over 12 hours on test size as small as 250 mb (7 million rows). The test ran after applying the OOM fix and failure handling improvement. The timeout flags were applied as follows: By using large timeouts, I saw that the overall CPU usage was very low (< 3%). Without large timeouts, CPU goes up very high to 100%. In either cases, the query takes too long to complete. |
Summary: To improve YSQL index backfill failure handling, switch over to a new design: - Make `pg_index` permissions `indislive`, `indisready`, `indisvalid` the authority, and don't really use the DocDB permissions (only use three of them for persisting state information) - For the backfill step, add a postgres-master RPC `BackfillIndex` Now, `CREATE INDEX` with backfill enabled looks like 1. postgres: send create index request to master 1. master: create index table 1. master: alter index table to have index info at `WRITE_AND_DELETE` perm - don't set fully applied schema - set schema to `WRITE_AND_DELETE` - set `ALTERING` state 1. tservers: apply the alter to the indexed table 1. master: _don't_ move on to the next permission - don't set fully applied schema - keep schema at `WRITE_AND_DELETE` 1. postgres: create postgres index at `indislive` permission 1. postgres: add `indisready` permission 1. postgres: send backfill request and wait for at least `READ_WRITE_AND_DELETE` permissions 1. master: move on to backfilling 1. master: get safe time for read 1. tservers: get safe time 1. master: send backfill requests to tservers 1. tservers: send backfill requests to postgres 1. master: alter to success (`READ_WRITE_AND_DELETE`) or abort (`WRITE_AND_DELETE_WHILE_REMOVING`) - clear fully applied schema - set schema to `READ_WRITE_AND_DELETE` or `WRITE_AND_DELETE_WHILE_REMOVING` - clear `ALTERING` state 1. postgres: finish waiting and, on success, add `indisvalid` permission If postgres dies before backfill, master isn't stuck, and a separate request can be made to drop the half-created index. If postgres dies during or after backfill, we can still drop the index, but you may need to kill any backfills in progress. If master fails to backfill, postgres will just stop at that point and not set the index public, so a separate request can be made to drop the index. Add some gflags: - master: - `ysql_backfill_is_create_table_done_delay_ms`: delay after finding index info in table for YSQL `IsCreateTableDone` (currently helpful, but not foolproof, for preventing backfill from happening while a tserver doesn't have the index info--issue #6234) - `ysql_index_backfill_rpc_timeout_ms`: deadline for master to tserver backfill tablet rpc calls (useful to handle large tables because we currently don't respect the deadline and try to backfill the entire tablet at once--issue #5326) - tserver: - `TEST_ysql_index_state_flags_update_delay_ms`: delay after committing `pg_index` state flag change (currently helpful, but not foolproof for consistency, when, in our current design, commits aren't guaranteed to have been seen by all tservers by the time the commit finishes--issue #6238) - `ysql_wait_until_index_permissions_timeout_ms`: timeout for `WaitUntilIndexPermissionsAtLeast` client function (again, useful for handling large tables--issue #5326) Add some helper functions: - `yb::ExternalMiniCluster::SetFlagOnMasters` for external minicluster tests (currently unused) - `yb::pgwrapper::GetBool` for pgwrapper tests Adjust some tests to the new backfill model: - `PgLibPqTest.BackfillPermissions` - `PgLibPqTest.BackfillReadTime` Fix `BackfillIndexesForYsql` to better handle libpq connections and results. Other issues: - `CREATE INDEX` can't be stopped with Ctrl+C, probably because of the long `YBCPgWaitUntilIndexPermissionsAtLeast` call - An in-progress master to tserver backfill RPC will not notice a DROP INDEX removes index tablets - master leader failover during backfill will not cause backfill to resume, unlike YCQL (issue #6218) Close: #5325 Test Plan: - `./yb_build.sh --cxx-test pgwrapper_pg_libpq-test --gtest_filter` - `PgLibPqTest.BackfillWaitBackfillTimeout` - `PgLibPqTest.BackfillDropAfterFail` - `PgLibPqTest.BackfillMasterLeaderStepdown` - `PgLibPqTest.BackfillDropWhileBackfilling` - `PgLibPqTest.BackfillAuth` Reviewers: amitanand, mihnea Reviewed By: mihnea Subscribers: yql, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D9682
Summary: To improve YSQL index backfill failure handling, switch over to a new design: - Make `pg_index` permissions `indislive`, `indisready`, `indisvalid` the authority, and don't really use the DocDB permissions (only use three of them for persisting state information) - For the backfill step, add a postgres-master RPC `BackfillIndex` Now, `CREATE INDEX` with backfill enabled looks like 1. postgres: send create index request to master 1. master: create index table 1. master: alter index table to have index info at `WRITE_AND_DELETE` perm - don't set fully applied schema - set schema to `WRITE_AND_DELETE` - set `ALTERING` state 1. tservers: apply the alter to the indexed table 1. master: _don't_ move on to the next permission - don't set fully applied schema - keep schema at `WRITE_AND_DELETE` 1. postgres: create postgres index at `indislive` permission 1. postgres: add `indisready` permission 1. postgres: send backfill request and wait for at least `READ_WRITE_AND_DELETE` permissions 1. master: move on to backfilling 1. master: get safe time for read 1. tservers: get safe time 1. master: send backfill requests to tservers 1. tservers: send backfill requests to postgres 1. master: alter to success (`READ_WRITE_AND_DELETE`) or abort (`WRITE_AND_DELETE_WHILE_REMOVING`) - clear fully applied schema - set schema to `READ_WRITE_AND_DELETE` or `WRITE_AND_DELETE_WHILE_REMOVING` - clear `ALTERING` state 1. postgres: finish waiting and, on success, add `indisvalid` permission If postgres dies before backfill, master isn't stuck, and a separate request can be made to drop the half-created index. If postgres dies during or after backfill, we can still drop the index, but you may need to kill any backfills in progress. If master fails to backfill, postgres will just stop at that point and not set the index public, so a separate request can be made to drop the index. Add some gflags: - master: - `ysql_backfill_is_create_table_done_delay_ms`: delay after finding index info in table for YSQL `IsCreateTableDone` (currently helpful, but not foolproof, for preventing backfill from happening while a tserver doesn't have the index info--issue #6234) - `ysql_index_backfill_rpc_timeout_ms`: deadline for master to tserver backfill tablet rpc calls (useful to handle large tables because we currently don't respect the deadline and try to backfill the entire tablet at once--issue #5326) - tserver: - `TEST_ysql_index_state_flags_update_delay_ms`: delay after committing `pg_index` state flag change (currently helpful, but not foolproof for consistency, when, in our current design, commits aren't guaranteed to have been seen by all tservers by the time the commit finishes--issue #6238) - `ysql_wait_until_index_permissions_timeout_ms`: timeout for `WaitUntilIndexPermissionsAtLeast` client function (again, useful for handling large tables--issue #5326) Add some helper functions: - `yb::ExternalMiniCluster::SetFlagOnMasters` for external minicluster tests (currently unused) - `yb::pgwrapper::GetBool` for pgwrapper tests Adjust some tests to the new backfill model: - `PgLibPqTest.BackfillPermissions` - `PgLibPqTest.BackfillReadTime` Fix `BackfillIndexesForYsql` to better handle libpq connections and results. Other issues: - `CREATE INDEX` can't be stopped with Ctrl+C, probably because of the long `YBCPgWaitUntilIndexPermissionsAtLeast` call - An in-progress master to tserver backfill RPC will not notice a DROP INDEX removes index tablets - master leader failover during backfill will not cause backfill to resume, unlike YCQL (issue #6218) Depends on D9803 Test Plan: Jenkins: rebase: 2.3 - `./yb_build.sh --cxx-test pgwrapper_pg_libpq-test --gtest_filter` - `PgLibPqTest.BackfillWaitBackfillTimeout` - `PgLibPqTest.BackfillDropAfterFail` - `PgLibPqTest.BackfillMasterLeaderStepdown` - `PgLibPqTest.BackfillDropWhileBackfilling` - `PgLibPqTest.BackfillAuth` Reviewers: mihnea Reviewed By: mihnea Subscribers: yql, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D9804
Why is this issue important?
|
Summary: This is basically @nocaway 's diff with minor updates, rebased to master, and tested against the follow-up tserver changes using this infrastructure to set up pagination and throttling support for YSQL index backfill. It enhances the BACKFILL command with a new clause containing a backfill instruction (encoded as an opaque string). The backfill instruction defines how many rows to process (backfill) and which row key to start from (i.e. where the previous backfill call left off). This way the tserver can configure both the per-backfill-request work (number of rows per backfill call), as well as the overall backfill rate (throttle the issuing of backfill calls). Note that the grammar change is safe w.r.t. rolling upgrades because BACKFILL command is only used internally by the tserver to a local Postgres instance. #### Detailed changes **(1) Backfill Process Description.** - A **//requesting//** tablet-server sends out a **BACKFILL** request with an `opaque string` for backfill instruction. No code-layer needs to read this string except tablet-server - Proxy process the **BACKFILL** statement by sending a scan request together with the `opaque string`. - A **//processing//** tablet-server will need to read the `opaque string` to know when to stop scanning. Postgres / Proxy will NOT process or care about this LIMIT. - When finishing and replying to scanning request, the **//processing//** tablet-server must send back an `opaque string` whose contents should have been updated for the followup backfill instruction. - Postgres / Proxy layer will forward the result `opaque string` to the //**requesting**// tablet-server by replying to the BACKFILL request - The **//requesting//** tablet-server reads the `opaque string` to decide whether or not to continue the backfill process and how to execute it. **(2) BACKFILL Statement Syntax** ``` BACKFILL <index oid list> [ WITH <opaque instruction string> ] READ TIME <time> PARTITION <bounds> ``` **(3) BACKFILL Processing and Execution** - Postgres parses `BACKFILL` statement. - Postgres sends query to scan the primary table. - Postgres insert the fetched data to index. - When scanning and inserting Postgres passes IN/OUT parameters for the BACKFILLing operation. **(4) BACKFILL Input / Output** - IN parameters are sent from Postgres to Yugabyte to provide further information on how to execute BACKFILL statement. This is already done. - This diff add an OUT parameter to send the execution result from Yugabyte to Postgres. BACKFILL spec from tablet server will be written to this OUT param, which will be returned-value of the BACKFILL statement. ``` typedef struct PgExecOutParamValue { const char *bfoutput = NULL; // The following parameters are not yet used. // Detailing execution status in yugabyte. const char *status = NULL; int64_t status_code = 0; } YBCPgExecOutParamValue; typedef struct PgExecParameters { ..... PgExecOutParam out_param = NULL; } ``` **(5) Implementation for backfill instruction.** - The aforementioned `opaque string` is a serialized string of a protobuf for the backfill instruction. - Example ``` // Backfill instruction. message PgsqlBackfillSpecPB { // Limit for each backfill batch. "-1" means no limit. optional int64 limit = 1 [default = -1]; // Accumulated counter is used to check if the number of scanned rows has reached limit. optional int64 count = 2 [default = 0]; // The next row key to read for backfill. Defined/used similar to how it is used in paging state. // i.e. The row key (SubDocKey = [DocKey + HybridTimestamp]) of the next row to read. optional bytes next_row_key = 3; } ``` **(6) Scanning Implementation** - Tablet server Serialize(PgsqlBackfillSpecPB) into hex string and send it together with the BACKFILL statement. See file `tablet.cc` - Backfill instruction string will be sent back & forth between tablet-server and proxy-server during scanning. The read request and response will have a new entry for the `opaque string`. Only tablet-server needs to read this string. ``` message PgsqlReadRequestPB { .... // Instruction for BACKFILL operation from master/table server. // This is the serialized-string of "message PgsqlBackfillSpecPB". optional bytes backfill_spec = 31; } message PgsqlResponsePB { .... optional bytes backfill_spec = 13; } ``` - The PgsqlBackfillSpecPB string is not the same as PgsqlPagingStatePB. Paging state is cursor that is advanced every 1024 rows by default and is to be read by all code-layers while backfill instruction string is created, interpreted, and processed by only tablet-server, and it represents the state of a backfill process and not the state of scanning. - At the end of scanning, either end-of-tablet or limit-is-reached, the server will need to send appropriate PgsqlBackfillSpecPB as a serialized-hex-string for the next batch of backfill. And postgres will forward this string back to the server who requests the BACKFILL. **(7) Writing Implementation ** This remains the same. Test Plan: Tested together with https://phabricator.dev.yugabyte.com/D12425 ([#7889, #5326] YBASE: Implement chunking/throttling in Tablet::BackfillIndexForYSQL) Reviewers: neil, amitanand, jason Reviewed By: jason Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D12435
…llIndexForYSQL Summary: Depends on https://phabricator.dev.yugabyte.com/D12435 Implement chunking/throttling in Tablet::BackfillIndexForYSQL * At a high level, we loop around in Tablet::BackfillIndexForYSQL to call `BACKFILL INDEX` on postgres until the backfill is done, or we are approaching the deadline. * The loop also throttles/sleeps itself to maintain the specified rate of backfill. Notes regarding rolling-upgrades: * The diff assumes that the TServer issuing the new format BACKFILL INDEX command only issues it to a postgres process, which understands the new format. This holds true, because the postgres process that the TServer talks to is the local postgres process. * Postgres/yb-client will read from the "leader" of the tablet. If the leadership has moved, this may be a different tablet/TServer. In this case, it may read/write from/to a TServer with a different version. In this case, the backfill_spec field is ignored by the server running the older process. This will result in the "old" behavior where the BACKFILL INDEX command will run until the end of the tablet range. Perf notes: Connecting to postgres to run a BACKFILL INDEX command may have some fixed set up costs. So having a very small batch size could be inefficient. More detailed description of the changes: ------------ Docdb: # Implement support for backfill_spec. # Set it accordingly while calling BACKFILL INDEX # Parse/update backfill_spec accordingly while reading pgsql_operation for a backfill read. # Loop around BACKFILL INDEX to implement throttling/progress-in-batches Test Plan: Tests added. ybd --cxx-test pg_index_backfill-test ybd --cxx-test integration-tests_cassandra_cpp_driver-test Reviewers: mihnea, neil, jason Reviewed By: jason Subscribers: bogdan, yql, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D12425
YCQL index backfill respects master to tserver RPC deadline by pausing backfill and returning state to master when deadline is near. YSQL disregards the deadline.
Make YSQL mindful of the deadline. This involves returning paging state from tserver to postgres to tserver to master.
The text was updated successfully, but these errors were encountered: