Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Respect deadlines for index backfill #5326

Closed
jaki opened this issue Aug 5, 2020 · 2 comments
Closed

[YSQL] Respect deadlines for index backfill #5326

jaki opened this issue Aug 5, 2020 · 2 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL)
Milestone

Comments

@jaki
Copy link
Contributor

jaki commented Aug 5, 2020

YCQL index backfill respects master to tserver RPC deadline by pausing backfill and returning state to master when deadline is near. YSQL disregards the deadline.

Make YSQL mindful of the deadline. This involves returning paging state from tserver to postgres to tserver to master.

@jaki jaki changed the title [YSQL] Have postgres [YSQL] Respect deadlines for index backfill Aug 5, 2020
@jaki jaki self-assigned this Aug 5, 2020
@jaki jaki added the area/ysql Yugabyte SQL (YSQL) label Aug 5, 2020
@emhna
Copy link
Contributor

emhna commented Oct 29, 2020

I found an issue where the query for building the index backfill stalls for over 12 hours on test size as small as 250 mb (7 million rows).

The test ran after applying the OOM fix and failure handling improvement.

The timeout flags were applied as follows:
./bin/yb-ctl create --master_flags "ysql_disable_index_backfill=false,index_backfill_rpc_timeout_ms=12000000,\”vmodule=backfill_index=3\"" --tserver_flags "ysql_disable_index_backfill=false,yb_client_admin_operation_timeout_sec=1200000"

By using large timeouts, I saw that the overall CPU usage was very low (< 3%). Without large timeouts, CPU goes up very high to 100%. In either cases, the query takes too long to complete.

jaki added a commit that referenced this issue Oct 31, 2020
Summary:

To improve YSQL index backfill failure handling, switch over to a new
design:

- Make `pg_index` permissions `indislive`, `indisready`, `indisvalid`
  the authority, and don't really use the DocDB permissions (only use
  three of them for persisting state information)
- For the backfill step, add a postgres-master RPC `BackfillIndex`

Now, `CREATE INDEX` with backfill enabled looks like

1. postgres: send create index request to master
1. master: create index table
1. master: alter index table to have index info at
   `WRITE_AND_DELETE` perm
   - don't set fully applied schema
   - set schema to `WRITE_AND_DELETE`
   - set `ALTERING` state
1. tservers: apply the alter to the indexed table
1. master: _don't_ move on to the next permission
   - don't set fully applied schema
   - keep schema at `WRITE_AND_DELETE`
1. postgres: create postgres index at `indislive` permission
1. postgres: add `indisready` permission
1. postgres: send backfill request and wait for at least
   `READ_WRITE_AND_DELETE` permissions
1. master: move on to backfilling
1. master: get safe time for read
1. tservers: get safe time
1. master: send backfill requests to tservers
1. tservers: send backfill requests to postgres
1. master: alter to success (`READ_WRITE_AND_DELETE`) or abort
   (`WRITE_AND_DELETE_WHILE_REMOVING`)
   - clear fully applied schema
   - set schema to `READ_WRITE_AND_DELETE` or
     `WRITE_AND_DELETE_WHILE_REMOVING`
   - clear `ALTERING` state
1. postgres: finish waiting and, on success, add `indisvalid` permission

If postgres dies before backfill, master isn't stuck, and a separate
request can be made to drop the half-created index.  If postgres dies
during or after backfill, we can still drop the index, but you may need
to kill any backfills in progress.

If master fails to backfill, postgres will just stop at that point and
not set the index public, so a separate request can be made to drop the
index.

Add some gflags:

- master:
  - `ysql_backfill_is_create_table_done_delay_ms`: delay after finding
    index info in table for YSQL `IsCreateTableDone` (currently helpful,
    but not foolproof, for preventing backfill from happening while a
    tserver doesn't have the index info--issue #6234)
  - `ysql_index_backfill_rpc_timeout_ms`: deadline for master to tserver
    backfill tablet rpc calls (useful to handle large tables because we
    currently don't respect the deadline and try to backfill the entire
    tablet at once--issue #5326)
- tserver:
  - `TEST_ysql_index_state_flags_update_delay_ms`: delay after
    committing `pg_index` state flag change (currently helpful, but not
    foolproof for consistency, when, in our current design, commits
    aren't guaranteed to have been seen by all tservers by the time the
    commit finishes--issue #6238)
  - `ysql_wait_until_index_permissions_timeout_ms`: timeout for
    `WaitUntilIndexPermissionsAtLeast` client function (again, useful
    for handling large tables--issue #5326)

Add some helper functions:

- `yb::ExternalMiniCluster::SetFlagOnMasters` for external minicluster
  tests (currently unused)
- `yb::pgwrapper::GetBool` for pgwrapper tests

Adjust some tests to the new backfill model:

- `PgLibPqTest.BackfillPermissions`
- `PgLibPqTest.BackfillReadTime`

Fix `BackfillIndexesForYsql` to better handle libpq connections and
results.

Other issues:

- `CREATE INDEX` can't be stopped with Ctrl+C, probably because of the
  long `YBCPgWaitUntilIndexPermissionsAtLeast` call
- An in-progress master to tserver backfill RPC will not notice a DROP
  INDEX removes index tablets
- master leader failover during backfill will not cause backfill to
  resume, unlike YCQL (issue #6218)

Close: #5325

Test Plan:

- `./yb_build.sh --cxx-test pgwrapper_pg_libpq-test --gtest_filter`
  - `PgLibPqTest.BackfillWaitBackfillTimeout`
  - `PgLibPqTest.BackfillDropAfterFail`
  - `PgLibPqTest.BackfillMasterLeaderStepdown`
  - `PgLibPqTest.BackfillDropWhileBackfilling`
  - `PgLibPqTest.BackfillAuth`

Reviewers: amitanand, mihnea

Reviewed By: mihnea

Subscribers: yql, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D9682
jaki added a commit that referenced this issue Oct 31, 2020
Summary:
To improve YSQL index backfill failure handling, switch over to a new
design:

- Make `pg_index` permissions `indislive`, `indisready`, `indisvalid`
  the authority, and don't really use the DocDB permissions (only use
  three of them for persisting state information)
- For the backfill step, add a postgres-master RPC `BackfillIndex`

Now, `CREATE INDEX` with backfill enabled looks like

1. postgres: send create index request to master
1. master: create index table
1. master: alter index table to have index info at
   `WRITE_AND_DELETE` perm
   - don't set fully applied schema
   - set schema to `WRITE_AND_DELETE`
   - set `ALTERING` state
1. tservers: apply the alter to the indexed table
1. master: _don't_ move on to the next permission
   - don't set fully applied schema
   - keep schema at `WRITE_AND_DELETE`
1. postgres: create postgres index at `indislive` permission
1. postgres: add `indisready` permission
1. postgres: send backfill request and wait for at least
   `READ_WRITE_AND_DELETE` permissions
1. master: move on to backfilling
1. master: get safe time for read
1. tservers: get safe time
1. master: send backfill requests to tservers
1. tservers: send backfill requests to postgres
1. master: alter to success (`READ_WRITE_AND_DELETE`) or abort
   (`WRITE_AND_DELETE_WHILE_REMOVING`)
   - clear fully applied schema
   - set schema to `READ_WRITE_AND_DELETE` or
     `WRITE_AND_DELETE_WHILE_REMOVING`
   - clear `ALTERING` state
1. postgres: finish waiting and, on success, add `indisvalid` permission

If postgres dies before backfill, master isn't stuck, and a separate
request can be made to drop the half-created index.  If postgres dies
during or after backfill, we can still drop the index, but you may need
to kill any backfills in progress.

If master fails to backfill, postgres will just stop at that point and
not set the index public, so a separate request can be made to drop the
index.

Add some gflags:

- master:
  - `ysql_backfill_is_create_table_done_delay_ms`: delay after finding
    index info in table for YSQL `IsCreateTableDone` (currently helpful,
    but not foolproof, for preventing backfill from happening while a
    tserver doesn't have the index info--issue #6234)
  - `ysql_index_backfill_rpc_timeout_ms`: deadline for master to tserver
    backfill tablet rpc calls (useful to handle large tables because we
    currently don't respect the deadline and try to backfill the entire
    tablet at once--issue #5326)
- tserver:
  - `TEST_ysql_index_state_flags_update_delay_ms`: delay after
    committing `pg_index` state flag change (currently helpful, but not
    foolproof for consistency, when, in our current design, commits
    aren't guaranteed to have been seen by all tservers by the time the
    commit finishes--issue #6238)
  - `ysql_wait_until_index_permissions_timeout_ms`: timeout for
    `WaitUntilIndexPermissionsAtLeast` client function (again, useful
    for handling large tables--issue #5326)

Add some helper functions:

- `yb::ExternalMiniCluster::SetFlagOnMasters` for external minicluster
  tests (currently unused)
- `yb::pgwrapper::GetBool` for pgwrapper tests

Adjust some tests to the new backfill model:

- `PgLibPqTest.BackfillPermissions`
- `PgLibPqTest.BackfillReadTime`

Fix `BackfillIndexesForYsql` to better handle libpq connections and
results.

Other issues:

- `CREATE INDEX` can't be stopped with Ctrl+C, probably because of the
  long `YBCPgWaitUntilIndexPermissionsAtLeast` call
- An in-progress master to tserver backfill RPC will not notice a DROP
  INDEX removes index tablets
- master leader failover during backfill will not cause backfill to
  resume, unlike YCQL (issue #6218)

Depends on D9803

Test Plan:
Jenkins: rebase: 2.3

- `./yb_build.sh --cxx-test pgwrapper_pg_libpq-test --gtest_filter`
  - `PgLibPqTest.BackfillWaitBackfillTimeout`
  - `PgLibPqTest.BackfillDropAfterFail`
  - `PgLibPqTest.BackfillMasterLeaderStepdown`
  - `PgLibPqTest.BackfillDropWhileBackfilling`
  - `PgLibPqTest.BackfillAuth`

Reviewers: mihnea

Reviewed By: mihnea

Subscribers: yql, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D9804
@jaki
Copy link
Contributor Author

jaki commented Mar 31, 2021

Why is this issue important?

  • if the deadline is too short to backfill an entire tablet, it can get into a long retry loop (150 attempts by default: index_backfill_rpc_max_retries; 10 minutes between attempts by default: index_backfill_rpc_max_delay_ms)
  • with long deadline, a lost master-tserver RPC will stall progress for the entire deadline period (default 30m)
  • with long deadline, it will be harder to implement handling an interrupt like ctrl+c
  • without chunking, master RPC tasks don't show progress of backfilling each tablet if we only send a single RPC and expect the whole tablet to be backfilled within that single request
  • without respecting deadlines, backfill processes may keep running after CREATE INDEX postgres client times out

@m-iancu m-iancu added this to the 2.7.x milestone Apr 27, 2021
m-iancu added a commit that referenced this issue Aug 19, 2021
Summary:
This is basically @nocaway 's diff with minor updates, rebased to master, and tested against the follow-up tserver changes using this infrastructure to set up pagination and throttling support for YSQL index backfill.

It enhances the BACKFILL command with a new clause containing a backfill instruction (encoded as an opaque string).
The backfill instruction defines how many rows to process (backfill) and which row key to start from (i.e. where the previous backfill call left off).
This way the tserver can configure both the per-backfill-request work (number of rows per backfill call), as well as the overall backfill rate (throttle the issuing of backfill calls).

Note that the grammar change is safe w.r.t. rolling upgrades because BACKFILL command is only used internally by the tserver to a local Postgres instance.

#### Detailed changes

**(1) Backfill Process Description.**
- A **//requesting//** tablet-server sends out a **BACKFILL** request with an `opaque string` for backfill instruction.  No code-layer needs to read this string except tablet-server
- Proxy process the **BACKFILL** statement by sending a scan request together with the `opaque string`.
- A **//processing//**  tablet-server will need to read the `opaque string` to know when to stop scanning.  Postgres / Proxy will NOT process or care about this LIMIT.
- When finishing and replying to scanning request, the **//processing//** tablet-server must send back an `opaque string` whose contents should have been updated for the followup backfill instruction.
- Postgres / Proxy layer will forward the result `opaque string` to the //**requesting**// tablet-server by replying to the BACKFILL request
- The **//requesting//** tablet-server reads the `opaque string` to decide whether or not to continue the backfill process and how to execute it.

**(2) BACKFILL Statement Syntax**
```
BACKFILL <index oid list> [ WITH <opaque instruction string> ] READ TIME <time> PARTITION <bounds>
```

**(3) BACKFILL Processing and Execution**
- Postgres parses `BACKFILL` statement.
- Postgres sends query to scan the primary table.
- Postgres insert the fetched data to index.
- When scanning and inserting Postgres passes IN/OUT parameters for the BACKFILLing operation.

**(4) BACKFILL Input / Output**
- IN parameters are sent from Postgres to Yugabyte to provide further information on how to execute BACKFILL statement. This is already done.
- This diff add an OUT parameter to send the execution result from Yugabyte to Postgres.  BACKFILL spec from tablet server will be written to this OUT param, which will be returned-value of the BACKFILL statement.
```
typedef struct PgExecOutParamValue {
  const char *bfoutput = NULL;

  // The following parameters are not yet used.
  // Detailing execution status in yugabyte.
  const char *status = NULL;
  int64_t status_code = 0;
} YBCPgExecOutParamValue;

typedef struct PgExecParameters {
  .....
  PgExecOutParam out_param = NULL;
}
```

**(5) Implementation for backfill instruction.**
- The aforementioned `opaque string` is a serialized string of a protobuf for the backfill instruction.
- Example
```
// Backfill instruction.
message PgsqlBackfillSpecPB {
  // Limit for each backfill batch. "-1" means no limit.
  optional int64 limit = 1 [default = -1];

  // Accumulated counter is used to check if the number of scanned rows has reached limit.
  optional int64 count = 2 [default = 0];

  // The next row key to read for backfill. Defined/used similar to how it is used in paging state.
  // i.e. The row key (SubDocKey = [DocKey + HybridTimestamp]) of the next row to read.
  optional bytes next_row_key = 3;
}
```

**(6) Scanning Implementation**
- Tablet server Serialize(PgsqlBackfillSpecPB) into hex string and send it together with the BACKFILL statement. See file `tablet.cc`
- Backfill instruction string will be sent back & forth between tablet-server and proxy-server during scanning. The read request and response will have a new entry for the `opaque string`.  Only tablet-server needs to read this string.
```
message PgsqlReadRequestPB {
  ....
  // Instruction for BACKFILL operation from master/table server.
  // This is the serialized-string of "message PgsqlBackfillSpecPB".
  optional bytes backfill_spec = 31;
}
message PgsqlResponsePB {
  ....
  optional bytes backfill_spec = 13;
}
```
- The PgsqlBackfillSpecPB string is not the same as PgsqlPagingStatePB.  Paging state is cursor that is advanced every 1024 rows by default and is to be read by all code-layers while backfill instruction string is created, interpreted, and processed by only tablet-server, and it represents the state of a backfill process and not the state of scanning.
- At the end of scanning, either end-of-tablet or limit-is-reached, the server will need to send appropriate PgsqlBackfillSpecPB as a serialized-hex-string for the next batch of backfill.  And postgres will forward this string back to the server who requests the BACKFILL.

**(7) Writing Implementation **
This remains the same.

Test Plan: Tested together with https://phabricator.dev.yugabyte.com/D12425 ([#7889, #5326] YBASE: Implement chunking/throttling in Tablet::BackfillIndexForYSQL)

Reviewers: neil, amitanand, jason

Reviewed By: jason

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D12435
amitanandaiyer added a commit that referenced this issue Aug 20, 2021
…llIndexForYSQL

Summary:
Depends on https://phabricator.dev.yugabyte.com/D12435

Implement chunking/throttling in Tablet::BackfillIndexForYSQL
* At a high level, we loop around in Tablet::BackfillIndexForYSQL to call `BACKFILL INDEX` on postgres until the backfill is done, or we are approaching the deadline.
* The loop also throttles/sleeps itself to maintain the specified rate of backfill.

Notes regarding rolling-upgrades:
  * The diff assumes that the TServer issuing the new format BACKFILL INDEX command only issues it to a postgres process, which understands the new format. This holds true, because the postgres process that the TServer talks to is the local postgres process.

  * Postgres/yb-client will read from the "leader" of the tablet. If the leadership has moved, this may be a different tablet/TServer. In this case, it may read/write from/to a TServer with a different version. In this case, the backfill_spec field is ignored by the server running the older process. This will result in the "old" behavior where the BACKFILL INDEX command will run until the end of the tablet range.

Perf notes:
  Connecting to postgres to run a BACKFILL INDEX command may have some fixed set up costs. So having a very small batch size could be inefficient.

More detailed description of the changes:

------------
Docdb:
# Implement support for backfill_spec.
# Set it accordingly while calling BACKFILL INDEX
# Parse/update backfill_spec accordingly while reading pgsql_operation for a backfill read.
# Loop around BACKFILL INDEX to implement throttling/progress-in-batches

Test Plan:
Tests added.

ybd --cxx-test pg_index_backfill-test
ybd --cxx-test integration-tests_cassandra_cpp_driver-test

Reviewers: mihnea, neil, jason

Reviewed By: jason

Subscribers: bogdan, yql, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D12425
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL)
Projects
None yet
Development

No branches or pull requests

5 participants