Merge #76012 #76215 #76358

76012: server, sql: add VIEWCLUSTERSETTING user privilege r=koorosh a=koorosh Before, only users with `admin` role or `MODIFYCLUSTERSETTING` permission could view cluster settings. Now, new role is added to provide users view-only permission to view cluster settings from SQL shell and in Db Console (in Advanced debugging > Cluster settings). This change doesn't change behavior for `MODIFYCLUSTERSETTING` option, it also allows view and modify cluster settings. Release note (sql change): new user privileges are added: `VIEWCLUSTERSETTING` and `NOVIEWCLUSTERSETTING` that allows users to view cluster settings only. Resolves: #74692 76215: kvserver: loosely couple raft log truncation r=tbg a=sumeerbhola In the ReplicasStorage design we stop making any assumptions regarding what is durable in the state machine when syncing a batch that commits changes to the raft log. This implies the need to make raft log truncation more loosely coupled than it is now, since we can truncate only when certain that the state machine is durable up to the truncation index. Current raft log truncation flows through raft and even though the RaftTruncatedStateKey is not a replicated key, it is coupled in the sense that the truncation is done below raft when processing the corresponding log entry (that asked for truncation to be done). The current setup also has correctness issues wrt maintaining the raft log size, when passing the delta bytes for a truncation. We compute the delta at proposal time (to avoid repeating iteration over the entries in all replicas), but we do not pass the first index corresponding to the truncation, so gaps or overlaps cannot be noticed at truncation time. We do want to continue to have the raft leader guide the truncation since we do not want either leader or followers to over-truncate, given our desire to serve snapshots from any replica. In the loosely coupled approach implemented here, the truncation request that flows through raft serves as an upper bound on what can be truncated. The truncation request includes an ExpectedFirstIndex. This is further propagated using ReplicatedEvalResult.RaftExpectedFirstIndex. This ExpectedFirstIndex allows one to notice gaps or overlaps when enacting a sequence of truncations, which results in setting the Replica.raftLogSizeTrusted to false. The correctness issue with Replica.raftLogSize is not fully addressed since there are existing consistency issues when evaluating a TruncateLogRequest (these are now noted in a code comment). Below raft, the truncation requests are queued onto a Replica in pendingLogTruncations. The queueing and dequeuing is managed by a raftLogTruncator that takes care of merging pending truncation requests and enacting the truncations when the durability of the state machine advances. The pending truncation requests are taken into account in the raftLogQueue when deciding whether to do another truncation. Most of the behavior of the raftLogQueue is unchanged. The new behavior is gated on a LooselyCoupledRaftLogTruncation cluster version. Additionally, the new behavior can be turned off using the kv.raft_log.enable_loosely_coupled_truncation.enabled cluster setting, which is true by default. The latter is expected to be a safety switch for 1 release after which we expect to remove it. That removal will also cleanup some duplicated code (that was non-trivial to refactor and share) between the previous coupled and new loosely coupled truncation. Note, this PR is the first of two -- loosely coupled truncation is turned off via a constant in this PR. The next one will eliminate the constant and put it under the control of the cluster setting. Informs #36262 Informs #16624 Release note (ops change): The cluster setting kv.raft_log.loosely_coupled_truncation.enabled can be used to disable loosely coupled truncation. 76358: sql: support partitioned hash sharded index r=chengxiong-ruan a=chengxiong-ruan Release note (sql change): Previously, crdb blocked users from creating hash sharded index in all kinds of partitioned tables including implict partitioned tables using `PARTITION ALL BY` or `REGIONAL BY ROW`. Now we turn on the support of hash sharded index in implicit partitioned tables. Which means primary key cannot be hash sharded if a table is explicitly partitioned with `PARTITION BY` or an index cannot be hash sharded if the index is explicitly partitioned with `PARTITION BY`. Paritioning columns cannot be placed explicitly as key columns of a hash sharded index as well, including regional-by-row table's `crdb_region` column. When a hash sharded index is partitioned, ranges are pre-split within every single possible partition on shard boundaries. Each partition is split up to 16 ranges, otherwise split into the number bucket count ranges. Co-authored-by: Andrii Vorobiov <and.vorobiov@gmail.com> Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com> Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com>
cockroachdb · Feb 22, 2022 · 67c8277 · 67c8277
4 parents 96d102a + a12895f + f9dee66 + e7caa94
commit 67c8277
Show file tree

Hide file tree

Showing 59 changed files with 6,697 additions and 273 deletions.
diff --git a/docs/generated/settings/settings-for-tenants.txt b/docs/generated/settings/settings-for-tenants.txt
@@ -181,4 +181,4 @@ trace.debug.enable	boolean	false	if set, traces for recent requests can be seen
 trace.jaeger.agent	string		the address of a Jaeger agent to receive traces using the Jaeger UDP Thrift protocol, as <host>:<port>. If no port is specified, 6381 will be used.
 trace.opentelemetry.collector	string		address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as <host>:<port>. If no port is specified, 4317 will be used.
 trace.zipkin.collector	string		the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.
-version	version	21.2-78	set the active cluster version in the format '<major>.<minor>'
+version	version	21.2-80	set the active cluster version in the format '<major>.<minor>'
diff --git a/docs/generated/settings/settings.html b/docs/generated/settings/settings.html
@@ -194,6 +194,6 @@
 <tr><td><code>trace.jaeger.agent</code></td><td>string</td><td><code></code></td><td>the address of a Jaeger agent to receive traces using the Jaeger UDP Thrift protocol, as <host>:<port>. If no port is specified, 6381 will be used.</td></tr>
 <tr><td><code>trace.opentelemetry.collector</code></td><td>string</td><td><code></code></td><td>address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as <host>:<port>. If no port is specified, 4317 will be used.</td></tr>
 <tr><td><code>trace.zipkin.collector</code></td><td>string</td><td><code></code></td><td>the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.</td></tr>
-<tr><td><code>version</code></td><td>version</td><td><code>21.2-78</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
+<tr><td><code>version</code></td><td>version</td><td><code>21.2-80</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
 </tbody>
 </table>
diff --git a/docs/generated/sql/bnf/stmt_block.bnf b/docs/generated/sql/bnf/stmt_block.bnf
@@ -1103,6 +1103,7 @@ unreserved_keyword ::=
 	| 'NOSQLLOGIN'
 	| 'NOVIEWACTIVITY'
 	| 'NOVIEWACTIVITYREDACTED'
+	| 'NOVIEWCLUSTERSETTING'
 	| 'NOWAIT'
 	| 'NULLS'
 	| 'IGNORE_FOREIGN_KEYS'
@@ -1270,6 +1271,7 @@ unreserved_keyword ::=
 	| 'VIEW'
 	| 'VIEWACTIVITY'
 	| 'VIEWACTIVITYREDACTED'
+	| 'VIEWCLUSTERSETTING'
 	| 'VISIBLE'
 	| 'VOTERS'
 	| 'WITHIN'
@@ -2497,6 +2499,8 @@ role_option ::=
 	| 'NOMODIFYCLUSTERSETTING'
 	| 'SQLLOGIN'
 	| 'NOSQLLOGIN'
+	| 'VIEWCLUSTERSETTING'
+	| 'NOVIEWCLUSTERSETTING'
 	| password_clause
 	| valid_until_clause
 

diff --git a/pkg/ccl/logictestccl/testdata/logic_test/partitioning_all_by_nothing b/pkg/ccl/logictestccl/testdata/logic_test/partitioning_all_by_nothing
@@ -3,7 +3,7 @@
 statement ok
 SET experimental_enable_implicit_column_partitioning = true
 
-statement error cannot define PARTITION BY on an unique constraint if the table has a PARTITION ALL BY definition
+statement error cannot define PARTITION BY on an index if the table is implicitly partitioned with PARTITION ALL BY or LOCALITY REGIONAL BY ROW definition
 CREATE TABLE partition_all_by_nothing_with_partition (
   pk INT PRIMARY KEY,
   a INT,

diff --git a/pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index b/pkg/ccl/logictestccl/testdata/logic_test/partitioning_hash_sharded_index
@@ -0,0 +1,287 @@
+# LogicTest: 5node
+
+statement ok
+SET experimental_enable_hash_sharded_indexes = true;
+
+statement ok
+SET experimental_enable_implicit_column_partitioning = true;
+
+statement ok
+CREATE TABLE t_hashed (
+  a INT PRIMARY KEY,
+  b STRING,
+  c INT,
+  INDEX idx_t_hashed_b_c (b, c) USING HASH
+);
+
+statement error cannot set explicit partitioning with ALTER INDEX PARTITION BY on a hash sharded index
+ALTER INDEX idx_t_hashed_b_c PARTITION BY LIST (b) (
+  PARTITION us_west VALUES IN (('seattle')),
+  PARTITION us_east VALUES IN (('new york'))
+);
+
+statement ok
+CREATE TABLE t_pk_hashed (
+  a STRING,
+  b INT,
+  PRIMARY KEY (a, b) USING HASH
+);
+
+statement error cannot set explicit partitioning with PARTITION BY on hash sharded primary key
+ALTER TABLE t_pk_hashed PARTITION BY LIST (b) (
+  PARTITION us_west VALUES IN (('seattle')),
+  PARTITION us_east VALUES IN (('new york'))
+);
+
+statement ok
+CREATE TABLE t_partition_all (
+  a INT PRIMARY KEY,
+  b STRING NOT NULL,
+  c INT
+) PARTITION ALL BY LIST (b) (
+   PARTITION us_west VALUES IN (('seattle')),
+   PARTITION us_east VALUES IN (('new york'))
+);
+
+statement error hash sharded indexes cannot include implicit partitioning columns from "PARTITION ALL BY" or "LOCALITY REGIONAL BY ROW"
+CREATE INDEX ON t_partition_all (b, c) USING HASH;
+
+statement error hash sharded indexes cannot include implicit partitioning columns from "PARTITION ALL BY" or "LOCALITY REGIONAL BY ROW"
+CREATE UNIQUE INDEX ON t_partition_all (b, c) USING HASH;
+
+statement error hash sharded indexes cannot include implicit partitioning columns from "PARTITION ALL BY" or "LOCALITY REGIONAL BY ROW"
+ALTER TABLE t_partition_all ALTER PRIMARY KEY USING COLUMNS (b) USING HASH;
+
+statement error hash sharded indexes cannot be explicitly partitioned
+CREATE TABLE t_pk_hashed_bad (
+  a STRING PRIMARY KEY USING HASH,
+  b INT
+) PARTITION BY LIST (a) (
+   PARTITION us_west VALUES IN (('seattle')),
+   PARTITION us_east VALUES IN (('new york'))
+);
+
+statement error hash sharded indexes cannot be explicitly partitioned
+CREATE TABLE t_pk_hashed_bad (
+  a STRING,
+  b INT,
+  PRIMARY KEY (a) USING HASH
+) PARTITION BY LIST (a) (
+   PARTITION us_west VALUES IN (('seattle')),
+   PARTITION us_east VALUES IN (('new york'))
+);
+
+statement error hash sharded indexes cannot be explicitly partitioned
+CREATE TABLE t_idx_hashed_bad (
+  a INT PRIMARY KEY,
+  b STRING,
+  c INT,
+  INDEX (b, c) USING HASH PARTITION BY LIST (b) (
+    PARTITION us_west VALUES IN (('seattle')),
+    PARTITION us_east VALUES IN (('new york'))
+  )
+);
+
+statement error hash sharded indexes cannot include implicit partitioning columns from "PARTITION ALL BY" or "LOCALITY REGIONAL BY ROW"
+CREATE TABLE t_idx_hashed_bad (
+  a INT PRIMARY KEY,
+  b STRING,
+  c INT,
+  INDEX (b, c) USING HASH
+) PARTITION ALL BY LIST (b) (
+   PARTITION us_west VALUES IN (('seattle')),
+   PARTITION us_east VALUES IN (('new york'))
+);
+
+statement ok
+CREATE TABLE t_to_be_hashed (
+  a INT PRIMARY KEY,
+  b STRING NOT NULL,
+  c INT,
+  FAMILY fam_0_a_b_c (a, b, c)
+) PARTITION ALL BY LIST (b) (
+   PARTITION us_west VALUES IN (('seattle')),
+   PARTITION us_east VALUES IN (('new york'))
+);
+
+query T
+SELECT @2 FROM [SHOW CREATE TABLE t_to_be_hashed];
+----
+CREATE TABLE public.t_to_be_hashed (
+  a INT8 NOT NULL,
+  b STRING NOT NULL,
+  c INT8 NULL,
+  CONSTRAINT t_to_be_hashed_pkey PRIMARY KEY (a ASC),
+  FAMILY fam_0_a_b_c (a, b, c)
+) PARTITION ALL BY LIST (b) (
+  PARTITION us_west VALUES IN (('seattle')),
+  PARTITION us_east VALUES IN (('new york'))
+)
+-- Warning: Partitioned table with no zone configurations.
+
+statement ok
+CREATE INDEX ON t_to_be_hashed (c) USING HASH;
+
+query T
+SELECT @2 FROM [SHOW CREATE TABLE t_to_be_hashed];
+----
+CREATE TABLE public.t_to_be_hashed (
+  a INT8 NOT NULL,
+  b STRING NOT NULL,
+  c INT8 NULL,
+  crdb_internal_c_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(c)), 16:::INT8)) VIRTUAL,
+  CONSTRAINT t_to_be_hashed_pkey PRIMARY KEY (a ASC),
+  INDEX t_to_be_hashed_c_idx (c ASC) USING HASH WITH (bucket_count=16),
+  FAMILY fam_0_a_b_c (a, b, c)
+) PARTITION ALL BY LIST (b) (
+  PARTITION us_west VALUES IN (('seattle')),
+  PARTITION us_east VALUES IN (('new york'))
+)
+-- Warning: Partitioned table with no zone configurations.
+
+statement ok
+CREATE UNIQUE INDEX ON t_to_be_hashed (c) USING HASH;
+
+query T
+SELECT @2 FROM [SHOW CREATE TABLE t_to_be_hashed];
+----
+CREATE TABLE public.t_to_be_hashed (
+  a INT8 NOT NULL,
+  b STRING NOT NULL,
+  c INT8 NULL,
+  crdb_internal_c_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(c)), 16:::INT8)) VIRTUAL,
+  CONSTRAINT t_to_be_hashed_pkey PRIMARY KEY (a ASC),
+  INDEX t_to_be_hashed_c_idx (c ASC) USING HASH WITH (bucket_count=16),
+  UNIQUE INDEX t_to_be_hashed_c_key (c ASC) USING HASH WITH (bucket_count=16),
+  FAMILY fam_0_a_b_c (a, b, c)
+) PARTITION ALL BY LIST (b) (
+  PARTITION us_west VALUES IN (('seattle')),
+  PARTITION us_east VALUES IN (('new york'))
+)
+-- Warning: Partitioned table with no zone configurations.
+
+statement ok
+ALTER TABLE t_to_be_hashed ALTER PRIMARY KEY USING COLUMNS (a) USING HASH;
+
+query T
+SELECT @2 FROM [SHOW CREATE TABLE t_to_be_hashed];
+----
+CREATE TABLE public.t_to_be_hashed (
+  a INT8 NOT NULL,
+  b STRING NOT NULL,
+  c INT8 NULL,
+  crdb_internal_c_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(c)), 16:::INT8)) VIRTUAL,
+  crdb_internal_a_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(a)), 16:::INT8)) VIRTUAL,
+  CONSTRAINT t_to_be_hashed_pkey PRIMARY KEY (a ASC) USING HASH WITH (bucket_count=16),
+  INDEX t_to_be_hashed_c_idx (c ASC) USING HASH WITH (bucket_count=16),
+  UNIQUE INDEX t_to_be_hashed_c_key (c ASC) USING HASH WITH (bucket_count=16),
+  FAMILY fam_0_a_b_c (a, b, c)
+) PARTITION ALL BY LIST (b) (
+  PARTITION us_west VALUES IN (('seattle')),
+  PARTITION us_east VALUES IN (('new york'))
+)
+-- Warning: Partitioned table with no zone configurations.
+
+statement ok
+CREATE TABLE t_idx_pk_hashed_1 (
+  a INT PRIMARY KEY USING HASH,
+  b STRING,
+  c INT,
+  INDEX (c) USING HASH,
+  FAMILY fam_0_a_b_c (a, b, c)
+) PARTITION ALL BY LIST (b) (
+   PARTITION us_west VALUES IN (('seattle')),
+   PARTITION us_east VALUES IN (('new york'))
+);
+
+query T
+SELECT @2 FROM [SHOW CREATE TABLE t_idx_pk_hashed_1];
+----
+CREATE TABLE public.t_idx_pk_hashed_1 (
+  crdb_internal_a_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(a)), 16:::INT8)) VIRTUAL,
+  a INT8 NOT NULL,
+  b STRING NOT NULL,
+  c INT8 NULL,
+  crdb_internal_c_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(c)), 16:::INT8)) VIRTUAL,
+  CONSTRAINT t_idx_pk_hashed_1_pkey PRIMARY KEY (a ASC) USING HASH WITH (bucket_count=16),
+  INDEX t_idx_pk_hashed_1_c_idx (c ASC) USING HASH WITH (bucket_count=16),
+  FAMILY fam_0_a_b_c (a, b, c)
+) PARTITION ALL BY LIST (b) (
+  PARTITION us_west VALUES IN (('seattle')),
+  PARTITION us_east VALUES IN (('new york'))
+)
+-- Warning: Partitioned table with no zone configurations.
+
+statement ok
+CREATE TABLE t_idx_pk_hashed_2 (
+  a INT,
+  b STRING,
+  c INT,
+  INDEX (c) USING HASH,
+  PRIMARY KEY (a) USING HASH,
+  FAMILY fam_0_a_b_c (a, b, c)
+) PARTITION ALL BY LIST (b) (
+   PARTITION us_west VALUES IN (('seattle')),
+   PARTITION us_east VALUES IN (('new york'))
+);
+
+query T
+SELECT @2 FROM [SHOW CREATE TABLE t_idx_pk_hashed_2];
+----
+CREATE TABLE public.t_idx_pk_hashed_2 (
+  a INT8 NOT NULL,
+  b STRING NOT NULL,
+  c INT8 NULL,
+  crdb_internal_c_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(c)), 16:::INT8)) VIRTUAL,
+  crdb_internal_a_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(a)), 16:::INT8)) VIRTUAL,
+  CONSTRAINT t_idx_pk_hashed_2_pkey PRIMARY KEY (a ASC) USING HASH WITH (bucket_count=16),
+  INDEX t_idx_pk_hashed_2_c_idx (c ASC) USING HASH WITH (bucket_count=16),
+  FAMILY fam_0_a_b_c (a, b, c)
+) PARTITION ALL BY LIST (b) (
+  PARTITION us_west VALUES IN (('seattle')),
+  PARTITION us_east VALUES IN (('new york'))
+)
+-- Warning: Partitioned table with no zone configurations.
+
+subtest test_presplit_with_partitioning
+
+statement ok
+CREATE TABLE t_presplit (
+  user_id INT PRIMARY KEY,
+  city STRING NOT NULL CHECK (city IN ('seattle', 'new york')),
+  member_id INT
+) PARTITION ALL BY LIST (city) (
+    PARTITION us_west VALUES IN (('seattle')),
+    PARTITION us_east VALUES IN (('new york'))
+);
+
+statement ok
+CREATE INDEX t_presplit_idx_member_id ON t_presplit (member_id) USING HASH WITH (bucket_count=8);
+
+skipif config 3node-tenant
+query TITTT colnames,retry
+SELECT t.name, r.table_id, r.index_name, r.start_pretty, r.end_pretty
+FROM crdb_internal.tables t
+JOIN crdb_internal.ranges r ON t.table_id = r.table_id
+WHERE t.name = 't_presplit'
+AND t.state = 'PUBLIC'
+AND r.split_enforced_until IS NOT NULL;
+----
+name        table_id  index_name                start_pretty               end_pretty
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2               /Table/116/2/"new york"/0
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"new york"/0  /Table/116/2/"new york"/1
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"new york"/1  /Table/116/2/"new york"/2
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"new york"/2  /Table/116/2/"new york"/3
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"new york"/3  /Table/116/2/"new york"/4
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"new york"/4  /Table/116/2/"new york"/5
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"new york"/5  /Table/116/2/"new york"/6
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"new york"/6  /Table/116/2/"new york"/7
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"new york"/7  /Table/116/2/"seattle"/0
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"seattle"/0   /Table/116/2/"seattle"/1
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"seattle"/1   /Table/116/2/"seattle"/2
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"seattle"/2   /Table/116/2/"seattle"/3
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"seattle"/3   /Table/116/2/"seattle"/4
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"seattle"/4   /Table/116/2/"seattle"/5
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"seattle"/5   /Table/116/2/"seattle"/6
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"seattle"/6   /Table/116/2/"seattle"/7
+t_presplit  116       t_presplit_idx_member_id  /Table/116/2/"seattle"/7   /Max