Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
76012: server, sql: add VIEWCLUSTERSETTING user privilege r=koorosh a=koorosh

Before, only users with `admin` role or `MODIFYCLUSTERSETTING`
permission could view cluster settings.
Now, new role is added to provide users view-only permission
to view cluster settings from SQL shell and in Db Console (in
Advanced debugging > Cluster settings).
This change doesn't change behavior for `MODIFYCLUSTERSETTING`
option, it also allows view and modify cluster settings.

Release note (sql change): new user privileges are added: `VIEWCLUSTERSETTING`
and `NOVIEWCLUSTERSETTING` that allows users to view cluster settings
only.

Resolves: #74692

76215: kvserver: loosely couple raft log truncation r=tbg a=sumeerbhola

In the ReplicasStorage design we stop making any assumptions
regarding what is durable in the state machine when syncing a batch
that commits changes to the raft log. This implies the need to
make raft log truncation more loosely coupled than it is now, since
we can truncate only when certain that the state machine is durable
up to the truncation index.

Current raft log truncation flows through raft and even though the
RaftTruncatedStateKey is not a replicated key, it is coupled in
the sense that the truncation is done below raft when processing
the corresponding log entry (that asked for truncation to be done).

The current setup also has correctness issues wrt maintaining the
raft log size, when passing the delta bytes for a truncation. We
compute the delta at proposal time (to avoid repeating iteration over
the entries in all replicas), but we do not pass the first index
corresponding to the truncation, so gaps or overlaps cannot be
noticed at truncation time.

We do want to continue to have the raft leader guide the truncation
since we do not want either leader or followers to over-truncate,
given our desire to serve snapshots from any replica. In the loosely
coupled approach implemented here, the truncation request that flows
through raft serves as an upper bound on what can be truncated.

The truncation request includes an ExpectedFirstIndex. This is
further propagated using ReplicatedEvalResult.RaftExpectedFirstIndex.
This ExpectedFirstIndex allows one to notice gaps or overlaps when
enacting a sequence of truncations, which results in setting the
Replica.raftLogSizeTrusted to false. The correctness issue with
Replica.raftLogSize is not fully addressed since there are existing
consistency issues when evaluating a TruncateLogRequest (these
are now noted in a code comment).

Below raft, the truncation requests are queued onto a Replica
in pendingLogTruncations. The queueing and dequeuing is managed
by a raftLogTruncator that takes care of merging pending truncation
requests and enacting the truncations when the durability of the
state machine advances.

The pending truncation requests are taken into account in the
raftLogQueue when deciding whether to do another truncation.
Most of the behavior of the raftLogQueue is unchanged.

The new behavior is gated on a LooselyCoupledRaftLogTruncation
cluster version. Additionally, the new behavior can be turned
off using the kv.raft_log.enable_loosely_coupled_truncation.enabled
cluster setting, which is true by default. The latter is expected
to be a safety switch for 1 release after which we expect to
remove it. That removal will also cleanup some duplicated code
(that was non-trivial to refactor and share) between the previous
coupled and new loosely coupled truncation.

Note, this PR is the first of two -- loosely coupled truncation
is turned off via a constant in this PR. The next one will
eliminate the constant and put it under the control of the cluster
setting.
 
Informs #36262
Informs #16624

Release note (ops change): The cluster setting
kv.raft_log.loosely_coupled_truncation.enabled can be used
to disable loosely coupled truncation.



76358: sql: support partitioned hash sharded index r=chengxiong-ruan a=chengxiong-ruan

Release note (sql change): Previously, crdb blocked users from creating
hash sharded index in all kinds of partitioned tables including implict
partitioned tables using `PARTITION ALL BY` or `REGIONAL BY ROW`. Now
we turn on the support of hash sharded index in implicit partitioned
tables. Which means primary key cannot be hash sharded if a table is
explicitly partitioned with `PARTITION BY` or an index cannot be hash
sharded if the index is explicitly partitioned with `PARTITION BY`.
Paritioning columns cannot be placed explicitly as key columns of a
hash sharded index as well, including regional-by-row table's `crdb_region`
column. When a hash sharded index is partitioned, ranges are pre-split 
within every single possible partition on shard boundaries. Each partition
is split up to 16 ranges, otherwise split into the number bucket count ranges.

Co-authored-by: Andrii Vorobiov <and.vorobiov@gmail.com>
Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com>
  • Loading branch information
4 people committed Feb 22, 2022
4 parents 96d102a + a12895f + f9dee66 + e7caa94 commit 67c8277
Show file tree
Hide file tree
Showing 59 changed files with 6,697 additions and 273 deletions.
2 changes: 1 addition & 1 deletion docs/generated/settings/settings-for-tenants.txt
Original file line number Diff line number Diff line change
Expand Up @@ -181,4 +181,4 @@ trace.debug.enable boolean false if set, traces for recent requests can be seen
trace.jaeger.agent string the address of a Jaeger agent to receive traces using the Jaeger UDP Thrift protocol, as <host>:<port>. If no port is specified, 6381 will be used.
trace.opentelemetry.collector string address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as <host>:<port>. If no port is specified, 4317 will be used.
trace.zipkin.collector string the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.
version version 21.2-78 set the active cluster version in the format '<major>.<minor>'
version version 21.2-80 set the active cluster version in the format '<major>.<minor>'
2 changes: 1 addition & 1 deletion docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,6 @@
<tr><td><code>trace.jaeger.agent</code></td><td>string</td><td><code></code></td><td>the address of a Jaeger agent to receive traces using the Jaeger UDP Thrift protocol, as <host>:<port>. If no port is specified, 6381 will be used.</td></tr>
<tr><td><code>trace.opentelemetry.collector</code></td><td>string</td><td><code></code></td><td>address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as <host>:<port>. If no port is specified, 4317 will be used.</td></tr>
<tr><td><code>trace.zipkin.collector</code></td><td>string</td><td><code></code></td><td>the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.</td></tr>
<tr><td><code>version</code></td><td>version</td><td><code>21.2-78</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
<tr><td><code>version</code></td><td>version</td><td><code>21.2-80</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
</tbody>
</table>
4 changes: 4 additions & 0 deletions docs/generated/sql/bnf/stmt_block.bnf
Original file line number Diff line number Diff line change
Expand Up @@ -1103,6 +1103,7 @@ unreserved_keyword ::=
| 'NOSQLLOGIN'
| 'NOVIEWACTIVITY'
| 'NOVIEWACTIVITYREDACTED'
| 'NOVIEWCLUSTERSETTING'
| 'NOWAIT'
| 'NULLS'
| 'IGNORE_FOREIGN_KEYS'
Expand Down Expand Up @@ -1270,6 +1271,7 @@ unreserved_keyword ::=
| 'VIEW'
| 'VIEWACTIVITY'
| 'VIEWACTIVITYREDACTED'
| 'VIEWCLUSTERSETTING'
| 'VISIBLE'
| 'VOTERS'
| 'WITHIN'
Expand Down Expand Up @@ -2497,6 +2499,8 @@ role_option ::=
| 'NOMODIFYCLUSTERSETTING'
| 'SQLLOGIN'
| 'NOSQLLOGIN'
| 'VIEWCLUSTERSETTING'
| 'NOVIEWCLUSTERSETTING'
| password_clause
| valid_until_clause

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
statement ok
SET experimental_enable_implicit_column_partitioning = true

statement error cannot define PARTITION BY on an unique constraint if the table has a PARTITION ALL BY definition
statement error cannot define PARTITION BY on an index if the table is implicitly partitioned with PARTITION ALL BY or LOCALITY REGIONAL BY ROW definition
CREATE TABLE partition_all_by_nothing_with_partition (
pk INT PRIMARY KEY,
a INT,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,287 @@
# LogicTest: 5node

statement ok
SET experimental_enable_hash_sharded_indexes = true;

statement ok
SET experimental_enable_implicit_column_partitioning = true;

statement ok
CREATE TABLE t_hashed (
a INT PRIMARY KEY,
b STRING,
c INT,
INDEX idx_t_hashed_b_c (b, c) USING HASH
);

statement error cannot set explicit partitioning with ALTER INDEX PARTITION BY on a hash sharded index
ALTER INDEX idx_t_hashed_b_c PARTITION BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
);

statement ok
CREATE TABLE t_pk_hashed (
a STRING,
b INT,
PRIMARY KEY (a, b) USING HASH
);

statement error cannot set explicit partitioning with PARTITION BY on hash sharded primary key
ALTER TABLE t_pk_hashed PARTITION BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
);

statement ok
CREATE TABLE t_partition_all (
a INT PRIMARY KEY,
b STRING NOT NULL,
c INT
) PARTITION ALL BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
);

statement error hash sharded indexes cannot include implicit partitioning columns from "PARTITION ALL BY" or "LOCALITY REGIONAL BY ROW"
CREATE INDEX ON t_partition_all (b, c) USING HASH;

statement error hash sharded indexes cannot include implicit partitioning columns from "PARTITION ALL BY" or "LOCALITY REGIONAL BY ROW"
CREATE UNIQUE INDEX ON t_partition_all (b, c) USING HASH;

statement error hash sharded indexes cannot include implicit partitioning columns from "PARTITION ALL BY" or "LOCALITY REGIONAL BY ROW"
ALTER TABLE t_partition_all ALTER PRIMARY KEY USING COLUMNS (b) USING HASH;

statement error hash sharded indexes cannot be explicitly partitioned
CREATE TABLE t_pk_hashed_bad (
a STRING PRIMARY KEY USING HASH,
b INT
) PARTITION BY LIST (a) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
);

statement error hash sharded indexes cannot be explicitly partitioned
CREATE TABLE t_pk_hashed_bad (
a STRING,
b INT,
PRIMARY KEY (a) USING HASH
) PARTITION BY LIST (a) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
);

statement error hash sharded indexes cannot be explicitly partitioned
CREATE TABLE t_idx_hashed_bad (
a INT PRIMARY KEY,
b STRING,
c INT,
INDEX (b, c) USING HASH PARTITION BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
)
);

statement error hash sharded indexes cannot include implicit partitioning columns from "PARTITION ALL BY" or "LOCALITY REGIONAL BY ROW"
CREATE TABLE t_idx_hashed_bad (
a INT PRIMARY KEY,
b STRING,
c INT,
INDEX (b, c) USING HASH
) PARTITION ALL BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
);

statement ok
CREATE TABLE t_to_be_hashed (
a INT PRIMARY KEY,
b STRING NOT NULL,
c INT,
FAMILY fam_0_a_b_c (a, b, c)
) PARTITION ALL BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
);

query T
SELECT @2 FROM [SHOW CREATE TABLE t_to_be_hashed];
----
CREATE TABLE public.t_to_be_hashed (
a INT8 NOT NULL,
b STRING NOT NULL,
c INT8 NULL,
CONSTRAINT t_to_be_hashed_pkey PRIMARY KEY (a ASC),
FAMILY fam_0_a_b_c (a, b, c)
) PARTITION ALL BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
)
-- Warning: Partitioned table with no zone configurations.

statement ok
CREATE INDEX ON t_to_be_hashed (c) USING HASH;

query T
SELECT @2 FROM [SHOW CREATE TABLE t_to_be_hashed];
----
CREATE TABLE public.t_to_be_hashed (
a INT8 NOT NULL,
b STRING NOT NULL,
c INT8 NULL,
crdb_internal_c_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(c)), 16:::INT8)) VIRTUAL,
CONSTRAINT t_to_be_hashed_pkey PRIMARY KEY (a ASC),
INDEX t_to_be_hashed_c_idx (c ASC) USING HASH WITH (bucket_count=16),
FAMILY fam_0_a_b_c (a, b, c)
) PARTITION ALL BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
)
-- Warning: Partitioned table with no zone configurations.

statement ok
CREATE UNIQUE INDEX ON t_to_be_hashed (c) USING HASH;

query T
SELECT @2 FROM [SHOW CREATE TABLE t_to_be_hashed];
----
CREATE TABLE public.t_to_be_hashed (
a INT8 NOT NULL,
b STRING NOT NULL,
c INT8 NULL,
crdb_internal_c_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(c)), 16:::INT8)) VIRTUAL,
CONSTRAINT t_to_be_hashed_pkey PRIMARY KEY (a ASC),
INDEX t_to_be_hashed_c_idx (c ASC) USING HASH WITH (bucket_count=16),
UNIQUE INDEX t_to_be_hashed_c_key (c ASC) USING HASH WITH (bucket_count=16),
FAMILY fam_0_a_b_c (a, b, c)
) PARTITION ALL BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
)
-- Warning: Partitioned table with no zone configurations.

statement ok
ALTER TABLE t_to_be_hashed ALTER PRIMARY KEY USING COLUMNS (a) USING HASH;

query T
SELECT @2 FROM [SHOW CREATE TABLE t_to_be_hashed];
----
CREATE TABLE public.t_to_be_hashed (
a INT8 NOT NULL,
b STRING NOT NULL,
c INT8 NULL,
crdb_internal_c_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(c)), 16:::INT8)) VIRTUAL,
crdb_internal_a_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(a)), 16:::INT8)) VIRTUAL,
CONSTRAINT t_to_be_hashed_pkey PRIMARY KEY (a ASC) USING HASH WITH (bucket_count=16),
INDEX t_to_be_hashed_c_idx (c ASC) USING HASH WITH (bucket_count=16),
UNIQUE INDEX t_to_be_hashed_c_key (c ASC) USING HASH WITH (bucket_count=16),
FAMILY fam_0_a_b_c (a, b, c)
) PARTITION ALL BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
)
-- Warning: Partitioned table with no zone configurations.

statement ok
CREATE TABLE t_idx_pk_hashed_1 (
a INT PRIMARY KEY USING HASH,
b STRING,
c INT,
INDEX (c) USING HASH,
FAMILY fam_0_a_b_c (a, b, c)
) PARTITION ALL BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
);

query T
SELECT @2 FROM [SHOW CREATE TABLE t_idx_pk_hashed_1];
----
CREATE TABLE public.t_idx_pk_hashed_1 (
crdb_internal_a_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(a)), 16:::INT8)) VIRTUAL,
a INT8 NOT NULL,
b STRING NOT NULL,
c INT8 NULL,
crdb_internal_c_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(c)), 16:::INT8)) VIRTUAL,
CONSTRAINT t_idx_pk_hashed_1_pkey PRIMARY KEY (a ASC) USING HASH WITH (bucket_count=16),
INDEX t_idx_pk_hashed_1_c_idx (c ASC) USING HASH WITH (bucket_count=16),
FAMILY fam_0_a_b_c (a, b, c)
) PARTITION ALL BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
)
-- Warning: Partitioned table with no zone configurations.

statement ok
CREATE TABLE t_idx_pk_hashed_2 (
a INT,
b STRING,
c INT,
INDEX (c) USING HASH,
PRIMARY KEY (a) USING HASH,
FAMILY fam_0_a_b_c (a, b, c)
) PARTITION ALL BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
);

query T
SELECT @2 FROM [SHOW CREATE TABLE t_idx_pk_hashed_2];
----
CREATE TABLE public.t_idx_pk_hashed_2 (
a INT8 NOT NULL,
b STRING NOT NULL,
c INT8 NULL,
crdb_internal_c_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(c)), 16:::INT8)) VIRTUAL,
crdb_internal_a_shard_16 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(a)), 16:::INT8)) VIRTUAL,
CONSTRAINT t_idx_pk_hashed_2_pkey PRIMARY KEY (a ASC) USING HASH WITH (bucket_count=16),
INDEX t_idx_pk_hashed_2_c_idx (c ASC) USING HASH WITH (bucket_count=16),
FAMILY fam_0_a_b_c (a, b, c)
) PARTITION ALL BY LIST (b) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
)
-- Warning: Partitioned table with no zone configurations.

subtest test_presplit_with_partitioning

statement ok
CREATE TABLE t_presplit (
user_id INT PRIMARY KEY,
city STRING NOT NULL CHECK (city IN ('seattle', 'new york')),
member_id INT
) PARTITION ALL BY LIST (city) (
PARTITION us_west VALUES IN (('seattle')),
PARTITION us_east VALUES IN (('new york'))
);

statement ok
CREATE INDEX t_presplit_idx_member_id ON t_presplit (member_id) USING HASH WITH (bucket_count=8);

skipif config 3node-tenant
query TITTT colnames,retry
SELECT t.name, r.table_id, r.index_name, r.start_pretty, r.end_pretty
FROM crdb_internal.tables t
JOIN crdb_internal.ranges r ON t.table_id = r.table_id
WHERE t.name = 't_presplit'
AND t.state = 'PUBLIC'
AND r.split_enforced_until IS NOT NULL;
----
name table_id index_name start_pretty end_pretty
t_presplit 116 t_presplit_idx_member_id /Table/116/2 /Table/116/2/"new york"/0
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"new york"/0 /Table/116/2/"new york"/1
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"new york"/1 /Table/116/2/"new york"/2
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"new york"/2 /Table/116/2/"new york"/3
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"new york"/3 /Table/116/2/"new york"/4
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"new york"/4 /Table/116/2/"new york"/5
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"new york"/5 /Table/116/2/"new york"/6
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"new york"/6 /Table/116/2/"new york"/7
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"new york"/7 /Table/116/2/"seattle"/0
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"seattle"/0 /Table/116/2/"seattle"/1
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"seattle"/1 /Table/116/2/"seattle"/2
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"seattle"/2 /Table/116/2/"seattle"/3
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"seattle"/3 /Table/116/2/"seattle"/4
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"seattle"/4 /Table/116/2/"seattle"/5
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"seattle"/5 /Table/116/2/"seattle"/6
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"seattle"/6 /Table/116/2/"seattle"/7
t_presplit 116 t_presplit_idx_member_id /Table/116/2/"seattle"/7 /Max
Loading

0 comments on commit 67c8277

Please sign in to comment.