Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfcs: new RFC on fixing the halloween problem #42864

Merged
merged 1 commit into from
Dec 3, 2019

Conversation

knz
Copy link
Contributor

@knz knz commented Nov 29, 2019

I am splitting this RFC away from the RFC on savepoints because I know recognize it is a different problem with an orthogonal solution.

https://github.com/knz/cockroach/blob/20191129-rfc-halloween/docs/RFCS/20191014_halloween.md

There is no new idea in this text (I copy-pasted the text from the other RFC) and the implementation PRs are already out, so if the reviewers are satisfied I would gladly already merge the RFC as in-progress or completed.

@knz knz requested a review from a team as a code owner November 29, 2019 14:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde
Copy link
Member

Well written, LGTM. @jordanlewis make sure you take a look, we'll need to figure out a clean way to integrate this with the new FK stuff.

@knz
Copy link
Contributor Author

knz commented Dec 2, 2019

TFYR!

bors r+

craig bot pushed a commit that referenced this pull request Dec 3, 2019
42668: sql: add precision support for TIME/TIMETZ r=otan a=otan

Refs: #32565
Refs: #26097 

Adding support for precision for both TIME and TIMETZ. This also
includes threading through some extra parsing syntax for TIMETZ.

ALTER TABLE between TIME and TIMETZ not supported as they have different
representations.

Release note (sql change): This PR adds new support for precision for
TIME types (e.g. TIME(3) will truncate to milliseconds). Previously this
would raise syntax errors.

42864: rfcs: new RFC on fixing the halloween problem r=knz a=knz

I am splitting this RFC away from the [RFC on savepoints](#41569) because I know recognize it is a different problem with an orthogonal solution.

https://github.com/knz/cockroach/blob/20191129-rfc-halloween/docs/RFCS/20191014_halloween.md

There is no new idea in this text (I copy-pasted the text from the other RFC) and the implementation PRs are already out, so if the reviewers are satisfied I would gladly already merge the RFC as in-progress or completed.

Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
@craig
Copy link
Contributor

craig bot commented Dec 3, 2019

Build succeeded

@craig craig bot merged commit 2f4446b into cockroachdb:master Dec 3, 2019
@nvanbenschoten
Copy link
Member

LGTM. Short and sweet.

knz added a commit to knz/cockroach that referenced this pull request Jan 6, 2020
Previously, all individual KV reads performed by a SQL statement were
able to observe the most recent KV writes that it performed itself.

This is in violation of PostgreSQL's dialect semantics, which mandate
that statements can only observe data as per a read snapshot taken at
the instant a statement begins execution.

Moreover, this invalid behavior causes a real observable bug: a
statement that reads and writes to the same table may never complete,
as the read part may become able to consume the rows that it itself
writes. Or worse, it could cause logical operations to be performed
multiple times: https://en.wikipedia.org/wiki/Halloween_Problem

This patch fixes it (partially) by exploiting the new KV `Step()` API
which decouples the read and write sequence numbers.

The fix is not complete however; additional sequence points must also
be introduced prior to FK existence checks and cascading actions. See
[cockroachdb#42864](cockroachdb#42864) and
[cockroachdb#33475](cockroachdb#33475) for
details.

For now, this patch excludes any mutation that 1) involves a foreign
key and 2) does not uyse the new CBO-driven FK logic, from the
new (fixed) semantics. When a mutation involves a FK without CBO
involvement, the previous (broken) semantics still apply.

Release note (bug fix): SQL mutation statements that target tables
with no foreign key relationships now correctly read data as per the
state of the database when the statement started execution. This is
required for compatibility with PostgreSQL and to ensure deterministic
behavior when certain operations are parallelized. Prior to this fix,
a statement [could incorrectly operate multiple
times](https://en.wikipedia.org/wiki/Halloween_Problem) on data that
itself was writing, and potentially never terminate. This fix is
limited to tables without FK relationships, and for certain operations
on tables with FK relationships; in other cases, the fix is not
active and the bug is still present. A full fix will be provided
in a later release.
craig bot pushed a commit that referenced this pull request Jan 6, 2020
42854: roachpb,kv: new TxnCoordSender API for step-wise execution r=knz a=knz

Supports #42864.

This patch introduces the following methods on `client.TxnSender`:

```go
	// Step creates a sequencing point in the current transaction. A
	// sequencing point establishes a snapshot baseline for subsequent
	// read operations: until the next sequencing point, read operations
	// observe the data at the time the snapshot was established and
	// ignore writes performed since.
	//
	// Before the first step is taken, the transaction operates as if
	// there was a step after every write: each read to a key is able to
	// see the latest write before it. This makes the step behavior
	// opt-in and backward-compatible with existing code which does not
	// need it.
	// The method is idempotent.
	Step() error

	// DisableStepping disables the sequencing point behavior and
	// ensures that every read can read the latest write. The
	// effect remains disabled until the next call to Step().
	// The method is idempotent.
	DisableStepping() error
```

Additionally it implements it in the `TxnCoordSender`.

`Step()` is the most important and will be used to introduce sequence
points between SQL statements.

`DisableStepping()` is a convenience function, so as to avoid
introducing many `Step()` calls in the code that performs schema
changes.

Release note: none

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
knz added a commit to knz/cockroach that referenced this pull request Jan 15, 2020
Previously, all individual KV reads performed by a SQL statement were
able to observe the most recent KV writes that it performed itself.

This is in violation of PostgreSQL's dialect semantics, which mandate
that statements can only observe data as per a read snapshot taken at
the instant a statement begins execution.

Moreover, this invalid behavior causes a real observable bug: a
statement that reads and writes to the same table may never complete,
as the read part may become able to consume the rows that it itself
writes. Or worse, it could cause logical operations to be performed
multiple times: https://en.wikipedia.org/wiki/Halloween_Problem

This patch fixes it (partially) by exploiting the new KV `Step()` API
which decouples the read and write sequence numbers.

The fix is not complete however; additional sequence points must also
be introduced prior to FK existence checks and cascading actions. See
[cockroachdb#42864](cockroachdb#42864) and
[cockroachdb#33475](cockroachdb#33475) for
details.

For now, this patch excludes any mutation that 1) involves a foreign
key and 2) does not uyse the new CBO-driven FK logic, from the
new (fixed) semantics. When a mutation involves a FK without CBO
involvement, the previous (broken) semantics still apply.

Release note (bug fix): SQL mutation statements that target tables
with no foreign key relationships now correctly read data as per the
state of the database when the statement started execution. This is
required for compatibility with PostgreSQL and to ensure deterministic
behavior when certain operations are parallelized. Prior to this fix,
a statement [could incorrectly operate multiple
times](https://en.wikipedia.org/wiki/Halloween_Problem) on data that
itself was writing, and potentially never terminate. This fix is
limited to tables without FK relationships, and for certain operations
on tables with FK relationships; in other cases, the fix is not
active and the bug is still present. A full fix will be provided
in a later release.
knz added a commit to knz/cockroach that referenced this pull request Jan 20, 2020
Previously, all individual KV reads performed by a SQL statement were
able to observe the most recent KV writes that it performed itself.

This is in violation of PostgreSQL's dialect semantics, which mandate
that statements can only observe data as per a read snapshot taken at
the instant a statement begins execution.

Moreover, this invalid behavior causes a real observable bug: a
statement that reads and writes to the same table may never complete,
as the read part may become able to consume the rows that it itself
writes. Or worse, it could cause logical operations to be performed
multiple times: https://en.wikipedia.org/wiki/Halloween_Problem

This patch fixes it (partially) by exploiting the new KV `Step()` API
which decouples the read and write sequence numbers.

The fix is not complete however; additional sequence points must also
be introduced prior to FK existence checks and cascading actions. See
[cockroachdb#42864](cockroachdb#42864) and
[cockroachdb#33475](cockroachdb#33475) for
details.

For now, this patch excludes any mutation that 1) involves a foreign
key and 2) does not uyse the new CBO-driven FK logic, from the
new (fixed) semantics. When a mutation involves a FK without CBO
involvement, the previous (broken) semantics still apply.

Release note (bug fix): SQL mutation statements that target tables
with no foreign key relationships now correctly read data as per the
state of the database when the statement started execution. This is
required for compatibility with PostgreSQL and to ensure deterministic
behavior when certain operations are parallelized. Prior to this fix,
a statement [could incorrectly operate multiple
times](https://en.wikipedia.org/wiki/Halloween_Problem) on data that
itself was writing, and potentially never terminate. This fix is
limited to tables without FK relationships, and for certain operations
on tables with FK relationships; in other cases, the fix is not
active and the bug is still present. A full fix will be provided
in a later release.
knz added a commit to knz/cockroach that referenced this pull request Jan 20, 2020
Previously, all individual KV reads performed by a SQL statement were
able to observe the most recent KV writes that it performed itself.

This is in violation of PostgreSQL's dialect semantics, which mandate
that statements can only observe data as per a read snapshot taken at
the instant a statement begins execution.

Moreover, this invalid behavior causes a real observable bug: a
statement that reads and writes to the same table may never complete,
as the read part may become able to consume the rows that it itself
writes. Or worse, it could cause logical operations to be performed
multiple times: https://en.wikipedia.org/wiki/Halloween_Problem

This patch fixes it (partially) by exploiting the new KV `Step()` API
which decouples the read and write sequence numbers.

The fix is not complete however; additional sequence points must also
be introduced prior to FK existence checks and cascading actions. See
[cockroachdb#42864](cockroachdb#42864) and
[cockroachdb#33475](cockroachdb#33475) for
details.

For now, this patch excludes any mutation that 1) involves a foreign
key and 2) does not uyse the new CBO-driven FK logic, from the
new (fixed) semantics. When a mutation involves a FK without CBO
involvement, the previous (broken) semantics still apply.

Release note (bug fix): SQL mutation statements that target tables
with no foreign key relationships now correctly read data as per the
state of the database when the statement started execution. This is
required for compatibility with PostgreSQL and to ensure deterministic
behavior when certain operations are parallelized. Prior to this fix,
a statement [could incorrectly operate multiple
times](https://en.wikipedia.org/wiki/Halloween_Problem) on data that
itself was writing, and potentially never terminate. This fix is
limited to tables without FK relationships, and for certain operations
on tables with FK relationships; in other cases, the fix is not
active and the bug is still present. A full fix will be provided
in a later release.
craig bot pushed a commit that referenced this pull request Jan 20, 2020
42862: sql: make SQL statements operate on a read snapshot r=andreimatei a=knz

Fixes #33473.
Fixes #28842.
Informs #41569 and #42864.

Previously, all individual KV reads performed by a SQL statement were
able to observe the most recent KV writes that it performed itself.

This is in violation of PostgreSQL's dialect semantics, which mandate
that statements can only observe data as per a read snapshot taken at
the instant a statement begins execution.

Moreover, this invalid behavior causes a real observable bug: a
statement that reads and writes to the same table may never complete,
as the read part may become able to consume the rows that it itself
writes. Or worse, it could cause logical operations to be performed
multiple times: https://en.wikipedia.org/wiki/Halloween_Problem

This patch fixes it by exploiting the new KV `Step()` API which
decouples the read and write sequence numbers.

The fix is not complete however; additional sequence points must also
be introduced prior to FK existence checks and cascading actions. See
#42864 and #33475 for details.

For now, this patch excludes any mutation that involves a foreign key
from the new (fixed) semantics. When a mutation involves a FK, the
previous (broken) semantics still apply.

Release note (bug fix): SQL mutation statements that target tables
with no foreign key relationships now correctly read data as per the
state of the database when the statement started execution. This is
required for compatibility with PostgreSQL and to ensure deterministic
behavior when certain operations are parallelized. Prior to this fix,
a statement [could incorrectly operate multiple
times](https://en.wikipedia.org/wiki/Halloween_Problem) on data that
itself was writing, and potentially never terminate. This fix is
limited to tables without FK relationships, and for certain operations
on tables with FK relationships; in other cases, the fix is not
active and the bug is still present. A full fix will be provided
in a later release.


Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants