Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(batch): support system column _rw_timestamp for tables #19232

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

chenzl25
Copy link
Contributor

@chenzl25 chenzl25 commented Nov 1, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • Resolve Add system column rw_timestamp for any tables #11629
  • Support system column _rw_timestamp (datatype is timestamptz) for tables. We will add a hidden column _rw_timestamp to every table catalog when loading it from a proto, but we never persist this column info.
  • Only support selecting _rw_timestamp in a batch query. Using it in a streaming query will cause an error.
  • For batch batch queries, we add a field epoch_idx to the storage table which indicates the position where we should put the epoch into.
  • Since the state store get_row interface doesn't expose any epoch information, we will use the iter interface to support point get if _rw_timestamp is selected.
  • Lots of change is caused by planner tests, the core change is about 200+ LOC.

Example:

dev=> select *, _rw_timestamp from t;
 id | a |         _rw_timestamp
----+---+-------------------------------
  2 | 2 | 2024-11-05 07:05:24.488+00:00
  1 | 1 | 2024-11-05 07:05:19.487+00:00
(2 rows)

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

  • Support _rw_timestamp system column for tables. Users can use a batch query to select this column to check the internal epoch/timestamp of each row which is useful when you want to know when the rows have been updated recently.

@chenzl25 chenzl25 added the user-facing-changes Contains changes that are visible to users label Nov 1, 2024
@graphite-app graphite-app bot requested a review from a team November 1, 2024 10:16
src/frontend/src/binder/update.rs Show resolved Hide resolved
Comment on lines +373 to +374
// `get_row` doesn't support select `_rw_timestamp` yet.
assert!(self.epoch_idx.is_none());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extend the get interface to also return the user key (thus epoch)?

Copy link
Contributor Author

@chenzl25 chenzl25 Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We do have a plan to add an extended interface (get_with_epoch) to return the epoch from the storage. Let's leave it for a later PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #19345

src/frontend/src/catalog/table_catalog.rs Outdated Show resolved Hide resolved
Comment on lines +405 to +421
.batch_chunk_iter_with_pk_bounds(
epoch.into(),
&pk_prefix,
range_bounds,
false,
1,
PrefetchOptions::new(false, false),
)
.await?;
pin_mut!(iter);
let chunk = iter.next().await.transpose().map_err(BatchError::from)?;
if let Some(chunk) = chunk {
let row = chunk.row_at(0).0.to_owned_row();
Ok(Some(row))
} else {
Ok(None)
}
Copy link
Contributor

@kwannoel kwannoel Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring this into a separate method like get_row_with_rw_timestamp, and calling it here seems more readable.

Copy link
Contributor

@st1page st1page left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of change is caused by planner tests, the core change is about 200+ LOC.

Why _rw_timestamp must appear in the LogicalScan. Can we prune it in the LogicalScan?

@chenzl25
Copy link
Contributor Author

chenzl25 commented Nov 4, 2024

Lots of change is caused by planner tests, the core change is about 200+ LOC.

Why _rw_timestamp must appear in the LogicalScan. Can we prune it in the LogicalScan?

After column pruning, this column will be pruned.

@st1page
Copy link
Contributor

st1page commented Nov 4, 2024

Lots of change is caused by planner tests, the core change is about 200+ LOC.

Why _rw_timestamp must appear in the LogicalScan. Can we prune it in the LogicalScan?

After column pruning, this column will be pruned.

Ohhh SORRY for misreading

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for MVs? (If not, do we have relate issue for that)

@chenzl25
Copy link
Contributor Author

chenzl25 commented Nov 8, 2024

Does this work for MVs? (If not, do we have relate issue for that)

Yes, it works for MVs.

Comment on lines +526 to +527
"selecting `_rw_timestamp` in a streaming query is not allowed".to_string(),
"please run the sql in batch mode or remove the column `_rw_timestamp` from the streaming query".to_string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's theoretically feasible, right? For example, use the epoch of the persisted entry for historical data, and the current epoch for incremental data.

Do we plan to support it in the future? If so, can we open an issue for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the current epoch for incremental data.

Yes, it is theoretically possible. We need to fetch the old epoch for delate mesage from the storage as well. Whethr we plan to support it in the future is depended on the requirement. If just for debugging purpose, I think batch query is enough, and it won't be invasive to the streaming proto.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem here is the same as when we try to specify the proctime in the streaming query, which generates inconsistent delete messages. We don't know what the previous time value was when processing the delete message

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know what the previous time value was when processing the delete message

Makes sense to me. Some ideas:

  • For append-only table / materialized views, there's no confusion and it sounds feasible.
  • For table with on-conflict checks, since we already have to lookup the storage for every deletes, we can obtain the deleted timestamp at the same time.

src/common/src/catalog/column.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add system column rw_timestamp for any tables
6 participants