-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: Revert Command #16294
rfc: Revert Command #16294
Conversation
a512ba8
to
25b4274
Compare
Review status: 0 of 2 files reviewed at latest revision, 17 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 19 at r1 (raw file):
s/older/newer/ docs/RFCS/fast_delete_range.md, line 23 at r1 (raw file):
We have generally switched to the word "span" for this; it may help to replace lowercase "range" with "span" in this doc. The This paragraph should probably go up in the summary or in a new "Background" section. docs/RFCS/fast_delete_range.md, line 32 at r1 (raw file):
This paragraph is unclear - it says these two (three?) statements are similar (to each other, apparently) then immediately starts listing differences without ever saying what they have in common. I think it's enough for the purposes of the motivation section to say that DROP and TRUNCATE use the DeleteRange command, which makes them extremely slow. Maybe also mention that DeleteRange has gotten much slower as a consequence of propEvalKV, since it must write all affected keys into the raft log. docs/RFCS/fast_delete_range.md, line 68 at r1 (raw file):
s/DATABASE/SYSTEM/ docs/RFCS/fast_delete_range.md, line 93 at r1 (raw file):
There are some potential edge cases here: we automatically split at table boundaries, but most sql range operations run over index spans. That is, we have a Range with StartKey docs/RFCS/fast_delete_range.md, line 111 at r1 (raw file):
What sort of overlaps are allowed between Tombstones? Full deletions (which always start with EndTimestamp zero, I assume?) must be allowed to overlap previous point-in-time recoveries. But you must not be allowed to recover into a timestamp range that was affected by another recovery or deletion. It was not obvious to me on my first read of this RFC that docs/RFCS/fast_delete_range.md, line 115 at r1 (raw file):
So it's a single versioned key, which unlike all other keys operates as if its value is the union of all its non-GC'd versions? docs/RFCS/fast_delete_range.md, line 119 at r1 (raw file):
We should probably require Timestamp to be the timestamp of the transaction that wrote this value. Otherwise I can imagine various weird things (a Timestamp in the future would leave a "black hole" that couldn't be written into until the time expires; a Timestamp in the past would leave an inconsistent mix of previously-written values in place. docs/RFCS/fast_delete_range.md, line 137 at r1 (raw file):
I'm afraid of the memory footprint here. If a user does a lot of DELETEs on a poorly-indexed table, we may end up with a growing list of tombstones to maintain. We should be able to remove these values from memory and reload them from disk when necessary, probably via a store-level cache instead of a range-level one (like the raftEntryCache and the timestampCache) docs/RFCS/fast_delete_range.md, line 181 at r1 (raw file):
Reads must also return WriteIntentError if there is a FullRangeTombstone intent, right? docs/RFCS/fast_delete_range.md, line 189 at r1 (raw file):
I don't think this can happen; a DeleteRange must block all other writes in the command queue so it can only run when it is going to be the most recent update to the range. docs/RFCS/fast_delete_range.md, line 204 at r1 (raw file):
So after point-in-time recovery has been used, writes will be slower by X% for the next day because of the loss of this optimization? That's unfortunate (but not critical) docs/RFCS/fast_delete_range.md, line 208 at r1 (raw file):
s/Range/MVCC/? "Range GC" sounds too much like "replica GC". docs/RFCS/fast_delete_range.md, line 228 at r1 (raw file):
It will also require some tricky synchronization to ensure that it doesn't race with any other writes. docs/RFCS/fast_delete_range.md, line 238 at r1 (raw file):
MVCCStats is serialized below raft, which means making changes to it in an upgrade-safe way is challenging. This will need a migration plan of its own, unless we can get rid of this field. It looks like it's used to prevent full scans in two places, one of which I think is redundant with the command queue. We might be able to come up with a way to prevent the scan in the GC pipeline without introducing this intrusive change. docs/RFCS/fast_delete_range.md, line 254 at r1 (raw file):
I think I'm OK with leaving these stats estimated for an indefinite period of time. We don't do anything important with the "live" stats. docs/RFCS/fast_delete_range.md, line 276 at r1 (raw file):
It's fine to leave merge-related work to the (distant) future. Comments from Reviewable |
Review status: 0 of 2 files reviewed at latest revision, 18 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 180 at r1 (raw file):
PropEvalKV rules require every operation to declare a read dependency on this key. We might be able to special-case it because this value is only written when there's a full-range write, but there are subtleties to work out. Range-local data (like the range descriptor) would not be covered by the full-range write span (but maybe that's OK because they shouldn't consult full-range tombstones anyway?) docs/RFCS/fast_delete_range.md, line 181 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Also, intents written by the current transaction must be processed instead of returning WriteIntentError. Since this key behaves differently from other keys in that new versions are supposed to be merged with older versions, some care must be taken when a transaction contains multiple DeleteRange or recovery commands. Comments from Reviewable |
Review status: 0 of 2 files reviewed at latest revision, 20 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 89 at r1 (raw file):
I recall seeing there were a number of conditions which prevented the use of the docs/RFCS/fast_delete_range.md, line 111 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I found the timestamp definitions here confusing as well. I think this would be simpler if the tombstone always defined a closed span of time:
If you want to delete a range you add a tombstone from docs/RFCS/fast_delete_range.md, line 295 at r1 (raw file):
@dt or someone else said that we could possibly implement Independent of this RFC, it would be nice to figure out a way to unify Comments from Reviewable |
Reviewed 2 of 2 files at r1. docs/RFCS/fast_delete_range.md, line 111 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
Same confusion here. docs/RFCS/fast_delete_range.md, line 184 at r1 (raw file):
Should point out that even if the docs/RFCS/fast_delete_range.md, line 189 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Most recent, yes, but not newest timestamp. docs/RFCS/fast_delete_range.md, line 233 at r1 (raw file):
But we have MVCCStats.LastUpdateNanos, doesn't that work (assuming the guarantees about the value being updated being strong enough). docs/RFCS/fast_delete_range.md, line 254 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
It's been a hassle to deal with out-of-date stats. Any operation that relies on them must make assumptions on how they're out-of-date. So, maybe ok, for the Live stats, but good to avoid where we can. docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file):
What happens if you drop point-in-time-recovery? The driving issue here is dealing with large truncations. Point-in-time seems fashionable to add, but as far as I can tell hasn't been requested or prioritized by anyone, plus you can already jerry-rig it with Comments from Reviewable |
Review status: all files reviewed at latest revision, 23 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Point in time recovery has been requested from a number of folks: #16161. Comments from Reviewable |
Review status: all files reviewed at latest revision, 23 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
Point in time recovery also addresses a HUGE hole in our current backup/restore story. I don't consider it optional Comments from Reviewable |
Review status: all files reviewed at latest revision, 26 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 8 at r1 (raw file):
Dunno if this is helpful, but here's the summary I wrote when I first started thinking about this rfc: "This RFC describes a new kv operation for transactional bulk deletes across docs/RFCS/fast_delete_range.md, line 60 at r1 (raw file):
Another important motivation is that if a user is DROPing or TRUNCATEing a table because they're running out of disk and want a quick win, we currently do that by writing more to disk (a rocksdb tombstone per key). This doesn't immediately solve that problem but certainly paves the way docs/RFCS/fast_delete_range.md, line 66 at r1 (raw file):
I've been trying to standardize on calling this piece ROLLBACK. A second piece we can build is the ability to RESTORE to system times in a backup other than the end time (subject to certain limitations), which I've been calling "point-in-time RESTORE". Our customers probably don't care about the distinctions and "point-in-time recovery" sounds to me like a fine name for the collective. Sorry if this is too much of a bikeshed docs/RFCS/fast_delete_range.md, line 89 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
The code that makes the fast path decision is https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/tablewriter.go#L547-L568 docs/RFCS/fast_delete_range.md, line 93 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
For tables not involved in any interleave hierarchy, I'd imagine we'd always use tl;dr Not sure there's anything to do here, unless I'm missing your point docs/RFCS/fast_delete_range.md, line 111 at r1 (raw file):
Why is this? The semantics of ROLLBACK to me are "make the table look exactly like it did at this time". The later rollback or deletion should be irrelevant. I could see the following situation:
In which case the rollback at t4 should make the system ignore the tombstone at t3 docs/RFCS/fast_delete_range.md, line 115 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Seems like we should flesh out the alternative then? Specifically: only the latest tombstone is consulted, and it's a merging of all the tombstone timespans for that range. docs/RFCS/fast_delete_range.md, line 295 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
Yeah, I've thought about this some and I don't see why we couldn't implement TRUNCATE by giving the table a new id Comments from Reviewable |
Review status: all files reviewed at latest revision, 26 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 66 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
docs/RFCS/fast_delete_range.md, line 93 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
As long as we use the table span (in contrast to docs/RFCS/fast_delete_range.md, line 111 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
I was thinking that it would be weird if deleted data suddenly got undeleted, but on reflection it should definitely be allowed. You have to be able to undo a bad point-in-time recovery. docs/RFCS/fast_delete_range.md, line 115 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
Yeah, let's think that through. docs/RFCS/fast_delete_range.md, line 189 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Hmm? Writes always block each other regardless of timestamp. There's no way a DeleteRange could execute when some other operation has occurred on the range at a higher timestamp. docs/RFCS/fast_delete_range.md, line 233 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
LastUpdateNanos is not the last update to the stats, it's the last update to the "age" fields. (and it's not precise enough; it's not a full HLC timestamp) docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
The feature may be required, but we don't necessarily have to get it in this way. The comparable feature in many other databases is powered by a continuous backup of the WAL or another feed of changes (which we intend to build anyway at some point). Getting the feature this way is better than WAL mirroring, but the complexity this introduces on the critical path is still a concern. Comments from Reviewable |
Review status: all files reviewed at latest revision, 26 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 8 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
Incorporated. docs/RFCS/fast_delete_range.md, line 19 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. docs/RFCS/fast_delete_range.md, line 23 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Introduced use of "span" throughout and created a #Background section. docs/RFCS/fast_delete_range.md, line 32 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. docs/RFCS/fast_delete_range.md, line 60 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
That's an interesting point. I added it as an aside. docs/RFCS/fast_delete_range.md, line 66 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I'm happy enough with calling the actual SQL command docs/RFCS/fast_delete_range.md, line 68 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. docs/RFCS/fast_delete_range.md, line 89 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
Done. docs/RFCS/fast_delete_range.md, line 93 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Will definitely be covering this in testing. docs/RFCS/fast_delete_range.md, line 111 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. docs/RFCS/fast_delete_range.md, line 115 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
No, it IS an MVCC key, as mentioned above ("an additional MVCC Range-local key..."). If a read comes in for time In other words, you read the "correct" tombstone that applies to your read the same way your read the underlying KV data: by looking up the closest MVCC version that is <= your transaction timestamp. The only reason it's materialized with all versions in memory is to avoid endlessly reading the same value. I adjusted the paragraph to make this clearer. docs/RFCS/fast_delete_range.md, line 119 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Yes, it's the transaction timestamp. I've made that more explicit. docs/RFCS/fast_delete_range.md, line 137 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Gotcha. Well after some thought about possibly just holding the most recent value, I think it'll be best to keep reading everything and put a TODO in the code to watch out for blowup here. The reason is what happens with reverts. Those will require that we have to read again. Some ideas that occurred to me to maybe read more than the first just made things complicated. Feels better to punt on the problem of too many TRUNCATEs for the time being. docs/RFCS/fast_delete_range.md, line 180 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Non-MVCC reads definitely don't consult the full-Range tombstones, which is what all other range-local data is. We need to be very careful about this though. docs/RFCS/fast_delete_range.md, line 181 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Yes, read must return Added a comment about handling an existing full-Range tombstone intent appropriately. docs/RFCS/fast_delete_range.md, line 184 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Yes, that was my assumption. Is there anything you want me to change in this RFC, or just a point you were reiterating? docs/RFCS/fast_delete_range.md, line 189 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
We do still need this high water mark. It's not so much about concurrency, but about whether we can guarantee that when a full-Range tombstone is applied at time docs/RFCS/fast_delete_range.md, line 204 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Yes that's correct. Rewriting a big table with batch writes after a docs/RFCS/fast_delete_range.md, line 208 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. docs/RFCS/fast_delete_range.md, line 228 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Yes; I'm considering this somewhat TBD in terms of design still. Added to the Unresolved questions section. docs/RFCS/fast_delete_range.md, line 233 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Right. We can't use it...We really need the high water hlc timestamp for data written to the range. docs/RFCS/fast_delete_range.md, line 238 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I don't think we can get rid of it. I'll add this to the unresolved questions section. docs/RFCS/fast_delete_range.md, line 254 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
We should clean them up...shouldn't be too hard. docs/RFCS/fast_delete_range.md, line 276 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I added this as a drawback, but I'm comfortable with the additional complexity for point-in-time recovery. If that's all we were getting from this mechanism, I might be less sanguine. But because we're already adding these full-Range tombstones for deletion / truncation, I think the extra bit for revert is easily worth the cost to do now and solve them all in one go. docs/RFCS/fast_delete_range.md, line 295 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
Doing a new ID for truncation would be great, because it would re-enable the blind put optimization. However, I don't understand the potential problems you'll experience when doing time travel queries. Comments from Reviewable |
25b4274
to
37c6f96
Compare
Reviewed 1 of 2 files at r1, 1 of 1 files at r2. docs/RFCS/fast_delete_range.md, line 115 at r2 (raw file):
For an active index, couldn't we just also send DeleteRange requests to the index ranges? docs/RFCS/fast_delete_range.md, line 116 at r2 (raw file):
You either need to start these two phrases with the word "no" or move the word "no" from the first bullet into the preceding sentence. Comments from Reviewable |
Review status: all files reviewed at latest revision, 19 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 66 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
+1 to "point-in-time recovery" as the feature name but docs/RFCS/fast_delete_range.md, line 115 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Is it true that only closest MVCC version <= your transaction timestamp can affect your read? I don't think so:
docs/RFCS/fast_delete_range.md, line 180 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
The range descriptor is an MVCC range-local key. They're the only ones currently, but they're not categorically prohibited. docs/RFCS/fast_delete_range.md, line 189 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
OK, I get it now. docs/RFCS/fast_delete_range.md, line 295 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Time travel queries do a time-travel read of the descriptors, so they should see the old ID. docs/RFCS/fast_delete_range.md, line 112 at r2 (raw file):
These are pretty severe restrictions. It's outside the scope of this RFC but we definitely want to be able to quickly drop tables with secondary indexes and entire subtrees of interleaved tables in the future. Comments from Reviewable |
37c6f96
to
8ef8879
Compare
Review status: 1 of 2 files reviewed at latest revision, 19 unresolved discussions. docs/RFCS/fast_delete_range.md, line 115 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
What I meant was that you use the closest MVCC version <= cmd/txn timestamp on the first read. BUT, each revert tombstone that is merged with the result read from MVCC requires that another read be done (for both MVCC and full-Range tombstone) at the revert tombstone's start time. So in your example, when reading at t6, we'd be affected by the tombstone to t5, then read at t5, which would be affected by the tombstone to t2, then read at t2. This is described earlier in the document. docs/RFCS/fast_delete_range.md, line 180 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Correct. It's going to require a comprehensive comment in the code. docs/RFCS/fast_delete_range.md, line 295 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Doc updated. docs/RFCS/fast_delete_range.md, line 112 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
In reality, for both For Oracle, dropping a table will automatically and without exception drop its secondary indexes. So I think in practice only global, non-interleaved tables with foreign key references will not follow the fast path, but those will probably be rare. docs/RFCS/fast_delete_range.md, line 115 at r2 (raw file): Previously, a-robinson (Alex Robinson) wrote…
Yes. docs/RFCS/fast_delete_range.md, line 116 at r2 (raw file): Previously, a-robinson (Alex Robinson) wrote…
Done. Comments from Reviewable |
Review status: 1 of 2 files reviewed at latest revision, 19 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 115 at r1 (raw file):
Yes, that's exactly what I mean about this key being treated differently from all other MVCC keys. I see now that this is described earlier, but it took several passes for me to connect the dots. Comments from Reviewable |
This PR is entering the final comments period. |
Review status: 1 of 2 files reviewed at latest revision, 21 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 111 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Can you explicitly walk through some time travel queries in your main example? In particular, it seems that without proper precautions, it could happen that you run a time travel query that aggregates over two ranges. When it reads the first range, the fast DeleteRange hasn't run yet. Before time travel touches the second, the I also wonder how you're undoing a PITR as Ben pointed at above. You're dealing with tombstones only, but it seems that you'd also want an Un-Tombstone, though that's sure to make things more complex. What was your answer to him there? docs/RFCS/fast_delete_range.md, line 184 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
We should document this better at some point, but no specific action required here. docs/RFCS/fast_delete_range.md, line 233 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
I see. So you're only missing the logical component - I think docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
I'm on board with point-in-time recovery being a useful addition to the system and I see the allure of two birds with one stone. I have no concerns about RFC'ing the complete design. But I'm worried about making this a one-man show trying to implement it all in a fell swoop for 1.1 as there is a lot of test code to write. If I picture the worst case, then I see you rushing this feature in just before 1.1 closes (perhaps with CEO persuasion), and later others will have to go in and fix any bugs that crop up in production after painfully long investigation and limited window for a fix. In my ideal case we make this two-stage, you loop someone in who's not a member of the old wise men core team and work with them to get a solid solution for large drops into 1.1, and then continue with PITR for 1.2 (or 1.1, if things are much faster than anticipated). Of course there's also the OK case in which you do all the work and it sorta works out OK, perhaps great. But even then I'd love for someone new to grok this, too (similar how Dan got thrown into the storage bowels via backup-restore through Ben and myself). Curious to hear what others think. docs/RFCS/fast_delete_range.md, line 51 at r3 (raw file):
Should make clear somewhere that the optimization only applies when there is no limit and no deleted keys are returned (mostly so that it's not forgotten about during implementation). Also, anything special about the first range? It contains a bunch of non-userspace keys. I assume we simply won't do this optimization for the first range, ever. (that should be OK since we split out the table anyway). docs/RFCS/fast_delete_range.md, line 227 at r3 (raw file):
This might tickle some new code paths because I don't think there's any other case in which you can get a docs/RFCS/fast_delete_range.md, line 280 at r3 (raw file):
Seems good to ping #7880. docs/RFCS/fast_delete_range.md, line 336 at r3 (raw file):
Spurious Comments from Reviewable |
Review status: 1 of 2 files reviewed at latest revision, 21 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 111 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Your time travel query question is not a problem. To see why, consider the following example: Let's say the time travel is at time t5. At your first range, the query to any key in that range will ensure that no full-Range write can possibly be written at time <= t5 (because of the timestamp cache). When the fast DeleteRange now runs before the second range is queried at t5, it must have a tombstone timestamp > t5. That means that when the second range is queried at t5, the fast DeleteRange cannot possibly affect the time travel read. Dealing with "undoing" a PITR was not contemplated in this document. However, it's fairly simple to see how they would work. If a revert is done from t6 to t4, doing a revert from t6,1 to t6,-1 would effectively undo the revert from t6. Consider the case of writes at t5:
docs/RFCS/fast_delete_range.md, line 184 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
I will add to the .proto comments. docs/RFCS/fast_delete_range.md, line 233 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Well yes, I suppose that's accurate. The LastUpdateNanos should be the high water mark wall clock time. Although I'm not sure because a quick survey of the code is making it very difficult to determine where that value is being set! In any case, I think it will be easier and less confusing to add another field and leave docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
These are valid concerns, but outside the scope of this RFC PR. The plan is to get the fast delete path working first. Then the PITR will need to be added as an enterprise feature with new SQL syntax and probably another RFC. These concerns will be amply addressed at that later date. docs/RFCS/fast_delete_range.md, line 51 at r3 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Added explicit mention of limit and returned keys exceptions. I don't believe there is any reason to make an exception for the first range. The first range isn't the only one that contains non-userspace keys. All the time series data, all the meta2 records, node liveness, etc. They're all potentially located on ranges after the first. I don't think we should do anything different for system keyspace ranges. This faster version for full-Range docs/RFCS/fast_delete_range.md, line 227 at r3 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
It will return a WriteIntentError for the key you're trying to write. As you say, I don't think this is a problem. docs/RFCS/fast_delete_range.md, line 280 at r3 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Looks like this comment accomplished that! docs/RFCS/fast_delete_range.md, line 336 at r3 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. Comments from Reviewable |
8ef8879
to
e840381
Compare
Reviewed 1 of 1 files at r4. docs/RFCS/fast_delete_range.md, line 111 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Ah, gotcha. I didn't take into account that only the tombstones which are visible from your timestamp are active. docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
SGTM docs/RFCS/fast_delete_range.md, line 51 at r3 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
For example, we are going through some trouble to make sure there aren't ever intents on the node liveness table (right?). An intent on the DeleteRange key would behave like an intent on any of the liveness table keys. Does seem that at the very least, it exposes us to a lot of new situations. docs/RFCS/fast_delete_range.md, line 163 at r4 (raw file):
I think this warrants an additional couple of words. Say you're anchored with your transaction far away from the range deletions, and when you've done those, you come back for the commit. How are you going make any guarantees about what the GCThreshold is on these ranges? It's the GC Queue's duty to make sure things work out in that case. So probably easiest to discuss it there and mention here only that at the time of writing the intent you fail immediately if you're already violating the GC Threshold (or TxnSpanGCThreshold, as usual). docs/RFCS/fast_delete_range.md, line 266 at r4 (raw file):
Is that really true? The tombstone layer makes it tricky to follow what happens. The GC Queue currently iterates over actual on-disk KV pairs but with tombstones, wouldn't you skip some? For example, if you have gcable kvs at t=1, 5, 10 and you have a tombstone for 3-6, wouldn't it look like only t=1,10 had kv pairs and you'd never delete t=5? Perhaps an example can clarify. Also worth discussing how a tombstone itself is GC'ed and how it influences what other values can be GC'ed. For example, a write at t=1, a deletion of that write at t=10, and a range tombstone 8-11. So Comments from Reviewable |
Review status: all files reviewed at latest revision, 17 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 233 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
It will be cleaner to add a new field, but much trickier due to the migration requirements. Do we docs/RFCS/fast_delete_range.md, line 51 at r3 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Yeah, but a DeleteRange that covers the node liveness keys would write intents to them with or without this optimization. Not doing a DeleteRange over the liveness keys is just part of what it means to "go through some trouble to make sure there aren't ever intents on the node liveness table". I don't think this needs a special case. docs/RFCS/fast_delete_range.md, line 163 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
This is kind of scary: a GC queue run during a REVERT transaction could lead to inconsistent data. I think this warrants a check in the GC queue, to not do any MVCC GC while an intent exists for a full-range tombstone (or potentially to look into the intent and incorporate it into the decisions about what is GC'able, but it's simpler and safer to just wait for the intent to resolve) docs/RFCS/fast_delete_range.md, line 362 at r4 (raw file):
I think the migration story belongs in the RFC; I wouldn't want to approve it without one. (I think it's aggressive to even send it to the final comment period without this). Comments from Reviewable |
Review status: all files reviewed at latest revision, 17 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 51 at r3 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Step 1: make cluster I just think it'd be nice to a) point out why it's a problem and b) make sure it's not becoming a problem. Sure, we own the KV store, but at the end who's to know what's allowed and what isn't? It's already anyone's guess today unless you're willing to think hard thoughts. Remember how we introduced node liveness under the assumption that there could never be intents on it, but then found than in practice intents could still be on it? This could be similar, I just want to make sure we make this thorough, especially since it should make sense to anyone not deep in the weeks of local/range-local/system keyspaces. (I find it confusing enough already). docs/RFCS/fast_delete_range.md, line 362 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I second the concern about this already being in final comment period. That's a good hammer to get folks to actually look at it (sure worked for me), but at least I personally don't feel that I have grokked the suggestions here well enough to stamp them. Comments from Reviewable |
Review status: all files reviewed at latest revision, 17 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 51 at r3 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Comments from Reviewable |
Review status: all files reviewed at latest revision, 17 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 51 at r3 (raw file): Previously, bdarnell (Ben Darnell) wrote…
You don't have to convince me it's correct for that particular use case we have in mind right now, but we've done remarkably little for other team members to come around and to know what's acceptable and what's not (after all, other team members == set of people who use KV). What I'm trying to do here is to get a little more detail in somewhere. Perhaps this isn't the right place, but we should document internals whenever we can. For example, why not use this operation to later purge time series ranges? Because those use merges. I don't know how it works for you, but every time I have to interact with local keys in any way there is considerable effort in paging in the messy details. I would love to be able to come back to this RFC at least find some information or pointers on how fast DeleteRange works outside of user-land if that ever becomes relevant. Comments from Reviewable |
e840381
to
19cdbf0
Compare
Review status: 1 of 2 files reviewed at latest revision, 17 unresolved discussions. docs/RFCS/fast_delete_range.md, line 233 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
OK, I'll switch to using this suggestion. Will accompany it with a significant comment in the docs/RFCS/fast_delete_range.md, line 51 at r3 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
I agree in principal but this concern seems misplaced. Why would we ever expect to receive a docs/RFCS/fast_delete_range.md, line 163 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
The GC Queue cannot proceed with the GC until it has "pushed" the transaction responsible for any intent on the full-Range tombstone key. That means that the GCThreshold for the range cannot change until after the transaction which created the full-Range tombstone has completed. I've added a little more verbiage about an immediate failure if the addition of the revert tombstone would revert to an earlier timestamp than the docs/RFCS/fast_delete_range.md, line 266 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
You're right this is more complicated than just using MVCCIterate. I looked again at the code in docs/RFCS/fast_delete_range.md, line 362 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Removed. Comments from Reviewable |
Review status: all files reviewed at latest revision, 24 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file):
I'm not. We've uncovered a lot of subtle interactions (mainly with GC) while discussing this, and I don't think we're done yet. If concerns are about REVERT are outside the scope of this RFC, then let's strip the RFC down to just the DeleteRange portions. docs/RFCS/fast_delete_range.md, line 163 at r4 (raw file):
That's not how it works today. The GC queue considers intents and old MVCC versions separately; it can GC old versions and advance the threshold even when intents exist that are not yet resolved. It's the command queue interactions on the GC threshold key that make this safe. But in any case, my concern was misplaced above - I misremembered how the MVCC GC worked and was afraid we would allow the first version past the threshold to be GC'd incorrectly. I'm no longer sure that we need to block GC when a range tombstone intent exists. docs/RFCS/fast_delete_range.md, line 362 at r4 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Um, removing the "unresolved question" about the migration story doesn't resolve it. We've avoided part of the problem by not adding a new field to MVCCStats, but since this is changing the behavior of all the existing KV commands this will still be tricky to roll out. During a rolling upgrade, a DeleteRange issued on a node running a new version will write tombstones that can't be interpreted by nodes running older versions. How will we gate this so that the new tombstones aren't used until the upgrade is complete (or optionally longer, if the admin wants to retain the ability to downgrade)? This is what I was talking about last week when I said a command like docs/RFCS/fast_delete_range.md, line 239 at r5 (raw file):
How do REVERTs interact with existing intents? (They can't fall back to a traditional DeleteRange) docs/RFCS/fast_delete_range.md, line 275 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Yeah, this seems insufficient since the GCQueue is neither transactional nor synchronized with raft. As I said above I'm not sure we need this, but if we do, we need more synchronization than is described here. docs/RFCS/fast_delete_range.md, line 277 at r5 (raw file):
We need to keep versions that are hidden by revert tombstones in addition to those that are kept visible by them (thanks to the ability to revert a revert, any covered versions may be uncovered, even if they are older than the GC threshold). docs/RFCS/fast_delete_range.md, line 279 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
We always keep one version beyond the GC threshold. We don't have to hold the GC threshold back because there are old range tombstones, but we do have to keep old tombstones around as long as they cover any extant versions. Comments from Reviewable |
Review status: all files reviewed at latest revision, 24 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
The point in time REVERT is scheduled for 1.1 (and fills a pretty embarrassing hole in our disaster recovery story). If we decide that it's outside the scope of this rfc, then we're likely cutting it from 1.1. Are we okay with doing that? At the very least, I propose that we include the details we need to support it in this rfc and implement them alongside the fast DROP and TRUNCATE. Then, if we're concerned about the ruggedization of REVERT when August/September rolls around, we can turn it off for the release Comments from Reviewable |
Review status: all files reviewed at latest revision, 24 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
I'm not sure I agree that this is a "pretty embarrassing hole" - many other databases don't have an MVCC-based rollback like this; they generally solve the point-in-time recovery problem by streaming transaction logs to a backup location, which requires an expensive restore process to actually use. Our Comments from Reviewable |
Review status: all files reviewed at latest revision, 24 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Except we don't have a streaming transaction log yet, so there's no way in a data loss scenario to restore to a point after the last backup. The data loss story is what I meant by embarrassing hole, not so much the lack of an mvcc rollback. We've encountered a decent amount of concern over this from potential customers Comments from Reviewable |
Review status: all files reviewed at latest revision, 24 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
Yes, but data loss from what failure mode? We have synchronous replication to protect against single-node failure and time travel queries to protect against application bugs or administrative mistakes. This greatly reduces the need for a streaming backup/point-in-time-recovery mechanism. There are still cases where you'd need this (multiple hardware failures or bugs in cockroachdb itself), but I'm not sure this is severe enough to make the proposed change worth the complexity and risk. Comments from Reviewable |
Review status: all files reviewed at latest revision, 24 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file):
The time it takes us to recover from these using a time travel query is pretty bad right now. And I've seen them happen often enough in the past that this matters.
Anyway, it sounds like you feel more strongly about this than I realized. Probably time to take a step back. I can't say I have a good enough sense of core to understand why this is so risky. But if we're going to bump point in time revert from 1.1, then we should do it sooner than later. From what I understand product+sales have been pitching this as a 1.1 feature (and I'll also schedule my time differently if this is not on my plate for 1.1). Not saying that decision has to be made in this comment thread, but the stakeholders should probably talk this over Comments from Reviewable |
19cdbf0
to
d163fe3
Compare
Review status: all files reviewed at latest revision, 24 unresolved discussions, all commit checks successful. docs/RFCS/fast_delete_range.md, line 280 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
I am starting to think of this feature as specifically about reversions, and have updated and renamed the RFC to reflect that. While we will be able to use it to dramatically improve our TRUNCATE and DROP performance for 1.1, I think a better solution long term for those will be to move such tables into a graveyard for eventual GC and start with a fresh table ID for case of TRUNCATE. This is attractive because it doesn't require per-range activity and because it gives us virgin keyspace for successive writes. This immediately reenables the docs/RFCS/fast_delete_range.md, line 163 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
This is explicitly spelled out now. We still have to push any intent on the full-Range tombstone key in order to ensure that changes to the docs/RFCS/fast_delete_range.md, line 362 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
This is no longer a problem since I've introduced a new docs/RFCS/fast_delete_range.md, line 133 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. docs/RFCS/fast_delete_range.md, line 239 at r5 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Good point. On second thought, we should probably not revert to normal DeleteRange – better to instead do a scan to clear the intents retry the full-Range delete/revert. Aside from the fact that per-key revert tombstones would be an unacceptable complication, we expect these full-Range actions on quiescent or at the very least, mostly-quiescent, tables, so I think this approach is better than gumming the works up with per-key writes. I've changed the RFC to instead introduce a new KV command: docs/RFCS/fast_delete_range.md, line 267 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. docs/RFCS/fast_delete_range.md, line 269 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
With batching, we still set the Could you give a more detailed example, or else maybe stop by my desk to discuss? docs/RFCS/fast_delete_range.md, line 275 at r5 (raw file): Previously, bdarnell (Ben Darnell) wrote…
The "versions of the tombstone" are the ones in the snapshot of the Range used in the GC queue's deliberations. But your point is valid. The reality is we can't GC anything if a revert might be added (between GC queue and docs/RFCS/fast_delete_range.md, line 277 at r5 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Yes good point. Added a section on the GC algorithm for clarity. docs/RFCS/fast_delete_range.md, line 298 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
OK, cleaned this up. Comments from Reviewable |
d163fe3
to
0277de1
Compare
Don't forget to change the PR title. (renaming the file mid-review makes the comments harder to follow; it would have been better to leave it and rename just before merging) I'm still unsatisfied with this RFC. The justification given is still focused on speeding up DeleteRange, even though you say that the long-term solution for DeleteRange should involve replacing the table ID. If that's the goal, let's have an RFC for that approach first, and then we can consider if this might be a more achievable short-term solution. (I think it might be if we limit the mechanism to delete/revert-to-zero. I think the full reversion functionality still has some edge cases to work out and is too complicated to justify by the need for faster deletion, especially if it's going to be replaced by a different way to speed up deletes). If the goal is fast point in time reversions, then the approach of this RFC is (probably) the right one. But in that case at least the "motivation" section needs to be rewritten to justify this feature on its own terms instead of dwelling on the limitations of DeleteRange. I am not convinced that this feature is high enough priority to warrant the high-risk work it involves, and the opportunity costs of the core team's time. Review status: 0 of 2 files reviewed at latest revision, 28 unresolved discussions, all commit checks successful. docs/RFCS/revert.md, line 13 at r6 (raw file):
Now that this is all about revert, what is left unspecified? Just the SQL syntax? docs/RFCS/revert.md, line 30 at r6 (raw file):
Is this still the primary goal? I think most of this background about DeleteRange is obsolete, and we need new motivation focused on the "revert" use case. docs/RFCS/revert.md, line 151 at r6 (raw file):
This will also revert anything else with the same timestamp. These timestamp collisions are possible thanks to HLC ratchets (they were the cause of one of our jepsen failures last fall). docs/RFCS/revert.md, line 163 at r6 (raw file):
I got confused again by "just like the underlying KV data". What's tripping me up is that I'm interpreting that to mean "just like the underlying KV data is read today" while you're meaning that it will be read by recursively applying the "reading with full-range reversions" below. docs/RFCS/fast_delete_range.md, line 362 at r4 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
DROP/TRUNCATE TABLE cannot issue Revert commands as long as there are nodes in the cluster that do not understand that command, so a migration strategy is still necessary. docs/RFCS/fast_delete_range.md, line 133 at r5 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Tobi's suggestion was docs/RFCS/fast_delete_range.md, line 239 at r5 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Where would that scan and retry occur? Comments from Reviewable |
Hmm, shame the rename made this basically a new review. I can't say much other than that we could probably make all of this work somehow, but like Ben I think this is a pretty drastic change and I can't quite see why we would embark on it at this stage. I second Ben's suggestion to flesh out the table ID RFC because that seems to address an urgent need and is perhaps more straightforward. Reviewed 4 of 4 files at r6. docs/RFCS/revert.md, line 128 at r6 (raw file):
I'm still confused by this terminology. "From timestamp1 to timestamp2" means typically that docs/RFCS/revert.md, line 265 at r6 (raw file):
Is this section aware of the fact that reversions can be reverted? docs/RFCS/revert.md, line 405 at r6 (raw file):
and if there's an intent? You don't want to duplicate that. Do you not allow splits while there's an intent? Comments from Reviewable |
0277de1
to
f0452b8
Compare
I've improved the motivation section to make it clear that the primary goal of this RFC is point in time recovery. The reason we're embarking on this is because it's a missing piece in our backup / restore story. It's been requested repeatedly by customers with enterprise production deployments in mind. PIT recovery is an important component of most disaster recovery plans. Review status: 1 of 2 files reviewed at latest revision, 28 unresolved discussions. docs/RFCS/revert.md, line 13 at r6 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Removed this verbiage in the cleanup effort. I added a small section in the detailed design for proposed syntax. docs/RFCS/revert.md, line 30 at r6 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Fixed. docs/RFCS/revert.md, line 128 at r6 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Yes, these are the other way around. It makes more sense to me that In any case, I've removed the docs/RFCS/revert.md, line 151 at r6 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Removed use of No, you cannot insert reversions in the past (where "past" is defined as anything older than the most recent write to a range). Explicitly including No value in the Range may have the same timestamp as the full-Range reversion. To see why this is true, consider: the full-Range reversion key is written using a span from start to end of the full Range in the command queue. It must choose a timestamp greater than the most recently written value (it uses the docs/RFCS/revert.md, line 163 at r6 (raw file): Previously, bdarnell (Ben Darnell) wrote…
OK, I've cleaned up the wording here. docs/RFCS/revert.md, line 265 at r6 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Yes, I've simplified the algorithm dramatically so it's easier to reason about correctness. docs/RFCS/revert.md, line 405 at r6 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Added a note to that effect. docs/RFCS/fast_delete_range.md, line 362 at r4 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I see your point. Added a bullet in "Unresolved Questions" section. Will have to think more about the migration. Seems like we need either an upgrade command or something not yet dreamed up. docs/RFCS/fast_delete_range.md, line 133 at r5 (raw file): Previously, bdarnell (Ben Darnell) wrote…
docs/RFCS/fast_delete_range.md, line 239 at r5 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Actually I think it would be preferable to return an error indicating the range is busy. Updated to reflect this. Comments from Reviewable |
Review status: 1 of 2 files reviewed at latest revision, 29 unresolved discussions, some commit checks failed. docs/RFCS/revert.md, line 10 at r7 (raw file):
Both use cases in this RFC (restore and truncate) assume that you're restoring to a timestamp that's within in the data's TTL, right? Comments from Reviewable |
Some form of PIT recovery may be a necessary feature, but in many (most?) databases this is implemented by restoring from a backup and replaying logs, instead of some sort of O(1) MVCC magic (Oracle appears to be an exception with their Flashback feature). O(1) PIT recovery is a cool feature, but there's an alternative design that builds on change feeds and backup/restore (thereby overlapping with work that we're doing anyway). Review status: 1 of 2 files reviewed at latest revision, 29 unresolved discussions, some commit checks failed. docs/RFCS/revert.md, line 128 at r6 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
FWIW my linguistic intuition matches Tobi's. In the context of this protobuf the english usage "from x to y" implying "x < y" is stronger than the association with "revert from x to y". docs/RFCS/revert.md, line 49 at r7 (raw file):
It's not easy to implement a log that can be played back, but it's work that we're already planning to do for change data capture (which has value beyond PIT recovery) docs/RFCS/revert.md, line 302 at r7 (raw file):
So regular use of reversion could delay GC indefinitely? docs/RFCS/revert.md, line 439 at r7 (raw file):
Another question which needs to be resolved: how does this interact with schema changes? What happens if you revert a table into an earlier phase of the schema change process? docs/RFCS/revert.md, line 441 at r7 (raw file):
I think we need at least an outline of the migration process before this RFC can be approved. Comments from Reviewable |
4d346ed
to
f046610
Compare
As stated in the RFC, most PIT recovery solutions require a quiesced database, restore from backup, and replay of logs up to the desired timestamp. While this will be possible with the change feeds feature, not being able to do this in situ, and quickly, makes it a sub-optimal solution. This, I'd argue, should be the "core" solution to PIT recovery. What's presented in this RFC is an enterprise feature. Review status: 1 of 2 files reviewed at latest revision, 29 unresolved discussions, some commit checks failed. docs/RFCS/revert.md, line 128 at r6 (raw file): Previously, bdarnell (Ben Darnell) wrote…
http://sentence.yourdictionary.com/revert In the first five I bothered to read, the use of the verb revert indicates moving to an earlier state in time. I don't see how an English speaker, seeing this struct is a reversion, would associate "to" with the most recent timestamp which is being reverted. docs/RFCS/revert.md, line 10 at r7 (raw file): Previously, andreimatei (Andrei Matei) wrote…
We might set the default TTL lower than 1d, but I think the main reasons suggested in the forum thread you've linked here are better addressed via other means. For example, recommending that high-churn tables manually specify a lower TTL or even providing new GC options, such as a maximum number of versions allowed, so the cost of GC is amortized and proportional to write volume. docs/RFCS/revert.md, line 49 at r7 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I just removed the sentence. It's not necessary to support the point that a fast, in-situ solution is possible via MVCC data we're already keeping. docs/RFCS/revert.md, line 302 at r7 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Yes. There are ways to get fancier, but this seemed like a reasonable starting point that is easy to reason about. docs/RFCS/revert.md, line 439 at r7 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I added this as an unresolved question. docs/RFCS/revert.md, line 441 at r7 (raw file): Previously, bdarnell (Ben Darnell) wrote…
See #16977 for a migration process. I removed this from the Unresolved section into Detailed Design. Comments from Reviewable |
The fact that most PIT recovery solutions are very disruptive indicates that while some form of PIT recovery may be a requirement, fast PIT recovery is just nice-to-have. I still believe that this will be an expensive and destabilizing change that is not a good investment at this time. Review status: 1 of 2 files reviewed at latest revision, 27 unresolved discussions, some commit checks failed. docs/RFCS/revert.md, line 128 at r6 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
Now that it's docs/RFCS/revert.md, line 405 at r6 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
This should be added as a drawback. docs/RFCS/revert.md, line 302 at r7 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
This should be added as a drawback. docs/RFCS/revert.md, line 439 at r7 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
It's a big question to leave unresolved; I think we're at least going to have to disallow reverts "into" a schema change, which is a pretty big caveat. Comments from Reviewable |
I have to agree with Ben that the opportunity cost is very high, in particular since the interference with all existing and future features is tricky. I share his view that we should not implement this in the near future. Reviewed 1 of 1 files at r8. Comments from Reviewable |
What is the near future? Revert isn't happening for 1.1. The open question is whether it should be restarted for 1.2 or some future release or whether we should burden change feeds with supporting PIT recovery. Review status: all files reviewed at latest revision, 28 unresolved discussions, some commit checks failed. docs/RFCS/revert.md, line 86 at r8 (raw file):
This is inaccurate. I'd nix this whole section about docs/RFCS/revert.md, line 441 at r8 (raw file):
I believe the DROP and TRUNCATE TABLE work is now done. Comments from Reviewable |
I haven't looked at everything that's on the table for 1.2 but I doubt there's going to be room for this. Change feeds are definitely higher priority than this in my book (in their own right), so I'd be in favor of relying on them for point-in-time recovery in the 1.2 time frame. |
f046610
to
bd6328f
Compare
This doesn't feel like the right forum for a discussion about when this RFC should be implemented. Please limit comments to the technical aspects of the RFC. Review status: all files reviewed at latest revision, 28 unresolved discussions, some commit checks failed. docs/RFCS/revert.md, line 405 at r6 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Added a general drawback covering the interaction between revert and intents. docs/RFCS/revert.md, line 302 at r7 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. docs/RFCS/revert.md, line 439 at r7 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I moved this into detailed design, but it still needs work. Really we need to copy the schema as it existed at the As a first step, the practical solution is to prevent docs/RFCS/revert.md, line 86 at r8 (raw file): Previously, petermattis (Peter Mattis) wrote…
Done. docs/RFCS/revert.md, line 441 at r8 (raw file): Previously, petermattis (Peter Mattis) wrote…
Done. Comments from Reviewable |
reminder: don't forget to rebase if you plan to continue work on this, and pick up the new RFC sections + naming convention. |
No description provided.