-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeed doesn't pick up IMPORT INTO #43784
Comments
This is real. @danhhz do you have thoughts about the right behavior here? |
I suppose our two options are 1) emit the sstables created by the IMPORT INTO or 2) disconnect the changefeed with an error. Either way, I think RangeFeed will have to do something with AddSSTables |
Maybe eventually this is true but I don't think that'd be my first approach to fixing this issue. As of 20.1 we take tables offline when performing an IMPORT INTO. I think it'd be reasonable to instead perform a time-bound backfill when detecting a table coming back online. I think that'd be a smaller and less invasive change than emitting logical ops for AddSSTable requests. I worry that getting that to work with the buffering of rangefeeds might be quite hard with large requests. @danhhz thoughts on this approach? cc @mwang1026 |
That's a nice idea and I have the same concerns about rangefeed buffering. It definitely works for now, do we have any plans to make IMPORT INTO work without taking the table offline? That would require something different |
Only vague ideas as far as I know. If we have a replicated lock table that supports spans then we could use that to make AddSST work transactionally. @dt brought this up recently. Replicated span locks don't seem like a 20.1 thing at this point. Maybe 20.2. That's probably the cleanest solution though will complicate this story a bit. I wonder if even in that case the right thing to do is detect the commit of the removal of the span lock as the event for the AddSSTable and maybe also attach metadata to the logical op entry with that "committed" span lock and just go do a time bound iteration of that span. For a given AddSSTable we should be writing into a span all at the same timestamp. If we really wanted to make I'm sure we could cobble together something worse using partial offline states in table descriptors or something else nasty that might actually make implementing this a little easier. <ramble> The patterns I'd like here is for the lock holders to watch a changefeed for some sort of invalidating event and then the writer of the event will try to grab an exclusive lock. When it successfully acquires the exclusive lock it knows that all of the other leaseholders have dropped there. It could be done without shared locks if you can grab an exclusive lock over a span. Long lived transactions are mostly expensive because of transaction heartbeating which we can probably eliminate without too much work. I wonder how much of this would fall out cleanly if we implemented pg_advisory_lock and scoped changefeeds. I'm not sure what exactly is the right abstraction for distributed locking at the kv layer but it seems like there's a lot of other state that would benefit from shared leasing and distributed coordination primitives. There are existing subsystems it would simplify (jobs?) and features it should enable (enums? types?). There's a bootstrapping problem to be worked around to use the changefeeds in the implementation but I'm going to assume we could work around it. |
my short term vote would probably be that active changefeeds should terminate with an error as soon as a table moves into a non-public state (offline, drop, etc). That would mean that when you start I don't think it would actually be correct to emit contents of SSTs as they came in, since they're sorta "uncommitted" until the IMPORT finishes, but more importantly, none of this is expected to play nice with txns, respecting closed-ts, etc -- by taking a table offline, we're assuming we can throw all that out the window, and I imagine we'll violate some assumptions if we let a changefeed keep running on keyspace where we've locked out other SQL and said all bets are off. |
Well aren’t you in luck! #44265 (comment) |
I just realized that #44663 makes this harder. I have a feeling that time-bound scans are not going to be a popular idea but I could be wrong. |
CDC Team, y'all should probably get a unit test to exercise the |
Acked @ajwerner |
I moved this back to triage, to ensure we get someone assigned. |
In 20.2.4, a changefeed would fail if IMPORT INTO was run against one of its target tables. The failure would look like: ``` I210428 10:45:57.982012 2015 jobs/registry.go:1131 ⋮ [n1] CHANGEFEED job 653840730248282113: stepping through state failed with error: ‹relation› ‹"test"› is offline: ‹importing› (1) Wraps: (2) attached stack trace -- stack trace: | github.com/cockroachdb/cockroach/pkg/sql/catalog.FilterDescriptorState | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descriptor.go:387 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:219 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:758 | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:808 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:757 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:193 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.acquireNodeLease.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:859 | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128 | runtime.goexit | /usr/local/Cellar/go/1.16.3/libexec/src/runtime/asm_amd64.s:1371 Wraps: (3) ‹relation› ‹"test"› is offline: ‹importing› Error types: (1) *catalog.inactiveDescriptorError (2) *withstack.withStack (3) *errutil.leafError ``` We want changefeed to fail when IMPORT INTO is run because changes via the AddSSTable mechanism is not currently reflected in the changefeed, meaning we would fail to emit imported data. The previous path that raised this failure depended on: 1) The descriptor being offline at the point we attempted to acquire a lease on it: https://github.com/cockroachdb/cockroach/blob/d1962910b58005096ce411bccbaddcd0c1d30cbd/pkg/ccl/changefeedccl/schemafeed/schema_feed.go#L514 2) The lease acquisition filtering out offline descriptors with an error: https://github.com/cockroachdb/cockroach/blob/eda2309728392593162e962a61182eab6ab003ff/pkg/sql/catalog/descriptor.go#L209 3) The failure from the lease acquisition in the schemafeed being treated as a fatal error. I believe our behaviour here has changed a few times on both the 20.2 branch and master because of changes in each of these 3 behaviours. In this change, rather than relying on the lease acquisition, we specifically check for offline tables in our ValidateTable function. This function is called for every descriptor version we get from the ExportRequest on the Descriptor table. Currently, I believe that checking for the offline descriptors is correct since it appears that only restore and import put tables into an offline state. Release note (enterprise change): CHANGEFEEDs more reliably fail when IMPORT INTO is run against a targeted table. Fixes cockroachdb#64276 See also cockroachdb#62585, cockroachdb#43784
In 20.2.4, a changefeed would fail if IMPORT INTO was run against one of its target tables. The failure would look like: ``` I210428 10:45:57.982012 2015 jobs/registry.go:1131 ⋮ [n1] CHANGEFEED job 653840730248282113: stepping through state failed with error: ‹relation› ‹"test"› is offline: ‹importing› (1) Wraps: (2) attached stack trace -- stack trace: | github.com/cockroachdb/cockroach/pkg/sql/catalog.FilterDescriptorState | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descriptor.go:387 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:219 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:758 | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:808 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:757 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:193 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.acquireNodeLease.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:859 | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128 | runtime.goexit | /usr/local/Cellar/go/1.16.3/libexec/src/runtime/asm_amd64.s:1371 Wraps: (3) ‹relation› ‹"test"› is offline: ‹importing› Error types: (1) *catalog.inactiveDescriptorError (2) *withstack.withStack (3) *errutil.leafError ``` We want changefeed to fail when IMPORT INTO is run because changes via the AddSSTable mechanism is not currently reflected in the changefeed, meaning we would fail to emit imported data. The previous path that raised this failure depended on: 1) The descriptor being offline at the point we attempted to acquire a lease on it: https://github.com/cockroachdb/cockroach/blob/d1962910b58005096ce411bccbaddcd0c1d30cbd/pkg/ccl/changefeedccl/schemafeed/schema_feed.go#L514 2) The lease acquisition filtering out offline descriptors with an error: https://github.com/cockroachdb/cockroach/blob/eda2309728392593162e962a61182eab6ab003ff/pkg/sql/catalog/descriptor.go#L209 3) The failure from the lease acquisition in the schemafeed being treated as a fatal error. I believe our behaviour here has changed a few times on both the 20.2 branch and master because of changes in each of these 3 behaviours. In this change, rather than relying on the lease acquisition, we specifically check for offline tables in our ValidateTable function. This function is called for every descriptor version we get from the ExportRequest on the Descriptor table. Currently, I believe that checking for the offline descriptors is correct since it appears that only restore and import put tables into an offline state. Release note (enterprise change): CHANGEFEEDs more reliably fail when IMPORT INTO is run against a targeted table. Fixes cockroachdb#64276 See also cockroachdb#62585, cockroachdb#43784
64323: changefeedccl: fail changefeeds when tables go offline r=miretskiy a=stevendanna In 20.2.4, a changefeed would fail if IMPORT INTO was run against one of its target tables. The failure would look like: ``` I210428 10:45:57.982012 2015 jobs/registry.go:1131 ⋮ [n1] CHANGEFEED job 653840730248282113: stepping through state failed with error: ‹relation› ‹"test"› is offline: ‹importing› (1) Wraps: (2) attached stack trace -- stack trace: | github.com/cockroachdb/cockroach/pkg/sql/catalog.FilterDescriptorState | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descriptor.go:387 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:219 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:758 | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:808 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:757 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:193 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.acquireNodeLease.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:859 | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128 | runtime.goexit | /usr/local/Cellar/go/1.16.3/libexec/src/runtime/asm_amd64.s:1371 Wraps: (3) ‹relation› ‹"test"› is offline: ‹importing› Error types: (1) *catalog.inactiveDescriptorError (2) *withstack.withStack (3) *errutil.leafError ``` We want changefeed to fail when IMPORT INTO is run because changes via the AddSSTable mechanism is not currently reflected in the changefeed, meaning we would fail to emit imported data. The previous path that raised this failure depended on: 1) The descriptor being offline at the point we attempted to acquire a lease on it: https://github.com/cockroachdb/cockroach/blob/d1962910b58005096ce411bccbaddcd0c1d30cbd/pkg/ccl/changefeedccl/schemafeed/schema_feed.go#L514 2) The lease acquisition filtering out offline descriptors with an error: https://github.com/cockroachdb/cockroach/blob/eda2309728392593162e962a61182eab6ab003ff/pkg/sql/catalog/descriptor.go#L209 3) The failure from the lease acquisition in the schemafeed being treated as a fatal error. I believe our behaviour here has changed a few times on both the 20.2 branch and master because of changes in each of these 3 behaviours. In this change, rather than relying on the lease acquisition, we specifically check for offline tables in our ValidateTable function. This function is called for every descriptor version we get from the ExportRequest on the Descriptor table. Currently, I believe that checking for the offline descriptors is correct since it appears that only restore and import put tables into an offline state. Release note (enterprise change): CHANGEFEEDs more reliably fail when IMPORT INTO is run against a targeted table. Fixes #64276 See also #62585, #43784 Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Closing - Stevens PR (https://github.com/cockroachdb/cockroach/pull/64323/files) made sure we fail the changefeed when the table is taken offline for an IMPORT INTO and added unit tests here. Docs PR here (cockroachdb/docs#10452 (comment)) |
In 20.2.4, a changefeed would fail if IMPORT INTO was run against one of its target tables. The failure would look like: ``` I210428 10:45:57.982012 2015 jobs/registry.go:1131 ⋮ [n1] CHANGEFEED job 653840730248282113: stepping through state failed with error: ‹relation› ‹"test"› is offline: ‹importing› (1) Wraps: (2) attached stack trace -- stack trace: | github.com/cockroachdb/cockroach/pkg/sql/catalog.FilterDescriptorState | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descriptor.go:387 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:219 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:758 | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:808 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:757 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:193 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.acquireNodeLease.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:859 | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128 | runtime.goexit | /usr/local/Cellar/go/1.16.3/libexec/src/runtime/asm_amd64.s:1371 Wraps: (3) ‹relation› ‹"test"› is offline: ‹importing› Error types: (1) *catalog.inactiveDescriptorError (2) *withstack.withStack (3) *errutil.leafError ``` We want changefeed to fail when IMPORT INTO is run because changes via the AddSSTable mechanism is not currently reflected in the changefeed, meaning we would fail to emit imported data. The previous path that raised this failure depended on: 1) The descriptor being offline at the point we attempted to acquire a lease on it: https://github.com/cockroachdb/cockroach/blob/d1962910b58005096ce411bccbaddcd0c1d30cbd/pkg/ccl/changefeedccl/schemafeed/schema_feed.go#L514 2) The lease acquisition filtering out offline descriptors with an error: https://github.com/cockroachdb/cockroach/blob/eda2309728392593162e962a61182eab6ab003ff/pkg/sql/catalog/descriptor.go#L209 3) The failure from the lease acquisition in the schemafeed being treated as a fatal error. I believe our behaviour here has changed a few times on both the 20.2 branch and master because of changes in each of these 3 behaviours. In this change, rather than relying on the lease acquisition, we specifically check for offline tables in our ValidateTable function. This function is called for every descriptor version we get from the ExportRequest on the Descriptor table. Currently, I believe that checking for the offline descriptors is correct since it appears that only restore and import put tables into an offline state. Release note (enterprise change): CHANGEFEEDs more reliably fail when IMPORT INTO is run against a targeted table. Fixes cockroachdb#64276 See also cockroachdb#62585, cockroachdb#43784
Describe the problem
I recently truncated a table, created a changefeed, imported data in with IMPORT command, didn't see changefeed working.
To Reproduce
create a changefeed on a table and perform
IMPORT INTO
.If possible, provide steps to reproduce the behavior:
IMPORT INTO
Expected behavior
imported data should be picked up by changefeed.
Additional data / screenshots
If the problem is SQL-related, include a copy of the SQL query and the schema
of the supporting tables.
If a node in your cluster encountered a fatal error, supply the contents of the
log directories (at minimum of the affected node(s), but preferably all nodes).
Note that log files can contain confidential information. Please continue
creating this issue, but contact support@cockroachlabs.com to submit the log
files in private.
If applicable, add screenshots to help explain your problem.
Environment:
Additional context
What was the impact?
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: