changefeed doesn't pick up IMPORT INTO #43784

dbist · 2020-01-07T17:46:49Z

Describe the problem
I recently truncated a table, created a changefeed, imported data in with IMPORT command, didn't see changefeed working.

To Reproduce
create a changefeed on a table and perform IMPORT INTO.

If possible, provide steps to reproduce the behavior:

Create a changefeed
perform IMPORT INTO
json will not be produced

Expected behavior
imported data should be picked up by changefeed.

Additional data / screenshots
If the problem is SQL-related, include a copy of the SQL query and the schema
of the supporting tables.

If a node in your cluster encountered a fatal error, supply the contents of the
log directories (at minimum of the affected node(s), but preferably all nodes).

Note that log files can contain confidential information. Please continue
creating this issue, but contact support@cockroachlabs.com to submit the log
files in private.

If applicable, add screenshots to help explain your problem.

Environment:

CockroachDB version 19.2.2
Server OS: OSX
Client app: cockroach sql

Additional context
What was the impact?

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

ajwerner · 2020-01-08T19:03:34Z

This is real. @danhhz do you have thoughts about the right behavior here?

danhhz · 2020-01-08T19:09:49Z

I suppose our two options are 1) emit the sstables created by the IMPORT INTO or 2) disconnect the changefeed with an error. Either way, I think RangeFeed will have to do something with AddSSTables

ajwerner · 2020-01-28T16:34:21Z

Either way, I think RangeFeed will have to do something with AddSSTables

Maybe eventually this is true but I don't think that'd be my first approach to fixing this issue. As of 20.1 we take tables offline when performing an IMPORT INTO. I think it'd be reasonable to instead perform a time-bound backfill when detecting a table coming back online. I think that'd be a smaller and less invasive change than emitting logical ops for AddSSTable requests. I worry that getting that to work with the buffering of rangefeeds might be quite hard with large requests.

@danhhz thoughts on this approach?

cc @mwang1026

danhhz · 2020-01-28T16:58:56Z

That's a nice idea and I have the same concerns about rangefeed buffering. It definitely works for now, do we have any plans to make IMPORT INTO work without taking the table offline? That would require something different

ajwerner · 2020-01-29T03:46:24Z

do we have any plans to make IMPORT INTO work without taking the table offline?

Only vague ideas as far as I know. If we have a replicated lock table that supports spans then we could use that to make AddSST work transactionally. @dt brought this up recently. Replicated span locks don't seem like a 20.1 thing at this point. Maybe 20.2. That's probably the cleanest solution though will complicate this story a bit. I wonder if even in that case the right thing to do is detect the commit of the removal of the span lock as the event for the AddSSTable and maybe also attach metadata to the logical op entry with that "committed" span lock and just go do a time bound iteration of that span. For a given AddSSTable we should be writing into a span all at the same timestamp. If we really wanted to make IMPORT INTO transactional then we'd need to grab an upgradable read lock on the entire range being written before we picked a timestamp. I'm not sure if that will fly. I guess it's better than what happens today.

I'm sure we could cobble together something worse using partial offline states in table descriptors or something else nasty that might actually make implementing this a little easier.

<ramble>
I've been thinking generally about watchable locks and some sort of signaling read-write lock that is a common pattern when orchestrating distributed services. This is effectively also the primitive I want for changefeed leases. I'm not sure gossip is the best mechanism for notifying of changes but it's what I'm thinking about using.

The patterns I'd like here is for the lock holders to watch a changefeed for some sort of invalidating event and then the writer of the event will try to grab an exclusive lock. When it successfully acquires the exclusive lock it knows that all of the other leaseholders have dropped there. It could be done without shared locks if you can grab an exclusive lock over a span. Long lived transactions are mostly expensive because of transaction heartbeating which we can probably eliminate without too much work.

I wonder how much of this would fall out cleanly if we implemented pg_advisory_lock and scoped changefeeds. I'm not sure what exactly is the right abstraction for distributed locking at the kv layer but it seems like there's a lot of other state that would benefit from shared leasing and distributed coordination primitives. There are existing subsystems it would simplify (jobs?) and features it should enable (enums? types?).

There's a bootstrapping problem to be worked around to use the changefeeds in the implementation but I'm going to assume we could work around it.
</ramble>

dt · 2020-01-29T05:15:19Z

my short term vote would probably be that active changefeeds should terminate with an error as soon as a table moves into a non-public state (offline, drop, etc).

That would mean that when you start IMPORT INTO, it kills your feeds, and you can't resume them (or start new ones) while the table is still offline. When the table comes back online, you could resume with cursor from the time it stopped and you'd see all the imported keys (we timestamp them > offline time) just via a normal backfill. I don't think we need to do anything with RangeFeed and AddSSTable in this scenario (unless there's a race with noticing the table is offline and tearing it down and the first sst arriving?). In theory, we could also pause the job and resume it automatically after the IMPORTT so this could all go unnoticed, but at the low level, I don't think we want changefeeds active when there's a chance of AddSSTable requests messing with our assumptions.

I don't think it would actually be correct to emit contents of SSTs as they came in, since they're sorta "uncommitted" until the IMPORT finishes, but more importantly, none of this is expected to play nice with txns, respecting closed-ts, etc -- by taking a table offline, we're assuming we can throw all that out the window, and I imagine we'll violate some assumptions if we let a changefeed keep running on keyspace where we've locked out other SQL and said all bets are off.

ajwerner · 2020-01-29T06:16:15Z

my short term vote would probably be that active changefeeds should terminate with an error as soon as a table moves into a non-public state (offline, drop, etc).

Well aren’t you in luck! #44265 (comment)

ajwerner · 2020-02-05T00:16:00Z

I just realized that #44663 makes this harder. I have a feeling that time-bound scans are not going to be a popular idea but I could be wrong.

ajwerner · 2021-02-25T22:47:32Z

CDC Team, y'all should probably get a unit test to exercise the IMPORT INTO scenarios to confirm that indeed the changefeed does fail and ideally that it fails with an error with an appropriate timestamp and has resolved everything up to that timestamp and then close this issue.

amruss · 2021-02-25T23:03:47Z

Acked @ajwerner

shermanCRL · 2021-02-25T23:25:03Z

I moved this back to triage, to ensure we get someone assigned.

In 20.2.4, a changefeed would fail if IMPORT INTO was run against one of its target tables. The failure would look like: ``` I210428 10:45:57.982012 2015 jobs/registry.go:1131 ⋮ [n1] CHANGEFEED job 653840730248282113: stepping through state failed with error: ‹relation› ‹"test"› is offline: ‹importing› (1) Wraps: (2) attached stack trace -- stack trace: | github.com/cockroachdb/cockroach/pkg/sql/catalog.FilterDescriptorState | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descriptor.go:387 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:219 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:758 | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:808 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:757 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:193 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.acquireNodeLease.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:859 | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128 | runtime.goexit | /usr/local/Cellar/go/1.16.3/libexec/src/runtime/asm_amd64.s:1371 Wraps: (3) ‹relation› ‹"test"› is offline: ‹importing› Error types: (1) *catalog.inactiveDescriptorError (2) *withstack.withStack (3) *errutil.leafError ``` We want changefeed to fail when IMPORT INTO is run because changes via the AddSSTable mechanism is not currently reflected in the changefeed, meaning we would fail to emit imported data. The previous path that raised this failure depended on: 1) The descriptor being offline at the point we attempted to acquire a lease on it: https://github.com/cockroachdb/cockroach/blob/d1962910b58005096ce411bccbaddcd0c1d30cbd/pkg/ccl/changefeedccl/schemafeed/schema_feed.go#L514 2) The lease acquisition filtering out offline descriptors with an error: https://github.com/cockroachdb/cockroach/blob/eda2309728392593162e962a61182eab6ab003ff/pkg/sql/catalog/descriptor.go#L209 3) The failure from the lease acquisition in the schemafeed being treated as a fatal error. I believe our behaviour here has changed a few times on both the 20.2 branch and master because of changes in each of these 3 behaviours. In this change, rather than relying on the lease acquisition, we specifically check for offline tables in our ValidateTable function. This function is called for every descriptor version we get from the ExportRequest on the Descriptor table. Currently, I believe that checking for the offline descriptors is correct since it appears that only restore and import put tables into an offline state. Release note (enterprise change): CHANGEFEEDs more reliably fail when IMPORT INTO is run against a targeted table. Fixes cockroachdb#64276 See also cockroachdb#62585, cockroachdb#43784

64323: changefeedccl: fail changefeeds when tables go offline r=miretskiy a=stevendanna In 20.2.4, a changefeed would fail if IMPORT INTO was run against one of its target tables. The failure would look like: ``` I210428 10:45:57.982012 2015 jobs/registry.go:1131 ⋮ [n1] CHANGEFEED job 653840730248282113: stepping through state failed with error: ‹relation› ‹"test"› is offline: ‹importing› (1) Wraps: (2) attached stack trace -- stack trace: | github.com/cockroachdb/cockroach/pkg/sql/catalog.FilterDescriptorState | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descriptor.go:387 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:219 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:758 | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:808 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:757 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:193 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.acquireNodeLease.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:859 | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128 | runtime.goexit | /usr/local/Cellar/go/1.16.3/libexec/src/runtime/asm_amd64.s:1371 Wraps: (3) ‹relation› ‹"test"› is offline: ‹importing› Error types: (1) *catalog.inactiveDescriptorError (2) *withstack.withStack (3) *errutil.leafError ``` We want changefeed to fail when IMPORT INTO is run because changes via the AddSSTable mechanism is not currently reflected in the changefeed, meaning we would fail to emit imported data. The previous path that raised this failure depended on: 1) The descriptor being offline at the point we attempted to acquire a lease on it: https://github.com/cockroachdb/cockroach/blob/d1962910b58005096ce411bccbaddcd0c1d30cbd/pkg/ccl/changefeedccl/schemafeed/schema_feed.go#L514 2) The lease acquisition filtering out offline descriptors with an error: https://github.com/cockroachdb/cockroach/blob/eda2309728392593162e962a61182eab6ab003ff/pkg/sql/catalog/descriptor.go#L209 3) The failure from the lease acquisition in the schemafeed being treated as a fatal error. I believe our behaviour here has changed a few times on both the 20.2 branch and master because of changes in each of these 3 behaviours. In this change, rather than relying on the lease acquisition, we specifically check for offline tables in our ValidateTable function. This function is called for every descriptor version we get from the ExportRequest on the Descriptor table. Currently, I believe that checking for the offline descriptors is correct since it appears that only restore and import put tables into an offline state. Release note (enterprise change): CHANGEFEEDs more reliably fail when IMPORT INTO is run against a targeted table. Fixes #64276 See also #62585, #43784 Co-authored-by: Steven Danna <danna@cockroachlabs.com>

amruss · 2021-04-28T22:15:44Z

Closing - Stevens PR (https://github.com/cockroachdb/cockroach/pull/64323/files) made sure we fail the changefeed when the table is taken offline for an IMPORT INTO and added unit tests here. Docs PR here (cockroachdb/docs#10452 (comment))

In 20.2.4, a changefeed would fail if IMPORT INTO was run against one of its target tables. The failure would look like: ``` I210428 10:45:57.982012 2015 jobs/registry.go:1131 ⋮ [n1] CHANGEFEED job 653840730248282113: stepping through state failed with error: ‹relation› ‹"test"› is offline: ‹importing› (1) Wraps: (2) attached stack trace -- stack trace: | github.com/cockroachdb/cockroach/pkg/sql/catalog.FilterDescriptorState | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descriptor.go:387 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:219 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:758 | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:808 | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:757 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:193 | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.acquireNodeLease.func1 | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:859 | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall | /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128 | runtime.goexit | /usr/local/Cellar/go/1.16.3/libexec/src/runtime/asm_amd64.s:1371 Wraps: (3) ‹relation› ‹"test"› is offline: ‹importing› Error types: (1) *catalog.inactiveDescriptorError (2) *withstack.withStack (3) *errutil.leafError ``` We want changefeed to fail when IMPORT INTO is run because changes via the AddSSTable mechanism is not currently reflected in the changefeed, meaning we would fail to emit imported data. The previous path that raised this failure depended on: 1) The descriptor being offline at the point we attempted to acquire a lease on it: https://github.com/cockroachdb/cockroach/blob/d1962910b58005096ce411bccbaddcd0c1d30cbd/pkg/ccl/changefeedccl/schemafeed/schema_feed.go#L514 2) The lease acquisition filtering out offline descriptors with an error: https://github.com/cockroachdb/cockroach/blob/eda2309728392593162e962a61182eab6ab003ff/pkg/sql/catalog/descriptor.go#L209 3) The failure from the lease acquisition in the schemafeed being treated as a fatal error. I believe our behaviour here has changed a few times on both the 20.2 branch and master because of changes in each of these 3 behaviours. In this change, rather than relying on the lease acquisition, we specifically check for offline tables in our ValidateTable function. This function is called for every descriptor version we get from the ExportRequest on the Descriptor table. Currently, I believe that checking for the offline descriptors is correct since it appears that only restore and import put tables into an offline state. Release note (enterprise change): CHANGEFEEDs more reliably fail when IMPORT INTO is run against a targeted table. Fixes cockroachdb#64276 See also cockroachdb#62585, cockroachdb#43784

piyush-singh added the A-cdc Change Data Capture label Jan 7, 2020

mwang1026 mentioned this issue Jan 28, 2020

changefeedccl: provide an option to finish changefeed upon schema change #44265

Closed

elinorgarcia added the T-cdc label Dec 7, 2020

dt mentioned this issue Feb 25, 2021

kvserver: include keys changed/added by AddSSTable in rangefeeds #61152

Closed

stevendanna mentioned this issue Apr 27, 2021

changefeeds: changefeed should fail or otherwise indicate a problem if IMPORT INTO was executed #64276

Closed

stevendanna mentioned this issue Apr 28, 2021

changefeedccl: fail changefeeds when tables go offline #64323

Merged

amruss closed this as completed Apr 28, 2021

stevendanna mentioned this issue Apr 29, 2021

release-21.1: changefeedccl: fail changefeeds when tables go offline #64372

Merged

mikeCRL mentioned this issue May 10, 2021

add v21.1.0-rc.2 release note cockroachdb/docs#10507

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changefeed doesn't pick up IMPORT INTO #43784

changefeed doesn't pick up IMPORT INTO #43784

dbist commented Jan 7, 2020

ajwerner commented Jan 8, 2020

danhhz commented Jan 8, 2020

ajwerner commented Jan 28, 2020

danhhz commented Jan 28, 2020

ajwerner commented Jan 29, 2020

dt commented Jan 29, 2020 •

edited

Loading

ajwerner commented Jan 29, 2020

ajwerner commented Feb 5, 2020

ajwerner commented Feb 25, 2021

amruss commented Feb 25, 2021

shermanCRL commented Feb 25, 2021

amruss commented Apr 28, 2021

changefeed doesn't pick up IMPORT INTO #43784

changefeed doesn't pick up IMPORT INTO #43784

Comments

dbist commented Jan 7, 2020

ajwerner commented Jan 8, 2020

danhhz commented Jan 8, 2020

ajwerner commented Jan 28, 2020

danhhz commented Jan 28, 2020

ajwerner commented Jan 29, 2020

dt commented Jan 29, 2020 • edited Loading

ajwerner commented Jan 29, 2020

ajwerner commented Feb 5, 2020

ajwerner commented Feb 25, 2021

amruss commented Feb 25, 2021

shermanCRL commented Feb 25, 2021

amruss commented Apr 28, 2021

dt commented Jan 29, 2020 •

edited

Loading