receive of raw encrypted incrimental backup caused stacktrace and D state #7180

prometheanfire · 2018-02-16T17:27:01Z

System information

Type	Version/Name
Distribution Name	Gentoo
Distribution Version	latest
Linux Kernel	4.14.18-gentoo
Architecture	x86_64
ZFS Version	built from master on the 8th, somwhere around `f54976d`
SPL Version	similliarly built from master on the 8th

Describe the problem you're observing

Kernel backtrace and receive hanging in d state.

Describe how to reproduce the problem

zfs send -Lwecp -I LOCAL_POOL@backup-201802091942 LOCAL_POOL@backup-201802161114 | ssh 1.2.3.4 zfs recv -duvs -o canmount=off REMOTE_POOL/remote-backups/LOCAL_POOL

Include any warning/errors/backtraces from the system logs

[400296.285606] VERIFY3(0 == zap_add(mos, dsobj, spa_feature_table[f].fi_guid, sizeof (zero), 1, &zero, tx)) failed (0 == 17)
[400296.285712] PANIC at dsl_dataset.c:802:dsl_dataset_activate_feature()
[400296.285803] Showing stack for process 9102
[400296.285807] CPU: 4 PID: 9102 Comm: txg_sync Not tainted 4.14.18-gentoo #1
[400296.285809] Hardware name: Supermicro X10SLH-F/X10SLM+-F/X10SLH-F/X10SLM+-F, BIOS 1.1 07/19/2013
[400296.285810] Call Trace:
[400296.285823]  dump_stack+0x46/0x65
[400296.285830]  spl_panic+0xc8/0x110
[400296.285841]  ? zap_add_impl+0x96/0x150
[400296.285845]  ? zap_add+0x78/0xa0
[400296.285853]  dsl_dataset_activate_feature+0x104/0x160
[400296.285858]  dsl_crypto_recv_key_sync+0x548/0x950
[400296.285864]  dsl_sync_task_sync+0xac/0x110
[400296.285868]  dsl_pool_sync+0x355/0x4c0
[400296.285874]  spa_sync+0x449/0xd80
[400296.285880]  txg_sync_thread+0x2de/0x540
[400296.285884]  ? txg_delay+0x1e0/0x1e0
[400296.285887]  ? __thread_exit+0x20/0x20
[400296.285890]  thread_generic_wrapper+0x6f/0x80
[400296.285897]  kthread+0x119/0x130
[400296.285901]  ? kthread_create_on_node+0x70/0x70
[400296.285905]  ? kthread_create_on_node+0x70/0x70
[400296.285910]  ret_from_fork+0x35/0x40

The text was updated successfully, but these errors were encountered:

prometheanfire · 2018-02-16T17:27:55Z

additional info, the kernel on the sender and receiver are the exact same (built and reused).

prometheanfire · 2018-02-16T17:29:45Z

@tcaputi you may want in on this, am I using an unsupported option when sending?

tcaputi · 2018-02-16T19:50:13Z

Can you tell me what datasets are encrypted on each system? Specifically is the sent dataset encrypted and are any parents of the received dataset encrypted?

prometheanfire · 2018-02-16T20:21:16Z

All datasets on the source system are encrypted, pool created with the following:

zpool create -O encryption=on -O keyformat=passphrase zfstest /dev/zvol/slaanesh-zp00/zfstest

The destination systems has no encrypted datasets, just the raw receive from the first send to it (which it received correctly). ZFS commands are hanging, so I can't get the output of zfs list on the destination system, but the send/receive command set was as follows.

zfs send -Lwecp LOCAL_POOL@backup-201802091942 | ssh 10.0.1.14 zfs recv -duvsF -o canmount=off REMOTE_POOL/remote-backups/LOCAL_POOL

tcaputi · 2018-02-16T20:25:03Z

OK. I think this is probably related to #7117. This manifestation of it doesn't seem to be a very serious ASSERT and we could probably just ignore it, but I'd like to know why it's happening before I say so for sure.

tcaputi · 2018-02-16T20:25:54Z

Did your original pool have any clones, by the way?

EDIT: nevermind. I see this is just a single dataset being sent, so this shouldn't matter.

prometheanfire · 2018-02-16T20:29:46Z

Any way to work around it (other than sending non-raw, unencrypted)?

tcaputi · 2018-02-16T20:34:07Z

looking into it now.don't want to tell you something that causes permanent damage...

tcaputi · 2018-02-18T07:32:47Z

So basically, this ASSERT is being caused because the code is confused about whether or not your dataset is encrypted. I have not been able to replicate your exact error, but i have replicated a couple other small issues and I have a patch to fix them (which I will be making a PR for soon).

The 2 bugs I found have to do with zfs recv -F and the way it replaces existing datasets if necessary. Can I ask why you added that into your command? Was there an existing dataset before that was preventing the receive from working? If so, the patch might fix this as well.

prometheanfire · 2018-02-18T08:29:46Z

Honestly my workflow for creating a new backup was to create the dataset then receive -F over it. It's probably just paranoia or stupidity on my part but a I've never been able to create a dataset on initial receive. I'll watch for your PR though and let you know how it goes (on 4.14.20 with master as of a few hours ago).

tcaputi · 2018-03-01T16:44:33Z

@prometheanfire Any thoughts for how we might go about getting this issue closed? I'm not sure if we were ever able to replicate the issue after these patches went through.

prometheanfire · 2018-03-01T16:55:31Z

I haven't been able to retest (I'm traveling), I'm going to close it and will reopen (or make a new one if I have to) if needed.

prometheanfire closed this as completed Mar 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

receive of raw encrypted incrimental backup caused stacktrace and D state #7180

receive of raw encrypted incrimental backup caused stacktrace and D state #7180

prometheanfire commented Feb 16, 2018

prometheanfire commented Feb 16, 2018

prometheanfire commented Feb 16, 2018

tcaputi commented Feb 16, 2018

prometheanfire commented Feb 16, 2018

tcaputi commented Feb 16, 2018

tcaputi commented Feb 16, 2018 •

edited

Loading

prometheanfire commented Feb 16, 2018

tcaputi commented Feb 16, 2018

tcaputi commented Feb 18, 2018

prometheanfire commented Feb 18, 2018

tcaputi commented Mar 1, 2018

prometheanfire commented Mar 1, 2018

receive of raw encrypted incrimental backup caused stacktrace and D state #7180

receive of raw encrypted incrimental backup caused stacktrace and D state #7180

Comments

prometheanfire commented Feb 16, 2018

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

prometheanfire commented Feb 16, 2018

prometheanfire commented Feb 16, 2018

tcaputi commented Feb 16, 2018

prometheanfire commented Feb 16, 2018

tcaputi commented Feb 16, 2018

tcaputi commented Feb 16, 2018 • edited Loading

prometheanfire commented Feb 16, 2018

tcaputi commented Feb 16, 2018

tcaputi commented Feb 18, 2018

prometheanfire commented Feb 18, 2018

tcaputi commented Mar 1, 2018

prometheanfire commented Mar 1, 2018

tcaputi commented Feb 16, 2018 •

edited

Loading