Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot send an encrypted dataset (key issue) #7117

Closed
Celmor opened this issue Feb 2, 2018 · 18 comments
Closed

Cannot send an encrypted dataset (key issue) #7117

Celmor opened this issue Feb 2, 2018 · 18 comments
Milestone

Comments

@Celmor
Copy link

Celmor commented Feb 2, 2018

System information

Type Version/Name
Distribution Name Linux NixOS
Linux Kernel 4.9.77
Architecture x86_64
ZFS Version 0.7.0-1 (commit 4c46b99)*
SPL Version 0.7.0-1 (commit 23602fd)

* added patches to ZFS (expression):

SPL expression/patches

Also attempted on arch (4.14.15-1-ARCH) with 0.7.0-255_g6bc4a237 zfs version resulting in same error.

Describe the problem you're observing

Not able to send an encrypted dataset

Describe how to reproduce the problem

Sending from "Backup"-Pool to a new pool, "enc" dataset is encryption root

# uname -a
Linux nixos 4.9.77 #1-NixOS SMP Wed Jan 17 08:39:00 UTC 2018 x86_64 GNU/Linux

# zpool import -l -o readonly=on DataBackup
Enter passphrase for 'DataBackup/enc':
1 / 1 keys successfully loaded
cannot mount '/DataBackup/enc': directory is not empty

# dmesg -wH
[Feb 2 04:25]  zd0: p1
[  +1.124851] VERIFY3(sa.sa_magic == 0x2F505A) failed (2 == 3100762)
[  +0.003484] PANIC at zfs_vfsops.c:583:zfs_space_delta_cb()
[  +0.001937] Showing stack for process 2083
[  +0.000003] CPU: 1 PID: 2083 Comm: z_upgrade Tainted: P           O    4.9.77 #1-NixOS
#...
^C

# zpool import DataNew -f
cannot mount '/DataBackup': directory is not empty
cannot mount '/DataBackup/enc': directory is not empty

# zfs send -cL DataBackup/enc/Backup/From-Various | zfs recv -Fuv DataNew/enc/Backup/From-Various
warning: cannot send 'DataBackup/enc/Backup/From-Various': dataset key must be loaded
cannot receive: failed to read from stream

# zfs load-key DataBackup/enc/Backup/From-Various
Key load error: Keys must be loaded for encryption root of 'DataBackup/enc/Backup/From-Various' (DataBackup/enc/Backup).

# zfs get -Ho property,value encryptionroot,keystatus DataBackup/enc/Backup/From-Various
encryptionroot	DataBackup/enc/Backup
keystatus	unavailable

# zfs get -Ho property,value encryptionroot,keystatus DataBackup/enc/Backup
encryptionroot	DataBackup/enc
keystatus	available

# zfs set mountpoint=/mnt DataBackup/enc/Backup/From-Various && zfs mount DataBackup/enc/Backup/From-Various
internal error: out of memory

# free -m
              total        used        free      shared  buff/cache   available
Mem:           7894        1180        6150          31         563        6447
Swap:             0           0           0

# zpool status
  pool: DataBackup
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
	invalid.  Sufficient replicas exist for the pool to continue
	functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: resilvered 1.21T in 0 days 03:50:19 with 0 errors on Sat Jan 27 06:02:23 2018
config:

	NAME                                                 STATE     READ WRITE CKSUM
	DataBackup                                           DEGRADED     0     0     0
	  mirror-0                                           DEGRADED     0     0     0
	    6204277785589605098                              UNAVAIL      0     0     0  was /dev/disk/by-id/ata-HGST_HDN724040ALE640_PK1338P4HVXXXX-part1
	    usb-Intenso_External_USB_3.0_20161009011156-0:0  ONLINE       0     0     0

errors: No known data errors

Another attempt:

# zpool import -o readonly=on -N DataBackup
# zpool create -f -m none -o feature@hole_birth=disabled -o ashift=12 DataNew /dev/disk/by-id/usb-WDC_WD15_EARS-00MVWB0_0000001245F4-0\:0
# zfs send -cewLR DataBackup@latest-d2018-01-26 | zfs recv -Fsuv DataNew
3584 bytes (3.6 kB, 3.5 KiB) copied, 7 s, 0.5 kB/sreceiving full stream of DataBackup@d2017-09-19-04-01 into DataNew@d2017-09-19-04-01
received 114G stream in 1082 seconds (107M/sec)
receiving incremental stream of DataBackup@latest-d2018-01-26 into DataNew@latest-d2018-01-26
received 410K stream in 1 seconds (410K/sec)
receiving full stream of DataBackup/Download@same into DataNew/Download@same
received 2.41G stream in 25 seconds (98.9M/sec)
receiving incremental stream of DataBackup/Download@d2017-09-19-04-01 into DataNew/Download@d2017-09-19-04-01
received 4.90M stream in 1 seconds (4.90M/sec)
receiving incremental stream of DataBackup/Download@latest-d2018-01-26 into DataNew/Download@latest-d2018-01-26
warning: cannot send 'DataBackup/enc@d2017-09-19-04-01': unsupported version or feature
warning: cannot send 'DataBackup/enc@latest-d2018-01-26': unsupported version or feature

Include any warning/errors/backtraces from the system logs

(full dmesg with highlighting)

[Feb 2 04:25]  zd0: p1
[  +1.124851] VERIFY3(sa.sa_magic == 0x2F505A) failed (2 == 3100762)
[  +0.003484] PANIC at zfs_vfsops.c:583:zfs_space_delta_cb()
[  +0.001937] Showing stack for process 2083
[  +0.000003] CPU: 1 PID: 2083 Comm: z_upgrade Tainted: P           O    4.9.77 #1-NixOS
[  +0.000001] Hardware name: Microsoft Corporation Surface Pro 3/Surface Pro 3, BIOS 3.11.2150 04/26/2017
[  +0.000001]  ffffb3a9c4e039a0 ffffffffa50f7cc2 ffffffffc087e214 0000000000000247
[  +0.000003]  ffffb3a9c4e039b0 ffffffffc04725d2 ffffb3a9c4e03b38 ffffffffc04726a5
[  +0.000002]  ffffffffa4eac8e4 ffff9f7100000030 ffffb3a9c4e03b48 ffffb3a9c4e03ae8
[  +0.000003] Call Trace:
[  +0.000006]  [<ffffffffa50f7cc2>] dump_stack+0x63/0x81
[  +0.000006]  [<ffffffffc04725d2>] spl_dumpstack+0x42/0x50 [spl]
[  +0.000005]  [<ffffffffc04726a5>] spl_panic+0xc5/0x100 [spl]
[  +0.000003]  [<ffffffffa4eac8e4>] ? __wake_up+0x44/0x50
[  +0.000004]  [<ffffffffc046f359>] ? spl_kmem_cache_free+0x139/0x1c0 [spl]
[  +0.000025]  [<ffffffffc07d4f98>] ? zio_buf_free+0x58/0x60 [zfs]
[  +0.000003]  [<ffffffffa53689a2>] ? mutex_lock+0x12/0x30
[  +0.000022]  [<ffffffffc072cc91>] ? dmu_zfetch+0x161/0x510 [zfs]
[  +0.000001]  [<ffffffffa53689a2>] ? mutex_lock+0x12/0x30
[  +0.000018]  [<ffffffffc07060fa>] ? arc_buf_access+0x14a/0x290 [zfs]
[  +0.000020]  [<ffffffffc070d06e>] ? dbuf_rele_and_unlock+0x27e/0x520 [zfs]
[  +0.000021]  [<ffffffffc07bddff>] zfs_space_delta_cb+0xdf/0x1d0 [zfs]
[  +0.000019]  [<ffffffffc071eb2b>] dmu_objset_userquota_get_ids+0x16b/0x3e0 [zfs]
[  +0.000003]  [<ffffffffc046d161>] ? spl_kmem_zalloc+0x91/0x160 [spl]
[  +0.000023]  [<ffffffffc072f518>] dnode_setdirty+0x38/0x100 [zfs]
[  +0.000019]  [<ffffffffc070f4ef>] dbuf_dirty+0x46f/0x8a0 [zfs]
[  +0.000019]  [<ffffffffc070fa40>] dmu_buf_will_dirty_impl+0x120/0x130 [zfs]
[  +0.000019]  [<ffffffffc070fa66>] dmu_buf_will_dirty+0x16/0x20 [zfs]
[  +0.000020]  [<ffffffffc071c343>] dmu_objset_space_upgrade+0x183/0x1e0 [zfs]
[  +0.000021]  [<ffffffffc071ef25>] dmu_objset_userobjspace_upgrade_cb+0xa5/0xf0 [zfs]
[  +0.000021]  [<ffffffffc071b410>] dmu_objset_upgrade_task_cb+0x80/0xe0 [zfs]
[  +0.000003]  [<ffffffffc04707f5>] taskq_thread+0x2b5/0x4e0 [spl]
[  +0.000002]  [<ffffffffa4e98730>] ? wake_up_q+0x80/0x80
[  +0.000004]  [<ffffffffc0470540>] ? task_done+0xb0/0xb0 [spl]
[  +0.000001]  [<ffffffffa4e8de67>] kthread+0xd7/0xf0
[  +0.000002]  [<ffffffffa4e8dd90>] ? kthread_park+0x60/0x60
[  +0.000001]  [<ffffffffa4e8dd90>] ? kthread_park+0x60/0x60
[  +0.000002]  [<ffffffffa536ba14>] ret_from_fork+0x54/0x60
[Feb 2 04:29] INFO: task z_upgrade:1619 blocked for more than 120 seconds.
[  +0.001755]       Tainted: P           O    4.9.77 #1-NixOS
[  +0.001418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  +0.001548] z_upgrade       D    0  1619      2 0x00000000
[  +0.000002]  ffff9f71c2412800 0000000000000000 ffff9f71cfb18580 ffff9f71a2b127c0
[  +0.000002]  ffff9f71c6379a80 ffffb3a9c369bc58 ffffffffa5366842 0000000000000006
[  +0.000001]  0000000000000003 0000000000040388 ffff9f71a2000800 ffff9f71a2b127c0
[  +0.000000] Call Trace:
[  +0.000005]  [<ffffffffa5366842>] ? __schedule+0x192/0x660
[  +0.000001]  [<ffffffffa5366d46>] schedule+0x36/0x80
[  +0.000005]  [<ffffffffc04746f8>] cv_wait_common+0x128/0x140 [spl]
[  +0.000004]  [<ffffffffa4eace00>] ? wake_atomic_t_function+0x60/0x60
[  +0.000002]  [<ffffffffc0474725>] __cv_wait+0x15/0x20 [spl]
[  +0.000026]  [<ffffffffc0781365>] txg_wait_synced+0xe5/0x130 [zfs]
[  +0.000014]  [<ffffffffc071ef56>] dmu_objset_userobjspace_upgrade_cb+0xd6/0xf0 [zfs]
[  +0.000022]  [<ffffffffc071b410>] dmu_objset_upgrade_task_cb+0x80/0xe0 [zfs]
[  +0.000002]  [<ffffffffc04707f5>] taskq_thread+0x2b5/0x4e0 [spl]
[  +0.000002]  [<ffffffffa4e98730>] ? wake_up_q+0x80/0x80
[  +0.000003]  [<ffffffffc0470540>] ? task_done+0xb0/0xb0 [spl]
[  +0.000001]  [<ffffffffa4e8de67>] kthread+0xd7/0xf0
[  +0.000001]  [<ffffffffa4e8dd90>] ? kthread_park+0x60/0x60
[  +0.000002]  [<ffffffffa536ba14>] ret_from_fork+0x54/0x60
...
@tcaputi
Copy link
Contributor

tcaputi commented Feb 9, 2018

zfs send -cewLR DataBackup@latest-d2018-01-26 | zfs recv -Fsuv DataNew

This looks like you had an existing pool from before the stability patch and are now trying to send to a new pool. You cannot use a raw send to do this, since the raw send would preserve the old broken format. This is why you are getting the errors you are seeing on those datasets. To un-break those datasets, you need to use a normal send. Unfortunately, the UI for this kind of sucks right now because zfs send -R will imply -w for any encrypted datasets because the receive side can't ask for new keys from the user. I will be making a PR in the coming weeks to correct that. For now you have to use sends without the -R, -p, and -w flags.

The kernel panic you got, however is a bit interesting. Do you have steps to reproduce that?

@tcaputi tcaputi added this to the 0.8.0 milestone Feb 9, 2018
@tcaputi
Copy link
Contributor

tcaputi commented Feb 11, 2018

Ok, I've just had a minute to look at this a little closer. Let me see if I understand everything correctly:

# zpool import -l -o readonly=on DataBackup
Enter passphrase for 'DataBackup/enc':
1 / 1 keys successfully loaded
cannot mount '/DataBackup/enc': directory is not empty

You performed a zpool import -l in readonly mode. The command seems to have succeeded, but it failed to mount the dataset, since you had some data at that mount point. I'm guessing you may have accidentally written some files there while the pool was exported or while that dataset was mounted, populating the empty directory.

# dmesg -wH
[Feb 2 04:25]  zd0: p1
[  +1.124851] VERIFY3(sa.sa_magic == 0x2F505A) failed (2 == 3100762)
[  +0.003484] PANIC at zfs_vfsops.c:583:zfs_space_delta_cb()
[  +0.001937] Showing stack for process 2083
[  +0.000003] CPU: 1 PID: 2083 Comm: z_upgrade Tainted: P           O    4.9.77 #1-NixOS

(This is mostly for other zfs developers) At this point, zfs hit an ASSERT and panicked. I think I see what happened here, and it seems to be a few bugs combined. I'm making another guess here that your datasets under DataBackup/enc were backed up via zfs send. Encrypted datasets do not perform userused accounting until the first time they are mounted. The first bug (and probably the most serious) is that even though your pool is in readonly mode, zfs noticed that the dataset didn't have any userused accounting data and went to calculate and write out this data. This seems like too serious of a bug to actually exist, and I expect that I may be misunderstanding something about the readonly mode code (@behlendorf can you confirm this?). In any case, zfs kicked off a task to begin doing this calculation, but the dataset was immediately unmounted due to the non-empty directory. This unloaded the key mapping which broke the user used accounting task. That's what I think happened anyway, although the error message itself doesn't really make sense. This definitely needs to be fixed.
...

# zfs send -cL DataBackup/enc/Backup/From-Various | zfs recv -Fuv DataNew/enc/Backup/From-Various
warning: cannot send 'DataBackup/enc/Backup/From-Various': dataset key must be loaded
cannot receive: failed to read from stream

You attempted to send your data from one pool to another using a non-raw send. This requires the key to be loaded (so the error is correct).

# zfs load-key DataBackup/enc/Backup/From-Various
Key load error: Keys must be loaded for encryption root of 'DataBackup/enc/Backup/From-Various' (DataBackup/enc/Backup).

You saw this error and attempted to load the encryption key for the dataset itself. Unfortunately, this dataset is inheriting it's key from DataBackup/enc/Backup, so zfs asked you to load the key for that dataset instead.

# zfs get -Ho property,value encryptionroot,keystatus DataBackup/enc/Backup/From-Various
encryptionroot	DataBackup/enc/Backup
keystatus	unavailable
...
# zfs get -Ho property,value encryptionroot,keystatus DataBackup/enc/Backup
encryptionroot	DataBackup/enc
keystatus	available

Your encryption root is DataBackup/enc/Backup so the key needs to be loaded for this dataset (not DataBackup/enc.

# zfs set mountpoint=/mnt DataBackup/enc/Backup/From-Various && zfs mount DataBackup/enc/Backup/From-Various
internal error: out of memory

# free -m
              total        used        free      shared  buff/cache   available
Mem:           7894        1180        6150          31         563        6447
Swap:             0           0           0

Honestly not sure what happened here, but it is almost certainly unrelated and best dealt with in a separate issue (such as #6650 which you opened, so thank you for that). I suspect it doesn't really have anything to do with memory, but that another error is simply printing the wrong message.

I should be able to look into fixing the primary issue on Monday. I'll post an update when I have it.

@behlendorf
Copy link
Contributor

Encrypted datasets do not perform userused accounting until the first time they are mounted

@tcaputi that definitely looks possible. I don't see a read-only check in dmu_objset_userobjspace_upgradable() which would prevent this. Extending upgrade_userobj_001_pos.ksh to cover more cases like this would be good.

@tcaputi
Copy link
Contributor

tcaputi commented Feb 13, 2018

@Celmor Would you mind trying to reproduce the VERIFY3(sa.sa_magic == 0x2F505A) failed (2 == 3100762) error with #7163? That may have fixed that issue for you (although I'm not sure).

@tcaputi
Copy link
Contributor

tcaputi commented Feb 13, 2018

It seems that this issue has been popping up randomly for quite some time, so its unlikely that my patch will fix anything. Nevermind.

@Celmor
Copy link
Author

Celmor commented Feb 14, 2018

Looking through my logs I noticed that I used change-key to change the passphrase for DataBackup/enc and rebooted after which I couldn't mount child datasets of enc anymore I created before that point, I can still mount datasets I created after though. I'm not sure how it works exactly in code regarding the encryption the key but it seems to me like it didn't actually change for those previously created datasets, although shouldn't they simply inherit it from their parent (enc). Tried changing the encryption key for enc dataset back to what I had before which didn't help.

@tcaputi
Copy link
Contributor

tcaputi commented Feb 14, 2018

Have you tried loading the key for the child datasets? You should need to load the key for DataBackup/enc/Backup, according to what you posted earlier.

@Celmor
Copy link
Author

Celmor commented Feb 14, 2018

@tcaputi
As I wanted to make clear in OP, I tried loading the key for child dataset (although there's only enc with a key and all others inherit) where it obviously tells me to load the key for the encryption root with "keystatus" being unavailable e.g.:

# zfs load-key DataBackup/enc/Backup/From-Various
Key load error: Keys must be loaded for encryption root of 'DataBackup/enc/Backup/From-Various' (DataBackup/enc/Backup).
# zfs get -Ho property,value encryptionroot,keystatus DataBackup/enc/Backup/From-Various
encryptionroot	DataBackup/enc/Backup
keystatus	unavailable

I've gone through DataBackup/enc/Backup as well with it telling me to load the key for the encryption root again and keystatus being unavailable. But looking at its parents keystatus it said it was available.

# zfs get -Ho property,value encryptionroot,keystatus DataBackup/enc/Backup
encryptionroot	DataBackup/enc
keystatus	available

Basically can go through all child datasets with each telling me to load the key for its parent and keystatus being unavailable and mounting/sending not working till the original encryption root being enc for which the keystatus is available, except those datasets I created after I changed the key for enc.

@tcaputi
Copy link
Contributor

tcaputi commented Feb 14, 2018

OK. Now I understand. So the problem is that your encryption hierarchy is messed up and I'm not sure how it got that way. DataBackup/enc/Backup should be its own encryption root. Could you tell me a little about how you received these datasets and the change-key operation you performed?

@Celmor
Copy link
Author

Celmor commented Feb 14, 2018

  • DataBackup received encrypted datasets from a temporary zpool (with -cewLR send options set) because I had to rebuild DataBackup (accidentally added instead of attached).
  • Attached disk to make pool redundant, resilvered
  • Deleted sending pool
  • Moved a bunch of datasets on DataBackup into DataBackup/enc to encrypt them
  • reboot, import and load-key
  • change-key for enc dataset
  • moving an existing unencrypted dataset on that pool into enc dataset to encrypt it
  • creating a new dataset as child of enc dataset
  • bunch of recovery attempt of original pool (unrelated with this issue, had metadata corrupted)...
  • booting Arch VM with updates applied and ZFS re-compiled from master (like a month old now), trying to send enc child datasets with resulting 'dataset key must be loaded' errors
  • booting nixos image as descripted in OP
  • trying to send enc child datasets with resulting 'dataset key must be loaded' errors
  • trying to mount enc child datasets with resulting 'encryption key not loaded' errors
  • trying to change-key back to original passphrase
  • another attempted mounts that failed

Could go further back where DataTemp received zfs sends from DataBackup or where I created DataBackup 4 months ago with all datasets unencrypted when I moved each of them into enc encrypted dataset and then all of those into my main pool (which is now corrupted)...

Extensive Log
function used (zfs-send, with possibly slight adjustments to adapt for sending/receiving pool/dataset evident from output)
Edit: Currently undoing the last operations (change-key) by overwriting pool from backup, don't have any older full backup from which I could recover the messed up encryption-keys.

@tcaputi
Copy link
Contributor

tcaputi commented Feb 21, 2018

I think this might be fixed with #7200. We were unable to reproduce the issue with the commands you gave us, but that patch fixes some of the error handling in the receive code path. Specifically, there was an issue where a failed receive would still update the encryption keys, even though the rest of the state was rolled back. When that patch is merged, can you give it a shot and let us know if the problem happens again.

@prometheanfire
Copy link
Contributor

Sure, I'll get this on both sender and receiver. Do I need to recreate the dataset that already exists on the receive side (since I'm doing incrimentals)?

@tcaputi
Copy link
Contributor

tcaputi commented Feb 21, 2018

I would just to be safe. I'm still not sure the exact steps to get into your state

@tcaputi
Copy link
Contributor

tcaputi commented Feb 28, 2018

@prometheanfire any update here?

@prometheanfire
Copy link
Contributor

I ran into the issue that you fixed Monday. I can't test the fix for a week though due to travel (stuck in dublin for now). But I think we are fixed here (from what I've seen here and elsewhere).

@Celmor
Copy link
Author

Celmor commented Feb 28, 2018

Currently I can't load the key anymore as zfs reports that the given passphrase (I copied from my password manager) is wrong. Not sure what's that about.

@tcaputi
Copy link
Contributor

tcaputi commented Mar 1, 2018

@prometheanfire If you are ok with it then, would you mind closing this ticket? Unless you have other concerns.

@Celmor This sounds similar to a bug that should be fixed in master. Would you mind repeating your sends with that code and, if it's still a problem, opening a new ticket? I'm going on vacation for a week soon, so this might have to wait until I get back.

@prometheanfire
Copy link
Contributor

ok, I'm good with closing this, sent raw and also incremental raws (I don't have perms to do so, and I was not the initial reporter)

for SNAPSHOT in $(zfs list -t snap | cut -d@ -f 1 | uniq | tail -n+2); do
  SIZE=$(zfs send -LwecpnvP -I ${SNAPSHOT}@${PREVIOUS_BACKUP_DATE} ${SNAPSHOT}@backup-${DATE} | tail -n 1 | awk '{ print $2 }')
  if [[ "$(zfs get -H -o value type ${SNAPSHOT})" == 'volume' ]]; then
    echo "zfs send -Lwecp -I ${SNAPSHOT}@${PREVIOUS_BACKUP_DATE} ${SNAPSHOT}@backup-${DATE} | pv -s "${SIZE}" | ssh 10.0.1.14 zfs recv -uvs storage-pool/remote-backups/${SNAPSHOT}"
  else
    echo "zfs send -Lwecp -I ${SNAPSHOT}@${PREVIOUS_BACKUP_DATE} ${SNAPSHOT}@backup-${DATE} | pv -s "${SIZE}" | ssh 10.0.1.14 zfs recv -uvs -o canmount=off storage-pool/remote-backups/${SNAPSHOT}"
  fi
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants