Skip to content
Valentin Hilbig edited this page May 18, 2017 · 19 revisions

WARNING! BETA CODE

This currently still is under development. Perhaps have a look into TODO

However besides from some quirks and presentation bugs, it works as shown below.

Purpose

This tool was created after observing, that some faulty hardware introduced data errors into the image taken by ddrescue. I don't think ddrescue was the culprit, but the hardware problems which did not allow to all data beeing transferred correctly to the other side using sshfs.

Notes

  • Important: This tool, by default, skips blocks of sectors which are smaller than 64 KB. If you want to check smaller (successfully read) blocks, too, then use option -s 4K or similar. -s0 disables skipping entirely, note that everything below the blocksize of your drive probably does the same.

  • Option -m 1M is the default. This means, there is one checksum generated for each 1 MiB block of data seen. This usually is a good value. However if you have too few bandwidth, you can reduce the amount of data to re-read, however this increases the size of the check data you have to transfer to the broken system. With 1 MiB you get around 6 MiB of verify-checksums per 100 GiB of image. Note that it does not make much sense to reduce -m below the blocksize of your device.

Example:

This example assumes:

  • You are on your own workstation, possibly behind a firewall.
  • The broken machine is booted into some rescue system and accessible as root@broken.example.com from your workstation.
  • You have some user@stable.example.com account on a server nearby broken.example.com and can access this account from your workstation.
  • broken.example.com and stable.example.com can talk to each other directly (read: Use ssh to log in to each other), probably on a faster network connection than your workstation has.
  • If your workstation takes the role of stable.example.com, then probably ssh tunnels are your friend. This is beyond the scope of this document.

First, create the image

Transfer the data from broken to stable:

ssh root@broken.example.com 'cd /; umount /mnt; mountpoint /mnt ||
{ sshfs -C -o cache=no user@stable.example.com:/data/`hostname -f` /mnt && sleep 1 && cd /mnt &&
yes Q | ddrescue /dev/sda sda.img sda.log; }'

Repeat this, until the full image is taken when it breaks. This can take hours, because it transfers data over the network, the machine may lock up (due to faulty hardware) and so on.

Note that this is nothing special, this is the normal approach you would go with ddrescue. So nothing new here, ddrescue-verify was designed to not change anything in respect to ddrescue, so you can use ddrescue-verify even with old images, provided you still have the ddrescue logfile around.

Second, create the verification hashes

On the stable system create the hashes, such that it works more quickly transferring the checks:

ssh user@stable.example.com 'cd /data/broken.example.com &&
ddrescue-verify sda.img sda.log > sda.check'

This just needs the "original" ddrescue logfile and the original image taken by ddrescue. No need to remember any options, ddrescue-verify automatically detects the mode of operation, because it sees a ddrescue log and not it's own output.

Third, verify the image

scp user@stable.example.com:/data/broken.example.com/sda.check root@broken.example.com:.
ssh root@broken.example.com 'apt-get install build-essential git;
git clone https://github.com/hilbix/ddrescue-verify;
cd ddrescue-verify && git pull && git submodule update --init && make &&
./ddrescue-verify -dui /dev/sda ~/sda.check' > sda.verify

In sda.check there are only parts present with a checksum, which were listed a success in sda.log. So no defective parts of the drive are touched.

If this hangs (because the machine is broken) then you need to restart the process.

Please note that restarting is not yet implemented (sorry).

There is option `-c 0xXXXXXXX' to restart the process from where it hung, but this is not really what you want. So this option needs improvement and will be changed in future.

If you are puzzled, where to find the value 0xXXXXXXX, it is taken from tail sda.verify. But this is not very satisfying today, as this value shows the last difference, not the last working position. So you loose some effort.

Note that I will not improve this until I need to use ddrescue-verify myself again. If you like, you can update it and send me a pull request. But please drop your copyright on this changes, else I cannot merge the changes back.

Fourth, pull the changes

With the list of changes in sda.verify you can update the parts of the image which are different:

scp sda.verify user@stable.example.com:/data/broken.example.com/sda.verify
scp sda.verify root@broken.example.com:.
ssh root@broken.example.com 'cd /; umount /mnt; mountpoint /mnt ||
{ sshfs -C -o cache=no user@stable.example.com:/data/`hostname -f` /mnt && sleep 1 && cd /mnt &&
yes Q | ddrescue /dev/sda sda.img sda.verify; }'

This pulls the changes which are listed in sda.verify and updates the image. Note that afterwards sda.verify is no more interesting, as some information of sda.log may be lost. However sda.log is still around, so you will continue to use that as the proper source!

You can repeat this step until it is complete as usual. As sda.verify is based on sda.log, no part of the drive is accessed which is known to be defective.

Fifth, repeat

Now jump to the second step above:

  • Create a new verification file sda.check
  • Run the differences, in case the broken system still was lying to you.
  • Update the image

Do this until you are satisfied. This is probably, when no more changes are detected by ddrescue-verify.