Skip to content
This repository has been archived by the owner on May 3, 2023. It is now read-only.

Latest commit

 

History

History
186 lines (133 loc) · 8.65 KB

Docs.md

File metadata and controls

186 lines (133 loc) · 8.65 KB

Documentation and remarks on duplicity

Useful links

Duplicity

To restore a backup without passphrase into a directory use this command:

duplicity restore --no-encryption file://<absolute-path-of-backup> <path-to-restore>

To backup incrementally from an existing backup and a source directory:

duplicity incremental --no-encryption <source-dir> file://<absolute-path-of-backup>

To list current files in backup:

duplicity list-current-files --no-encryption file://<absolute-path-of-backup>

To list files of a given time:

duplicity list-current-files --time <time> --no-encryption file://<absolute-path-of-backup>

See "Time formats" section for the format of <time> string.

To list all the backup snapshots contained in a directory:

duplicity collection-status --no-encryption file://<absolute-path-of-backup>

If you don't have enough space on /tmp directory mount it to a more capable drive:

sudo mount -o bind /path/to/hd /tmp

Time formats

duplicity uses time strings in two places. Firstly, many of the files duplicity creates will have the time in their filenames in the w3 datetime format as described in a w3 note. Basically they look like "2001-07-15T04:09:38-07:00", which means what it looks like. The "-07:00" section means the time zone is 7 hours behind UTC. Secondly, the -t, --time, and --restore-time options take a time string, which can be given in any of several formats:

  1. the string "now" (refers to the current time);
  2. a sequences of digits, like "123456890" (indicating the time in seconds after the epoch);
  3. a string like "2002-01-25T07:00:00+02:00" in datetime format;
  4. an interval, which is a number followed by one of the characters s, m, h, D, W, M, or Y (indicating seconds, minutes, hours, days, weeks, months, or years respectively), or a series of such pairs. In this case the string refers to the time that preceded the current time by the length of the interval. For instance, "1h78m" indicates the time that was one hour and 78 minutes ago. The calendar here is unsophisticated: a month is always 30 days, a year is always 365 days, and a day is always 86400 seconds;
  5. a date format of the form YYYY/MM/DD, YYYY-MM-DD, MM/DD/YYYY, or MM-DD-YYYY, which indicates midnight on the day in question, relative to the current time zone settings. For instance, "2002/3/5", "03-05-2002", and "2002-3-05" all mean March 5th, 2002.

Duplicity file format

Filenames

File extensions are used to determine if the file is compressed or encrypted:

  • .gz is a compressed file;
  • .gpg is an encrypted file.

File names are used to define the type, the time, the volume and the relationships between snapshots. Those files must obey certain regular expressions to be considered part of a duplicity backup.

Signatures

A signature file is a tar file (compressed and/or encrypted) with a defined structure. Every file in the tar is in one of the following folders:

  • signature;
  • snapshot;
  • deleted.

Any other folder is ignored.

To determine which files belongs to a given snapshot, consider and sort the signature files up to the desired date. The algorithm works as follows:

  1. open the tar files of all the signature files, and iterate over their first file (they are alphabetically ordered);
  2. sort the resulting filenames by name and number of signature file; in this way, when the filenames are the same, the last signature file wins;
  3. yield the top element of this list;
  4. advance the iterators on the signature files; if a signature files comes to an end, remove it;
  5. collect the resulting filenames;
  6. go to point 2 if there are still filenames to consider.

Signature file format

Signatures are generated by librsync. See mksum.c.

Duplicity uses md4 type signatures, because the header starts with: 0x72730136. The file format is the following:

  • Header
    • 32b BE magic number (0x72730136 for MD4);
    • 32b BE block length (duplicity uses different block lengths);
    • 32b BE strong sum length (16 for native MD4, 8 for duplicity);
  • Block *
    • 32b BE weak sum;
    • [strong sum length]B BE? strong sum;

BE stands for Big Endian. There is only one header and then one block for each file block.

Manifest

Every backup snapshot contains a manifest file. The manifest is a text file containing informations about which paths are contained in which volumes.

Manifest file format

The manifest is a text file with this specific structure:

Hostname <host>
Localdir <path>
Volume <volume-num>:
    StartingPath <path> [<block-num>]
    EndingPath <path> [<block-num>]
    Hash <hash-type> <hash-val>

Where The first two lines are reported only once, and Volume directives are one for each volume in the snapshot, so they can be present multiple times.

  • <host> is the hostname from which the backup has been performed;
  • <path> is a path. Paths are UTF-8 strings, but this is not always the case. To be more safe, a path is a sequence of null terminated bytes, delimited by '/' characters. If the path contains spaces, they are encoded as '\x20' strings and the path delimited with double quotes ("). If the path contains non UTF-8 bytes, they are dumped as is, generating an invalid UTF-8 string;
  • The Localdir <path> directive speficies from which local directory the backup has been performed. Usually this is the root directory '/';
  • <volume-num> is the volume number for which the three following indented directives refers to. Volumes are sorted and starts from 1;
  • <block-num> is the optional block number for multi volume snapshots. If a file is splitted into multiple volumes, this number is reported. If it is present in the StartingPath directive, it refers to the first block of the file present in the volume, if it is in the EndingPath directive, it refers to the last (inclusive) block present in the volume;
  • <hash-type> refers to the hashing algorithm used (normally SHA1);
  • <hash-val> is hash value for the volume, represented as an hexadecimal lowercase string.

For example:

Hostname dellxps
Localdir /
Volume 1:
    StartingPath   .
    EndingPath     "home/michele/Documents/My\x20Important\x20Docs\big_file.tar" 15
    Hash SHA1 012886c20a9670cff933ee1724104a8f24c09253
Volume 2:
    StartingPath   "home/michele/Documents/My\x20Important\x20Docs\big_file.tar" 16
    EndingPath     home/michele/Z
    Hash SHA1 e7feba722cb4309878e8f6731f8bcd7cb346a7c4

Volumes

A volume file is a tar file (compressed and/or encrypted) with a defined structure. Every entry in the tar is in one of the following directories:

  • deleted;
  • diff;
  • multivol_diff;
  • snapshot;
  • multivol_snapshot.

Any other folder is ignored.

If an entry is inside the deleted directory, means that it has been deleted in the current snapshot. If an entry is inside snapshot, means that it is stored "as-is", without the need to compute patches from previous versions. Similarly, if the entry is inside multivol_snapshot. The only difference is that the file is splitted in multiple numbered blocks. For example, the path my/file is stored in multivol_diff/my/file/1, multivol_diff/my/file/2, etc. These blocks could span multiple volumes. A block has always a fixed size of 65536 bytes (64 KB). If an entry is inside diff, means that it is a librsync's delta w.r.t the previous version of the file. Similarly if the entry is inside multivol_diff. The only difference is that the file is splitted in multiple numbered blocks. Note that the delta file is valid only when all the blocks are available, and these could span multiple volumes.

For diff like-files, contents can be retreived by applying an ordered sequence of patches.

Given a certain delta, to retreive the patch sequence (see patchdir.normalyze_ps), given the complete ordered sequence:

  • iterate backwards in time;
  • remove blank deltas;
  • add every delta diff until a snapshot (or multivol snapshot) is found;
  • return also the full diff and break the iteration.

To have a patched file, we need to apply librsync patches (see librsync.PatchedFile).