Skip to content

spec: big file chunking

Jörn Friedrich Dreyer edited this page Jan 26, 2018 · 6 revisions

Deprecated in favor of https://dragotin.wordpress.com/2015/06/22/owncloud-chunking-ng/ which was added with https://github.com/owncloud/core/pull/20118

The ownCloud custom file chunking algorithm that is used to upload big files via WebDAV. This is implemented in ownCloud 4.5 and higher:

  • Someone want´s to upload a big file for example big.mpg
  • The client splits the file into several chunks
  • The size of the chunks is flexible and can be optimized for best performance in the future.
  • The indivdual chunks are uploaded via WebDAV to the server.
  • The order of the uploads is not important.
  • And the upload can talk a long time and can be interruped without problems.
  • The client sends a custom http header OC-Chunked: 1 to enable chunked uploading mode on the server so that we always stay backward compatible.
  • The files are uploaded to the final location with a special name: <path/filename>-chunking-<transferid>-<chunkcount>-<index>
  • Example: big.mpg-chunking-4711-20-3
  • The transferid is a random id that together with the filename signals the server that specific chunks belong together.
  • The index starts at 0 and counts to chunkcount-1
  • The Server detects the files during upload as chunk files and keeps them in a temp folder
  • The server moves the file to the final place with the final name when all parts are uploaded
  • Last WebDAV requests ends when final file is in place. (syncron transfer)
  • temp folder is cleaned once a day to remove failed old uploads.
  • Some other header may be added which allow optimisations: OC-Total-Length is the size of the full file; OC-Chunk-Size is the size of each chunk but the last one.

In a second step this can be extended to support partial updating of files.

  • The client generated a parts list with md5 hashes for the different chunks
  • The server provides an addtional REST API to check the hashes.
  • The API is called with PUT method to http://.../remote.php/filesync/oc_chunked/path/to/file
  • the PUT datastream starts with a 2 byte chunksize
  • followed by binary md5 of the chunks
  • Everything in big-endian
  • The API returns the following information in json encoded format:
  • transferid: to use for the missing chunks
  • needed: list of chunk numbers
  • count: of provided hashes
  • The client sends the chunksize and list of hashes to the server via this API
  • Server responds with list of which chunks are needed and prepares the upload directory.
  • The client sends only the changed chunks.

Pitfalls:

  • Clients need to be aware that the server does not receive the OC-Chunked header because of a filtering proxy.

    In that case: If there are more than one chunk and the server returns an etag after the first chunk, that can mean two things:

    a) The server does not see the chunk header and created that file with the chunk name, which is the error.

    b) The server already knew all parts and the one transmitted was the last missing. Thats cool and no file with the chunk filename is created.

    If that happens, we send an DELETE request on the chunk file name. If that succeeds we have the error condition and can not do chunking. If the remove fails, we have case b) and all is fine.