Skip to content

Commit

Permalink
canonicalization: Add recommendations for canonicalization
Browse files Browse the repository at this point in the history
Spun off from [1], now that we seem to have reached a consensus for
"SHOULD canonical JSON" there.  I've set this up so we have space to
add canonicalization recommendations for other formats, although the
only other basic type discussed in this repository is a gzipped
tarball and that's more than I want to bite off at the moment ;).

[1]: #259
     Subject: manifest json fields order

Signed-off-by: W. Trevor King <wking@tremily.us>
  • Loading branch information
wking committed Sep 23, 2016
1 parent eebd585 commit 0b0c8e6
Show file tree
Hide file tree
Showing 4 changed files with 24 additions and 1 deletion.
3 changes: 2 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ DOC_FILES := \
image-layout.md \
layer.md \
config.md \
manifest.md
manifest.md \
canonicalization.md

FIGURE_FILES := \
img/media-types.png
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ The OCI Image Format project creates and maintains the software shipping contain
- [Filesystem Layers](layer.md)
- [Image Configuration](config.md)
- [Manifests and Manifest Lists](manifest.md)
- [Canonicalization](canonicalization.md)

## Overview

Expand Down
19 changes: 19 additions & 0 deletions canonicalization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Canonicalization

One benefit of content-addressable storage is easy deduplication.
Many images might depend on a particular [layer](layer.md), the there will only be one blob in the [store](image-layout.md).
However, that same semantic layer serialized slightly differently would have a different hash, and if both versions of the layer are referenced there will be two blobs with the same semantic content.
To allow efficient storage, implementations serializing content for blobs SHOULD use a canonical serialization.
This increases the chance that different implementations can push the same semantic content to the store without creating redundant blobs.

## JSON

[JSON][] content SHOULD be serialized as [canonical JSON][canonical-json].
Implementations:

* [Go][]: [github.com/docker/go][], which claims to implement [canonical JSON][canonical-json] except for Unicode normalization.

[canonical-json]: http://wiki.laptop.org/go/Canonical_JSON
[github.com/docker/go]: https://github.com/docker/go/
[Go]: https://golang.org/
[JSON]: http://json.org/
2 changes: 2 additions & 0 deletions media-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,5 @@ The following figure shows how the above media types reference each other:

[Descriptors](descriptor.md) are used for all references.
The manifest list being a "fat manifest" references one or more image manifests per target platform. An image manifest references exactly one target configuration and possibly many layers.

[JSON]: http://json.org/

0 comments on commit 0b0c8e6

Please sign in to comment.