Author(s):
- NA
Abstract
UnixFS is a protocol-buffers based format for describing files, directories, and symlinks in IPFS. The current implementation of UnixFS has grown organically and does not have a clear specification document. See “implementations” below for reference implementations you can examine to understand the format.
Draft work and discussion on a specification for the upcoming version 2 of the UnixFS format is happening in the ipfs/unixfs-v2
repo. Please see the issues there for discussion and PRs for drafts. When the specification is completed there, it will be copied back to this repo and replace this document.
- JavaScript
- Data Formats - unixfs
- Importer - unixfs-importer
- Exporter - unixfs-exporter
- Go
ipfs/go-ipfs/unixfs
- Protocol Buffer Definitions -
ipfs/go-ipfs/unixfs/pb
The UnixfsV1 data format is represented by this protobuf:
message Data {
enum DataType {
Raw = 0;
Directory = 1;
File = 2;
Metadata = 3;
Symlink = 4;
HAMTShard = 5;
}
required DataType Type = 1;
optional bytes Data = 2;
optional uint64 filesize = 3;
repeated uint64 blocksizes = 4;
optional uint64 hashType = 5;
optional uint64 fanout = 6;
optional uint32 mode = 7;
optional UnixTime mtime = 8;
}
message Metadata {
optional string MimeType = 1;
}
message UnixTime {
required int64 Seconds = 1;
optional fixed32 FractionalNanoseconds = 2;
}
This Data
object is used for all non-leaf nodes in Unixfs.
For files that are comprised of more than a single block, the 'Type' field will be set to 'File', the 'filesize' field will be set to the total number of bytes in the file (not the graph structure) represented by this node, and 'blocksizes' will contain a list of the filesizes of each child node.
This data is serialized and placed inside the 'Data' field of the outer merkledag protobuf, which also contains the actual links to the child nodes of this object.
For files comprised of a single block, the 'Type' field will be set to 'File', 'filesize' will be set to the total number of bytes in the file and the file data will be stored in the 'Data' field.
UnixFS currently supports two optional metadata fields:
-
mode
-- Themode
is for persisting the file permissions in numeric notation [spec].- If unspecified this defaults to
0755
for directories/HAMT shards0644
for all other types where applicable
- The nine least significant bits represent
ugo-rwx
- The next three least significant bits represent
setuid
,setgid
and thesticky bit
- The remaining 20 bits are reserved for future use, and are subject to change. Spec implementations MUST handle bits they do not expect as follows:
- For future-proofing the (de)serialization layer must preserve the entire uint32 value during clone/copy operations, modifying only bit values that have a well defined meaning:
clonedValue = ( modifiedBits & 07777 ) | ( originalValue & 0xFFFFF000 )
- Implementations of this spec must proactively mask off bits without a defined meaning in the implemented version of the spec:
interpretedValue = originalValue & 07777
- For future-proofing the (de)serialization layer must preserve the entire uint32 value during clone/copy operations, modifying only bit values that have a well defined meaning:
- If unspecified this defaults to
-
mtime
-- A two-element structure (Seconds
,FractionalNanoseconds
) representing the modification time in seconds relative to the unix epoch1970-01-01T00:00:00Z
.-
The two fields are:
Seconds
( always present, signed 64bit integer ): represents the amount of seconds after or before the epoch.FractionalNanoseconds
( optional, 32bit unsigned integer ): when specified represents the fractional part of the mtime as the amount of nanoseconds. The valid range for this value are the integers[1, 999999999]
.
-
Implementations encoding or decoding wire-representations must observe the following:
- An
mtime
structure withFractionalNanoseconds
outside of the on-wire range[1, 999999999]
is not valid. This includes a fractional value of0
. Implementations encountering such values should consider the entire enclosing metadata block malformed and abort processing the corresponding DAG. - The
mtime
structure is optional - its absence impliesunspecified
, rather than0
- For ergonomic reasons a surface API of an encoder must allow fractional 0 as input, while at the same time must ensure it is stripped from the final structure before encoding, satisfying the above constraints.
- An
-
Implementations interpreting the mtime metadata in order to apply it within a non-IPFS target must observe the following:
- If the target supports a distinction between
unspecified
and0
/1970-01-01T00:00:00Z
, the distinction must be preserved within the target. E.g. if nomtime
structure is available, a web gateway must not render aLast-Modified:
header. - If the target requires an mtime ( e.g. a FUSE interface ) and no
mtime
is supplied OR the suppliedmtime
falls outside of the targets accepted range:- When no
mtime
is specified or the resultingUnixTime
is negative: implementations must assume0
/1970-01-01T00:00:00Z
( note that such values are not merely academic: e.g. the OpenVMS epoch is1858-11-17T00:00:00Z
) - When the resulting
UnixTime
is larger than the targets range ( e.g. 32bit vs 64bit mismatch ) implementations must assume the highest possible value in the targets range ( in most cases that would be2038-01-19T03:14:07Z
)
- When no
- If the target supports a distinction between
-
Where the file data is small it would normally be stored in the Data
field of the UnixFS File
node.
To aid in deduplication of data even for small files, file data can be stored in a separate node linked to from the File
node in order for the data to have a constant CID regardless of the metadata associated with it.
As a further optimization, if the File
node's serialized size is small, it may be inlined into its v1 CID by using the identity
multihash.
Importing a file into unixfs is split up into two parts. The first is chunking, the second is layout.
Chunking has two main parameters, chunking strategy and leaf format.
Leaf format should always be set to 'raw', this is mainly configurable for backwards compatibility with earlier formats that used a Unixfs Data object with type 'Raw'. Raw leaves means that the nodes output from chunking will be just raw data from the file with a CID type of 'raw'.
Chunking strategy currently has two different options, 'fixed size' and 'rabin'. Fixed size chunking will chunk the input data into pieces of a given size. Rabin chunking will chunk the input data using rabin fingerprinting to determine the boundaries between chunks.
Layout defines the shape of the tree that gets built from the chunks of the input file.
There are currently two options for layout, balanced, and trickle. Additionally, a 'max width' must be specified. The default max width is 174.
The balanced layout creates a balanced tree of width 'max width'. The tree is formed by taking up to 'max width' chunks from the chunk stream, and creating a unixfs file node that links to all of them. This is repeated until 'max width' unixfs file nodes are created, at which point a unixfs file node is created to hold all of those nodes, recursively. The root node of the resultant tree is returned as the handle to the newly imported file.
If there is only a single chunk, no intermediate unixfs file nodes are created, and the single chunk is returned as the handle to the file.
To read the file data out of the unixfs graph, perform an in order traversal, emitting the data contained in each of the leaves.
Metadata support in UnixFSv1.5 has been expanded to increase the number of possible use cases. These include rsync and filesystem based package managers.
Several metadata systems were evaluated:
In this scheme, the existing Metadata
message is expanded to include additional metadata types (mtime
, mode
, etc). It then contains links to the actual file data but never the file data itself.
This was ultimately rejected for a number of reasons:
-
You would always need to retrieve an additional node to access file data which limits the kind of optimizations that are possible.
For example many files are under the 256KiB block size limit, so we tend to inline them into the describing UnixFS
File
node. This would not be possible with an intermediateMetadata
node. -
The
File
node already contains some metadata (e.g. the file size) so metadata would be stored in multiple places which complicates forwards compatibility with UnixFSv2 as to map between metadata formats potentially requires multiple fetch operations
Repeated Metadata
messages are added to UnixFS Directory
and HAMTShard
nodes, the index of which indicates which entry they are to be applied to.
Where entries are HAMTShard
s, an empty message is added.
One advantage of this method is that if we expand stored metadata to include entry types and sizes we can perform directory listings without needing to fetch further entry nodes (excepting HAMTShard
nodes), though without removing the storage of these datums elsewhere in the spec we run the risk of having non-canonical data locations and perhaps conflicting data as we traverse through trees containing both UnixFS v1 and v1.5 nodes.
This was rejected for the following reasons:
-
When creating a UnixFS node there's no way to record metadata without wrapping it in a directory.
-
If you access any UnixFS node directly by its CID, there is no way of recreating the metadata which limits flexibility.
-
In order to list the contents of a directory including entry types and sizes, you have to fetch the root node of each entry anyway so the performance benefit of including some metadata in the containing directory is negligible in this use case.
This adds new fields to the UnixFS Data
message to represent the various metadata fields.
It has the advantage of being simple to implement, metadata is maintained whether the file is accessed directly via its CID or via an IPFS path that includes a containing directory, and by keeping the metadata small enough we can inline root UnixFS nodes into their CIDs so we can end up fetching the same number of nodes if we decide to keep file data in a leaf node for deduplication reasons.
Downsides to this approach are:
-
Two users adding the same file to IPFS at different times will have different CIDs due to the
mtime
s being different.If the content is stored in another node, its CID will be constant between the two users but you can't navigate to it unless you have the parent node which will be less available due to the proliferation of CIDs.
-
Metadata is also impossible to remove without changing the CID, so metadata becomes part of the content.
-
Performance may be impacted as well as if we don't inline UnixFS root nodes into CIDs, additional fetches will be required to load a given UnixFS entry.
With this approach we would maintain a separate data structure outside of the UnixFS tree to hold metadata.
This was rejected due to concerns about added complexity, recovery after system crashes while writing, and having to make extra requests to fetch metadata nodes when resolving CIDs from peers.
This scheme would see metadata stored in an external database.
The downsides to this are that metadata would not be transferred from one node to another when syncing as Bitswap is not aware of the database, and in-tree metadata
The integer portion of UnixTime is represented on the wire using a varint encoding. While this is inefficient for negative values, it avoids introducing zig-zag encoding. Values before the year 1970 will be exceedingly rare, and it would be handy having such cases stand out, while at the same keeping the "usual" positive values easy to eyeball. The varint representing the time of writing this text is 5 bytes long. It will remain so until October 26, 3058 ( 34,359,738,367 )
Fractional values are effectively a random number in the range 1 ~ 999,999,999. Such values will exceed
2^28 nanoseconds ( 268,435,456 ) in most cases. Therefore, the fractional part is represented as a 4-byte
fixed32
, as per Google's recommendation.