feat: assign codes for MIME types #159

Stebalien · 2020-02-01T00:33:30Z

And add a script to automate this.

fixes #4

Questions:

Names: Currently, I've left the mime types as names. We might want to use mime/ as in feat: adding MIME types as codecs #84.
Range: I've allocated quite a large range (0.4% of the 4 byte range). Most of this is reserved.

vmx

The import script is not future-proof. If I understand it correctly, then the script parses the XML file and assigns numbers to the codecs based on the order in the XML file. They are sorted alphabetically in the XML file, so if a new mimetype is added and you re-run the script, you would end up with different codes.

Some <record>s have a date attribute, it looks like it got introduced in 2014. Hence I propose sorting the records by date attribute first (the ones without a date first) and if they have the same date alphabetically based on <file>. This we we hopefully end up with a future-proof reproducible assignment of codes.

I'd also like to see more information text about why we include MIME types and that Multicodecs are not MIME types. If I understand things correctly in an ideal Multicodec-only world all those +json MIME types would all just be Multicodec JSON 0x0200.

mikeal · 2020-02-03T15:50:55Z

I think that we should factor out the consumption of the entire multiformat table in our multiformat clients before we drastically increase the size of the table. The parse time and table size are already noticable in our bundle size, this would make that much worse. My older http clients include a full mime database and it makes it unsuitable for browser bundles. Luckily, we already have a tentative plan to move to using integer references that won’t require the full table in code.

Stebalien · 2020-02-03T16:42:20Z

The import script is not future-proof. If I understand it correctly, then the script parses the XML file and assigns numbers to the codecs based on the order in the XML file. They are sorted alphabetically in the XML file, so if a new mimetype is added and you re-run the script, you would end up with different codes.

Unless there's a bug, it first loads the already-assigned numbers. Then, for all new mime types, it assigns increasing (unique) codes.

Some s have a date attribute, it looks like it got introduced in 2014. Hence I propose sorting the records by date attribute first (the ones without a date first) and if they have the same date alphabetically based on . This we we hopefully end up with a future-proof reproducible assignment of codes.

Sounds like a good starting point. I'll keep the current "don't change things" logic but having a stable conversion would be nice.

@mikeal Hm. Fair point. Even compressed, the table is going to grow from 3K to 22K. Or 26K to 260K uncompressed.

I'm fine leaving this in a PR for now. I submitted it because we kept getting requests to do something like this.

vmx · 2020-02-04T13:58:21Z

Unless there's a bug, it first loads the already-assigned numbers. Then, for all new mime types, it assigns increasing (unique) codes.

Oh I missed that when I read the code. Though I'm still in favour of having a stable conversion that can be run at any time and results in the same output.

vmx · 2020-02-04T14:03:52Z

table.csv

+libp2p-peer-record,                                                                   libp2p,        0x0301,     libp2p peer record type
+x11,                                                                                  multihash,     0x1100,     
+sm3-256,                                                                              multihash,     0x00534d,     
+blake2b-8,                                                                            multihash,     0x00b201,     Blake2b consists of 64 output lengths that give different hashes


The hashes are prefixed with an additional 0x00 is that intentional or a bug (the MIME types as well)?

Yes. I'm trying to make it clear how many bytes are in each code. Every two hex digits is a single byte.

Sorry I don't get it. What do you mean with "how many bytes". Do you mean how many bytes it takes when encoded as varint?

Shouldn't then values over 0x80 also have a 0x00 prefix?

vmx · 2020-02-12T12:10:16Z

I've put on my todo list to make this stable. I don't know when I will find the time to do it. When this issue becomes urgent to be merged, please let me know and i'll prioritize it.

pierogitus · 2020-04-20T22:04:37Z

A downside of a 4-byte range is it makes a base32 sha256 CID 64 characters which doesn't fit in a DNS segment. A 3-byte range would work and with a single range instead of sub ranges for each major type it wouldn't reserve too much of that space.

See GH/multiformats#158. Changes are analogous to those proposed in GH/multiformats#159.

See GH/#158. Changes are analogous to those proposed in GH/#159.

CMCDragonkai · 2020-11-07T08:22:23Z

What's the status of this? I'm hoping to use CIDs to refer to image types soon.

CMCDragonkai · 2020-11-07T10:16:59Z

I have a suggestion, perhaps instead of one big table.csv, we have multiple tables. And use certain codes to go down address spaces.

CMCDragonkai · 2020-11-07T10:18:47Z

Also I noticed that image/jpeg is missing from the table. This is a very strange mime to not have in the csv. Also image/gif.

CMCDragonkai · 2020-11-09T04:34:07Z

Another question, adding in mimetypes overlaps with existing codecs in the tables.csv. For example how do we compare application/json to json which already exists in the table.csv?

lewisl9029 · 2021-09-16T22:15:17Z

Looks like this effort has been stalled for a while, mostly due to concerns around the drastic increase in table size?

The readme of the project describes a first-come, first-serve policy when it comes to adding new codecs, and I wonder if we could maybe apply that here as well with mime types. I.e. maybe we can start with a small handful of the most commonly used MIME types on the internet today (say, this list), and then add more over time based on demand, instead of dumping in all known mime-types at once?

Is there some particular need for all the mime types to be in a contiguous block that I'm not aware of?

Stebalien added 4 commits January 31, 2020 16:35

feat(validate): check mime types

263464d

feat: add base mime types

e7f13d2

feat: add a script for importing mime types

ace6268

feat: reserve MIME range

bf2acff

Stebalien requested review from raulk and vmx February 1, 2020 00:37

Stebalien force-pushed the feat/mime-range branch from 2542896 to 669fea4 Compare February 1, 2020 00:41

Stebalien mentioned this pull request Feb 1, 2020

feat: adding MIME types as codecs #84

Closed

vmx requested changes Feb 3, 2020

View reviewed changes

vmx requested a review from mikeal February 3, 2020 15:42

vmx reviewed Feb 4, 2020

View reviewed changes

feat: import mime types

0e370e5

Stebalien force-pushed the feat/mime-range branch from 669fea4 to 0e370e5 Compare February 5, 2020 22:00

Stebalien mentioned this pull request Apr 14, 2020

Dealing with multimedia #168

Closed

This was referenced Aug 24, 2020

Added common text formats #82

Open

Reserving ranges #158

Open

ntninja added a commit to ntninja/multicodec that referenced this pull request Aug 25, 2020

Add reserved code range for private use by applications

df1d75a

See GH/multiformats#158. Changes are analogous to those proposed in GH/multiformats#159.

ntninja mentioned this pull request Aug 25, 2020

Add reserved code range for private use by applications #191

Merged

rvagg pushed a commit that referenced this pull request Aug 28, 2020

Add reserved code range for private use by applications (#191)

e0e9582

See GH/#158. Changes are analogous to those proposed in GH/#159.

hacdias mentioned this pull request Dec 11, 2020

[WIP] adding MIME types as codec types multiformats/js-multicodec#28

Closed

lewisl9029 mentioned this pull request Sep 16, 2021

Mimetypes as codes #4

Open

rvagg mentioned this pull request Nov 4, 2021

multiformat code for CARs #239

Closed

sshmatrix mentioned this pull request Nov 6, 2023

[Proposal] ENSIP-17: DataURI Format in Contenthash ensdomains/docs#165

Closed

0xc0de4c0ffee mentioned this pull request Jun 12, 2024

Codecs for ENS Contenthash: URI [0xF2] and Data URL [0xF3] #353

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: assign codes for MIME types #159

feat: assign codes for MIME types #159

Stebalien commented Feb 1, 2020 •

edited

Loading

vmx left a comment

mikeal commented Feb 3, 2020

Stebalien commented Feb 3, 2020

vmx commented Feb 4, 2020

vmx Feb 4, 2020

Stebalien Feb 4, 2020

vmx Feb 5, 2020

Stebalien Feb 5, 2020

vmx Feb 5, 2020

Stebalien Feb 5, 2020

Stebalien Feb 5, 2020

vmx commented Feb 12, 2020

pierogitus commented Apr 20, 2020

CMCDragonkai commented Nov 7, 2020

CMCDragonkai commented Nov 7, 2020

CMCDragonkai commented Nov 7, 2020 •

edited

Loading

CMCDragonkai commented Nov 9, 2020

lewisl9029 commented Sep 16, 2021

feat: assign codes for MIME types #159

Are you sure you want to change the base?

feat: assign codes for MIME types #159

Conversation

Stebalien commented Feb 1, 2020 • edited Loading

vmx left a comment

Choose a reason for hiding this comment

mikeal commented Feb 3, 2020

Stebalien commented Feb 3, 2020

vmx commented Feb 4, 2020

vmx Feb 4, 2020

Choose a reason for hiding this comment

Stebalien Feb 4, 2020

Choose a reason for hiding this comment

vmx Feb 5, 2020

Choose a reason for hiding this comment

Stebalien Feb 5, 2020

Choose a reason for hiding this comment

vmx Feb 5, 2020

Choose a reason for hiding this comment

Stebalien Feb 5, 2020

Choose a reason for hiding this comment

Stebalien Feb 5, 2020

Choose a reason for hiding this comment

vmx commented Feb 12, 2020

pierogitus commented Apr 20, 2020

CMCDragonkai commented Nov 7, 2020

CMCDragonkai commented Nov 7, 2020

CMCDragonkai commented Nov 7, 2020 • edited Loading

CMCDragonkai commented Nov 9, 2020

lewisl9029 commented Sep 16, 2021

Stebalien commented Feb 1, 2020 •

edited

Loading

CMCDragonkai commented Nov 7, 2020 •

edited

Loading