Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is HEIF/HEIC magic number? #83

Closed
Jehan opened this issue Sep 14, 2018 · 10 comments
Closed

What is HEIF/HEIC magic number? #83

Jehan opened this issue Sep 14, 2018 · 10 comments

Comments

@Jehan
Copy link

Jehan commented Sep 14, 2018

Hi!

At GIMP's we had a report about a file wrongly interpreted as HEIC because of the wrong extension. Normally we are able to avoid this kind of error if we add magic number detection.

I couldn't get clear details about what can be considered as HEIC/HEIF magic number. Looking at a few sample images, I found that the string "ftyp" at the 5th bytes seems like a constant in every HEIC/HEIF file. Yet since I understand that HEIC/HEIF is a subformat of ISOBMFF, it looks like this magic number would also detect non-HEIC files which are ISOBMFF, right? At least it also worked with a .mov file (unfortunately!).

In the various HEIC files I could find, the longer string was either ftypmif1 or ftypheic. Would both these strings qualify as good magic numbers? Can there be other variants?
All the files I had in my possession also had the string mif1heic at the 16th byte. Is it also a confirmed constant? Can I use this as magic keyword instead?

Thanks! :-)

@farindk
Copy link
Contributor

farindk commented Sep 14, 2018

Hi Jehan,

the following four-cc codes can appear after the 'ftyp':

  • 'heic': the usual HEIF images
  • 'heix': 10bit images, or anything that uses h265 with range extension
  • 'hevc', 'hevx': brands for image sequences
  • 'heim': multiview
  • 'heis': scalable
  • 'hevm': multiview sequence
  • 'hevs': scalable sequence

I have not yet seen any files other than 'heic', and, hence, other types are not supported by libheif yet. I may add support when there are example images available for them.

'mif1' should not be a valid major brand, I think. However, 'mif1' should be listed in the compatible brands, as you have seen it appearing at byte 16. However, that location is not fixed. Starting at byte 16 is a list of compatible brands. I am not even 100% sure that this will always start at byte 16.

I think your best bet would be to check for the 'ftypheic' (and maybe also the other brands listed above). Note that this is not a string. It is not null-terminated. You have to check for these 8 bytes.

Dirk

@Jehan
Copy link
Author

Jehan commented Sep 14, 2018

'mif1' should not be a valid major brand

A major brand is when it is the 4-byte code after 'ftyp'? So 'ftypmif1' files should not be valid HEIC files according to you?
I actually have a few samples on my disk with this sequence, and they actually opens fine (as far as I can see) with GIMP, which uses libheif. I have no idea where I got them though (probably somewhere on the web, when searching for test samples).
For instance (I appended a .txt so that github accepts it; obviously remove this extension):
season_collection_1440x960.heic.txt

I would be a bit sad to reject these files if they work.

Note that this is not a string. It is not null-terminated.

Yeah I was saying "string" in the meaning of "sequence of bytes". Our code does not assume magic numbers to be null-terminated (or even readable text). ;-)

@farindk
Copy link
Contributor

farindk commented Sep 16, 2018

I have attached the relevant parts of the MP4 and HEIF standards below.
The file starts with a box header, which is its size (including the box header) and its type ('ftyp').
The header can be tricky to read if one wants to do it 100% correct (largesize), but I've never seen a box with 'largesize' in the ftyp box. So you can probably assume a fixed byte offset for ftyp.

This is followed by the major brand and a version, then a list of compatible brands. The number of compatible brands in the list must be derived from the box size. As far as I understand, 'heic' should be the major brand, or at least appear in the compatible brands list. The Nokia example you attached (season collection) puts 'mif1' into the major brand, which seems unusual, but might be still conforming to the standard as 'heic' appears in the list of compatible brands.

Would it help if libheif provided a method for quickly checking the file-type? That might be useful for other projects too, I guess.


Syntax of box header:
screenshot from 2018-09-16 14-48-58

Syntax of ftyp box:
screenshot from 2018-09-16 14-49-18

@Jehan
Copy link
Author

Jehan commented Sep 16, 2018

I have attached the relevant parts of the MP4 and HEIF standards below.

Thanks! Looking at this, it is confirmed the compatible brands are actually at fixed offset (well unless the box uses a largesize, but as you note, that probably nearly never happens), but it doesn't give any order for the brands. I imagine then that the fact that all the files in my possession had mif1heic in the list cannot therefore be considered as a constant, just chance.

Would it help if libheif provided a method for quickly checking the file-type? That might be useful for other projects too, I guess.

Not really. Or to be accurate, we are already checking the file type before we even get to the file plug-in. Basically for this, every file plug-in register oneself with file name extensions and/or magic bytes (the magic takes precedence over the extension of course, as it is much more reliable).
This algorithm has the merit to be both simple and efficient.

Anyway I think I will just use ftyp(heic|heix|…|mif1) as magic (basically "ftyp" + the list you gave me including "mif1" since I actually encountered this one). I think this should be both generic and accurate enough to detect a HEIF file.

Thanks for all the help. Unless you want to leave this opened (for any file type detection API or anything), I guess you can close.

gnomesysadmins pushed a commit to GNOME/gimp that referenced this issue Sep 16, 2018
Just looking for "ftyp" would also match other ISOBMFF files (.mov or
.mp4 files for instance). These are the possible 4-byte "brand" code
which can follow "ftyp", as listed by Dirk Farin from libheif.

I add the "mif1" brand, as I encountered some files using this magic
(even though this should normally not be valid apparently, yet the file
loaded fine in GIMP).

This is not perfect as the standard allows potentially very big box
headers, in which case 8 bytes (the "largesize" slot) may be inserted
between "ftyp" and the brand, as I understand it. But this is actually
unlikely enough to probably never happen (the compatible brands list
would have to be huuuge, as it looks like this is the only extendable
part in a ftyp box). So let's assume this just never happens.

See also: strukturag/libheif#83
@Jehan
Copy link
Author

Jehan commented Sep 16, 2018

@farindk
Copy link
Contributor

farindk commented Sep 16, 2018

I had another look into the specs. The brand 'mif1' seems to be actually allowed as the major brand. In that case, the file extension and MIME type defined in the file (requires deep parsing) should be considered.
Some parts of the spec seem to be much more complicated than necessary to me...

And btw. you might also check for 'msf1', which is the equivalent case for image sequences.

@Jehan
Copy link
Author

Jehan commented Sep 16, 2018

Thanks. I'm updating the magic for HEIF.

gnomesysadmins pushed a commit to GNOME/gimp that referenced this issue Sep 16, 2018
Just looking for "ftyp" would also match other ISOBMFF files (.mov or
.mp4 files for instance). These are the possible 4-byte "brand" code
which can follow "ftyp", as listed by Dirk Farin from libheif.

I add the "mif1" brand, as I encountered some files using this magic
(even though this should normally not be valid apparently, yet the file
loaded fine in GIMP).

This is not perfect as the standard allows potentially very big box
headers, in which case 8 bytes (the "largesize" slot) may be inserted
between "ftyp" and the brand, as I understand it. But this is actually
unlikely enough to probably never happen (the compatible brands list
would have to be huuuge, as it looks like this is the only extendable
part in a ftyp box). So let's assume this just never happens.

See also: strukturag/libheif#83

(cherry picked from commit 4ad3993)
@fancycode
Copy link
Member

@farindk maybe it would be a good idea to provide a function in libheif to check if some data looks like a heif file?

@farindk
Copy link
Contributor

farindk commented Sep 17, 2018

Yes, that's what I proposed a few posts above. As Jehan said, it will not work for GIMP, because they'd like to check the format before even accessing the plugin, but we should still add such a function.

@farindk
Copy link
Contributor

farindk commented Nov 21, 2018

merged the file-type detection into master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants