Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposal "Format API" #127

Merged
merged 7 commits into from
May 5, 2020
Merged

Add proposal "Format API" #127

merged 7 commits into from
May 5, 2020

Conversation

mickael-menu
Copy link
Member

@mickael-menu mickael-menu commented Mar 23, 2020

This proposal introduces a dedicated API to easily figure out a file format.

While a Publication is independent of any particular format, knowing the format of a publication file is necessary to:

  • determine the publication parser to use,
  • group or search publications by file type in the user's bookshelf.

This API is not tied to Publication, so it can be used as a general purpose tool to guess a file format, e.g. during HTTP requests or in the LCP library.

You can read the formatted proposal here.

@mickael-menu mickael-menu changed the title Initial proposal for the File and Format API Add proposal "File and Format API" Mar 30, 2020
@mickael-menu mickael-menu changed the title Add proposal "File and Format API" Add proposal "Format API" Apr 1, 2020
@mickael-menu
Copy link
Member Author

@danielweck @jccr We would like to merge this proposal during the next Readium dev call, so any insights you may have for desktop and web is most welcomed 🙏

@mickael-menu
Copy link
Member Author

mickael-menu commented Apr 24, 2020

I'd like to move forward, so I intend to merge this proposal after next week's meeting, if there's no counter-arguments till then.

I made a few changes in the proposal after implementing it in Swift:

  • Renamed Format.guess into Format.of, because the caller is expecting an accurate Format returned.
  • Added bitmap formats sniffing, because we need them in the CBZ parser.
  • Removed inspectingContent parameter to simplify the sniffers. Instead, Format.of() will iterate twice through the sniffers: first with a context containing only file extensions and media types, and the second time with a context containing the content, if there's any.
  • I grouped the sniffing of several formats in single sniffers, when the logic is shared (e.g. all the Readium WebPub formats).
  • Added a few more APIs:
    • MediaType: encoding, structuredSyntaxSuffix, isZIP, isJSON, isRWPM, isPartOf()
    • Link: mediaType,
    • Format.SnifferContext: encoding,contentAsRWPM and readFileSignature() to sniff magic numbers.

Change the ZAB media type
Fix EPUB heavy sniffing
Add the x. facet for ZAB and W3C WPUB
Renamed Format.guess into Format.of
Added bitmap formats, used for the CBZ parser
Removed `inspectingContent` parameter to simplify the sniffers
Grouped formats in shared sniffers
Add `Link.mediaType` helper
Add `Format.SnifferContext` `encoding`, `contentAsRWPM` and `readFileSignature()`
Add MediaType's isAudio and isLCPProtected
Add OPDS Authentication Document media type and format
Sniff an Audiobook using the reading order types
@mickael-menu mickael-menu merged commit 7b8a1b6 into readium:master May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants