-
Notifications
You must be signed in to change notification settings - Fork 38
Rewrite Epub parsing with the new XmlParser implementation and the new Publication model #89
Conversation
Elements and attributes are every time picked from the right namespace. Epub 3.x Property Data Type default vocabularies and prefix mechanism are carefully implemented. As data are not organized in the same way in an Epub Package Document and in a Readium Publication, a specific intermediate model is used to parse completely an Epub before converting it into a Publication. This allows clearer code and should make further feature additions easier.
After SMIL parsing, media overlays are stored in the mediaOverlays property of a dedicated link.
Hi @qnga, very interesting proposal indeed, but quite a large PR(s), which will take some time to be reviewed. It would be good if you can join the weekly Readium call to explain the rationale of this evolution and give details about its risks. Please contact me in PM if you need info on how to join. |
Great job on the namespaces and vocabularies! Since it's a quite large PR, maybe it would be worth adding a unit tests suite for the parser? We can reuse the test cases from Swift: https://github.com/readium/r2-streamer-swift/tree/develop/r2-streamer-swiftTests/Parser/EPUB |
…ck metadata language
…n readingOrder and resources
…bout MultilanguageString
… result is percent-decoded
…ors to a dedicated function
Move Metadata mapping to a separate file Map all ununused MetaItems to otherMetadata
…s into otherMetadata
MetadataParser now outputs only MetaItem elements, which are used in MetadataMapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qnga Great job, it's well-structured 👍
I made a few cosmetic changes that you can review and merge (branch review/pull/89
), and left some comments on the PR that we can tackle together.
r2-streamer/src/main/java/org/readium/r2/streamer/fetcher/ContentFilter.kt
Outdated
Show resolved
Hide resolved
r2-streamer/src/main/java/org/readium/r2/streamer/parser/epub/Constants.kt
Show resolved
Hide resolved
r2-streamer/src/main/java/org/readium/r2/streamer/parser/epub/MetadataModel.kt
Outdated
Show resolved
Hide resolved
r2-streamer/src/main/java/org/readium/r2/streamer/parser/epub/EpubModel.kt
Outdated
Show resolved
Hide resolved
r2-streamer/src/main/java/org/readium/r2/streamer/parser/epub/EpubAdapter.kt
Outdated
Show resolved
Hide resolved
r2-streamer/src/main/java/org/readium/r2/streamer/parser/epub/EpubAdapter.kt
Outdated
Show resolved
Hide resolved
r2-streamer/src/main/java/org/readium/r2/streamer/parser/epub/EpubAdapter.kt
Outdated
Show resolved
Hide resolved
r2-streamer/src/main/java/org/readium/r2/streamer/parser/epub/Metadata.kt
Outdated
Show resolved
Hide resolved
… and use it in parsing of Epubs
Fixes readium/kotlin-toolkit#203 Fixes readium/kotlin-toolkit#202 Fixes readium/r2-lcp-kotlin#50
Elements and attributes are every time picked from the right namespace. Epub 3.x Property Data Type default vocabularies and prefix mechanism are carefully implemented.
As data are not organized in the same way in an Epub Package Document and in a Readium Publication, a specific intermediate model is used to parse completely an Epub before converting it into a Publication. This allows clearer code and should make further feature additions easier.
UPDATE: This PR now includes the adaptation of the streamer to the new Publication model introduced by @mickael-menu in Shared PR 88 .