diff --git a/design/history/exploration-reports/2020.09-learning-ipld.md b/design/history/exploration-reports/2020.09-learning-ipld.md new file mode 100644 index 00000000..0bf4e9e4 --- /dev/null +++ b/design/history/exploration-reports/2020.09-learning-ipld.md @@ -0,0 +1,98 @@ +# Journey learning the IPLD stack + +Author: Daniel Martí (@mvdan) + +It was suggested that I should capture my perspective as I get up to speed on +the IPLD stack, so that we can possibly identify shortcomings with the current +material, or topics which can cause problems. + +### Previous knowledge + +I should note that I was already familiar with hashing, Git, type systems and +data structures, encodings like JSON and Protobuf, and compatibility between +different programming languages. So the "block" and "data model" layers of IPLD +were relatively easy to understand. + +### First impression + +The docs are somewhat scattered and unfinished, which does make it a little +extra confusing to get started. In chronological order, I read: + +1) https://hackmd.io/LHTTmGSWSvem4Wz2h_a39g?both, Eric's "terse primer". Seems + to try to cover everything, though it does seem like quite a lot of + information to take in all at once. + +2) https://ipld.github.io/docs/, sourced at https://github.com/ipld/docs. Seems + aimed at getting started with tutorials in JS. I should note that I first + read this before Mikeal's "new intro" added on September 2nd 2020. + +3) https://github.com/ipld/specs, which seems to contain all the formal specs, + but also includes a pretty decent README. + +I think all three should probably be unified into two halves: + +* A high-level introduction to IPLD, max 3-4 pages. Probably extending Mikeal's + new intro with some material from Eric's primer? + +* The set of spec documents, with a README to classify and introduce each of the + groups or layers. I think the specs repo already does a decent job at this. + +### Concepts that confused me + +I already raised some of these as Slack threads or HackMD comments, but for the +sake of keeping record, I'm listing the most basic or important ones here. + +* It is said that a link is unlike a URL, since it is merely a hash of data + that doesn't statically say where to fetch the data from. So... how would one + ever actually fetch data via a link? + +* Out of the three layers (blocks, data model, schemas), Schemas have been by + far the hardest to wrap my head around. I think an introduction should contain + a very brief example, including how it actually looks like when mapped to the + data model and encoded into a block. + +The following are more such points, but focused around schemas, once I got to +that part of the spec: + +* My first read about ADLs left me very confused, in particular how they're + different than Schemas. I found the "Mapped to the Data Model" introduction to + ADLs much easier to understand, as it shows reasonable examples. + +* Why are multi-block data structures a separate definition in the spec, and not + just part of Schemas? + +* Are blocks generally filled with data nearly completely, or is it normal to + have them relatively empty? + +* Wouldn't removing the first byte from a very long multi-block List mean that + every block would need to be modified to shift all bytes forward by one? I + assume and hope not, but the spec doesn't really give pointers. + +* Since data in blocks is encoded from the data model, how would Iknow if a + particular data model value fits in a single block? What about a shema value? + +* Would IPLD be much different if the data model was an internal detail, and not + exposed to the user? I imagine that, most of the time, one would interact with + schemas and not the data model. + +* When docs say "the IPLD type system", is that in terms of Schemas, or the Data + Model, or both? Answer: The schema intro later says that "types" are for + schemas, and "kinds" for the data model. That should probably be sooner in the + schema spec. + +* For new IPLD team members in the future, it's probably best if their first + week is focused on the basics alone - blocks, hashing, linking, encodings, and + the data model. That's enough for some realistic demos using IPLD, and can be + learned in half a day, allowing the person to start contributing without + multiple full days of reading. I should clarify that Eric did give me a data + model issue to work on during my first week, but I never picked up on the "you + don't need to read about schemas for now" nudge. + +* The specs README hierarchy presents these three concepts in order: multi-block + collections, schemas, and ADLs. The docs do explain why schemas and ADLs are + different, but not why multi-block collections are also a separate thing. Eric + mentioned that multi-block collections are pretty much ADLs; so why are they + introduced before schemas? + +* The schema authoring guide talks about "component specifiers" and "component + specifications", but never seems to define them.