design: add an exploration report about learning IPLD (#294)

ipld · Sep 26, 2020 · ddc331d · ddc331d
1 parent d81caf2
commit ddc331d
Showing 1 changed file with 98 additions and 0 deletions.
diff --git a/design/history/exploration-reports/2020.09-learning-ipld.md b/design/history/exploration-reports/2020.09-learning-ipld.md
@@ -0,0 +1,98 @@
+# Journey learning the IPLD stack
+
+Author: Daniel Martí (@mvdan)
+
+It was suggested that I should capture my perspective as I get up to speed on
+the IPLD stack, so that we can possibly identify shortcomings with the current
+material, or topics which can cause problems.
+
+### Previous knowledge
+
+I should note that I was already familiar with hashing, Git, type systems and
+data structures, encodings like JSON and Protobuf, and compatibility between
+different programming languages. So the "block" and "data model" layers of IPLD
+were relatively easy to understand.
+
+### First impression
+
+The docs are somewhat scattered and unfinished, which does make it a little
+extra confusing to get started. In chronological order, I read:
+
+1) https://hackmd.io/LHTTmGSWSvem4Wz2h_a39g?both, Eric's "terse primer". Seems
+   to try to cover everything, though it does seem like quite a lot of
+   information to take in all at once.
+
+2) https://ipld.github.io/docs/, sourced at https://github.com/ipld/docs. Seems
+   aimed at getting started with tutorials in JS. I should note that I first
+   read this before Mikeal's "new intro" added on September 2nd 2020.
+
+3) https://github.com/ipld/specs, which seems to contain all the formal specs,
+   but also includes a pretty decent README.
+
+I think all three should probably be unified into two halves:
+
+* A high-level introduction to IPLD, max 3-4 pages. Probably extending Mikeal's
+  new intro with some material from Eric's primer?
+
+* The set of spec documents, with a README to classify and introduce each of the
+  groups or layers. I think the specs repo already does a decent job at this.
+
+### Concepts that confused me
+
+I already raised some of these as Slack threads or HackMD comments, but for the
+sake of keeping record, I'm listing the most basic or important ones here.
+
+* It is said that a link is unlike a URL, since it is merely a hash of data
+  that doesn't statically say where to fetch the data from. So... how would one
+  ever actually fetch data via a link?
+
+* Out of the three layers (blocks, data model, schemas), Schemas have been by
+  far the hardest to wrap my head around. I think an introduction should contain
+  a very brief example, including how it actually looks like when mapped to the
+  data model and encoded into a block.
+
+The following are more such points, but focused around schemas, once I got to
+that part of the spec:
+
+* My first read about ADLs left me very confused, in particular how they're
+  different than Schemas. I found the "Mapped to the Data Model" introduction to
+  ADLs much easier to understand, as it shows reasonable examples.
+
+* Why are multi-block data structures a separate definition in the spec, and not
+  just part of Schemas?
+
+* Are blocks generally filled with data nearly completely, or is it normal to
+  have them relatively empty?
+
+* Wouldn't removing the first byte from a very long multi-block List mean that
+  every block would need to be modified to shift all bytes forward by one? I
+  assume and hope not, but the spec doesn't really give pointers.
+
+* Since data in blocks is encoded from the data model, how would Iknow if a
+  particular data model value fits in a single block? What about a shema value?
+
+* Would IPLD be much different if the data model was an internal detail, and not
+  exposed to the user? I imagine that, most of the time, one would interact with
+  schemas and not the data model.
+
+* When docs say "the IPLD type system", is that in terms of Schemas, or the Data
+  Model, or both? Answer: The schema intro later says that "types" are for
+  schemas, and "kinds" for the data model. That should probably be sooner in the
+  schema spec.
+
+* For new IPLD team members in the future, it's probably best if their first
+  week is focused on the basics alone - blocks, hashing, linking, encodings, and
+  the data model. That's enough for some realistic demos using IPLD, and can be
+  learned in half a day, allowing the person to start contributing without
+  multiple full days of reading. I should clarify that Eric did give me a data
+  model issue to work on during my first week, but I never picked up on the "you
+  don't need to read about schemas for now" nudge.
+
+* The specs README hierarchy presents these three concepts in order: multi-block
+  collections, schemas, and ADLs. The docs do explain why schemas and ADLs are
+  different, but not why multi-block collections are also a separate thing. Eric
+  mentioned that multi-block collections are pretty much ADLs; so why are they
+  introduced before schemas?
+
+* The schema authoring guide talks about "component specifiers" and "component
+  specifications", but never seems to define them.