Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Commit

Permalink
design: add an exploration report about learning IPLD (#294)
Browse files Browse the repository at this point in the history
  • Loading branch information
mvdan authored Sep 26, 2020
1 parent d81caf2 commit ddc331d
Showing 1 changed file with 98 additions and 0 deletions.
98 changes: 98 additions & 0 deletions design/history/exploration-reports/2020.09-learning-ipld.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Journey learning the IPLD stack

Author: Daniel Martí (@mvdan)

It was suggested that I should capture my perspective as I get up to speed on
the IPLD stack, so that we can possibly identify shortcomings with the current
material, or topics which can cause problems.

### Previous knowledge

I should note that I was already familiar with hashing, Git, type systems and
data structures, encodings like JSON and Protobuf, and compatibility between
different programming languages. So the "block" and "data model" layers of IPLD
were relatively easy to understand.

### First impression

The docs are somewhat scattered and unfinished, which does make it a little
extra confusing to get started. In chronological order, I read:

1) https://hackmd.io/LHTTmGSWSvem4Wz2h_a39g?both, Eric's "terse primer". Seems
to try to cover everything, though it does seem like quite a lot of
information to take in all at once.

2) https://ipld.github.io/docs/, sourced at https://github.com/ipld/docs. Seems
aimed at getting started with tutorials in JS. I should note that I first
read this before Mikeal's "new intro" added on September 2nd 2020.

3) https://github.com/ipld/specs, which seems to contain all the formal specs,
but also includes a pretty decent README.

I think all three should probably be unified into two halves:

* A high-level introduction to IPLD, max 3-4 pages. Probably extending Mikeal's
new intro with some material from Eric's primer?

* The set of spec documents, with a README to classify and introduce each of the
groups or layers. I think the specs repo already does a decent job at this.

### Concepts that confused me

I already raised some of these as Slack threads or HackMD comments, but for the
sake of keeping record, I'm listing the most basic or important ones here.

* It is said that a link is unlike a URL, since it is merely a hash of data
that doesn't statically say where to fetch the data from. So... how would one
ever actually fetch data via a link?

* Out of the three layers (blocks, data model, schemas), Schemas have been by
far the hardest to wrap my head around. I think an introduction should contain
a very brief example, including how it actually looks like when mapped to the
data model and encoded into a block.

The following are more such points, but focused around schemas, once I got to
that part of the spec:

* My first read about ADLs left me very confused, in particular how they're
different than Schemas. I found the "Mapped to the Data Model" introduction to
ADLs much easier to understand, as it shows reasonable examples.

* Why are multi-block data structures a separate definition in the spec, and not
just part of Schemas?

* Are blocks generally filled with data nearly completely, or is it normal to
have them relatively empty?

* Wouldn't removing the first byte from a very long multi-block List mean that
every block would need to be modified to shift all bytes forward by one? I
assume and hope not, but the spec doesn't really give pointers.

* Since data in blocks is encoded from the data model, how would Iknow if a
particular data model value fits in a single block? What about a shema value?

* Would IPLD be much different if the data model was an internal detail, and not
exposed to the user? I imagine that, most of the time, one would interact with
schemas and not the data model.

* When docs say "the IPLD type system", is that in terms of Schemas, or the Data
Model, or both? Answer: The schema intro later says that "types" are for
schemas, and "kinds" for the data model. That should probably be sooner in the
schema spec.

* For new IPLD team members in the future, it's probably best if their first
week is focused on the basics alone - blocks, hashing, linking, encodings, and
the data model. That's enough for some realistic demos using IPLD, and can be
learned in half a day, allowing the person to start contributing without
multiple full days of reading. I should clarify that Eric did give me a data
model issue to work on during my first week, but I never picked up on the "you
don't need to read about schemas for now" nudge.

* The specs README hierarchy presents these three concepts in order: multi-block
collections, schemas, and ADLs. The docs do explain why schemas and ADLs are
different, but not why multi-block collections are also a separate thing. Eric
mentioned that multi-block collections are pretty much ADLs; so why are they
introduced before schemas?

* The schema authoring guide talks about "component specifiers" and "component
specifications", but never seems to define them.

0 comments on commit ddc331d

Please sign in to comment.