Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: CID as an interface #211

Closed
wants to merge 2 commits into from
Closed

Conversation

achingbrain
Copy link
Member

@achingbrain achingbrain commented Oct 15, 2022

This is an attempt at implementing #203, converting the CID class into a minimal interface and having factory functions to return objects that conform to the CID interface.

I've preserved all the generics the Link interface introduced and there is a function in link.js that add the extra methods to turn something that conforms to CID into something that can be used as a Link so existing code using the Link API should not have to change.

Notably there was no need to update any of the Link tests, though I did move some across from the CID tests and add some extra ones.

The static methods on CID have been exported as individual functions - the names remain the same (decode, parse, asCID etc) in an attempt to be less disruptive.

Code using these methods should mostly just need to change:

import { CID } from 'multiformats/cid'

to:

import * as CID from 'multiformats/cid

Types can be imported as:

import type { CID } from 'multiformats/interface'

or as before:

import type { CID } from 'multiformats/cid'

Fixes #208

BREAKING CHANGE: the CID class is now an interface

@achingbrain achingbrain mentioned this pull request Oct 15, 2022
@@ -82,13 +82,6 @@ describe('CID', () => {
assert.throws(() => CID.create(0, 113, hash), msg)
})

it('throws on trying to base encode CIDv0 in other base than base58btc', async () => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has moved to test-link.spec.js

assert.deepStrictEqual(newCid.toString(), cidStr)
})

it('.link() should return this CID', () => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has moved to test-link.spec.js

This is an attempt at implementing #203, converting the CID class
into a minimal interface and having factory functions to return
objects that conform to the CID interface.

I've preserved all the generics the `Link` interface introduced and
there are functions in `link.js` that add the extra methods to turn
something that conforms to `CID` into something that can be used as
a `Link` so existing code using the `Link` API should not have to
change.

Notably there was no need to update any of the `Link` tests.

The static methods on CID have been exported as individual functions
- the names remain the same (`decode`, `parse`, etc) in an attempt
to be less disruptive.

Code using these methods should mostly just need to change:

```js
import { CID } from 'multiformats/cid'
```

to:

```js
import * as CID from 'multiformats/cid
```

Types can be imported as:

```ts
import type { CID } from 'multiformats/interface'
```

or as before:

```ts
import type { CID } from 'multiformats/cid'
```

BREAKING CHANGE: the CID class is now an interface

it('toJSON()', async () => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has moved to test-link.spec.js

@Gozala
Copy link
Contributor

Gozala commented Oct 15, 2022

@achingbrain it seems to me that CID and Link interface are almost compatible at this point, and I'm not sure we need both. Only difference between two seems to be a following set of methods toString, toJSON, link and toV1, if those often seem unnecessary we should probably consider removing them instead.

That said changes here seem to exhibit robustness principle design which in our code often manifests in specific pattern that I think is worth sharing as it might provide useful food for thoughts when evaluating this.

Pattern

We often find yourself defining two distinct interfaces for the same data type. I tend to call the first one model of the data type (which is usually just a data needed to represent a type) and the second a view of the data type (naming isn't ideal because it may get you thinking about MVC, but I'm open to better terms).

model

Model in most cases is just a struct with fields to represent a data. Only exceptions to this rule are:

  1. When internal structure is generic, exposing internals would require exposing new type parameters that aren't useful otherwise. Often you could remove need for that by hiding some internal structure behind methods.
  2. When data structure has variable representation and exposing them as type union exposes incidental complexity. E.g. if the thing is logically a key value map or a collection of sorts you don't necessarily want to expose underlying trees, instead 2 or 3 methods might be more adequate.

All or our code that produces instances of this type takes model in arguments, which keeps requirements minimal for a caller. Since module can produce instances of the type it already has a dependency on a module that defines a type and there for it has access to all the functions (in static forms) to operate on that data.

view

Most our types exposed to the end user have view interface that extends it's model interface. This allows us to expose all of the library functions using methods, which happens to be an idiomatic in JS. I do want to emphasize however that this not just to pleasing folks that want method chaining it has real value:

By hanging relevant lib functionality on the instances we free a user that consumes or incorporates them from dependence on our library. In other words if library Foo used library Bar that produces type Beep from our library beep we don't want Foo to have to declare dependency on Beep just to perform some operations on values it got from Bar. By hanging most functionality on the instance (which simply delegates from prototype method to a static functions) we provide all of that functionality allowing:

  1. Foo to operate on received value without having to import Beep.
  2. Make Bar able to swap Beep for other library in the future without having it's users swap Beep import to something else.

generics

Code that incorporates data type into a larger structure as opposed to producing or consuming would ideally take module but return view without importing library defining a data type. In most cases solution is to use constrained generics like in the example below:

const box<T extends Lib.Model> (value:T): { box: T } = ({ box: value })

That allows box to retain type of the value of the input, e.g. if it was Lib.View it would come out as such without downcasting it to Lib.Model.

Now there are cases where above does not work e.g. because you have to use some functionality of the value. In that case it's usually up to the author to compare pros and cons of introducing dependence on Lib vs requiring Lib.View and deciding on what's optimal.


How does this apply here ?

It looks to me that with changes here CID is almost becoming what we call model in above pattern while Link acts like a view. It's bit ironic because my intention was to do reverse of this :P That is Link acting as model while CID remained to act as a view, although I did end up compromising and adding couple of methods that I thought would make it unusable in most places and consequently reaffirm use of CIDs instead.

@achingbrain it seems that you have came to a different conclusion thinking that CID as defined in PR would be useful enough and other methods while sometimes useful aren't that necessary in practice. I think I would like to better understand why we came to different conclusions on this.

What I do fail to understand however is motivation for removing those methods from the class. Specifically sticking methods on class prototype has no downsides that I can tell and even in the face of model and view interfaces we tend to have a same instance implementing both, here you have chosen to have one wrap the other which comes with performance implications (JS engines can do aggressive optimizations on classes but not as much on structs like you do in asLink function not to mention additional allocations and the fact that link.asCID === link will no longer hold.

I think it would really help to understand what do you think about pattern described above and then how do you think this relates to it or whether it should at all. If you'll find described pattern to be useful here, I'd be happy to work with you on aligning this PR with it (regardless of which one will act as model and which one as view). On the other hand if you do think described pattern is irrelevant to this, it would help to better understand motivation because it's lost on me.

If this needs to go forward I would very much prefer to not do what asLink does in terms of wrapping class with more methods, but other way round that is having all the methods on prototype and maybe just extracting relevant fields in CID.create.

@achingbrain
Copy link
Member Author

it would help to better understand motivation because it's lost on me

I guess this is a shift in thinking away from the current implementation.

The primary motivation is to have CID be a minimal interface that is as lightweight as possible.

If the CID interface is very minimal, it can be extended to add additional functionality (such as having byteOffset and byteLength fields for the CID-as-ArrayBuffer idea) instead of having all these extra fields* that from a practical perspective sometimes cause problems, which historically we've worked around, but the workarounds themselves can inadvertently cause other problems.

It's harder to go in the other direction and remove functionality while expecting it to behave the same. By which I mean, you could declare extra fields in the DefaultCID implementation in order for it to be usable as something else (Link for example), code to the CID interface, and then become surprised when deep equality between two objects with enough overlap to satisfy the CID interface doesn't work since at runtime interfaces don't exist.

* By extra fields, I mean, what is the minimum set of fields a thing needs to be used as a CID? I think it's a .version, a .codec, and a .multihash. .bytes is nice too, but anything outside of that should probably be viewed with suspicion for a lightweight, minimal interface, which is the goal here.

The fewer fields we require, the more likely we are to be backwards compatible too so there's less need to convert between older/newer implementations.

It looks to me that with changes here CID is almost becoming what we call model in above pattern while Link acts like a view

In terms of patterns, the change to CID here makes it more of a value object.

As an aside, we could double down on that and remove equals from the interface too?

I take your point about not wanting to have to depend on multiformats to compare two CIDs, but tangentially related there's nothing in the Uint8Array class that lets you do a value-comparison to another Uint8Array and people seem to have accepted that API.

it seems to me that CID and Link interface are almost compatible at this point, and I'm not sure we need both.

They are, except there are a bunch of extra fields and methods on Link - I didn't want to mess with that interface because it would be (more) disruptive and since it's not used anywhere in the libp2p or ipfs stacks I don't have a good view on which fields are important and which are not.

Link seems more relevant to IPLD Schemas perhaps? Maybe a Link has more to say about the entity behind the link via destination type hinting? I'm not sure - there's not a lot of IPLD Schemas in libp2p/ipfs so again I don't really have a good view on it.

@Gozala
Copy link
Contributor

Gozala commented Oct 16, 2022

If the CID interface is very minimal, it can be extended to add additional functionality (such as having byteOffset and byteLength fields for the CID-as-ArrayBuffer idea) instead of having all these extra fields* that from a practical perspective #212 cause #208, which historically we've worked around, but the workarounds themselves can inadvertently cause #200.

I understand the motivation, yet I don't think trimming bunch of methods going to help with this. The problem in my experience is never stuff on prototype, it's always own properties.

If we do want to drop asCID and accept that CID no longer comes out as CID in new realm we could do that and that would address the problem you seem to want to solve. Alternatively we could employ a different signaling mechanism that would introduce above problems.

For the background cid.asCID === this was chosen because it seemed like a pretty strong signal with very little chance of collision with user code, maybe we should revisit this and choose a less problematic signal which may be more prone to collision but overall be a better compromise.

It's harder to go in the other direction and remove functionality while expecting it to behave the same. By which I mean, you could declare extra fields in the DefaultCID implementation in order for it to be usable as something else (Link for example), code to the CID interface, and then become surprised when deep equality between two objects with enough overlap to satisfy the CID interface doesn't work since at runtime interfaces don't exist.

I think your are conflating problems here. Equality problem comes from own properties and is unrelated to methods and getters on the prototype. If equality check considers prototypes it would also fail if I have CID from different version of library and therefor this would not solve that problem.

They are, except there are a bunch of extra fields and methods on Link - I didn't want to mess with that interface because it would be (more) disruptive and since it's not used anywhere in the libp2p or ipfs stacks I don't have a good view on which fields are important and which are not.

I would much rather trim Link interface and happy to work with you on that. Non of the methods are really MUST for us, it was just a set that made sense to me. I'm happy to revisit that

@rvagg
Copy link
Member

rvagg commented Oct 18, 2022

I would much rather trim Link interface

fwiw this is my preference too, Link should be that lightweight thing that we get to refer to in our libraries that don't need to create CIDs; that was the original intention so we should work on that (and I'd also like to see it trimmed, it turned out a bit heavier than I expected).

@rvagg
Copy link
Member

rvagg commented Dec 6, 2022

closing because I think this is redundant now? if not, maybe re-attempt it on master and we can have a discussion on the changes cause they may still be worth it

@rvagg rvagg closed this Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CID equality
3 participants