Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(object): add support for raw leaves in object diff #5472

Merged
merged 4 commits into from
Oct 4, 2018

Conversation

overbool
Copy link
Contributor

@overbool overbool commented Sep 15, 2018

License: MIT
Signed-off-by: Overbool overbool.xu@gmail.com

Fixes #5470

License: MIT
Signed-off-by: Overbool <overbool.xu@gmail.com>
@overbool
Copy link
Contributor Author

@Stebalien, something seems to be incorrect in ci

@schomatis
Copy link
Contributor

Yes, this definitely looks like the right approach, treat the nodes as IPLD nodes as long as possible and only convert them to ProtoNode when necessary: when we need to remove links (write to the node, something that can't be done with the IPLD interface), and at that point (if they have links to remove) we know they are not raw nodes. Am I interpreting this patch correctly?

Could you add a test for the raw nodes case?

(And also, if it's not too much trouble, could you add some comment lines explaining that this now supports raw nodes and why.)

@schomatis
Copy link
Contributor

And great job!! (I should have started with that line :)

@overbool
Copy link
Contributor Author

overbool commented Sep 17, 2018

@schomatis Yes, thx for your reply. I add the comment and test cases. Could you help me review this again?

But something seems to be incorrect about our ci.

@overbool overbool force-pushed the fix/diff-raw-leaves branch 2 times, most recently from 4254f14 to 1ce6e04 Compare September 17, 2018 11:02
Copy link
Contributor

@schomatis schomatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @overbool!! This LGTM (minus the two code nits), I would like to get someone else to check the tests structure, overloading test cases (with more than one test_cmp call per case) seems sensible to me but we may be breaking some unwritten rule.

coreiface "github.com/ipfs/go-ipfs/core/coreapi/interface"
"github.com/ipfs/go-ipfs/dagutils"

cmdkit "gx/ipfs/QmSP88ryZkHSRn1fnngAaV2Vcn63WUJzAavnRM9CVdU1Ky/go-ipfs-cmdkit"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure this was some IDE automatic feature (it happens to me also) but could you restore the e and cmdkit names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

"gx/ipfs/QmSP88ryZkHSRn1fnngAaV2Vcn63WUJzAavnRM9CVdU1Ky/go-ipfs-cmdkit"
)

const (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm usually all for declaring constants and avoiding hard-coding inside functions but in this particular case of strings it doesn't seem to be worth it, those strings won't be repeated elsewhere and they make more sense inside the StringArg call instead of isolated here where we wouldn't now what they are used for, it will be just repeating the "obj_a" string as just the name of the string constant with the same value.

@schomatis
Copy link
Contributor

@magik6k Could you take a look at the tests please?

@schomatis
Copy link
Contributor

But something seems to be incorrect about our ci.

Yeah, I wouldn't worry too much about it, they tend to fail for no (apparent) reason fairly regularly, my main rule of thumb is that if another CI that is doing the same test is passing then I'm good.

License: MIT
Signed-off-by: Overbool <overbool.xu@gmail.com>
@magik6k
Copy link
Member

magik6k commented Sep 17, 2018

@magik6k Could you take a look at the tests please?

It was a random fail, most likely caused by #5481

@overbool
Copy link
Contributor Author

@schomatis Thx for your suggestion. I had fixed them.

@overbool
Copy link
Contributor Author

@schomatis Should I split the test cases to keep the rule (one test_cmp call per case)?

@schomatis
Copy link
Contributor

schomatis commented Sep 17, 2018

It was a random fail, most likely caused by #5481

@magik6k Sorry, I wasn't very clear. I meant if you could take a look at the sharness tests in this PR: https://github.com/ipfs/go-ipfs/pull/5472/files#diff-c85948db06f72554f9c3153c8cfac4c9, particularly if it's ok to call test_cmp twice in the same test case (which seems reasonable to me but just to be sure).

Copy link
Member

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the test cases to be separate, makes it more obvious when something fails (slightly easier to read too).

dagutils/diff.go Outdated
@@ -94,7 +94,11 @@ func ApplyChange(ctx context.Context, ds ipld.DAGService, nd *dag.ProtoNode, cs
return e.Finalize(ctx, ds)
}

// Diff returns a set of changes that transform node 'a' into node 'b'
// Diff returns a set of changes that transform node 'a' into node 'b'.
// It supports two nodes forms: ProtoNode and RawNode. Because we treats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/treats/treat

License: MIT
Signed-off-by: Overbool <overbool.xu@gmail.com>
@overbool
Copy link
Contributor Author

@magik6k I had separated the test cases and modified the spelling mistake,could you help me review it again?

@schomatis
Copy link
Contributor

Nice work, thanks @overbool !

@schomatis schomatis added the RFM label Sep 18, 2018
@overbool
Copy link
Contributor Author

@schomatis thx for your help in this issue

@Stebalien
Copy link
Member

How will this behave when passed, e.g., DagCBOR nodes? I think this'll just continue as if nothing is wrong and print out nonsensical results. We should fail when we encounter non-protobuf/non-raw nodes.

@schomatis
Copy link
Contributor

I think this'll just continue as if nothing is wrong and print out nonsensical results.

Why?

We should fail when we encounter non-protobuf/non-raw nodes.

Where?

I don't have any experience with CBOR so I fail to see the relation between that type of node and this particular PR.

The title of the PR may be misleading, it emphasizes why we are doing this but not what we are changing here. This just doesn't force the ProtoNode type assertion unless it needs to: when modifying the links of placeholder nodes (cleanA/cleanB) which shouldn't even exist in the first place (we could just be using another container for the Add/Remove links). There is no especial support for proto/raw nodes, this function operates at the IPLD level comparing CIDs.

@Stebalien
Copy link
Member

when modifying the links of placeholder nodes (cleanA/cleanB) which shouldn't even exist in the first place (we could just be using another container for the Add/Remove links)

This is how we're calculating the diff. We're removing all the links present in both nodes A and B from both A and B, leaving only the links present in only one or the other. However, this will fail for any format that's not ProtoNode as we won't be able to remove the links.

This is fine for raw nodes only because they don't have links.


One solution is to just treat all non-ProtoNodes as opaque "things" (i.e., stop diffing). Diff would, at the top, check to see if we have two protonodes. If not, it would compare the CIDs and emit a Mod change object instead of actually diffing.


Is this making any sense or am I still misunderstanding what this patch does.

@overbool
Copy link
Contributor Author

However, this will fail for any format that's not ProtoNode as we won't be able to remove the links.

If other format Nodes have links and implement RemoveNodeLink function, maybe Diff() can support other format Nodes not just ProtoNode and RawNode.

Currently, Diff() just support ProtoNode and RawNode.

@schomatis
Copy link
Contributor

This is how we're calculating the diff. We're removing all the links present in both nodes A and B from both A and B, leaving only the links present in only one or the other. However, this will fail for any format that's not ProtoNode as we won't be able to remove the links.

Yes, it seems like a reasonable compromise. @overbool Are you ok with making that change? If not, we can merge this PR as-is, since it's already solving #5470 (and the fact that this command is deeply broken is beyond responsibility of this PR).

@overbool
Copy link
Contributor Author

Are you ok with making that change?

@schomatis What do you mean about change?

One solution is to just treat all non-ProtoNodes as opaque "things" (i.e., stop diffing). Diff would, at the top, check to see if we have two protonodes. If not, it would compare the CIDs and emit a Mod change object instead of actually diffing.

about this?

@schomatis
Copy link
Contributor

Yes, instead of returning an error if the conversion to a ProtoNode fails generate a Mod entry instead (without diffing inside of those node's links).

@overbool
Copy link
Contributor Author

@schomatis, i got it.
However, using new solution means that previous work and tests were wasted.

@@ -112,8 +116,8 @@ func Diff(ctx context.Context, ds ipld.DAGService, a, b ipld.Node) ([]*Change, e
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that much (I think), we could add a check here to verify that, if they have links, i.e., they are not raw nodes because they passed the first check (hence we would still be supporting them), they must be proto-nodes; if not, you could also return a Mod as before. Actually you could even merge this proto-node check in the previous if, WDYT?

@overbool
Copy link
Contributor Author

overbool commented Sep 22, 2018

@Stebalien @schomatis According to your suggestion, I made some changes in Diff().

It only traverses links in the following cases:

  1. two node's links number are greater than 0.
  2. both of two nodes are ProtoNode.

Otherwise, it compares the cid and emits a Mod change object.

@@ -209,3 +197,16 @@ func MergeDiffs(a, b []*Change) ([]*Change, []Conflict) {
}
return out, conflicts
}

func getChange(a, b ipld.Node) ([]*Change, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely encapsulated!

dagutils/diff.go Outdated
// Diff returns a set of changes that transform node 'a' into node 'b'
// Diff returns a set of changes that transform node 'a' into node 'b'.
// It only traverses links in the following cases:
// 1. two node's links number are greater then 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "then" -> "than"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG, forgive my frequent spelling mistakes. 😨

Copy link
Contributor

@schomatis schomatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien Was that what you had in mind?

@schomatis schomatis added need/review Needs a review and removed RFM labels Sep 23, 2018
Copy link
Member

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 small things in sharness, then LGTM

printf "" > diff_exp &&
test_cmp diff_exp diff_out
'

test_expect_success "diff (raw-leaves) against self (single file) is empty" '
ipfs object diff $SINGLE_FILE_RAW $SINGLE_FILE_RAW > diff_raw_out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing &&

test_expect_success "diff against self (empty dir) is empty" '
ipfs object diff $EMPTY_DIR $EMPTY_DIR > diff_out
printf "" > diff_exp &&
test_cmp diff_exp diff_out
'

test_expect_success "diff (raw-leaves) against self (empty dir) is empty" '
ipfs object diff $EMPTY_DIR_RAW $EMPTY_DIR_RAW > diff_raw_out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also missing &&

License: MIT
Signed-off-by: Overbool <overbool.xu@gmail.com>
@overbool
Copy link
Contributor Author

@magik6k thx for your reply, I had fixed it.

@overbool
Copy link
Contributor Author

@Stebalien Was this pr what you had in mind?

@eingenito eingenito added the status/in-progress In progress label Oct 2, 2018
@Stebalien Stebalien merged commit 181c399 into ipfs:master Oct 4, 2018
@ghost ghost removed the status/in-progress In progress label Oct 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/review Needs a review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants