Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cli): ipfs add with multiple files of same name #8493

Merged
merged 2 commits into from
Apr 3, 2022

Conversation

schomatis
Copy link
Contributor

@schomatis schomatis commented Oct 6, 2021

This PR incorporates the go-ipfs-cmds fix that makes the ipfs add command not ignore repeated entries. This fix conflicts with another bug in the ipfs add -w variant of the command which can't correctly process files with the same name (I think this was uncovered here and we don't have a specific issue for it). This is shown by the shanress test failing (see more details below). We need to first fix the ipfs add -w issue before landing this so we can be confident this is the correct fix.


Detail of the ipfs add -w bug and the test that fails (uncovering it)

The ipfs add -w (repeats) test now fails. This is expected because:

  • we repeatedly add the m/4r93 file
  • ignored before, these extra/repeated files are now sent to the core UnixFS API Add()
  • this API uses a fake (disposable) MFS root directory as a way to interact with the underlying DAG service (repo)
  • with the "wrapped" option -w we group all the entries being added in a single directory
  • the MFS directory, not intended for this ephemeral abuse, correctly refuses to add two entries with the same name (with the resulting error in sharness of a failed command: Error: directory already has entry by that name)

We are abusing the MFS API, see related and broader issue for more details on this general topic: Decouple MFS from UnixFS.

Also note that this issue might not happen with any kind of repeated entry in the ipfs add -w command as empty directories seem to be ignored when constructing the path and only files within those repeated directories will trigger the error, like ipfs add -w -r dir/file dir/file. There is some undocumented logic around path manipulation related to this:

https://github.com/ipfs/go-ipfs/blob/52a747763f6c4e85b33ca051cda9cc4b75c815f9/core/coreunix/add.go#L411-L424


Fixes #8264.

cc @guseggert

go.mod Outdated
@@ -29,7 +29,7 @@ require (
github.com/ipfs/go-graphsync v0.8.0
github.com/ipfs/go-ipfs-blockstore v0.1.6
github.com/ipfs/go-ipfs-chunker v0.0.5
github.com/ipfs/go-ipfs-cmds v0.6.0
github.com/ipfs/go-ipfs-cmds v0.6.1-0.20211005151237-0e6ec5a05153
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to create a release here after tests pass?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this PR is approved we should merge + release go-ipfs-cmds and bubble into this PR before merging it into master

@aschmahmann aschmahmann added the need/author-input Needs input from the original author label Oct 15, 2021
@schomatis schomatis self-assigned this Oct 15, 2021
@schomatis schomatis marked this pull request as draft October 15, 2021 17:11
@schomatis schomatis force-pushed the schomatis/fix/cli/multiple-files-same-name branch from 50a62cf to 4d916b1 Compare November 2, 2021 19:56
@schomatis

This comment has been minimized.

@schomatis

This comment has been minimized.

@schomatis

This comment has been minimized.

@schomatis schomatis force-pushed the schomatis/fix/cli/multiple-files-same-name branch 2 times, most recently from 133f0a3 to a27afcf Compare January 6, 2022 20:55
@guseggert
Copy link
Contributor

What's your estimate on how long it would take to fix #8541 ? If we fix that, then all the weird edge cases here would go away, right?

@schomatis
Copy link
Contributor Author

@guseggert

What's your estimate on how long it would take to fix #8541 ?

This largely depends on review bandwidth and (optionally) being able to finally consolidate UnixFS/MFS back here (as we would likely be breaking a bunch of APIs).

If we fix that, then all the weird edge cases here would go away, right?

The general scope of that issue is too broad and vague at the moment, but achieving the specific use case of removing MFS from ipfs add is not hard and I think doing that would, if not solve every edge case here, largely simplify this issue.

@aschmahmann
Copy link
Contributor

aschmahmann commented Feb 11, 2022

We are abusing the MFS API, see related and broader issue for more details on this general topic: #8541.

It doesn't seem like this is really about MFS as much as it is about some unintuitive behavior around what -w does with repeat parameters that leads to the behavior being undefined in the case of multiple files with the same name.

❯ ipfs add .\foo .\bar
added QmaZAaCrugAkinkaSGBP2uD8pchyKXY2rmPf8WQ2CzE2pN bar
added QmQ7qCwMvm3mb5PRaAJd1ytaAE4qULHzoJCunikZfT1oAp foo

This seems to indicate that ipfs add <file1> <file2> is just a shortcut for calling ipfs add file1 followed by ipfs add file2.

❯ ipfs add -w .\foo .\bar
added QmaZAaCrugAkinkaSGBP2uD8pchyKXY2rmPf8WQ2CzE2pN bar
added QmQ7qCwMvm3mb5PRaAJd1ytaAE4qULHzoJCunikZfT1oAp foo
added QmSQUBkpW1ApwzXK8Z23P4uL8jJkss31rd49bWG2SR6c2g

This seems to indicate that ipfs add -w <file1> <file2> adds file1 and file2 into a shared directory keyed by their file names. While this is an interesting and perhaps useful behavior, it's not the same as the repeated adds being a shortcut for ipfs add -w <file1> followed by ipfs add -w file2>.

This behavior seems to have been introduced intentionally ages ago #1536 and while likely not that common (due to bugs like this one with multiple files with the same name not being repeated flagged + upvoted by users) it's definitely in use (e.g. a cursory look on grep.app shows https://github.com/ipfs/public-gateway-checker/blob/16c99857e4f4cb67ff5a8bc617ec9fa44bdc8a6d/publish-to-ipfs.sh#L3)

In this case what happens when you do ipfs add -w .\test1\foo .\test2\foo is undefined and there could be any number of potential options:

  1. Only one file gets added and that one is wrapped in a directory
  2. Both files are added and only one gets wrapped in a directory
  3. Both files are added and are put in the same directory but one of them has an autogenerated name (e.g. foo(1))
  4. There's an error

It seems to me like the only sensible thing to do is to error if ipfs add -w contains entries with the same name. I'm not sure what the intent behind this test https://github.com/ipfs/go-ipfs/blob/2a871ef0184da3b021209881dfa48e3cfdaa9d26/test/sharness/t0043-add-w.sh#L131-L142 was, but if we want we can maintain compatibility here by only erroring if the entry paths are different (e.g. allowing ipfs add -w foo foo to be deduplicated to remove the last foo). We should also document the interesting behavior of ipfs add -w <file1> <file2> vs other ipfs add <file1> <file2> since at the moment the docs in ipfs add --help leave it ambiguous.

@schomatis
Copy link
Contributor Author

@aschmahmann Thanks for the detailed explanation. Could you expand a bit more on this suggestion, please?

we can maintain compatibility here by only erroring if the entry paths are different

Maybe an example of when would we error?

@aschmahmann
Copy link
Contributor

aschmahmann commented Feb 16, 2022

Maybe an example of when would we error?

The undefined behavior of when we should error is ipfs add -w .\test1\foo .\test2\foo (i.e. any time we would have two entries being added with the same name, which would necessitate an override or an error we should error)

The sharness test you linked https://github.com/ipfs/go-ipfs/blob/2a871ef0184da3b021209881dfa48e3cfdaa9d26/test/sharness/t0043-add-w.sh#L131-L142 is pretty weird in that it does intentionally do the equivalent of ipfs add -w .\test1\foo .\test1\foo and not expect an error.

At a high level if we want to continue to support that case (instead of just erroring from two files called foo), we can do so by either:

  1. Checking if the import paths are the same .\test1\foo .\test1\foo and just not parsing the data a second time (does enable the backwards compatibility with the linked test)
  2. Checking if the data in .\test1\foo and .\test2\foo are the same (and therefore will result in the same CID) and then ignoring the second entry (a more generous way of enabling backwards compatibility in more situations where it's safe)

Aside from the fact that technically this would be an unforced breaking change, if 1 or 2 are a pain to implement we'd probably be fine to modify the sharness test and just not support entries with the same name whether they have the same path or content. We'll be generating errors so in the unlikely event this breaks someone's workflow it'll be detectable and we could deal with it later. I'd rather not have breaking changes if we can avoid it, but we can evaluate depending on how painful you think it'll be to implement.

@schomatis
Copy link
Contributor Author

Got it, thanks. I think we can implement this by extending the -cmds lib to retain the path and decide what to do in ipfs add here.

@schomatis
Copy link
Contributor Author

According to Discord discussion we're going with:

  1. Error if there would be a conflicting path name, but don't if the DAGs from the conflicting names would be identical - shouldn't require any API changes to go-ipfs-cmds or go-ipfs-files and breaks the least backwards compatibility. Might be a bit more work to deal with in the coreunix.Adder and we'll likely end up putting some blocks in the repo that will only get removed with GC, but that's fine.

to minimize messing around with go-ipfs-cmds or go-ipfs-files.

@schomatis
Copy link
Contributor Author

@aschmahmann

I feel we have been going back and forth on the issue of which error to display to the user in go-ipfs which is not a blocker for my work (nor do I have any strong opinion on it).

My blocker is the following: AFAICT any of the proposed errors need go-ipfs to be aware of the import path, which is not available at the moment. I have asked for a simple way to get it in Discord and didn't get an answer. I have gotten pushback to modifying upstream dependencies but don't see any other way, so I need a decision on which dependency to modify. I'll repeat my original Discord question here:

I'm thinking we can satisfy #8493 (comment) with either:

@guseggert
Copy link
Contributor

Let's roll with doing the preprocessing in go-ipfs-cmds. Does that unblock you?

@schomatis
Copy link
Contributor Author

Yes, thanks.

@schomatis schomatis force-pushed the schomatis/fix/cli/multiple-files-same-name branch from a27afcf to 407eae9 Compare March 18, 2022 23:00
@schomatis schomatis marked this pull request as ready for review March 21, 2022 21:46
@schomatis
Copy link
Contributor Author

@guseggert Ready for review.

go.mod Outdated Show resolved Hide resolved
@schomatis schomatis force-pushed the schomatis/fix/cli/multiple-files-same-name branch from 9a27cbe to 02380b2 Compare March 29, 2022 20:40
@schomatis schomatis enabled auto-merge (squash) March 29, 2022 20:41
@schomatis
Copy link
Contributor Author

@guseggert Updated dependency, this should be ready now.

@schomatis schomatis removed the need/author-input Needs input from the original author label Mar 29, 2022
@BigLep BigLep removed the request for review from aschmahmann April 1, 2022 16:27
@schomatis schomatis merged commit 282ac7f into master Apr 3, 2022
@schomatis schomatis deleted the schomatis/fix/cli/multiple-files-same-name branch April 3, 2022 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

CLI cannot process multiple passed files with the same name
6 participants