Skip to content

Commit

Permalink
cmd/files: flush parent folders (#10630)
Browse files Browse the repository at this point in the history
* cmd/files: flush parent folders

This is a mitigation to increased MFS memory usage in the course of many writes operations.

The underlying issue is the unbounded growth of the mfs directory cache in
boxo. In the latest boxo version, this cache can be cleared by calling Flush()
on the folder. In order to trigger that, we call Flush() on the parent folder
of the file/folder where the write-operations are happening.

To flushing the parent folder allows it to grow unbounded. Then, any read
operation to that folder or parents (i.e. stat), will trigger a sync-operation to match
the cache to the underlying unixfs structure (and obtain the correct node-cid).

This sync operation must visit every item in the cache. When the cache has grown too much,
and the underlying unixfs-folder has switched into a HAMT, the operation can take minutes.

Thus, we should clear the cache often and the Flush flag is a good indicator
that we can let it go. Users can always run with --flush=false and flush at
regular intervals during their MFS writes if they want to extract some performance.

Fixes #8694, #10588.

* cmd/files: docs and changelog for --flush changes
  • Loading branch information
hsanjuan authored Dec 19, 2024
1 parent 7c49860 commit ecb2558
Show file tree
Hide file tree
Showing 2 changed files with 68 additions and 18 deletions.
82 changes: 64 additions & 18 deletions core/commands/files.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,16 +59,18 @@ Content added with "ipfs add" (which by default also becomes pinned), is not
added to MFS. Any content can be lazily referenced from MFS with the command
"ipfs files cp /ipfs/<cid> /some/path/" (see ipfs files cp --help).
NOTE:
Most of the subcommands of 'ipfs files' accept the '--flush' flag. It defaults
to true. Use caution when setting this flag to false. It will improve
NOTE: Most of the subcommands of 'ipfs files' accept the '--flush' flag. It
defaults to true and ensures two things: 1) that the changes are reflected in
the full MFS structure (updated CIDs) 2) that the parent-folder's cache is
cleared. Use caution when setting this flag to false. It will improve
performance for large numbers of file operations, but it does so at the cost
of consistency guarantees. If the daemon is unexpectedly killed before running
'ipfs files flush' on the files in question, then data may be lost. This also
applies to run 'ipfs repo gc' concurrently with '--flush=false'
operations.
`,
of consistency guarantees and unbound growth of the directories' in-memory
caches. If the daemon is unexpectedly killed before running 'ipfs files
flush' on the files in question, then data may be lost. This also applies to
run 'ipfs repo gc' concurrently with '--flush=false' operations. We recommend
flushing paths reguarly with 'ipfs files flush', specially the folders on
which many write operations are happening, as a way to clear the directory
cache, free memory and speed up read operations.`,
},
Options: []cmds.Option{
cmds.BoolOption(filesFlushOptionName, "f", "Flush target and ancestors after write.").WithDefault(true),
Expand Down Expand Up @@ -491,10 +493,14 @@ being GC'ed.
}

if flush {
_, err := mfs.FlushPath(req.Context, nd.FilesRoot, dst)
if err != nil {
if _, err := mfs.FlushPath(req.Context, nd.FilesRoot, dst); err != nil {
return fmt.Errorf("cp: cannot flush the created file %s: %s", dst, err)
}
// Flush parent to clear directory cache and free memory.
parent := gopath.Dir(dst)
if _, err = mfs.FlushPath(req.Context, nd.FilesRoot, parent); err != nil {
return fmt.Errorf("cp: cannot flush the created file's parent folder %s: %s", dst, err)
}
}

return nil
Expand Down Expand Up @@ -792,10 +798,30 @@ Example:
}

err = mfs.Mv(nd.FilesRoot, src, dst)
if err == nil && flush {
_, err = mfs.FlushPath(req.Context, nd.FilesRoot, "/")
if err != nil {
return err
}
return err
if flush {
parentSrc := gopath.Dir(src)
parentDst := gopath.Dir(dst)
// Flush parent to clear directory cache and free memory.
if _, err = mfs.FlushPath(req.Context, nd.FilesRoot, parentDst); err != nil {
return fmt.Errorf("cp: cannot flush the destination file's parent folder %s: %s", dst, err)
}

// Avoid re-flushing when moving within the same folder.
if parentSrc != parentDst {
if _, err = mfs.FlushPath(req.Context, nd.FilesRoot, parentSrc); err != nil {
return fmt.Errorf("cp: cannot flush the source's file's parent folder %s: %s", dst, err)
}
}

if _, err = mfs.FlushPath(req.Context, nd.FilesRoot, "/"); err != nil {
return err
}
}

return nil
},
}

Expand Down Expand Up @@ -943,6 +969,17 @@ See '--to-files' in 'ipfs add --help' for more information.
flog.Error("files: error closing file mfs file descriptor", err)
}
}
if flush {
// Flush parent to clear directory cache and free memory.
parent := gopath.Dir(path)
if _, err := mfs.FlushPath(req.Context, nd.FilesRoot, parent); err != nil {
if retErr == nil {
retErr = err
} else {
flog.Error("files: flushing the parent folder", err)
}
}
}
}()

if trunc {
Expand Down Expand Up @@ -1105,11 +1142,20 @@ Change the CID version or hash function of the root node of a given path.
return err
}

err = updatePath(nd.FilesRoot, path, prefix)
if err == nil && flush {
_, err = mfs.FlushPath(req.Context, nd.FilesRoot, path)
if err := updatePath(nd.FilesRoot, path, prefix); err != nil {
return err
}
return err
if flush {
if _, err = mfs.FlushPath(req.Context, nd.FilesRoot, path); err != nil {
return err
}
// Flush parent to clear directory cache and free memory.
parent := gopath.Dir(path)
if _, err = mfs.FlushPath(req.Context, nd.FilesRoot, parent); err != nil {
return err
}
}
return nil
},
}

Expand Down
4 changes: 4 additions & 0 deletions docs/changelogs/v0.33.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ If you depended on removed ones, please fill an issue to add them to the upstrea

Onboarding files and directories with `ipfs add --to-files` now requires non-empty names. due to this, The `--to-files` and `--wrap` options are now mutually exclusive ([#10612](https://github.com/ipfs/kubo/issues/10612)).

#### MFS stability with large number of writes

We have fixed a number of issues that were triggered by writing or copying many files onto an MFS folder: increased memory usage first, then CPU, disk usage, and eventually a deadlock on write operations. The details of the fixes can be read at [#10630](https://github.com/ipfs/kubo/pull/10630) and [#10623](https://github.com/ipfs/kubo/pull/10623). The result is that writing large amounts of files to an MFS folder should now be possible without major issues. It is possible, as before, to speed up the operations using the `ipfs files --flush=false <op> ...` flag, but it is recommended to switch to `ipfs files --flush=true <op> ...` regularly, or call `ipfs files flush` on the working directory regularly, as this will flush, clear the directory cache and speed up reads.

#### 📦️ Dependency updates

- update `boxo` to [v0.26.0](https://github.com/ipfs/boxo/releases/tag/v0.26.0)
Expand Down

0 comments on commit ecb2558

Please sign in to comment.