Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fixed-period and conditional repo GC #1495

Merged
merged 1 commit into from
Nov 10, 2015
Merged

Add fixed-period and conditional repo GC #1495

merged 1 commit into from
Nov 10, 2015

Conversation

rht
Copy link
Contributor

@rht rht commented Jul 18, 2015

WIP sketch.

  1. Ok with configurable GC frequency?
  2. go-datastore doesn't have size check, so a local system call is used instead

Address #972

@Luzifer
Copy link
Member

Luzifer commented Jul 18, 2015

I like this. 💚

Suggestions:

  1. Let the user decide how much space it may consume
  2. Let the user decide how often to check (optional? not sure if really used)
  3. Maybe add an option to run GC even when size is not exceeded?

@rht rht force-pushed the gc branch 2 times, most recently from f6028f6 to 0d079dc Compare July 18, 2015 12:07
@rht
Copy link
Contributor Author

rht commented Jul 18, 2015

  1. I will put this into config once there is consensus on how the parameters are defined. This could even default to e.g. 50% of remaining free space.
  2. That's too low level. Frequency is configurable only when there isn't an algorithm to adapt the freq to the system resource level.
  3. For now ipfs repo gc?

@rht
Copy link
Contributor Author

rht commented Jul 18, 2015

2.. Although not necessarily for nix https://nixos.org/nixos/manual/sec-nix-gc.html

@rht rht force-pushed the gc branch 2 times, most recently from 4431faf to 7ed7f35 Compare July 19, 2015 14:30
fmt.Println("Repo GC done. Released %d", newDiskUsage-diskUsage)
}
}
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file/function is already a huge mess. we'll want to split this out.

  • this stuff belongs in the fs-repo package. I think maybe we can pass an option all the way in as to whether to initialize with GC or not. (or even no option, assume always so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cyclomatic complexity check:

$ gocyclo -over 15 . | grep -v Godeps | grep -v _test
32 main daemonFunc cmd/ipfs/daemon.go:115:1
30 cli parseArgs commands/cli/parse.go:199:1
27 merkledag_pb (*PBLink).Unmarshal merkledag/pb/merkledag.pb.go:101:1
27 cli parseOpts commands/cli/parse.go:62:1
25 http Parse commands/http/parse.go:15:1
25 secio (*secureSession).runHandshake p2p/crypto/secio/protocol.go:112:1
23 corehttp (*gatewayHandler).getOrHeadHandler core/corehttp/gateway_handler.go:85:1
22 merkledag_pb (*PBNode).Unmarshal merkledag/pb/merkledag.pb.go:205:1
21 main run cmd/ipfswatch/main.go:49:1
20 http getResponse commands/http/client.go:156:1
19 fsrepo (*FSRepo).SetConfigKey repo/fsrepo/fsrepo.go:487:1
19 merkledag_pb (*PBLink).VerboseEqual merkledag/pb/merkledag.pb.go:638:1
19 merkledag_pb (*PBLink).Equal merkledag/pb/merkledag.pb.go:684:1
18 http (internalHandler).ServeHTTP commands/http/handler.go:74:1
16 main main cmd/ipfs/main.go:58:1
16 corehttp (*gatewayHandler).putHandler core/corehttp/gateway_handler.go:249:1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, ticker and ctx timeout stuff shouldn't exist in fs-repo. This is daemon stuff.
I will put the func runGC() in daemon.go for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rht cyclomatic is nice-- i hadn't seen it :)

@jbenet
Copy link
Member

jbenet commented Jul 20, 2015

@rht solid first pass! 👍 this is headed in the right direction. some comments above o/

@rht rht force-pushed the gc branch 7 times, most recently from a4704b0 to 6ca010a Compare July 22, 2015 08:04
@rht
Copy link
Contributor Author

rht commented Jul 22, 2015

@jbenet RFCR, things have been ironed out.

@@ -461,3 +478,80 @@ func merge(cs ...<-chan error) <-chan error {
}()
return out
}

func runGC(req cmds.Request, repo repo.Repo, node *core.IpfsNode) error {
disableGC, _, err := req.Option(disableGCKwd).Bool()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside of daemonFunc because it is cleaner this way. But it is the other way around for mountFuse, initWithDefaults. Which one is preferred? With runGC's, there is less res.SetError(err, cmds.ErrNormal) and less complexity in daemonFunc.

@rht rht force-pushed the gc branch 2 times, most recently from 3e54b16 to 05c2554 Compare July 22, 2015 13:55

if diskUsageMB > cfg.Datastore.StorageGCWatermark*cfg.Datastore.StorageMax/100 {
// Do GC here
fmt.Println("Starting repo GC...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a log.Event, not fmt.Println

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several fmt.Print* being used (that's why I use it). Should they be replaced with event log as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a few fmt.Println, sorry, those are very few and mostly just the addresses users see when starting up the daemon.

we've considered maybe printing out some of the events, kind of like an http server, but it will take some work to figure out what to show and what not to.

@whyrusleeping
Copy link
Member

This looks good to me, i'm okay with merging.

@rht
Copy link
Contributor Author

rht commented Nov 9, 2015

Rebased. Not sure if this is going to be used.

@ghost ghost mentioned this pull request Nov 9, 2015
@jbenet
Copy link
Member

jbenet commented Nov 10, 2015

@rht very much yes! thanks for rebasing.

@whyrusleeping last time i asked you said you'd be ok with this merging before dev0.4.0. ok to merge still?

@jbenet
Copy link
Member

jbenet commented Nov 10, 2015

@rht tests fail now :(

@@ -166,7 +165,7 @@ func open(repoPath string) (repo.Repo, error) {
}

func newFSRepo(rpath string) (*FSRepo, error) {
expPath, err := u.TildeExpansion(filepath.Clean(rpath))
expPath, err := util.TildeExpansion(filepath.Clean(rpath))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I removed a duplicate import here. Fixed.

License: MIT
Signed-off-by: rht <rhtbot@gmail.com>
jbenet added a commit that referenced this pull request Nov 10, 2015
Add fixed-period and conditional repo GC
@jbenet jbenet merged commit 76dedd8 into ipfs:master Nov 10, 2015
@jbenet
Copy link
Member

jbenet commented Nov 10, 2015

Fixed-period GC landed! Thanks @rht

@ghost
Copy link

ghost commented Nov 26, 2015

@rht I'm testing this on venus and can't get it to trigger GC, neither periodic, nor via ipfs cat

  "Datastore": {
    "GCPeriod": "1m",
    "Path": "/ipfs/repo/datastore",
    "StorageGCWatermark": 90,
    "StorageMax": "20G",
    "Type": "leveldb",
    "NoSync": true
  },
root@venus:~# du -sh /ipfs/ipfs_master
27G /ipfs/ipfs_master

root@venus:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        40G   36G  1.9G  96% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
udev            991M  4.0K  991M   1% /dev
tmpfs           201M  372K  200M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none           1001M  240K 1001M   1% /run/shm
none            100M     0  100M   0% /run/user

@daviddias daviddias mentioned this pull request Nov 26, 2015
42 tasks
@rht
Copy link
Contributor Author

rht commented Nov 26, 2015

In the log, there is no gc trigger at all? If all the added files are pinned, they won't be gc-ed. And there should be continued periodic err message Maximum storage limit exceeded. Maybe unpin some files?.

@rht
Copy link
Contributor Author

rht commented Nov 26, 2015

It works in my case; I just reproduced the err message through ipfs cat when $IPFS_PATH storage exceed Datastore.StorageMax.

@ghost ghost mentioned this pull request Nov 26, 2015
@ghost
Copy link

ghost commented Nov 27, 2015

I got ahold of logs and I even see the periodic one twice :)

03:59:58.429 ERROR   corerepo: Maximum storage limit exceeded. Maybe unpin some files? gc.go:179
03:59:58.429 ERROR   cmd/ipfs: Maximum storage limit exceeded. Maybe unpin some files? daemon.go:295

But isn't it supposed to start GC'ing?

@rht
Copy link
Contributor Author

rht commented Nov 27, 2015

Pinned files don't get gc-ed.

@ghost
Copy link

ghost commented Nov 27, 2015

Maybe I'm just totally confused -- I thought this would periodically check for the watermark, and do the equivalent to ipfs repo gc when the watermark gets hit. I only get the exceeded message though, both periodic and on ipfs cat.

@rht
Copy link
Contributor Author

rht commented Nov 27, 2015

ic, just discovered the err: maybeGC only does gc when the storage exceeds watermark, but is below storageMax. When the storage starts beyond storageMax, it doesn't trigger.

@rht
Copy link
Contributor Author

rht commented Nov 27, 2015

I think I set it to err and not do gc because if pinned files are larger than storagemax, the gc will be repeated wastefully.

@ghost
Copy link

ghost commented Nov 27, 2015

Do you mean these lines? https://github.com/ipfs/go-ipfs/blob/20b06a4cbce8884f5b194da6e98cb11f2c77f166/core/corerepo/gc.go#L126-L134

gc.Repo.GetStorageUsage() counts all blocks though, not just the ones pinned.

The GC run won't be wasted if at least a few blocks aren't pinned. On the ipfs.io gateways for example, there are a few GB pinned blocks, and the rest just fills up over time while people request stuff.

@rht
Copy link
Contributor Author

rht commented Nov 27, 2015

No, that refers to GCPeriod?
This one https://github.com/ipfs/go-ipfs/blob/20b06a4cbce8884f5b194da6e98cb11f2c77f166/core/corerepo/gc.go#L177-L181.
No gc is run since it returns err immediately.

Then #2010 should fix this.

@ghost
Copy link

ghost commented Nov 27, 2015

Eh yes sorry I marked the wrong lines - I meant the ones in maybeGC that you linked. Let's carry on in #2010

@rht rht mentioned this pull request Dec 17, 2015
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants