-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some way to get the actual storage size of an object #5910
Comments
ipfs object stat
@da2x i believe i found a way to replicate this using already existing stats according to I took your example hash, adding up all block sizes of unique recursive references which resulted in:
Adding up recursive unique references datasizes landed at:
the "expected" ref size is the size according to the size you indicate in this issue. Here's my somewhat less hacky solution im using with our package that interfaces with IPFS, unfortunately I couldn't get the built-in // DedupAndCalculatePinSize is used to remove duplicate refers to objects for a more accurate pin size cost
func (im *IpfsManager) DedupAndCalculatePinSize(hash string) (int64, error) {
// format a multiaddr api to connect to
parsedIP := strings.Split(im.nodeAPIAddr, ":")
multiAddrIP := fmt.Sprintf("/ip4/%s/tcp/%s", parsedIP[0], parsedIP[1])
outBytes, err := exec.Command("ipfs", fmt.Sprintf("--api=%s", multiAddrIP), "refs", "--recursive", "--unique", hash).Output()
if err != nil {
return 0, err
}
scanner := bufio.NewScanner(strings.NewReader(string(outBytes)))
var refsArray []string
for scanner.Scan() {
refsArray = append(refsArray, scanner.Text())
}
var calculatedRefSize int
for _, ref := range refsArray {
refStats, err := im.Stat(ref)
if err != nil {
return 0, err
}
calculatedRefSize = calculatedRefSize + refStats.DataSize
}
return int64(calculatedRefSize), nil
} |
My example command was slightly off. I updated it above and its behaviour now matches your go implementation. |
Woot! I believe my go implementation will work regardless of chunk size |
So, "CumulativeSize" is calculated based on metadata recorded in the object itself. We'd probably want something like:
I'd vote for the latter, actually. See: #3955. |
There is an |
Version information:
go-ipfs version: 0.4.18-
Repo version: 7
System version: amd64/linux
Golang version: go1.11.1
Type:
enhancement
Description:
ipfs object stat
QmPS6VssQGyBYjGQSK8ordvXaU1yUoaUmTfmrV7daLeRPH outputsCumulativeSize: 36709305
. However, when an object consists of mostly duplicated data (like the example hash does) the actual blocksize required to store it in a repository is only 15729480 bytes (or 42,8 % of the CumulativeSize.).It would extremely useful to have another field in the output from
ipfs object stat
for the actual recursive deduplicated storage block size of an object (the amount of space required in the repository to store the object).The following command is the only way I’ve found to get this number:
(The above method may not be accurate although it appears to work at least for synthetic tests objects.)
This would be very useful to pinning services; all of which currently overcharge their customers for the CumulativeSize of pinned objects and not the actual storage space used.The IPFS community would benefit from more accurate storage accounting and cheaper pin storage services. Ideally, pinning services would create per-customer objects and charge them for the deduplicated storage space across all their pins.
What follows are instructions for recreating a test object in case it gets garbage collected or otherwise lost in time. It’s not really relevant for the issue itself except explaining the test object.
The text was updated successfully, but these errors were encountered: