-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: dag import --stats (#8237) #8237
Conversation
ae4855c
to
44433f3
Compare
@rvagg is the idea here basically to implement |
No. This implements a single "end-summary" output as the very very last printout, no intermediate progress updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old response looks like this:
> ipfs dag import foo.car
Pinned root QmbWqxBEKC3P8tqsKc98xmWNzrzDtRLMiMPL8wBuTGsMnR success
behind the scenes, JSON (HTTP API at /api/vo/dag/import
) looks like this:
> ipfs dag import foo.car --enc=json | jq
{
"Root": {
"Cid": {
"/": "QmbWqxBEKC3P8tqsKc98xmWNzrzDtRLMiMPL8wBuTGsMnR"
},
"PinErrorMsg": ""
}
}
While it is pretty safe to add new fields to JSON, changing the very first line printed as the text output could break people's scripts 🙈 👀
TODO before merge
We discussed this during triage today and it looks good, but we need to tweak it a bit:
- we want
Stats
struct withStats.BlockCount
next to theRoot
so we can add other metrics in the future, eg:payloadBytesCount
suggested in feat: dag import --stats (#8237) #8237 (comment)- time import took
- print
Stats
only when--stats
is passed (making this a backward-compatible opt-in feature)
@rvagg : are you able to do the open TODOs? |
@BigLep I'll put it on my personal backlog. Having it behind a flag makes me less enthusiastic, though I understand the reasoning. |
@gammazero : given your tribute next week, can you please incorporate @lidel's comments and get this over the line? |
This applies to both text and json output encoings. - Stats data is now contained within a Stats datastructure - Stats are printed after root so that first line of output is the same as previously, even when stats are output using --stats
edbbb78
to
875c3f0
Compare
Done:
|
would Approve if this wasn't my PR, but 👌 this is sweet, thanks so much @gammazero |
I added I did not add an elapsed time stat because I do not know what is the desired format for that duration value (seconds, milliseconds, microseconds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, perfectly fine to keep stats limited to info around imported blocks.
See comment below.
core/commands/dag/dag.go
Outdated
@@ -53,9 +54,15 @@ type ResolveOutput struct { | |||
RemPath string | |||
} | |||
|
|||
type CarImportStats struct { | |||
BlockCount uint64 | |||
PayloadBytesCount uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Payload
can be confusing here: is this raw blocks (CAR-metadata) or actual data in each block (CAR-metadata-dagmetadata)?
Perhaps renaming it to BlockBytesCount
and setting this to sum of nd.Stat().BlockSize()
is a way to remove confusion while keeping this useful no matter what codecs are used inside of blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed PayloadBytesCount
to BlockBytesCount
It happens that nd.Size()
and nd.Stat().BlockSize
return the same value, so I think it is better to use nd.Size()
, given the comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. if they are the same, it makes sense, but something feels off when I compare Size imported from CAR with value reported by ipfs dag stat
:
$ ipfs dag stat bafybeihcyruaeza7uyjd6ugicbcrqumejf6uf353e5etdkhotqffwtguva
Size: 27676801, NumBlocks: 383
$ ipfs dag export bafybeihcyruaeza7uyjd6ugicbcrqumejf6uf353e5etdkhotqffwtguva > test.car
0s 26.41 MiB / ? [--------------------------------------------------------------------------------=-----------------------] 390.25 MiB/s 0s
$ ipfs dag import --stats test.car
Pinned root bafybeihcyruaeza7uyjd6ugicbcrqumejf6uf353e5etdkhotqffwtguva success
Imported 383 blocks (125832269 bytes)
125832269 bytes is ~125 MB which is way more than 26MB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidel What I did previously (using nd.Size()
) worked for the car files in sharness/t0054-dag-car-import-export-data/
but does not work for your example. Using nd.Stat().DataSize + nd.Stat().LinksSize
works for your example, but not for the test cars/dags. The test cars/dags have stats with all zeros, for almost all blocks.
It appears that nd.Size()
returns nd.Stat().CumulativeSize
if a block has stat values. Othersize, nd.Size()
is set to len(nd.RawData())
. This makes both nd.Size()
and nd.Stats()
completely unreliable across different dags.
Apparently, the way to get a reliable size is to always use len(nd.RawData())
. So, that is what the latest change does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataSize
and LinksSize
sounds like a dag-pb thing, which probably won't be represented in the fixture files.
What stat are we actually after here, @ribasushi what's your expectation of what this sizing is going to report? I would think that it's the size of the output, which includes CAR header, CID lengths and even varint section size prefixes. But I could find that by measuring the size of the output myself, so the utility doesn't seem great. But what is the utility of reporting just the block sizes? What is the useful for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the ipfs dag stat
number. It is useful in terms of "this is the amount of IPLD-data these blocks hold"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @ribasushi, len(nd.RawData())
is probably the one then, but in this case maybe just use len(block.RawData())
then you get to use the block as it comes out of the CAR rather than whatever happens to it through a Decode
cycle (probably the same, but seems safer to use the one closer to the original)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remaining thing: #8237 (comment)
I hope you don't mind @gammazero but pushed the
|
basic regression tests for the default output (text and json)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
We tested everything with --stats
so I've added basic regression tests for dag import
without --stats
(default output).
This should be ready for merge.
dag import
* feat: report block count on `dag import` * fix: clean-up dag import message format * Only print stats when --stats flag is passed This applies to both text and json output encoding. - Stats data is now contained within a Stats datastructure - Stats are printed after root so that first line of output is the same as previously, even when stats are output using --stats * fix sharness test * Add PayloadBytesCount to stats * Attempt to stabilize flaky tests * Rename PayloadBytesCount to BlockBytesCount * Correctly calculate size or imported dag * Use RawSize of original block for import bytes calc * test: dag import without --stats basic regression tests for the default output (text and json) Co-authored-by: gammazero <gammazero@users.noreply.github.com> Co-authored-by: Marcin Rataj <lidel@lidel.org> (cherry picked from commit 0057199)
* feat: report block count on `dag import` * fix: clean-up dag import message format * Only print stats when --stats flag is passed This applies to both text and json output encoding. - Stats data is now contained within a Stats datastructure - Stats are printed after root so that first line of output is the same as previously, even when stats are output using --stats * fix sharness test * Add PayloadBytesCount to stats * Attempt to stabilize flaky tests * Rename PayloadBytesCount to BlockBytesCount * Correctly calculate size or imported dag * Use RawSize of original block for import bytes calc * test: dag import without --stats basic regression tests for the default output (text and json) Co-authored-by: gammazero <gammazero@users.noreply.github.com> Co-authored-by: Marcin Rataj <lidel@lidel.org> (cherry picked from commit 0057199)
I'm just taking a stab at this and am unsure if I'm following the right pattern here so feedback would be appreciated.
Currently
dag import
only reports pinned CIDs and an error on pinning. If youpin-roots=false
then you get nothing. I'm working on the JS version of this, including the HTTP client, and I'd really like to get a bit more information out of this process. The minimum I think is simply a report of how many blocks were imported (although in the current form it's not counting unique blocks, just total processed blocks).So we end up with this slightly awkward event:
{BlockCount:uint64, Root:*RootMeta}
- where the first one is expected to beRoot==nil
andBlockCount
telling you how many blocks, and any remaining should haveRoot!=nil
and reporting pin status if you asked for pinning.Is there a better pattern to follow than this? Can I make two separate event types? Can I clean up the struct a bit somehow or maybe use two separate structs rather than overloading one?