Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streaming JSON parser and writer #2593

Merged
merged 2 commits into from
Jun 29, 2017
Merged

Add streaming JSON parser and writer #2593

merged 2 commits into from
Jun 29, 2017

Conversation

paulirish
Copy link
Member

@paulirish paulirish commented Jun 24, 2017

Empty your mind

be formless,

shapeless

like water

🚰 🚰 🚰


fixes #1798, #1685

combo effort with bckenz.
@brendankenny did the streaming writer a bit ago. and we got the parser up and running together.
good stuff.

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so good

const delegateMock = {
loadingProgress: _ => {},
loadingStarted: _ => {},
loadingComplete: bool => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe success instead of bool?


const WebInspector = require('../web-inspector');

class TraceParser {
Copy link
Member

@brendankenny brendankenny Jun 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a top level description that mentions it uses TimelineLoader to parse trace JSON streaming in as chunks?

}

/**
* Returns entire trace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@return {{traceEvents: !Array<!TraceEvent>}}


it('correctly streams a trace to disk', () => {
traceFilename = 'test-trace-0.json';
return assetSaver.saveTrace({traceEvents}, traceFilename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test could just be absorbed by the test above since it is already saving and testing the save of this file

function saveTrace(traceData, traceFilename) {
return new Promise((resolve, reject) => {
const traceIter = traceJsonGenerator(traceData);
const traceStream = new stream.Readable({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment to describe what's happening

@@ -0,0 +1,64 @@
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since saveTrace is going to stay in asset-saver.js to keep it out of the lighthouse-background bundle, should this file just be trace-parser.js?

const stripOuterBrackets = str => str.replace(/^\[/, '').replace(/\]$/, '');
const partialEventsStr = events => stripOuterBrackets(JSON.stringify(events));

const traceEventsStr = partialEventsStr(events.slice(0, events.length-2)) + ',';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment explaining here that this is pulling out a string of trace events to repeatedly feed in? that way you don't have to read down to the "just keep reading" comment below to understand why this is here

assert.ok(streamedTrace.traceEvents.length > 300 * 1000, 'not >300,000 trace events');
// big trace is ~30X larger than the original
assert.ok(streamedTrace.traceEvents.length > events.length * 5, 'way more trace events');
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe check that streamedTrace.traceEvents[events.length-2] is the same as events[0] (I believe those indices are right :). Checks that the test is feeding in the events correctly and that the parser is correctly parsing into at least the second chunk fed in.

@brendankenny
Copy link
Member

oh, we should probably document that this (currently) doesn't load the metadata property when a trace streams in. We might want that at some point, but it sounds like TimelineLoader might support it by the time we do

@paulirish
Copy link
Member Author

ptal

@paulirish
Copy link
Member Author

lgtm on the writer.

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two nits and parser LGTM!


/**
* Traces > 256MB hit limits in V8, so TraceParser will parse the trace events stream as its
* received. We use DevTools' TimelineLoader for the heavy lifting, as it has a fast trace-specific
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it's received"

const readTrace = require(filename);

assert.equal(streamedTrace.traceEvents.length, readTrace.traceEvents.length);
assert.deepStrictEqual(streamedTrace.traceEvents[0], readTrace.traceEvents[0]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need the traceEvents[0] check now that it checks the whole check below

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet!

yield '"traceEvents": [\n';
if (traceData.traceEvents.length > 0) {
const eventsIterator = traceData.traceEvents[Symbol.iterator]();
// Emit first item manually to avoid a trailing comma.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such a shame, when will we get json2017?

}
});

const writeStream = fs.createWriteStream(traceFilename);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not that big a deal, but could also just build a buffer to keep this all sync and make the saveAssets call as fs.writeFileSync

Copy link
Member

@brendankenny brendankenny Jun 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting idea, through Buffer.concat? We could check it out, though assembling an entire giant trace in memory makes me nervous since of course the original copy of the giant trace is also taking up memory :)

We could also do a fs.writeSync loop, but it didn't seem to really matter since saveAssets is already async due to screenshots.

(and in theory the stream from fs.createWriteStream is optimized by having an internal buffer to write to disk in decent-sized chunks, though in practice it seems to flush after every write)

// assert.ok(streamedTrace.traceEvents.length > 400 * 1000);
assert.ok(bytesRead > 256 * 1024 * 1024, `${bytesRead} bytes read`);
// if > 256 MB are read we should have ~480,000 trace events
assert.ok(streamedTrace.traceEvents.length > 300 * 1000, 'not >300,000 trace events');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like we could nail this down to a particular number to be aware of any unexpected changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. done.

@brendankenny
Copy link
Member

@patrickhulce you're probably the best to take a last look so we're not just approving each other's work :)

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

});

it('correctly saves a trace with multiple extra properties to disk', () => {
const trace = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a test to make sure we can save a trace that's over 256mb?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was nervous about saving a 256MB file on travis/appveyor, but I can try it out

@brendankenny
Copy link
Member

trying it out :)

@brendankenny
Copy link
Member

all the CIs are happy, so let's keep an eye on the 256MB write test, but seems fine for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adopt a streaming JSON parser
3 participants