Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow analysis of generated bundle and dependencies #1858

Closed
valscion opened this issue Mar 19, 2017 · 29 comments
Closed

Allow analysis of generated bundle and dependencies #1858

valscion opened this issue Mar 19, 2017 · 29 comments
Milestone

Comments

@valscion
Copy link
Contributor

valscion commented Mar 19, 2017

This issue is a continuation from the Twitter thread discussing usage of webpack-bundle-analyzer with CRA.

webpack-bundle-analyzer can be used either as a CLI, providing it the stats.json webpack JSON output and the built files. With these, we are able to:

  1. Show the sizes of each module as they are on disk
  2. Show the sizes of each module after minifcation
  3. Show the sizes of each module after minification and gzip

Here's an example I built from a project using create-react-app after first having to eject.

⏩ report.html.zip ⏪

The changes I had to do after ejecting are in this commit: valscion/arvontakone@fbbc306

EDIT: Here's a GIF of that visualization:

webpack-bundle-analyzer-example

@valscion
Copy link
Contributor Author

Ideally this wouldn't require any changes to the CRA code, but I don't think that is possible. We'd need to at least get our hands on the stats.json that webpack can output if ran with the --json flag.

Are we able to come up with some way of gaining access to the module stats from the outside, or would it fall under the category of providing configuration and being excluded from CRA?

@franklinjavier
Copy link

I highly recommend the Bundle Buddy tool from @samccone

https://twitter.com/franklinjavier/status/931324987654443008?s=17

@gaearon
Copy link
Contributor

gaearon commented Jan 8, 2018

What's the latest thinking on this? Which tools work best?

@valscion
Copy link
Contributor Author

valscion commented Jan 9, 2018

I'm biased, naturally, but still think webpack-bundle-analyzer would be the best. I'm still contemplating about the best way to handle outputting stats.json from the build -- this issue webpack-contrib/webpack-bundle-analyzer#119 makes me a bit unsure how much work we'd need to do in CRA to get the stats.json out without issues.

I haven't forgotten about this issue here, though. When I've had time with webpack-bundle-amalyzer, I've worked on the next major version PR webpack-contrib/webpack-bundle-analyzer#97

@gaearon
Copy link
Contributor

gaearon commented Jan 9, 2018

OK, let me know if you need something on our side (as long as it doesn't negatively affect build perf).

@valscion
Copy link
Contributor Author

valscion commented Jan 9, 2018

Sure, will do ☺️

@joshwcomeau
Copy link
Contributor

A "lighter touch" could be to just have an option to output JSON, and then that JSON can be handled however the user sees fit (I prefer to use Chris Bateman's webpack visualizer).

I imagine an addition to package.json scripts like this:

  "scripts": {
    "start": "react-scripts start",
    "build": "react-scripts build",
    "build-stats": "react-scripts buildStats",
    "test": "react-scripts test --env=jsdom",
    "eject": "react-scripts eject"
  }

build-stats will just create a stats.json in the root dir, which they can then use however they see fit. For the recommended webpack-bundle-analyzer plugin, this can be done through their CLI utility, by adding another script (in userland) to the package.json:

  "scripts": {
    "start": "react-scripts start",
    "build": "react-scripts build",
    "build-stats": "react-scripts buildStats",
    "test": "react-scripts test --env=jsdom",
    "eject": "react-scripts eject",
+    "analyze": "webpack-bundle-analyzer stats.json"
  }

I'm happy to work on this as a feature if folks agree this is the direction to go!

@valscion
Copy link
Contributor Author

Yeah the stats.json is indeed what webpack-bundle-analyzer would also need. I think we've agreed that is a good direction. This point — "as long as it doesn't negatively affect build perf" — is the crucial one that might need some manual fiddling.

Feel free to work on this @joshwcomeau, it would be amazing ☺️

@valscion
Copy link
Contributor Author

@joshwcomeau you probably want to be aware of this: webpack-contrib/webpack-bundle-analyzer#119

Basically, if a bundle is large enough, JSON.stringify on the stats object coming from webpack can crash the build for an out-of-memory error. Most likely this would also affect webpack CLI itself when ran with --json flag, as they seem to be using plain JSON.stringify, too: https://github.com/webpack/webpack/blob/30329fb3648537ddc801b71f14904a3e762d4ad4/bin/webpack.js#L370-L371

I haven't reproduced this error myself, but if you find something, I'd love to hear about it.

@joshwcomeau
Copy link
Contributor

This point — "as long as it doesn't negatively affect build perf" — is the crucial one that might need some manual fiddling.

Right, so I was proposing we don't do it as part of the main build task; I have a hard time imagining it not affecting build perf, since it's non-trivial extra work being done. By having it as a separate task (eg. build-stats or something), you can run that task as-needed. In my experience, this isn't something that needs to be done with every build, it's more something you do occasionally when you notice your bundle getting bloated. As a bonus, that separate task might be even faster? Although for that, I'm really not sure, don't know enough (or, really, anything) about Webpack internals.

@joshwcomeau you probably want to be aware of this: webpack-contrib/webpack-bundle-analyzer#119

Ah, interesting! I've never run into that myself. Do you think it's something we need to be concerned about? It seems like a pretty edge-case thing (otherwise I'm sure I would have run into it by now). My instinct is that we proceed anyway, and hope that a fix is implemented within webpack shortly.

I'll spend some time tomorrow looking at this.

@trevorwhealy
Copy link

trevorwhealy commented Jan 14, 2018

I'm in favor of the separate entry point npm script proposal by @joshwcomeau, #1858 (comment) - the webpack visualizer they're referring to and webpack-bundle-analyzer alike are both great because of their interactivity. I actually made a workaround with a similar approach #3518 that can be used without ejecting.

I don't think we need to be that concerned about build performance here because, in my experience, this is not information most users will find very useful on every recompile. Usually, I check these tools every few days to make sure that the bundle size hasn't grown much larger than expected, due to a recently added dependency.

@valscion
Copy link
Contributor Author

Ah, interesting! I've never run into that myself. Do you think it's something we need to be concerned about? It seems like a pretty edge-case thing (otherwise I'm sure I would have run into it by now).

We should care, one way or the other. If we go the direction of adding a new react-scripts task (like the buildStats you proposed), then this is not crucial — building the project normally would still work. We could tell about this edge case in the documentation.

However, if we were to create the stats.json file on react-scripts build, that would change things. The error coming from JSON.stringify is not rescuable, so we couldn't just slap a try { } rescue block around the call, as the JSON.stringify call will crash the node process itself. It would be awful if some users of create-react-app with huge applications would no longer be able to build their app at all.

Nevertheless, both approaches could benefit if we'd change JSON.stringify to some other code supporting streaming JSON stringification. Mainly, it could be much faster if the stringification of a huge object would be chunked. https://github.com/philbooth/bfj package seems like a nice solution.

My instinct is that we proceed anyway, and hope that a fix is implemented within webpack shortly.

Hmm I'm not so sure about whether webpack would fix it in any short time period. If the stats.toJson method would start crashing, then I'd bet they would do something. But if it is only the stringifying part, then it seems like more of a power user feature to be able generate a big JSON file.

@joshwcomeau
Copy link
Contributor

joshwcomeau commented Jan 15, 2018

Did some digging, I have some questions.

It turns out the Node API for webpack supplies the stats object as part of the build process.

I still need to figure out exactly how to get the stats object to match the options when using the CLI, but one thing's for sure: we will need to JSON.stringify it so we can write it to file, which means that performance irrespective, I think it should be a flag to the build script.

For the bug being discussed about stringifying blowing up on huge JSON blobs, I imagine there's some kind of way to stream it? But

(I did do some perf checks and it seems like it'll take around 20ms to stringify and write to file, on a 2014 iMac, with a small project. I imagine it could be quite a bit longer on a larger project, but given this is an opt-in thing that won't run automatically, I think that's ok?)

So yeah, I think the best approach would be to have a CLI flag for the build script, since it's essentially the same script. For an ejected project:

"scripts": {
    "start": "node scripts/start.js",
    "build": "node scripts/build.js",
    "build:with-stats": "node scripts/build.js --stats",
    "test": "node scripts/test.js --env=jsdom"
}

I can use the convention established in the test script, and just use process.argv to get whether or not the flag was provided. And then I can use that to optionally do something like this:

const resolveArgs = {
  stats,
  previousFileSizes,
  warnings: messages.warnings,
};

if (buildStats) {
  fs.writeFile('bundle-stats.json', JSON.stringify(stats.toJson()), err => {
    if (err) {
      reject(err);
    }

    return resolve(resolveArgs);
  });
} else {
  return resolve(resolveArgs);
}

Lemme know if this sounds good (or what ought to be changed) and I'll get a PR submitted!

EDIT: Ah, just saw the comment about bfj. Yeah, that works as well! I misunderstood originally and thought the stringification happened within Webpack. Since we have control over it, we may as well use something like bjf.

@gaearon
Copy link
Contributor

gaearon commented Jan 15, 2018

Something with streaming probably sounds good to me

@valscion
Copy link
Contributor Author

As far as I'm concerned, your thoughts seem good to me, @joshwcomeau 👍

It might be useful to write the bundle-stats.json to the same directory as the built files (was it dist/? It's been a while I ran CRA.), as that will let you only use one argument to the webpack-bundle-analyzer CLI.

The CLI will by default read the bundled files from the same folder as the stats JSON file is in — if the stats JSON file is in a different directory than the bundle contents, CLI will need you to supply the bundle directory as the second argument.

@joshwcomeau
Copy link
Contributor

Sorry for the delay in this. It's a particularly busy week for me, traveling for work. Haven't had the time I expected to work on this! Will try and pick it up over the coming weekend.

It might be useful to write the bundle-stats.json to the same directory as the built files (was it dist/? It's been a while I ran CRA.), as that will let you only use one argument to the webpack-bundle-analyzer CLI.

Hmm... so my concern with that is that I think of the /build directory as the shippable production assets. My deploy script, for example, just copies that directory to the server. It doesn't fit, semantically, with the rest of the directory's contents, IMO.

I suspect it would cause more confusion as folks try and figure out where the stats file is written. Because of how I view the /build directory, I don't think it'd occur to me to check it for the stats file. Feels surprising to me, and I'm not sure if the benefit of being able to omit an arg for webpack-bundle-analyzer is worth the trade-off.

Maybe that's just me, though? Curious to get someone else's take on it, if /build is an intuitive place for the stats JSON to be written.

@valscion
Copy link
Contributor Author

I think you have valid points. We should then figure out how we can help users get the stats reported correctly. I think documenting this will be key :). Users will have to remember to use both CLI arguments with CRA.

webpack-bundle-analyzer ./build-stats.json ./build

@trevorwhealy
Copy link

trevorwhealy commented Jan 17, 2018

Curious to get someone else's take on it, if /build is an intuitive place for the stats JSON to be written.

@joshwcomeau I understand not wanting to include the stats file in the build folder because it's not a production asset, but the /build folder is the first place I would look for stats.json, considering it's an output of the build process.

@samccone
Copy link
Contributor

samccone commented Jan 18, 2018

My 2 cents when it comes to exporting these types of perf/stats from build pipelines

TL;DR

Add the ability for users to output full source included sourcemaps – making it trivial to use any sourcemap compatible bundle analyzer.


less tl;dr

My current thinking is that the webpack stats object only gives a small subset of the entire picture and is coupled to the a specific build pipeline :/ which encourages hyper specific tooling that is not portable across other bundle analysis pipelines.

What is the most common denominator when it comes to looking and understanding bundles then? I would say it is source-maps.

The nice thing about sourcemaps with embedded source (debug sourcemaps in webpack) is that the sourcemaps are a portable bundle representation that do a pretty excellent job of capturing the entire picture of what is in a bundle or bundles for a given application.

From a user's perspective what kind of questions arise when they are looking at a bundle or sets of bundles?

Based on my user research these are the questions that continually pop up.

  • What is in my bundle?
  • What is the largest thing in my bundle?
  • Is there duplication in my bundle?
  • Is there duplication across multiple bundles?
  • Within lib X that is a dep of my bundle what is eating up all the space?
  • Why is lib X making it into my bundle.

Luckily of all these questions can be answered via source maps introspection except for (why is lib X making it into my bundle).

For this reason I think that if CRA wants to make it easy for users to understand their bundle without having to eject their webpack config, outputting debug source maps in dev mode by default and optionally outputting debug source maps for a prod build is a solid approach that will be maximally is portable across tools.

As always thanks for your time and everyone's thoughts.

@andriijas
Copy link
Contributor

I think its really valuable to get a better understanding of bundling by looking at the output. If #3145 makes it into 2.0 it will be even more useful.

Generating the stats.json seems to be a fair way so that the user can pick the tool of flavour for them self.

@valscion are you up for a PR? I can help with docs.

@valscion
Copy link
Contributor Author

@valscion are you up for a PR? I can help with docs.

Which PR are you talking about? I think @joshwcomeau might be the one you should be asking here ☺️

@andriijas
Copy link
Contributor

@valscion Even i could manage to get the json file out of webpack going but you seemed to have knowledge about some concerns regarding it. Anyway I will appreciate and be thankful for anyone who wants to make an effort!

@joshwcomeau
Copy link
Contributor

joshwcomeau commented Jan 23, 2018

Interesting, @samccone! I've never heard of sourcemaps being used for anything other than understanding stack traces in dev.

Of the points you listed, are there any that stats.json can't answer? Of the webpack-specific analyzers/visualizers I've seen, none have tackled finding duplication across bundles, although it feels like it really ought to be possible.

I suppose from my perspective, it's about balancing the benefit of a tool-agnostic method vs. supporting what users actually use. I've been meaning to try Bundle Buddy for a while, but I'd also hate to lose the tool I already use, Webpack Visualizer. I'd wager that Webpack Bundle Analyzer is the most popular tool used by folks, and it feels like an incomplete solution if we don't support that. Unless we think tool-agnosticism is important enough to override that?

My personal sentiment is that we should support the most common pattern, since ejecting can handle more advanced/niche usecases. Just based on NPM downloads it looks like webpack-bundle-analyzer is leading the pack atm. Can always revisit in the future if that changes (or support both?)


For the location of stats.json in /build, my preference is weakly-held. If folks think it makes more sense to output it in /build, I'm alright with that!


My life continues to be busier than I expect. I spent some time this morning playing with this; I still need to test everything, but I think I should have a PR up in the next day or two. That said, I don't want to make any promises I can't keep! If anyone's getting impatient, don't let me be a blocker on this.

@samccone
Copy link
Contributor

Based on my comment above I put a design document for understanding bundles via sourcemaps
https://docs.google.com/document/d/1ycGVBJmwIVs34yhC0oTqv_WH5f0fs2SAFmzyTiBK99k/edit

your comments are appreciated!

--

, none have tackled finding duplication across bundles, although it feels like it really ought to be possible.

It is possible, bundle-buddy does just that https://bundle-buddy.firebaseapp.com/

but I'd also hate to lose the tool I already use, Webpack Visualizer

There is actually no reason that I know of why webpack vis can not gather its info from the sourcemaps vs stats.json :)

@gaearon
Copy link
Contributor

gaearon commented Jan 23, 2018

Add the ability for users to output full source included sourcemaps – making it trivial to use any sourcemap compatible bundle analyzer.

Don't we already do that?

@joshwcomeau
Copy link
Contributor

joshwcomeau commented Jan 27, 2018

Cool doc, @samccone!

It is possible, bundle-buddy does just that https://bundle-buddy.firebaseapp.com/

I know :) I mentioned it earlier. I meant specifically tools that use Webpack's stats.json

There is actually no reason that I know of why webpack vis can not gather its info from the sourcemaps vs stats.json :)

For sure, but "it's possible" and "it exists" are two very different things. I'd like to support the tools as they are now, not as they could be. Unless I've misunderstood, and you're saying that these tools already support it?


I think I'm gonna submit my PR as-is, with support for stats.json. Important to stress that I'm not saying that it's a better solution than sourcemaps! It's just the one that I believe solves the greatest user need as it is now (based on npm download numbers).

If someone else wants to add support for sourcemaps, I think it'd be pretty straightforward! I'd imagine we just need to find how they're being generated in dev, hook into it in prod, and write them to the build dir. But yeah, I do think we want both, and I don't have the time to tack on another task. Happy to coordinate with someone else on that though.

@joshwcomeau
Copy link
Contributor

Oh, there's totally already support for sourcemaps, at least according to the README template. Yay solving for both usecases!

joshwcomeau added a commit to joshwcomeau/create-react-app that referenced this issue Jan 28, 2018
As discussed in facebook#1858, we'd like to make it easy for users to view
bundle stats without ejecting.

This PR adds a new script that, along with building the site for
production, creates a `build-stats.json` file. This file can be
used with tools like Webpack Bundle Analyzer or Webpack Visualizer,
to help the user optimize their bundles.

Conflicts:
	package.json
	packages/react-scripts/package.json
@joshwcomeau
Copy link
Contributor

If anyone from earlier in the discussion wants to take a gander at the code, PR is here: #3945

@valscion
Copy link
Contributor Author

valscion commented Jun 4, 2018

Yassss #3945 was merged so this issue can be closed, I think! 🎉

💯 👍 work @joshwcomeau

@valscion valscion closed this as completed Jun 4, 2018
@Timer Timer modified the milestones: 2.x, 2.0.0 Oct 1, 2018
@lock lock bot locked and limited conversation to collaborators Jan 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants