-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add a guide to running Lighthouse at scale #10511
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
|
||
# Running Lighthouse at Scale | ||
|
||
Many Lighthouse users want to collect Lighthouse data for hundreds or thousands of URLs daily. First, anyone interested should understand [how variability plays into web performance measurement](./variability.md) in the lab. | ||
|
||
There are three primary options for gathering Lighthouse data at scale. | ||
|
||
## Option 1: Using the PSI API | ||
|
||
The default quota of the [PageSpeed Insights API](https://developers.google.com/speed/docs/insights/v5/get-started) is 25,000 requests per day. Of course, you can't test localhost or firewalled URLs using the PSI API, unless you use a security-concerning solution like [ngrok](https://ngrok.com/) to web-expose them. | ||
|
||
A huge benefit of using the PSI API is that you don't need to create and maintain [a stable testing environment](./variability.md#run-on-adequate-hardware) for Lighthouse to run. The PSI API has Lighthouse running on Google infrastructure which offers good reproducibility. | ||
|
||
* PRO: You don't need to maintain testing hardware. | ||
* PRO: A simple network request returns complete Lighthouse results | ||
* CON: The URLs must be web-accessible. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the other big con here is limited control over configuration, even if the URL is web accessible, testing behind auth, different form factors or throttling isn't really possible |
||
|
||
Approx eng effort: ~5 minutes for the first result. ~30 minutes for a script that evaluates and saves the results for hundreds of URLs. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as someone who has written multiple systems that save lighthouse results to some sort of database now I think ~30 minutes is a massive underestimation of trying to save and query LH results in any way that isn't "dump these reports to a local filesystem" I think a big con of 1&2 and pro of 3 is that you don't need to worry about doing a bunch of work to consume the data. There's a great emphasis here already on creating and maintaining a test environment which is definitely a big stumbling block, but the storage of historical data is also pretty complex and annoying to build. Being upfront about that might help folks choose the right solution for them. e.g. if you've got some big bespoke storage plan for how you consume the data on your platform then LHCI isn't really bringing much to the table for you, but if it's something you haven't thought about at all, then you're gonna be in for a lot of frustration pretty quick with option 1 or 2 when you realize you can't do anything useful with this hunk of reports on your filesystem WDYT about breaking the eng effort into 3 components instead of 2? "first result", "setup of larger system to collect", "setup of larger system to query/consume" |
||
|
||
## Option 2: Using the Lighthouse CLI on cloud hardware | ||
|
||
The [Lighthouse CLI](https://github.com/GoogleChrome/lighthouse#using-the-node-cli) is the foundation of most advanced uses of Lighthouse and provides considerable configuration possibilities. For example, you could launch a fresh Chrome in a debuggable state (`chrome-debug --port=9222`) and then have Lighthouse repeatedly reuse the same Chrome. (`lighthouse <url> --port=9222`). That said, we wouldn't recommend this above a hundred loads, as state can accrue in a Chrome profile. Using a fresh profile for each Lighthouse run is the best approach for reproducible results. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should we use an example that we would endorse? :) maybe custom headers/throttling options/puppeteer/etc? |
||
|
||
Many teams have wrapped around the Lighthouse CLI with bash, python, or node scripts. The npm modules [multihouse](https://github.com/samdutton/multihouse) and [lighthouse-batch](https://www.npmjs.com/package/lighthouse-batch) both leverage this pattern. | ||
|
||
You'll be running Lighthouse CLI on your own machines, and we have guidance on the [specs of machines suitable](./variability.md#run-on-adequate-hardware) for running Lighthouse without skewing performance results. The environment must also be able to run either headful Chrome or headless Chrome. | ||
|
||
* PRO: Ultimate configurability | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PRO: supports on-premise and private URLs There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be good to have some recommendations if you want to run lighthouse inside a Docker container. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Our general recommendation on docker is: don't if you can help it Probably worth adding to this doc @paulirish the various issues we've seen with docker and why it should be avoided (shared memory issue and flags to workaround, etc) :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found this on and old issue: #6162 (comment) I've been looking for documentation about "Device Class" or if is still being used. In my case, I will create a bucket for devices (similar to the device class) and group the results by ranges of benchmarkIndex There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We shelved the device class targeting because the absolute value of |
||
* CON: Must create and maintain testing environment | ||
|
||
Approx eng effort: 1 day for the first result, after provisioning and setup. Another 2-5 days for calibrating, troubleshooting, handling interaction with cloud machines. | ||
|
||
## Option 3: Gather data via a service that integrates Lighthouse | ||
|
||
Many are listed in our readme: https://github.com/GoogleChrome/lighthouse#lighthouse-integrations-in-web-perf-services |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love that this is a prereq :)