puppeteer-prerender is a library that uses Puppeteer to fetch the pre-rendered html, meta, links, and Open Graph of a webpage, especially Single-Page Application (SPA).
const Prerenderer = require('puppeteer-prerender')
async function main() {
const prerender = new Prerenderer()
try {
const {
status,
redirect,
meta,
openGraph,
links,
html,
staticHTML
} = await prerender.render('https://www.example.com/')
} catch (e) {
console.error(e)
}
await prerender.close()
}
main()
Creates a prerenderer instance.
Default options:
{
// Boolean | Function. Whether to print debug logs.
// You can provide your custom log function, it should accept same arguments as console.log()
debug: false,
// Object. Options for puppeteer.launch().
// see https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#puppeteerlaunchoptions
puppeteerLaunchOptions: undefined,
// Number. Maximum navigation time in milliseconds.
timeout: 30000,
// String. Specific user agent to use in this page. The default value is set by the underlying Chromium.
userAgent: undefined,
// Boolean. Whether to follow 301/302 redirect.
followRedirect: false,
// Object. Extra meta tags to parse.
extraMeta: undefined,
// Object. Options for parse-open-graph.
// see https://github.com/kasha-io/parse-open-graph#parsemeta-options
parseOpenGraphOptions: undefined,
// Array. Rewrite URL to another location.
rewrites: undefined
}
Extra meta tags to parse. e.g.:
{
status: { selector: 'meta[http-equiv="Status" i]', property: 'content' },
icon: { selector: 'link[rel~="icon"]', property: 'href' }
}
The property name is the name of property which will be set in result.meta
object. selector
is the parameter of document.querySelector()
which used to select the element. property
is the property of the selected element which contains the value.
const result = await prerender.render('https://www.google.com/foo', {
rewrites: [
['https://www.google.com/:path(.*)', 'https://www.example.com/:path'],
['https://www.googletagmanager.com/(.*)', ''] // block
]
})
The page will load from https://www.example.com/foo
instead of https://www.google.com/foo
.
And requests to https://www.googletagmanager.com/*
will be blocked.
It uses url-rewrite module underlying.
Prerenders the page of the given url
.
Returns: Promise.
These options can be overrided:
{
timeout,
userAgent,
followRedirect,
extraMeta,
parseOpenGraphOptions,
rewrites
}
Return format:
{
status, // HTTP status code
redirect, // the redirect location if status is 301/302
meta: {
title,
description, // <meta property="og:description"> || <meta name="description">
image, // <meta property="og:image"> or first <img> which width & height >= 300
canonicalURL, // <link rel="canonical"> || <meta property="og:url">
// <meta rel="alternate" hreflang="de" href="https://m.example.com/?locale=de">
locales: [
{ lang: 'de', href: 'https://m.example.com/?locale=de' },
// ...
],
// <meta rel="alternate" media="only screen and (max-width: 640px)" href="https://m.example.com/">
media: [
{ media: 'only screen and (max-width: 640px)', href: 'https://m.example.com/' },
// ...
],
author, // <meta name="author">
// <meta property="article:tag"> || <meta name="keywords"> (split by comma)
keywords: [
'keyword1',
// ...
]
/*
extraMeta will also be set in here
*/
},
openGraph, // Open Graph object
// The absolute URLs of <a> tags.
// Useful for crawling the next pages.
links: [
'https://www.example.com/foo?bar=1',
// ...
],
html // page html
staticHTML // static html (scripts removed)
}
The openGraph
object format:
{
og: {
title: 'Open Graph protocol',
type: 'website',
url: 'http://ogp.me/',
image: [
{
url: 'http://ogp.me/logo.png',
type: 'image/png',
width: '300',
height: '300',
alt: 'The Open Graph logo'
},
]
description: 'The Open Graph protocol enables any web page to become a rich object in a social graph.'
},
fb: {
app_id: '115190258555800'
}
}
See parse-open-graph for details.
Closes the underlying browser.
Opens or disables debug mode.
Sets the default timeout value.
Sets the default user agent.
Sets the default value of followRedirect.
Sets the default value of extraMeta.
Sets the default value of parseOpenGraphOptions.