Skip to content

Fetch the pre-rendered content, meta and Open Graph of a SPA

License

Notifications You must be signed in to change notification settings

jiangfengming/puppeteer-prerender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

puppeteer-prerender

puppeteer-prerender is a library that uses Puppeteer to fetch the pre-rendered html, meta, links, and Open Graph of a webpage, especially Single-Page Application (SPA).

Usage

const Prerenderer = require('puppeteer-prerender')

async function main() {
  const prerender = new Prerenderer()

  try {
    const {
      status,
      redirect,
      meta,
      openGraph,
      links,
      html,
      staticHTML
    } = await prerender.render('https://www.example.com/')
  } catch (e) {
    console.error(e)
  }

  await prerender.close()
}

main()

APIs

new Prerenderer(options)

Creates a prerenderer instance.

Default options:

{
  // Boolean | Function. Whether to print debug logs.
  // You can provide your custom log function, it should accept same arguments as console.log()
  debug: false,

  // Object. Options for puppeteer.launch().
  // see https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#puppeteerlaunchoptions
  puppeteerLaunchOptions: undefined,

  // Number. Maximum navigation time in milliseconds.
  timeout: 30000,

  // String. Specific user agent to use in this page. The default value is set by the underlying Chromium.
  userAgent: undefined,

  // Boolean. Whether to follow 301/302 redirect.
  followRedirect: false,

  // Object. Extra meta tags to parse.
  extraMeta: undefined,
  
  // Object. Options for parse-open-graph.
  // see https://github.com/kasha-io/parse-open-graph#parsemeta-options
  parseOpenGraphOptions: undefined,
  
  // Array. Rewrite URL to another location.
  rewrites: undefined
}

extraMeta

Extra meta tags to parse. e.g.:

{
  status: { selector: 'meta[http-equiv="Status" i]', property: 'content' },
  icon: { selector: 'link[rel~="icon"]', property: 'href' }
}

The property name is the name of property which will be set in result.meta object. selector is the parameter of document.querySelector() which used to select the element. property is the property of the selected element which contains the value.

rewrites

const result = await prerender.render('https://www.google.com/foo', {
  rewrites: [
    ['https://www.google.com/:path(.*)', 'https://www.example.com/:path'],
    ['https://www.googletagmanager.com/(.*)', ''] // block
  ]
})

The page will load from https://www.example.com/foo instead of https://www.google.com/foo. And requests to https://www.googletagmanager.com/* will be blocked.

It uses url-rewrite module underlying.

prerenderer.render(url, options)

Prerenders the page of the given url.

Returns: Promise.

These options can be overrided:

{
  timeout,
  userAgent,
  followRedirect,
  extraMeta,
  parseOpenGraphOptions,
  rewrites
}

Return format:

{
  status, // HTTP status code
  redirect, // the redirect location if status is 301/302

  meta: {
    title,
    description, // <meta property="og:description"> || <meta name="description">
    image, // <meta property="og:image"> or first <img> which width & height >= 300
    canonicalURL, // <link rel="canonical"> || <meta property="og:url">

    // <meta rel="alternate" hreflang="de" href="https://m.example.com/?locale=de">
    locales: [
      { lang: 'de', href: 'https://m.example.com/?locale=de' },
      // ...
    ],

    // <meta rel="alternate" media="only screen and (max-width: 640px)" href="https://m.example.com/">
    media: [
      { media: 'only screen and (max-width: 640px)', href: 'https://m.example.com/' },
      // ...
    ],

    author, // <meta name="author">

    // <meta property="article:tag"> || <meta name="keywords"> (split by comma)
    keywords: [
      'keyword1',
      // ...
    ]

    /*
      extraMeta will also be set in here
    */
  },

  openGraph, // Open Graph object

  // The absolute URLs of <a> tags.
  // Useful for crawling the next pages.
  links: [
    'https://www.example.com/foo?bar=1',
    // ...
  ],

  html // page html
  staticHTML // static html (scripts removed)
}

The openGraph object format:

{
  og: {
    title: 'Open Graph protocol',
    type: 'website',
    url: 'http://ogp.me/',
    image: [
      {
        url: 'http://ogp.me/logo.png',
        type: 'image/png',
        width: '300',
        height: '300',
        alt: 'The Open Graph logo'
      },
    ]
    description: 'The Open Graph protocol enables any web page to become a rich object in a social graph.'
  },
  fb: {
    app_id: '115190258555800'
  }
}

See parse-open-graph for details.

prerenderer.close()

Closes the underlying browser.

prerenderer.debug

Opens or disables debug mode.

prerenderer.timeout

Sets the default timeout value.

prerenderer.userAgent

Sets the default user agent.

prerenderer.followRedirect

Sets the default value of followRedirect.

prerender.extraMeta

Sets the default value of extraMeta.

prerender.parseOpenGraphOptions

Sets the default value of parseOpenGraphOptions.

License

MIT

About

Fetch the pre-rendered content, meta and Open Graph of a SPA

Resources

License

Stars

Watchers

Forks

Packages

No packages published