A printer that can print multiple web pages as one pretty PDF
with outlines, without distractions
and learn in depth
Warning
Respect the copyright please! Do not share non-public content on the Internet, especially paid content!
Playwright is used to print PDFs, similar to printing in Chrome, but with the added ability to print multiple web pages into one seamless PDF automatically.
- Fully customizable as it is a Node.js library.
- Universal compatibility with any website through plugins.
- Unique feature to replace internal website links with internal PDF links, supporting hash positioning.
- Automatically generates PDF outlines, with support for different levels and collapsed statuses.
- Easy to remove distracting elements, leaving only pure knowledge.
Warning
Web Printer is a Node.js library, not an application. If you're new to Node.js/TypeScript/JavaScript, Web Printer might be challenging to use. An app is currently being developed for general use. Please follow @pbkapp for updates.
If you're not a beginner, feel free to proceed as you would with any npm package installation.
pnpm i playwright @web-printer/core
# Web Printer use Chrome by default. Other supported browsers can be viewed in PrinterOption.channel.
# If you have installed Chrome, you can skip it.
pnpm exec playwright install chrome
# install plugin you need
pnpm i @web-printer/vitepress
Then create a .ts
file, input
import { Printer } from "@web-printer/core"
// import plugin you have installed
import vitepress from "@web-printer/vitepress"
// Will open a browser to login if you need.
// new Printer().login(url)
new Printer()
.use(
vitepress({
url: {
Guide: "https://vuejs.org/guide/introduction.html",
API: "https://vuejs.org/api/application.html"
}
})
)
.print("Vue 3.2 Documentation")
And run it by tsx, in other ways may throw errors. I have no time to fix it now.
But if you are a novice, follow me, maybe easier.
First you shoud install pnpm(with node), vscode(support typescript).
pnpm create printer@latest
# or complete in one step. https://github.com/ourongxing/web-printer/tree/main/packages/create-printer
pnpm create printer@latest web-printer -p vitepress -c chrome
And follow the tips. After customizing, use pnpm print
to print. A pretty PDF will appear in ./output
.
The @web-printer/core provide a Printer object, some types and some utilities.
import { Printer, type Plugin } from "@web-printer/core"
import type { Plugin, PrinterOption, PrinterPrintOption } from "@web-printer/core"
// Will open a browser to login if you need.
// new Printer().login(url)
new Printer({} as PrinterOption)
.use({} as Plugin)
.print("PDF name", {} as PrinterPrintOption )
PrinterOption
extends Playwright browserType.launchPersistentContext
options.
{
/**
* Chromium distribution channel. Choose you have installed.
* @default "chrome"
* */
channel?: "chromium" | "chrome" | "chrome-beta" | "chrome-dev" | "chrome-canary" | "msedge" | "msedge-beta" | "msedge-dev" | "msedge-canary"
/**
* Dir of userdata of Chrome. It is not recommended to use your system userData of Chrome.
* @default "./userData"
*/
userDataDir?: string
/**
* Dir of output pdfs
* @default "./output"
*/
outputDir?: string
/**
* Number of threads to print, will speed up printing.
* @default 1
*/
threads?: number
}
PrinterPrintOption
extends Playwright page.pdf()
options.
{
/**
* Used for outline. If given, Printer could fetch titles and set it as part of outline.
* @default 0 means not set sub titles as outline.
*/
subTitleOutline?: number
/**
* Make a test print, only print two pages and name will be appended "test: "
* @default false
*/
test?: boolean
/**
* Filter the pages you want
*/
filter?: PageFilter
/**
* Reverse the printing order.
* If the outline has different levels, outline may be confused.
*/
reverse?: boolean
/**
* A local cover pdf path.
* Maybe you can use it to marge exist pdf, but can't merge outlines.
*/
coverPath?: string
/**
* inject additonal css
*/
style?: string | (false | undefined | string)[]
/**
* Set the top and bottom margins of all pages except the first page of each artical to zero.
* @default false
*/
continuous?: boolean
/**
* Replace website link to PDF link
* @default false
*/
replaceLink?: boolean
/**
* Add page numbers to the bottom center of the page.
* @default false
* @requires PrinterPrintOption.continuous = false
*/
addPageNumber?: boolean
/**
* Margins of each page
* @default
* {
* top: 60,
* right: 55,
* bottom: 60,
* left: 55,
* }
*/
margin?: {
/**
* @default 60
*/
top?: string | number
/**
* @default 55
*/
right?: string | number
/**
* @default 60
*/
bottom?: string | number
/**
* @default 55
*/
left?: string | number
}
/**
* Paper format. If set, takes priority over `width` or `height` options.
* @defaults "A4"
*/
format?: "A0" | "A1" | "A2" | "A3" | "A4" | "A5" | "Legal" | "Letter" | "Tabloid"
}
Plugins in Web Printer is only used to adapt to different websites.
A plugin have five methods:
fetchPagesInfo
: Used to fetch a list of page url and title, need return the list.injectStyle
: Used to remove distracting elements and make web pages more PDF-friendly.onPageLoaded
: Run after page loaded.onPageWillPrint
: Run before page will be printed.otherParams
: Used to place other useful params.
- Content Site
- Amazing Blog
- Documentation Site Generator
In fact, it is just use Playwright to inject JS and CSS into the page. You can read the code of offical plugins to learn how to write a plugin. It's pretty simple most of the time.
Let's make some rules
- Use a function to return a plugin.
- The function parameter is an options object.
- If the number of pages info to be fetched is large and fetched slow, you need to provide the
maxPages
option, especially endless loading.
Used to fetch a list of page url and title, need return the list. Usually need to parse sidebar outline. Web Printer could restore the hierarchy and collapsed state of the original outline perfectly.
type fetchPagesInfo = (params: {context: BrowserContext}) => MaybePromise<PageInfoWithoutIndex[]>
interface PageInfoWithoutIndex {
url: string
title: string
/**
* Outer ... Inner
*/
groups?: (
| {
name: string
collapsed?: boolean
}
| string
)[]
/**
* When this item is a group but have a link and content.
*/
selfGroup?: boolean
collapsed?: boolean
}
The pageInfo need returned just like
// https://javascript.info/
[
{
title: "Manuals and specifications",
url: "https://javascript.info/manuals-specifications",
groups: [
{
name: "The JavaScript language"
},
{
name: "An introduction"
}
]
},
...
]
Examples
-
simple outline: javascript-info/src/index.ts
-
complex outline: mdbook/src/index.ts
-
scroll loading: juejin/src/index.ts
-
pagination: zhihu/src/index.ts
Used to remove distracting elements and make web pages more PDF-friendly.
type injectStyle = (params: { url: string; printOption: PrinterPrintOption }): MaybePromise<{
style?: string
contentSelector?: string
titleSelector?: string
avoidBreakSelector?: string
}>
Let's make some rules:
- Hide all elements but content.
- Set the margin of the content element and it's ancestor elements to zero.
Therefore, everyone can set the same margin for any website.
Don't worry, It's so easy. You only need to provide a contentSelector
, support selector list. Web Printer can hide all elements but it and make the margin of it and it's ancestor elements zero automatically.
But not all websites can do this, sometimes you still need to write CSS yourself, just return the style
property.
When you set PrinterPrintOption.continuous
to true
. Web Printer will set the top and bottom margins of all pages to zero.
The titleSelector
is used to mark the title element, and set top margin for it only. The default value is same as contentSelector
if contentSelector
is not empty. And If contentSelector
has ,
, Printer will use the first selector. If titleSelector
and contentSelector
are both empty, the default value will be body
, but sometimes setting margin top for the body may result in extra white space.
The avoidBreakSelector
is used to avoid page breaks in some elements. The default value is pre,blockquote,tbody tr
Run after page loaded. Usually used to wait img loaded, especially lazy loaded images.
type onPageLoaded = (params: { page: Page; pageInfo: PageInfo; printOption: PrinterPrintOption }): MaybePromise<void>
Web Printer provide two methods to handle image loading:
-
type evaluateWaitForImgLoad = (page: Page, imgSelector = "img"): Promise<void>
-
type evaluateWaitForImgLoadLazy = ( page: Page, imgSelector = "img", waitingTime = 200 ): Promise<void>
Run before page will be printed.
type onPageWillPrint = (params: { page: Page; pageInfo: PageInfo; printOption: PrinterPrintOption }): MaybePromise<void>
Used to place other useful params.
type otherParams = (params: { page: Page; pageInfo: PageInfo; printOption: PrinterPrintOption }): MaybePromise<{
hashIDSelector: string
}>
In some sites, such as Wikipedia, like to use a hash id to jump to the specified element. If you give the hashIDSelector
and PrinterPrintOption.replaceLink
is true
, Printer could replace the hash of url to PDF position. The default value is h2[id],h3[id],h4[id],h5[id]
.
PDF generated by Web Printer maybe need to be shrinked in size by yourself.
MIT ©