Command line tool for crawling web pages with PageGraph.
For building the tool, you need to have tsc
(TypeScript Compiler) package installed.
npm install
npm run build
npm run test
The tests are defined in test/test.js
. Test parameters are defined in test/config.js
and can be overriden via environment variables. You need to specify a PageGraph binary path.
Since PageGraph is built as part of Brave Nightly, you can simply point the binary path to be your local installation.
npm run crawl -- \
-b /Applications/Brave\ Browser\ Nightly.app/Contents/MacOS/Brave\ Browser\ Nightly \
-u https://brave.com \
-t 5 \
-o output/ \
--debug debug
The -t
specifies how many seconds to crawl the URL provided in -u
using the PageGraph binary in -b
.
You can see all supported options:
npm run crawl -- -h
NOTE: PageGraph currently does not track puppeteer / automation scripts, and so modifying or interacting with the document through devtools/puppeteer while recording a PageGraph file will fail.