Puppeteer Scraper with Nodejs, Express Typescript server

Crawl any website using puppeteer concurrently & serve the data using a server app with nodejs, express & typescript

What can it do?

Crawl a whole website or a specific page
Save the data in a JSON file
Save screenshots of the website
Serve the data using a server app with nodejs, express & typescript

How to use?

Clone the repository
Run npm install to install the dependencies
Run npm start to start the server
Edit src/crawler/crawler.ts/ to change the crawler settings to your needs
Make a POST request to http://localhost:3000/crawl. Example request with curl:

curl -X POST -H "Content-Type: application/json" -d '{
  "userId": "yourUserId",
  "serviceId": "yourServiceId",
  "startUrl": "https://typesense.org/docs/",
  "maxDepth": 3,
  "takeScreenshot": true,
  "colorScheme": "light"
}' http://localhost:3000/start-crawl

Profit!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
js-version-and-backups		js-version-and-backups
src		src
.env		.env
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Puppeteer Scraper with Nodejs, Express Typescript server

What can it do?

How to use?

About

Releases

Packages

Languages

onurusluca/puppeteer-scraper

Folders and files

Latest commit

History

Repository files navigation

Puppeteer Scraper with Nodejs, Express Typescript server

What can it do?

How to use?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages