-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Playwright content fetcher
You can configure changedetection.io to fetch pages using the excellent and very fast Playwright backend https://docs.browserless.io/docker/docker-quickstart (otherwise it will fetch using a plain non-JS built in browser)
The official hosted version also comes with 1 preconfigured Chrome browser (and you can add more!) see https://changedetection.io
See docker-compose.yml for more examples
In docker-compose.yml uncomment PLAYWRIGHT_DRIVER_URL
under environment
, and the playwright-chrome
section under services
.
docker run -d --name browserless \
-e "DEFAULT_LAUNCH_ARGS=[\"--window-size=1920,1080\"]" \
--rm -p 3000:3000 \
--shm-size="2g" \
dgtlmoon/sockpuppetbrowser:latest
This assumes Playwright is being installed and run on the same server as changedection.io - if running on a different server adjust changedetection.io variables accordingly - ensure firewall ports are open. Process below tested and working on Debian 11.
Install the nodejs 16 repo
curl -fsSL https://deb.nodesource.com/setup_lts.x | sudo -E bash -
Install the dependencies
sudo apt install python3-dev python3-pip nodejs build-essential ca-certificates curl dumb-init ffmpeg fontconfig fonts-freefont-ttf fonts-gfs-neohellenic fonts-indic fonts-ipafont-gothic fonts-kacst fonts-liberation fonts-noto-cjk fonts-noto-color-emoji fonts-roboto fonts-thai-tlwg fonts-ubuntu fonts-wqy-zenhei gconf-service git libappindicator1 libappindicator3-1 libasound2 libatk-bridge2.0-0 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgbm-dev libgbm1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 locales lsb-release msttcorefonts pdftk unzip wget xdg-utils xvfb
Install playwright via Pip (especially if you get the error "No module named 'playwright'"
)
python3 -m pip install playwright
Clone this git repo to a folder of your choice (eg. /opt/)
git clone https://github.com/browserless/chrome /opt/browserless
cd into folder git cloned into then run
npm install
npm run build
npm prune production
Systemd service configs (/etc/systemd/system/)
Example browserless.service:
[Unit]
Description=browserless service
After=network.target
[Service]
Environment=APP_DIR=/opt/browserless
Environment=PLAYWRIGHT_BROWSERS_PATH=/opt/browserless
Environment=CONNECTION_TIMEOUT=60000
Environment=HOST=127.0.0.1
Environment=LANG="C.UTF-8"
Environment=NODE_ENV=production
Environment=PORT=3000
Environment=WORKSPACE_DIR=/opt/browserless/workspace
WorkingDirectory=/opt/browserless
ExecStart=/opt/browserless/start.sh
SyslogIdentifier=browserless
[Install]
WantedBy=default.target
Example changedetection.service
[Unit]
Description=changedetection.io service
After=network.target browserless.service
Wants=browserless.service
[Service]
Environment=PLAYWRIGHT_DRIVER_URL=ws://127.0.0.1:3000/?stealth=1&--disable-web-security=true
ExecStart=/usr/local/bin/changedetection.io -d /opt/change-detection -p 80
SyslogIdentifier=change-detection
[Install]
WantedBy=default.target
Enable services:
systemctl enable browserless.service
systemctl enable changedetection.service
Manual control:
systemctl start [service]
systemctl stop [service]
There seems to be some memory leak in playwright https://github.com/microsoft/playwright/issues/6319 , as yet there does not seem to be a solution, this can easily consume 200Mb->several gigabytes, restarting the service seems to be very fast and so far the best way to mitigate this
Crontab every x minutes..
#!/bin/bash
# Check if >240Mb and kill
# @todo - you need to find a way to restart :)
ps -C 'python ./changedetection.py -d /datastore' u|grep -v PID|awk '$6 > 240000 {print $2};'|while read pid
do
kill -9 $pid
# add your restart line here
# or use docker restart changedetection.io
done
Create a file named restart-changedetection.sh
with your favorite text editor, copy/paste (and edit if you need to) the script, save it to any folder you want (eg. /opt) and chmod it to 755:
#!/bin/bash
# Check if >240Mb, kill and restart the service
ps -C changedetection u|grep -v PID|awk '$6 > 240000 {print $2};'|while read pid
do
kill -9 $pid
systemctl restart changedetection.service
done
Use crontab to run it every few minutes; run crontab -e
, add something like the following code to a new line on the bottom and save:
*/5 * * * * /opt/restart-changedetection.sh >/dev/null 2>&1
The upper code makes the script run every 5th minute of the hour (eg. 02:10, 02:15, 02:20...) and doesn't show any output.