Skip to content
Danny Lin edited this page Oct 28, 2024 · 12 revisions

Sidebar and Backend Server

Due to the security restriction of the browser, several advanced features of WebScrapBook requires a running collaborating backend server to work. The backend server can be set up using our PyWebScrapBook, with basic instruction in Basic.

PyWebScrapBook supports many configuration that can be adjusted by editing config.ini. You can run wsb help config under CLI to see available configs. Below are some useful configurations:

Host a scrapbook with default PyWebScrapBook directory structure

The data structure of legacy ScrapBook X is to store captured pages under data/ and metadata under tree/, while the default data structure of PyWebScrapBook is to store captured pages under root directory and metadata under .wsb/tree/, so that all metadata are under .wsb/ and is easier to manage.

To use the default data structure of PyWebScrapBook, simply omit wsb config -ba before running wsb serve for a directory. If it has been run, you can edit .wsb/config.ini and comment out [book ""] section like this:

; [book ""]
; name = scrapbook
; top_dir = 
; data_dir = data
; tree_dir = tree
; index = tree/map.html
; no_tree = false

Or reassign values for them, like:

[book ""]
name = scrapbook
top_dir = 
data_dir = 
tree_dir = .wsb/tree
index = .wsb/tree/map.html
no_tree = false

Host a chrooted scrapbook

By default the backend server is allowed to access the special .wsb directory. For better security and error-proofing, the root directory of the backend server can be configured to not include .wsb.

For example, for the following directory structure:

C:\Users\MyUserName\ScrapBooks
C:\Users\MyUserName\ScrapBooks\.wsb
C:\Users\MyUserName\ScrapBooks\public
C:\Users\MyUserName\ScrapBooks\public\tree
C:\Users\MyUserName\ScrapBooks\public\data

below configuration uses C:\Users\MyUserName\ScrapBooks\public as the root directory of the backend server and C:\Users\MyUserName\ScrapBooks\.wsb is never accessible:

[app]
root = public

[book ""]
name = scrapbook
top_dir = 
data_dir = data
tree_dir = tree
index = tree/map.html
no_tree = false

This will make the backup directory (whose default path is .wsb/backup) inaccessible from the web interface of the backend server. If it's not desired, configure app.backup_dir to a path under app.root, such as public/backup.

Host multiple scrapbooks under the backend server

For example, to host C:\Users\MyUserName\ScrapBooks as the root directory of the backend server, with three scrapbooks, using the following directory structure:

C:\Users\MyUserName\ScrapBooks
C:\Users\MyUserName\ScrapBooks\scrapbook1
C:\Users\MyUserName\ScrapBooks\scrapbook2
C:\Users\MyUserName\ScrapBooks\scrapbook3

Enter CLI, change working directory to C:\Users\MyUserName\ScrapBooks, and run wsb config -ba to generate config files, and edit C:\Users\MyUserName\ScrapBooks\.wsb\config.ini with text editor and modify [book ""] to the following:

Above steps can be simplified to running wsb --root "C:\Users\MyUserName\ScrapBooks" config -bae.

[book ""]
name = Book1
top_dir = scrapbook1
data_dir = 
tree_dir = .wsb/tree
index = .wsb/tree/map.html
no_tree = false

[book "scrapbook2"]
name = Book2
top_dir = scrapbook2
data_dir = 
tree_dir = .wsb/tree
index = .wsb/tree/map.html
no_tree = false

[book "scrapbook3"]
name = Book3
top_dir = scrapbook3
data_dir = 
tree_dir = .wsb/tree
index = .wsb/tree/map.html
no_tree = false

And run C:\Users\MyUserName\ScrapBooks\.wsb\serve.py to start the backend server. Different scrapbooks can be switched form the sidebar (their display name are determined by the names above).

Transfer items between scrapbooks

Items can be transfered between scrapbooks. Just select items to transfer and use the Copy to... command to copy them to another scrapbook. Or open a manage window using the Manage command, and select items to transfer and drag them into the other window.

Alternatively, items can be exported into an archive file, and then imported into another scrapbook.

Host behind a reverse proxy

When the server is run behined a reverse proxy, it may require information through X-Forwarded-For, X-Forwarded-Host, and X-Forwarded-Prefix headers set by the reverse proxy to work correctly.

These headers are by default ignored, and corresponding configs need to be set to consume them. Note that not all proxies set all these headers, and the corresponding config should not be set if there's not really a proxy that sets the header, to prevent a security issue if the client provides a faked header.

Example 1

If https://example.com/ is served by an nginx server which pass https://scrapbooks.example.com/ to http://127.0.0.1:8000/ served by PyWebScrapBook, the nginx server can set following headers:

location / {
  proxy_pass http://127.0.0.1:8000;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header X-Forwarded-Proto $scheme;
  proxy_set_header X-Forwarded-Host $host;
  proxy_set_header X-Forwarded-Port $server_port;
}

And the PyWebScrapBook application can be configured like this:

[app]
...
allowed_x_for = 1
allowed_x_proto = 1
allowed_x_host = 1
allowed_x_port = 1
allowed_x_prefix = 0

Example 2

If https://example.com/ is served by an nginx server which pass https://example.com/scrapbooks/ to http://127.0.0.1:8000/ served by PyWebScrapBook, the nginx server can set following headers:

location /scrapbooks/ {
  rewrite ^/scrapbooks/(.*)$ /$1 break;
  proxy_pass http://127.0.0.1:8000;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header X-Forwarded-Proto $scheme;
  proxy_set_header X-Forwarded-Host $host;
  proxy_set_header X-Forwarded-Port $server_port;
  proxy_set_header X-Forwarded-Prefix /scrapbooks;
}

And the PyWebScrapBook application can be configured like this:

[app]
...
allowed_x_for = 1
allowed_x_proto = 1
allowed_x_host = 1
allowed_x_port = 1
allowed_x_prefix = 1