Skip to content

Commit

Permalink
Save full Firefox profile
Browse files Browse the repository at this point in the history
Save the whole Firefox profile directory instead of only saving a few of
its subcomponents. Remove an unused import of shutil from
profile_commands.py.

Additionally, remove the `extension_port.txt` file after reading the
port from it, to prevent reading stale port information when a browser
is restarted after a crash.

Finally, remove a part of the documentation that references the old way
of dumping the profile and update a leftover reference to the
`log_directory` config option.

Closes #62.
  • Loading branch information
boolean5 committed May 6, 2021
1 parent dafc26c commit 5dff51c
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 48 deletions.
14 changes: 2 additions & 12 deletions docs/Configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ of configurations of `class<BrowserParams>`.
- `data_directory`
- The directory into which screenshots and page dumps will be saved
- [Intended to be removed by #232](https://github.com/mozilla/OpenWPM/issues/232)
- `log_directory` -> supported file extensions are `.log`
- `log_path` -> supported file extensions are `.log`
- The path to the file in which OpenWPM will log. The
directory given will be created if it does not exist.
- `failure_limit` -> has to be either of type `int` or `None`
Expand Down Expand Up @@ -287,17 +287,7 @@ browser before visiting the next `site` in `sites`.

### Loading and saving a browser profile

It's possible to load and save profiles during stateful crawls. Profile dumps
currently consist of the following browser storage items:

- cookies
- localStorage
- IndexedDB
- browser history

Other browser state, such as the browser cache, is not saved. In
[Issue #62](https://github.com/citp/OpenWPM/issues/62) we plan to expand
profiles to include all browser storage.
It's possible to load and save profiles during stateful crawls.

#### Save a profile

Expand Down
1 change: 1 addition & 0 deletions openwpm/browser_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -674,6 +674,7 @@ def _start_extension(self, browser_profile_path: Path) -> ClientSocket:
with open(ep_filename, "rt") as f:
port = int(f.read().strip())

ep_filename.unlink()
self.logger.debug(
"BROWSER %i: Connecting to extension on port %i"
% (self.browser_params.browser_id, port)
Expand Down
46 changes: 10 additions & 36 deletions openwpm/commands/profile_commands.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import logging
import shutil
import tarfile
from pathlib import Path

Expand Down Expand Up @@ -47,47 +46,22 @@ def dump_profile(
% (browser_params.browser_id, browser_profile_path, tar_path)
)

storage_vector_files = [
tar.add(browser_profile_path, arcname="")
archived_items = tar.getnames()
tar.close()

required_items = [
"cookies.sqlite", # cookies
"cookies.sqlite-shm",
"cookies.sqlite-wal",
"places.sqlite", # history
"places.sqlite-shm",
"places.sqlite-wal",
"webappsstore.sqlite", # localStorage
"webappsstore.sqlite-shm",
"webappsstore.sqlite-wal",
]
storage_vector_dirs = [
"webapps", # related to localStorage?
"storage", # directory for IndexedDB
]
for item in storage_vector_files:
full_path = browser_profile_path / item
if (
not full_path.is_file()
and not full_path.name.endswith("shm")
and not full_path.name.endswith("wal")
):
for item in required_items:
if item not in archived_items:
logger.critical(
"BROWSER %i: %s NOT FOUND IN profile folder, skipping."
% (browser_params.browser_id, full_path)
"BROWSER %i: %s NOT FOUND IN profile folder"
% (browser_params.browser_id, item)
)
elif not full_path.is_file() and (
full_path.name.endswith("shm") or full_path.name.endswith("wal")
):
continue # These are just checkpoint files
tar.add(full_path, arcname=item)
for item in storage_vector_dirs:
full_path = browser_profile_path / item
if not full_path.is_dir():
logger.warning(
"BROWSER %i: %s NOT FOUND IN profile folder, skipping."
% (browser_params.browser_id, full_path)
)
continue
tar.add(full_path, arcname=item)
tar.close()
raise RuntimeError("Profile dump not successful")


class DumpProfileCommand(BaseCommand):
Expand Down

0 comments on commit 5dff51c

Please sign in to comment.