Skip to content

Releases: openzim/zimit

2.1.2

09 Sep 14:39
6d5fc0b
Compare
Choose a tag to compare

Changed

  • Upgrade to browsertrix crawler 1.3.0-beta.1 (#387) (fixes "Ziming a website with huge assets (e.g. PDFs) is failing to proceed" - #380)

2.1.1

05 Sep 07:46
501520d
Compare
Choose a tag to compare

Added

  • Add support for uncompressed tar archive in --warcs (#369)

Changed

  • Upgrade to browsertrix crawler 1.3.0-beta.0 (#379), including upgrage to Ubuntu Noble (#307)

Fixed

  • Stream files downloads to not exhaust memory (#373)
  • Fix documentation on --diskUtilization setting (#375)

2.1.0

09 Aug 08:04
2e082c4
Compare
Choose a tag to compare

Added

  • Add --custom-behaviors argument to support path/HTTP(S) URL custom behaviors to pass to the crawler (#313)
  • Add daily automated end-to-end tests of a page with Youtube player (#330)
  • Add --warcs option to directly process WARC files (#301)

Changed

  • Make it clear that --profile argument can be an HTTP(S) URL (and not only a path) (#288)
  • Fix README imprecisions + add back warc2zim availability in docker image (#314)
  • Enhance integration test to assert final content of the ZIM (#287)
  • Stop fetching and passing browsertrix crawler version as scraperSuffix to warc2zim (#354)
  • Do not log number of WARC files found (#357)
  • Upgrade dependencies (warc2zim 2.1.0)

Fixed

  • Sort WARC directories found by modification time (#366)

2.0.6

02 Aug 08:39
2452e60
Compare
Choose a tag to compare

Changed

  • Upgraded Browsertrix Crawler to 1.2.6

2.0.5

24 Jul 06:38
021654e
Compare
Choose a tag to compare

Changed

  • Upgraded Browsertrix Crawler to 1.2.5
  • Upgraded warc2zim to 2.0.3

2.0.4

15 Jul 08:55
fbd01a7
Compare
Choose a tag to compare

Changed

  • Upgraded Browsertrix Crawler to 1.2.4 (fixes retrieve automatically the assets present in a data-xxx tag #316)

2.0.3

24 Jun 07:51
e8995a9
Compare
Choose a tag to compare

Changed

  • Upgraded Browsertrix Crawler to 1.2.0 (fixes Youtube videos issue #323)

2.0.2

18 Jun 14:00
b73a3e0
Compare
Choose a tag to compare

Changed

  • Upgrade dependencies (mainly warc2zim 2.0.2)

2.0.1

13 Jun 11:33
2835c7b
Compare
Choose a tag to compare

Changed

  • Upgrade dependencies (especially warc2zim 2.0.1 and browsertrix crawler 1.2.0-beta.0) (#318)

Fixed

  • Crawler is not correctly checking disk size / usage (#305)

2.0.0

04 Jun 07:35
d8e6d55
Compare
Choose a tag to compare

Added

  • New --version flag to display Zimit version (#234)
  • New --logging flag to adjust Browsertrix Crawler logging (#273)
  • Use new --scraper-suffix flag of warc2zim to enhance ZIM "Scraper" metadata (#275)
  • New --noMobileDevice CLI argument
  • Publish Docker image for linux/arm64 (in addition to linux/amd64) (#178)

Changed

  • Use warc2zim version 2, which works without Service Worker anymore (#193)
  • Upgraded Browsertrix Crawler to 1.1.3
  • Adopt Python bootstrap conventions
  • Upgrade to Python 3.12 + upgrade dependencies
  • Removed handling of redirects by zimit, they are handled by browsertrix crawler and detected properly by warc2zim (#284)
  • Drop initial check of URL in Python (#256)
  • --userAgent CLI argument overrides again the --userAgentSuffix and --adminEmail values
  • --userAgent CLI argument is not mandatory anymore