Skip to content

Releases: openzim/zimit

1.6.3

18 Jan 08:14
19b4898
Compare
Choose a tag to compare

Changed

  • Adapt to new warc2zim code structure
  • Using browsertrix-crawler 0.12.4
  • Using warc2zim 1.5.5

Added

  • New --build parameter (optional) to specify the directory holding Browsertrix files ; if not set, --output
    directory is used ; zimit creates one subdir of this folder per invocation to isolate datasets ; subdir is kept only
    if --keep is set.

Fixed

  • --collection parameter was not working (#252)

1.6.2

17 Nov 10:25
6e6c0e8
Compare
Choose a tag to compare

Changed

  • Using browsertrix-crawler 0.12.3

Fixed

  • Fix logic passing args to crawler to support value '0' (#245)
  • Fix documentation about Chrome and headless (#248)

1.6.1

06 Nov 09:05
a73114d
Compare
Choose a tag to compare

Changed

  • Using browsertrix-crawler 0.12.1

1.6.0

02 Nov 19:57
9e91406
Compare
Choose a tag to compare

Changed

  • Scraper fails for all HTTP error codes returned when checking URL at startup (#223)
  • User-Agent now has a default value (#228)
  • Manipulation of spaces with UA suffix and adminEmail has been modified
  • Same User-Agent is used for check_url (Python) and Browsertrix crawler (#227)
  • Using browsertrix-crawler 0.12.0

1.5.3

04 Oct 08:52
0005145
Compare
Choose a tag to compare

Changed

  • Using browsertrix-crawler 0.11.2

1.5.2

19 Sep 09:14
3769c77
Compare
Choose a tag to compare

Changed

  • Using browsertrix-crawler 0.11.1

1.5.1

18 Sep 09:13
2be5562
Compare
Choose a tag to compare

Changed

  • Using browsertrix-crawler 0.11.0
  • Scraper stat file is not created empty (#211)
  • Crawler statistics are not available anymore (#213)
  • Using warc2zim 1.5.4

1.5.0

23 Aug 16:40
12dab25
Compare
Choose a tag to compare

Added

  • --long-description param

1.4.1

23 Aug 12:18
951241d
Compare
Choose a tag to compare

Changed

  • Using browsertrix-crawler 0.10.4
  • Using warc2zim 1.5.3

1.4.0

02 Aug 14:46
cbaaa77
Compare
Choose a tag to compare

Added

  • --title to set ZIM title
  • --description to set ZIM description
  • New crawler options: --maxPageLimit, --delay, --diskUtilization
  • --zim-lang param to set warc2zim's --lang (ISO-639-3)

Changed

  • Using browsertrix-crawler 0.10.2
  • Default and accepted values for --waitUntil from crawler's update
  • Using warc2zim 1.5.2
  • Disabled Chrome updates to prevent incidental inclusion of update data in WARC/ZIM (#172)
  • --failOnFailedSeed used inconditionally
  • --lang now passed to crawler (ISO-639-1)

Removed

  • --newContext from crawler's update