Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collections and temporary directory parameters #254

Merged
merged 5 commits into from
Nov 23, 2023
Merged

Conversation

benoit74
Copy link
Collaborator

@benoit74 benoit74 commented Nov 23, 2023

Rationale

Changes

  • --collection parameter is now used correctly to find WARC files
  • more logs are displayed explaining the search for WARC files location
  • new --build parameter (optional) to place Browsertrix files in a different directory that ZIM files
    • ZIMs are always placed in --output directory
    • Browsertrix files are placed in a "temporary" subdir of --build (if supplied) or --output by default
  • display warc2zim arguments

@benoit74 benoit74 marked this pull request as ready for review November 23, 2023 08:07
@benoit74
Copy link
Collaborator Author

benoit74 commented Nov 23, 2023

@rgaudin @kelson42 I finally slightly changed my opinion compared to Tuesday discussion, I fixed the --collection parameter logic, it was not that complex to do (and a bit sad to remove that even if not really useful) ; and it made me realized that it was not possible to set a distinct dir for WARC files location so I added this as well.

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but there are some naming to fix

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
zimit.py Outdated Show resolved Hide resolved
zimit.py Outdated Show resolved Hide resolved
@benoit74
Copy link
Collaborator Author

Thank you, I also fixed a "bug" where I did not created one temporary subdir per run, which I found problematic in terms of data segregation and very/risky for the cleanup which was deleting the whole build dir while it might have contained other files before the run.

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@rgaudin rgaudin merged commit a62f31e into main Nov 23, 2023
2 checks passed
@rgaudin rgaudin deleted the collections_param branch November 23, 2023 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zimit does not really support setting the --collection CLI argument
2 participants