Skip to content

Commit

Permalink
tests currently fail for extractall_unicode after a package update (#550
Browse files Browse the repository at this point in the history
)

* tests currently fail for extractall_unicode after a package update
we need to deal with zip archives created under windows and unicode filenames differently. thus I have added another if branch

* black formatting

* added partial dependency of haydn in instruction
  • Loading branch information
nkundiushuti committed Jul 22, 2022
1 parent e591c54 commit c4759b4
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 2 deletions.
1 change: 1 addition & 0 deletions docs/source/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ To install ``mirdata`` for development purposes:
pip install .[tests]
pip install .[docs]
pip install .[dali]
pip install .[haydn_op20]
We recommend to install `pyenv <https://github.com/pyenv/pyenv#installation>`_ to manage your Python versions
Expand Down
9 changes: 7 additions & 2 deletions mirdata/download_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,8 +320,13 @@ def extractall_unicode(zfile, out_dir):
"cp437"
).decode(errors="ignore") != filename:
filename_bytes = filename.encode("cp437")
guessed_encoding = chardet.detect(filename_bytes)["encoding"] or "utf8"
filename = filename_bytes.decode(guessed_encoding, "replace")
if filename_bytes.decode("utf-8", "replace") != filename_bytes.decode(
errors="ignore"
):
guessed_encoding = chardet.detect(filename_bytes)["encoding"] or "utf8"
filename = filename_bytes.decode(guessed_encoding, "replace")
else:
filename = filename_bytes.decode("utf-8", "replace")

disk_file_name = os.path.join(out_dir, filename)

Expand Down

0 comments on commit c4759b4

Please sign in to comment.