Skip to content

Commit

Permalink
fix: add mupdf exception for pdf extraction
Browse files Browse the repository at this point in the history
  • Loading branch information
engisalor committed Jun 18, 2024
1 parent 2a5393f commit 5f51358
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion corpusama/source/pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ def _try_extract(file: str, clean: bool, n: int = 0) -> None:
text = extract_text(file, clean)
with open(file.with_suffix(".txt"), "w") as f:
f.write(text)
except (fitz.fitz.FileDataError, RuntimeError) as e:
except (fitz.fitz.FileDataError, RuntimeError, fitz.mupdf.FzErrorFormat) as e:
logger.warning(f"{n} - {file} - {e}")


Expand Down

0 comments on commit 5f51358

Please sign in to comment.