Skip to content

Commit

Permalink
Add unit test for handling PDFs with toUnicode cMaps that omit leadin…
Browse files Browse the repository at this point in the history
…g zeros in hex encoded utf16
  • Loading branch information
alexcat3 committed Jul 4, 2024
1 parent bb81286 commit 69de633
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 0 deletions.
Binary file added test/pdfs/issue18099_reduced.pdf
Binary file not shown.
12 changes: 12 additions & 0 deletions test/unit/api_spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -3418,6 +3418,18 @@ Caron Broadcasting, Inc., an Ohio corporation (“Lessee”).`)

await loadingTask.destroy();
});
it("gets text content, correctly handling documents with toUnicode cmaps that omit leading zeros on hex-encoded UTF-16", async function () {
const loadingTask = getDocument(
buildGetDocumentParams("issue18099_reduced.pdf")
);
const pdfDoc = await loadingTask.promise;
const pdfPage = await pdfDoc.getPage(1);
const { items } = await pdfPage.getTextContent({
disableNormalization: true,
});
const text = mergeText(items);
expect(text).toEqual("Hello world!");
});

it("gets text content, and check that out-of-page text is not present (bug 1755201)", async function () {
if (isNodeJS) {
Expand Down

0 comments on commit 69de633

Please sign in to comment.