Any way to extract images from the pdf itself on the front end? #165

chronicadventure · 2024-12-10T15:14:41Z

chronicadventure
Dec 10, 2024

First off I just want to say thank you so much for being so responsive TaTo. I love when devs really support their work. You've helped me a lot with all of my questions.

My question today is if there is a way to extract an image from a pdf on the front end. I have a requirement where users are creating SOPs (standard operating procedures) while referencing a pdf. Sometimes the pdfs contain images that are very helpful that they would like to include inside their SOP.

Is there anyway that you know of that I can somehow expose the images so they can select them? If I could figure out how to get the binary data for the image, then I can convert it to base64 and display it easily in my WYSIWYG editor for the SOP they are working on.

Answered by TaTo30

Dec 10, 2024

Based on: https://stackoverflow.com/questions/18680261/extract-images-from-pdf-file-with-javascript

I've did a small test and this seems to work only when raster images are embebed in the document (eg. 9.pdf):

<script setup lang="ts">
import pdf14 from "@samples/42.pdf";

import { VuePDF, usePDF } from "@tato30/vue-pdf";
import * as PDFJS from "pdfjs-dist";

const { pdf } = usePDF(pdf14);

function getPageImages(page: number) {
  pdf.value?.promise.then(async (document) => {
    const pageProxy = await document.getPage(page);
    const ops = await pageProxy.getOperatorList();
    const objs = [];
    for (var i = 0; i < ops.fnArray.length; i++) {
      if (ops.fnArray[i] == PDFJS.OPS.pain…

View full answer

TaTo30 · 2024-12-10T23:28:09Z

TaTo30
Dec 10, 2024
Maintainer

Based on: https://stackoverflow.com/questions/18680261/extract-images-from-pdf-file-with-javascript

I've did a small test and this seems to work only when raster images are embebed in the document (eg. 9.pdf):

<script setup lang="ts">
import pdf14 from "@samples/42.pdf";

import { VuePDF, usePDF } from "@tato30/vue-pdf";
import * as PDFJS from "pdfjs-dist";

const { pdf } = usePDF(pdf14);

function getPageImages(page: number) {
  pdf.value?.promise.then(async (document) => {
    const pageProxy = await document.getPage(page);
    const ops = await pageProxy.getOperatorList();
    const objs = [];
    for (var i = 0; i < ops.fnArray.length; i++) {
      if (ops.fnArray[i] == PDFJS.OPS.paintImageXObject) {
        objs.push(ops.argsArray[i][0]);
      }
    }

    objs.map(async (val) => {
      pageProxy.objs.get(val, async (obj) => {
        const bitmap = await createImageBitmap(obj.bitmap);
        const ocanvas = new OffscreenCanvas(bitmap.width, bitmap.height);

        ocanvas.getContext("bitmaprenderer")!.transferFromImageBitmap(bitmap);
        const blob = await ocanvas.convertToBlob({ type: "image/png" });

        console.log(URL.createObjectURL(blob));
      });
    });
  });
}
</script>

<template>
  <div>
    <VuePDF :pdf="pdf" :page="1" @loaded="getPageImages(1)" />
  </div>
</template>

1 reply

chronicadventure Dec 12, 2024
Author

This looks to be working very well. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any way to extract images from the pdf itself on the front end? #165

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Any way to extract images from the pdf itself on the front end? #165

chronicadventure Dec 10, 2024

Replies: 1 comment · 1 reply

TaTo30 Dec 10, 2024 Maintainer

chronicadventure Dec 12, 2024 Author

chronicadventure
Dec 10, 2024

Replies: 1 comment 1 reply

TaTo30
Dec 10, 2024
Maintainer

chronicadventure Dec 12, 2024
Author