-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Remove only raster graphics #2208
Comments
|
@MartinThoma
your opinion |
Yes, that is a bug. It should delete inline images (I vaguely remember that this was also documented somewhere) |
your opinion? |
Making it more flexible is a similar issue as with #1831 . The solution proposed there is to add a function which gets metadata about the annotation and returns True (delete) or False (don't delete). That is for sure the more flexible design and allows easier extension. I'm currently writing a proposal going in that direction :-) Give me a few minutes xD You're too fast xD |
What do you think about the following: class ImageType(Enum):
XOBJECT_IMAGES = auto()
INLINE_IMAGES = auto()
DRAWING_IMAGES = auto()
@dataclass
class ImageData:
image_type: ImageType
... # potentially more information, e.g. the name, the size, the encoding, position in the page, reference to the image object, ...
def constant_true(image_data: ImageData) -> bool:
return True
def remove_images(
self,
ignore_byte_string_object: Optional[bool] = None,
to_delete: Callable[[ImageData], bool] = constant_true
) -> None: ... |
I'm looking for similar functions:
|
Weird little finding: I thought
I'm not sure if that is a good idea though 😅 |
also I Propose to have |
None is more common for me. |
Sounds good 👍
Also good 👍
Then it's fine to me 👍 What do you think about using the |
The documentation contains two examples regarding image extraction:
This works as expected for pages containing embedded raster graphics. For example:
As a result we get two images and a new PDF document without these images. When we change the page from 0 to the last one:
We get only reduced PDF document:
Some background images have disappeared (they look like vector graphics) and they are not store in separate files.
This might look like a bug but I suppose that you are aware of that (that's the reason to request a new feature instead of a bug). Nevertheless, it could be great to have a feature allowing to extract only raster graphics to separate files and don't touch vector graphics (or extract them but also with text which might be very hard as I guess).
The text was updated successfully, but these errors were encountered: