-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop flattening EXIF IFD into getexif() #4947
Conversation
@@ -3346,91 +3351,100 @@ def tobytes(self, offset=8): | |||
head = b"MM\x00\x2A\x00\x00\x00\x08" | |||
ifd = TiffImagePlugin.ImageFileDirectory_v2(ifh=head) | |||
for tag, value in self.items(): | |||
if tag in [0x8769, 0x8225] and not isinstance(value, dict): | |||
value = self.get_ifd(tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work for interop since it's inside of Exif IFD.
im = Image.open('10_years_of_Wikipedia_by_Guillaume_Paumier.jpg')
good_exif = im.getexif()
bad_exif = Exif()
bad_exif.load(good_exif.tobytes())
print("Good interop dictionary size: %d" % len(good_exif.get_ifd(0xA005)))
print(" Bad interop dictionary size: %d" % len(bad_exif.get_ifd(0xA005)))
I suspect same for makernote too, but since makernote is vendor specific, I think it would be safer to make it read only property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've added another commit to cover Interop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've being thinking through this on and off for some time.
Thinking of consumer of this class, it would make sense when reading Exif IFD to get Interop value decoded into a dict at that time. Or shimmed with ImageFileDirectory_v2
instance. Otherwise caller must know Interop tag ID, must know to call get_ifd
, etc. So that's a bit inconvenient and adds extra knowledge to consumer.
Same for Makernote, however saving it back we don't want to encode it back to vendor-specific format and probably worth just copying whatever bytes were there originally.
On the other hand Exif is quite complicated and some knowledge from consumer is expected anyway.
🤷
What do you think of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another benefit of using ImageFileDirectory_v2
as a shim for sub-IFDs is that we get lazy loading.
Also consider that Exif might contain multiple IFDs one after another. Currently only first IFD is preserved on save. |
I'm not following. What do you mean, 'only first IFD'? |
I was referring to IFD1 - thumbnail image info which might be present in the file. Accessible through |
f"Expecting to read {size} bytes but only got " | ||
f"{len(data)}. Skipping tag {ifd_tag}" | ||
if tag not in self._ifds: | ||
if tag in [0x8769, 0x8825]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think __getitem__
method should be updated as well to call get_ifd
for Exif IFD. I can't comment on non-changed code, but currently it's line 3474 should be if tag in [0x8769, 0x8825]:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I can see that this would be more useful for most cases, if you wanted to actually read the IFD offset, you wouldn't be able to anymore (apart from looking directly into _data
or _info
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point.
I think getting offset of Exif IFD is useless by itself, unless you want manually parse IFD structure yourself, in which case you could do so with raw exif bytes.
Additionally, I think code should be consistent: either decode both GPS and Exif IFDs into dictionaries on read in __getitem__
, or return offset in both cases.
Couldn't that easily lead to a situation where the thumbnail image no longer matches the main image, if the main image is modified by Pillow, but the thumbnail image isn't? |
That's a great point. In that case it's better to ignore IFD1 for writing. |
Co-authored-by: Konstantin Kopachev <kkopachev@popsugar.com>
@kkopachev How does this PR look now? |
@hugovk Apart from inconsistent treatment of GPS IFD in |
Ok, I've added a commit to treat it consistently. |
Thanks both! |
Resolves #3973 and #5273
At the moment, the
Exif
class flattens the EXIF IFD into the rest of the Exif data. Losing information in this way is not ideal.Here is the new test added to demonstrate the difference.
To try and prevent this, while maintaining as much backwards compatibility as possible, this PR stops this flattening only for
getexif()
, while keeping it in place for the older_getexif()
. So this is backwards incompatible, but only for code written after the release of 6.0.0.Only one existing test had to be changed to match this breaking change.