-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues with writing metadata to pdfs #8332
Conversation
@ThiloteE would you be so kind to try the exports again? I have to admit you provided a lot of information and I feel a little lost - I don't want to miss anything. What I purposefully left out for now, is the error message when importing XMP. I think the existing error is fine. If a user knows what XMP is and explicitly tries to import XMP, they will be able to handle such a message I think ;) |
I am on it. Just downloaded the pull from https://builds.jabref.org/pull/8332/merge/. Hope this is the correct one |
@ThiloteE you have to wait until the Deployment workflow runs through |
oh |
First test results are in. Great success! the errormessage that emerged when repeatedly trying to update the file (ii.) is gone. I can update an entry successfully without an errormessage appearing.
This status message appears when i
here the metadata extracted via exiftool: metadata test 3 after second run.txt This is great! Not sure if unintended repercussions will emerge someday in some niche cases, but this is definitely a fix for the problem i encountered! A great leap forward. While doing this test though, i noticed that the keywords field is not exported. (keywords = {water},) 😥 |
second test results are in. Old behavior:
New behavior:
I would desire something like this:
Here the test data: Status message:
metadata test before any run (fresh pdf).txt |
Third (and last) test results are in and correspond to iii in #8278 (comment): Old and current behavior are similar! How i did the test: When trying to write metadata to Améry (1973) Wider den Strukturalismus. Das Beispiel des Michel Foucault.pdf (one of the pdf files that fail to be written with metadata) the following error message emerges:
Desired new behavior is something like:
|
For now, I just implemented it so that old metadata is always overwritten. This is something to discuss about.
That was not intentional behavior for files linked twice, but because after the first write the embedded-bibtex export failed because there was already metadata. So the XMP contained the second metadata (because it always did overwrite any existing metadata) while the embedded bibtex contained the first metadata because the second write failed.
True, this is unfortunate. I think this would mostly be the case for journals and such, right? I think the correct thing to do here would be to write the XMP information of the collection (like journal-name and year if all entries are from the same journal) and embedd the bibtex of all entries that are linked. Would be lots of work though that I personally don't have the time for right now. |
I think it also is relevant if you have a file that is a whole book, but then you have InBook references (because you only want to cite a few chapters of the book and not the whole book). People will have chapter entries in their library and may want to make use of the 'crossref' field. As it stands, right now, i think having multiple entries that link to the same file is not advised to have in your library :D I personally will try to avoid this.
|
Good Point! First idea: look for all entries for a pdf, only keep fields that are the same for all entries, write those entries. |
I can reproduce, but not explain (yet). Need some more investigation. |
Oh, yes that would be good!
|
Unless you find some nitpicks during the codereview, the tests i did would suggest that what btut here did is definitely better than before. It fixes ii. of #8278 (comment), which is currently the thing that bothers the most. I recommend merging for 5.4. (Although i still think that i. should be fixed somehow at one point at least! There will be people complaining why the metadata they export is not the metadata they expect, but at least we know why and what the user can do to workaround it. iii. seems a tough nut to crack, so maybe one day ...) |
Okay, wer merge this now and create a follow up issue |
* upstream/main: (46 commits) New Crowdin updates (#8349) Bump pdfbox from 2.0.24 to 2.0.25 (#8345) Bump fontbox from 2.0.24 to 2.0.25 (#8348) Bump xmlunit-core from 2.8.3 to 2.8.4 (#8347) Bump tika-core from 2.1.0 to 2.2.0 (#8346) Added missing executable bindings to various commands (#8342) Update Gradle Wrapper from 7.3.1 to 7.3.2. (#8343) Fix issues with writing metadata to pdfs (#8332) add tinylog test (#8339) Tinylog (#8226) Don't register any database changes to the indexer while dropping a file (#8334) Fix ACM fetcher (#8338) Squashed 'buildres/csl/csl-styles/' changes from 3bb4b5f..60bf7d5 Exception shouldn't happen when pasting an entry with a publication date-range of form yyyy-yyyy (#8247) Refactor Sidepane logic (#8202) New translations JabRef_en.properties (Japanese) (#8331) Bump bcprov-jdk15on from 1.69 to 1.70 (#8333) Update Controlsfx to 11.1.1 (#8330) Update citeproc (#8329) Bump classgraph from 4.8.137 to 4.8.138 (#8327) ... # Conflicts: # build.gradle
There are still some issues when writing metadata to pdfs.
Fixes #8278
CHANGELOG.md
described in a way that is understandable for the average user (if applicable)