Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep source urls on file download #10855

Merged
merged 23 commits into from
Feb 18, 2024

Conversation

ror3d
Copy link
Contributor

@ror3d ror3d commented Feb 5, 2024

Adds a new field to the LinkedFile structure to keep the source url from where it was downloaded, allowing for redownload of it later. Also adds a context menu action in the linked file viewer to redownload, as well as a menu action to redownload any missing files from entries that have the source url but don't have the linked file. Closes #10848

Also, file description is now copied to the downloaded file upon first download, so something like "arXiv Fulltext PDF" is retained.

image

image

image

Notes:

  • I have not implemented tests, I don't usually work with Java or with visual interfaces so I am unsure how tests should be implemented.
  • When implementing the download, I copied/adapted the download code currently present in fieldeditors.LinkedFileViewModel.java. I would remove that code from there and use the action too, but there is a couple differences in how duplicated files are handled so I didn't want to break it. Basically, from reading that code that code, I believe that if the downloaded file is found to be duplicate, the information is not stored in the linked file. I think that's an error, but maybe it makes sense so I'd rather ask first before changing it.

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

Copy link
Member

@calixtus calixtus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your interest in JabRef and open source development.
I took a quick look on your changes, I think this would be a good addition, but I have some comments on the implementation.

Also a reminder for the @JabRef/developers to think about:
Our custom format for the file field is horrible. "description:filename:filetype" (and with this PR: ":sourcelink"). If the description is empty ":filename:filetype".
https://markov.htwsaar.de/tex-archive/macros/latex/contrib/biblatex/doc/biblatex.pdf#page=21 states that in the file field only a local link should appear (i know, our implementation here is horrible), The url field (https://markov.htwsaar.de/tex-archive/macros/latex/contrib/biblatex/doc/biblatex.pdf#page=27) instead should hold the uri to the online publication. Is there already an issue for this?

@ror3d
Copy link
Contributor Author

ror3d commented Feb 6, 2024

Also a reminder for the @JabRef/developers to think about: Our custom format for the file field is horrible. "description:filename:filetype" (and with this PR: ":sourcelink"). If the description is empty ":filename:filetype". https://markov.htwsaar.de/tex-archive/macros/latex/contrib/biblatex/doc/biblatex.pdf#page=21 states that in the file field only a local link should appear (i know, our implementation here is horrible), The url field (https://markov.htwsaar.de/tex-archive/macros/latex/contrib/biblatex/doc/biblatex.pdf#page=27) instead should hold the uri to the online publication. Is there already an issue for this?

I think the url field would not be appropriate for this, semantically it is intended to be used for online-only publications, such as a news article, a wikipedia entry, or a github repo, to give some examples. I agree that the current solution is not optimal (e.g. using the same bib file with a version that doesn't have my changes removes the sourceUrl info) but if limited to the standard fields, that's the best that can be done, I think.

@koppor
Copy link
Member

koppor commented Feb 13, 2024

Thank you for the contribution and making JabRef better. I like the idea.

We need to think more about the format. Sorry for that!

This changes the logic of the linked files (somewhat) completely: the media type is replaced by the file link. We do not use the MediaType since ages, thus this is OK. -- This refs #98.

I agree that keeping the information in the file field is useful.

We do not know about effects on older JabRef versions with this PR. The analysis has been done there (but we should check, too)

I agree that the current solution is not optimal (e.g. using the same bib file with a version that doesn't have my changes removes the sourceUrl info) but if limited to the standard fields, that's the best that can be done, I think.

We did not touch the file field (e.g., koppor#487 or the mentioned issue #98), because other tools like Zotero rely on our field.

Note that the current decision (see #98 (comment)) is to introduce attachments as new field with a better format. Maybe JSON? See #10371 for an ADR for that.

@ror3d
Copy link
Contributor Author

ror3d commented Feb 13, 2024

We need to think more about the format. Sorry for that!

Do you mean the format should be changed before this PR is accepted?

This changes the logic of the linked files (somewhat) completely: the media type is replaced by the file link. We do not use the MediaType since ages, thus this is OK. -- This refs #98.

The way I made this work was not supposed to change the format. Currently, according to the FileFieldParser, the format has 3 fields, which are description, link, and file type. If you mean that at some point in the past there was also a MediaType field, then I guess this could break that. If that's the case, a simple solution would be to add an empty field when adding the sourceUrl field, so it would look like description:link:filetype::sourceUrl to avoid breaking old versions.

We do not know about effects on older JabRef versions with this PR. The analysis has been done there (but we should check, too)

What would need be done to check this? Get some old version that has this 4th field and test it?

Note that the current decision (see #98 (comment)) is to introduce attachments as new field with a better format. Maybe JSON? See #10371 for an ADR for that.

Well that would be a feasible change, it would be backwards compatible with anything before, although it would lose functionality as this field would no longer be stored/updated, right?

If the solution of having 4 fields instead of 3 is not a good temporary solution in waiting for the improved attachment field, if there is some specification of how the attachment field should look like, I should be able to implement it.

JSON or just a list of fieldName: value, .... That is a simple first implementation, and can be easily extended to json by making the structure recursive later if needed at some point (treat anything inside {} as a string for forwards compatibility).

@Siedlerchr
Copy link
Member

I would not use this PR to change the format, that is a bit out of scope. I guess @koppor 's concern was mediaType = fileType but I see that you have covered the case. So for me it's okay.

… with mediaType.

Recover/add tests for the file field.
@ror3d
Copy link
Contributor Author

ror3d commented Feb 15, 2024

This latest commit changes the format of my implementation to have the sourceURL be in the 5th position of the colon-separated list instead of the 4th so as to not break compatibility with old versions, as mentioned.

@calixtus
Copy link
Member

ugh, sthg went wrong seperating the commits. i forgot to commit early, now my three commits are mixed up...

Copy link
Contributor Author

@ror3d ror3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for this, I wanted to do something of the like initially to have the outside access to the progress but I didn't really know how to approach it so I went with that more hacky solution.

@koppor
Copy link
Member

koppor commented Feb 18, 2024

This latest commit changes the format of my implementation to have the sourceURL be in the 5th position of the colon-separated list instead of the 4th so as to not break compatibility with old versions, as mentioned.

Sorry that my comment was misunderstanding. Your initial implementation was right. There never was a fourth field. The third field always existed (mimetype, filetype, call it whatever you want).

I was too hectic in providing my comments. -- We had a broken JabRef 3.6 release which destroyed meta data (regarding groups) and I wanted to prevent that being happen again.

We will finish this PR with six-eyes-principles. No need to do more work on your side here @ror3d. Thank you for your contribution and putting energy into JabRef!

@calixtus calixtus added this pull request to the merge queue Feb 18, 2024
Merged via the queue into JabRef:main with commit 2777ebc Feb 18, 2024
20 checks passed
@ror3d
Copy link
Contributor Author

ror3d commented Feb 18, 2024

Ah great, thank you for getting this fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-download missing linked files
4 participants