Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CFF import/export and craft a round-trip test #10995

Merged
merged 44 commits into from
Mar 21, 2024
Merged

Improve CFF import/export and craft a round-trip test #10995

merged 44 commits into from
Mar 21, 2024

Conversation

jeanprbt
Copy link
Contributor

@jeanprbt jeanprbt commented Mar 8, 2024

This PR improves the actual CFF importer/exporter by many ways. The exporter is no longer based on a *.layout file, but is now code-based in a brand new CffExporter.java file, so as to handle all the logic related to the CFF format.
Key improvements are the following.

  • Author's names are now parsed to use CFF fields family-names, given-names, name-particle and name-suffix
  • Fields references and preferred-citation are now parsed.
    • The importer now creates separate entries for the topmost software or dataset and for each of the preferred-citation or references cited. Importer adds a cites relationship between topmost software and preferred-citation, whereas it adds a related relationship between topmost software and each of the references.
    • The exporter now parses the software and dataset types correctly without adding a useless preferred-citation field in these cases. For software or dataset entries with JabRef cites or related fields, exporter now exports the corresponding CFF fields.
  • Multiple exports at a time are now supported to comply with the CFF format.
    • If multiple "non-software-nor-dataset" entries are selected, then exporter uses the references field with a dummy topmost software element.
    • If several entries including a software or dataset one are selected, then exporter uses this one as topmost element and the others as references, adding a preferred-citation for the cites element.
    • If several entries including several software ones are selected, then exporter uses a dummy topmost element, and selected entries are exported as references. The cites or related won't be exported in this case.
  • Exporter now supports UnknownField and exports them if they correspond to existing CFF fields
  • Importer and Exporter now support keywords fields
  • Importer and Exporter support all fields from the CFF specification
    Closes Craft round-trip test for CFF importer/exporter #10993.

This PR also features a round-trip test, which imports JabRef's citation file. This creates two entries, that are then exported to one file.

Here are some screenshots to show the updated behavior. These screenshots use the original CITATION.cff file of the JabRef repository.

Here is the original CFF file
original

Here is what is imported
imported

The article has a citation key, used by the software
article

The software has a cites relationship towards the article
software

When exporting both entries, like this:
export

Here is the resulting output
exported

It complies with the CFF format
validate

Sub-tasks

  • Parse author's names to use CFF fields family-names, etc.
  • Support JabRef's UnknownField in the exporter and check if field complies to CFF format
  • Support keywords field in both importer and exporter
  • Support all fields from CFF specification
  • Avoid using useless preferred-citation field when exporting software or dataset
  • Import several entries in the case of a preferred-citation or references field
  • Add a cites or related relationship between imported entries in relevant cases in the importer
  • Parse JabRef cites and related fields correctly in the exporter
  • If a software and its cites entry are both exported, then exporter discards the cites entry since it has already parsed it as a preferred-citation of the software entry
  • Document changes in the MADR format at https://github.com/JabRef/jabref/tree/main/docs/decisions

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@jeanprbt jeanprbt marked this pull request as ready for review March 11, 2024 16:01
@jeanprbt
Copy link
Contributor Author

jeanprbt commented Mar 14, 2024

The following is written on the cffconvert README.

cffconvert does not support converting items from references or preferred-citation keys at the moment.

Consistency check will be difficult to perform without these two fields.
But I'll try to include all entries from the CFF format specification.

@koppor
Copy link
Member

koppor commented Mar 14, 2024

Consistency check will be difficult to perform without these two fields.

You mean: consistency check of your work with the result of the tool? - It is OK to be better than other tools :p.

@jeanprbt
Copy link
Contributor Author

Yes that's what I meant, since it is the official tool :)
But let's try to be better than it !

@koppor koppor marked this pull request as draft March 18, 2024 11:02
koppor and others added 16 commits March 18, 2024 19:48
* Convert to @ParameterizedTest

* Convert to csvsource

---------

Co-authored-by: Carl Christian Snethlage <50491877+calixtus@users.noreply.github.com>
* Add test to check parsing of BibDesk Static Groups

* Add test to check parsing of BibDesk Static Groups

* Change isExpanded attribute to false in expected groups

* remove extra blank line

* Add tests to check parsing of BibDesk Smart and mixed groups

* Add parsing of BibDesk Files

* Attempts at plist

* Now parses bdsk-file and shows it as a file in JabRef

* Add test for parsing a bdsk-file field

* Fix formatting

* Add dd-plist library to documentation

---------

Co-authored-by: Tian0602 <646432316@qq.com>

* Add creation of static JabRef group from a BibDesk file

* Creates an empty ExplicitGroup from BibDesk comment

* Adds citations to new groups
modifies group creations to support multiple groups in the same BibDeskFile

* Fix requested changes
Refactor imports since they did not match with main
Add safety check in addBibDeskGroupEntriesToJabRefGroups

---------

Co-authored-by: Filippa Nilsson <filnils@kth.se>

* Refactor newline to match main branch

Co-authored-by: Filippa Nilsson <filnils@kth.se>

* Add changes to CHANGELOG.md

* Reformat indentation to match previous

* Revert external libraries

Adjust groups serializing

* checkstyle and optional magic

* fix

* fix tests

* fix

* fix dangling do

* better group tree metadata setting

* merge group trees, prevent duplicate group assignment in entry
Add new BibDesk group

Fix IOB for change listeing

* fix tests, and extract constant

* return early

* fixtest and checkstyle

---------

Co-authored-by: Anna Maartensson <120831475+annamaartensson@users.noreply.github.com>
Co-authored-by: Tian0602 <646432316@qq.com>
Co-authored-by: LottaJohnsson <35195355+LottaJohnsson@users.noreply.github.com>
Co-authored-by: Filippa Nilsson <filnils@kth.se>
Co-authored-by: Filippa Nilsson <75281470+filippanilsson@users.noreply.github.com>
Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>
Co-authored-by: Siedlerchr <siedlerkiller@gmail.com>
* Fixes Zotero file handling for absolute paths

Fixes #10959

* checkstyle mimiimm

* fix changelog

* cannot fix
…11037)

* [Copy] Include string constants in copy (#11)

Signed-off-by: Anders Blomqvist <anders@minaemail.se>

* [Copy] New method for serializing string constants (#12)

Signed-off-by: Anders Blomqvist <anders@minaemail.se>

* Add a sanity check for null for clipboard content

Currenlty, the clipboard content can be null since the database
does not seem to be updating. This is a sanity check to prevent
the program from adding null to the clipboard.

Link to DD2480-Group1#13

* [Fix] Add parsed serilization when save settings

When loading from existing files or libraries, the parser will set
the serilization of the string constant to the correct value. However,
when editing via the GUI, the serilization was not set and a new
string constant list will be created without the serilization.
This result in the serilization being null and when copying with
the clipboard.

Link to DD2480-Group1#13

* feat: import string constants when pasting #9

Add functionality to import string constants in the paste function

Should add functionality to handle colliding string constants.
Should also check that the constants are valid using the
ConstantsItemModel class.

* feat: Add string constant validity checker and dialog messages #9

Check that a pasted string constant is valid using the
ConstantsItemModel class.

Add diagnostic messages notifying users when adding a string constant
fails while pasting.

* [Copy] Copy referenced constant strings to clipboard  (#16)

* feat: Add parsed serialized string when cloning
* feat: Add sanity check for null in ClipBoardManager
* closes #15

* feat: new unit tests

Add 4 new unit tests, testing the new features added for issue-10872. Specifically the tests are for the `storeSettings` method in the ConstantsPropertiesViewModel.java, and `setContent` in the ClipBaordManager.java.

Closes #6

* Update CHANGELOG with copy and paste function

* Fix Checkstyle failing by reformat the code

* Fix OpenRewrite failing by running rewriteRun

* Refactor by extract methods in setContent

* collet failures

* changelog and use os.newline

* checkstyle

* use real bibentrytypes manager

* Fix CHANGELOG.md

* Swap if branches

* Code cleanup

* Use List for getUsedStringValues

* Fix submodule

* Collection is better

* Fix csl-styles

* Remove empty line

* Group BibTeX string l10n together

---------

Signed-off-by: Anders Blomqvist <anders@minaemail.se>
Co-authored-by: Anders Blomqvist <anders@minaemail.se>
Co-authored-by: ZOU Hetai <33616271+JXNCTED@users.noreply.github.com>
Co-authored-by: Hannes Stig <hannes.a.stig@gmail.com>
Co-authored-by: Elliot <elliot.darth@gmail.com>
Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>
Bumps [gittools/actions](https://github.com/gittools/actions) from 0.13.4 to 1.1.1.
- [Release notes](https://github.com/gittools/actions/releases)
- [Commits](GitTools/actions@v0.13.4...v1.1.1)

---
updated-dependencies:
- dependency-name: gittools/actions
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [com.googlecode.plist:dd-plist](https://github.com/3breadt/dd-plist) from 1.23 to 1.28.
- [Release notes](https://github.com/3breadt/dd-plist/releases)
- [Commits](3breadt/dd-plist@dd-plist-1.23...v1.28.0)

---
updated-dependencies:
- dependency-name: com.googlecode.plist:dd-plist
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps org.apache.pdfbox:xmpbox from 3.0.1 to 3.0.2.

---
updated-dependencies:
- dependency-name: org.apache.pdfbox:xmpbox
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [com.dlsc.gemsfx:gemsfx](https://github.com/dlsc-software-consulting-gmbh/GemsFX) from 2.2.0 to 2.4.0.
- [Release notes](https://github.com/dlsc-software-consulting-gmbh/GemsFX/releases)
- [Changelog](https://github.com/dlsc-software-consulting-gmbh/GemsFX/blob/master/CHANGELOG.md)
- [Commits](dlsc-software-consulting-gmbh/GemsFX@v2.2.0...v2.4.0)

---
updated-dependencies:
- dependency-name: com.dlsc.gemsfx:gemsfx
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps org.apache.pdfbox:fontbox from 3.0.1 to 3.0.2.

---
updated-dependencies:
- dependency-name: org.apache.pdfbox:fontbox
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add test cases

* Add test cases

* Keep braces for last part

* Refine method description

* Adapt test to new braces keeping

* Add CHANGELOG.md entry

* Adapt tests

* More edge cases

* Minor code beautification

* Simplify code

* Fix braces removing

* Extract static fields, refactor code

* Fix removal of {} for export

* Re-add Objects.requireNonNull

* Fix typo

* Re-add NPE throwing

* Rename to modern terms

* Consistent initialization
* Collect DOI and publication type from semantich scholar to be able to expand the information of the new entries later by search through DOI

* Include abstract in the request. This lets the GUI show the abstract since that was implemented already.
Refactor api request string since most of it is shared

* Add button to open the relation paper's DOI URL.
Fix DOI for some ArXiv entries.

* Don't show the open link button if there is no link to open.

* Make field value null error a bit more useful

* Include SemanticScholar url in the request and use it as the URL field.

* Add changes to changelog

* Change tooltip text to an existing, more informative one

* Run rewriter to fix pull request

* improve url optional handling

---------

Co-authored-by: Siedlerchr <siedlerkiller@gmail.com>
@koppor koppor mentioned this pull request Mar 20, 2024
6 tasks
docs/decisions/0029-cff-export-multiple-entries.md Outdated Show resolved Hide resolved
docs/decisions/0029-cff-export-multiple-entries.md Outdated Show resolved Hide resolved
docs/decisions/0029-cff-export-multiple-entries.md Outdated Show resolved Hide resolved
docs/decisions/0029-cff-export-multiple-entries.md Outdated Show resolved Hide resolved
docs/decisions/0029-cff-export-multiple-entries.md Outdated Show resolved Hide resolved
docs/decisions/0029-cff-export-multiple-entries.md Outdated Show resolved Hide resolved
docs/decisions/0029-cff-export-multiple-entries.md Outdated Show resolved Hide resolved
src/main/java/org/jabref/cli/ArgumentProcessor.java Outdated Show resolved Hide resolved
src/main/java/org/jabref/logic/exporter/CffExporter.java Outdated Show resolved Hide resolved
@jeanprbt
Copy link
Contributor Author

I'm done with all the features I stated above in the PR. The exporter now parses the cites and related relationships correctly, and the round-trip test is fully working : importing CITATION.cff creates two entries, and exporting these two entries only creates one file which is consistent with the original one. I completely modified the behavior of CffExporter to achieve this, I'll let you check it out but in a few words it works like this.

  • Firstly, it checks how many software or dataset entries are contained in the list entries to export.
    • If there is only one, it is used as the main (i.e. topmost) entry
    • If there is none or several, a dummy main entry is created
  • Secondly, the method parseEntry is generic and can manage not only the main but also the preferred-citation and references entries (few differences between main and other entries are handled by the boolean parameter main)
  • This method is called on main entry to have the root data HashMap which will be dumped into the output yaml file.
  • Then, if the main contains a StandardField.CITES and/or a StandardField.RELATED field(s), the corresponding entries are retrieved using the databaseContext and then passed as parameters to parseEntry => they are added either as a preferred-citation or as references to the final HashMap data.
  • Finally, the remaining entries of the original list are passed as parameters to parseEntry to be added to the List<Map<String, Object>> related, which will be put in the main data HashMap as references.

I also fixed the changes you requested. Now, I will try to focus on my proposal for the GSoC project you mentioned earlier in this PR. I would really like to conduct this project and we are in the contributors' proposal phase, that is why I need to take some time for it. Do you think I have the required skills and commitment to conduct such a project, and that I have my chances to be selected ?

Thanks a lot for your responsiveness, your hints and your continuous feedback on this one, hopefully it will be merged in the official JabRef project ! Users will be happy with such a CFF importer/exporter.

@jeanprbt jeanprbt marked this pull request as ready for review March 21, 2024 13:13
@koppor
Copy link
Member

koppor commented Mar 21, 2024

@jeanprbt Your text reads as if should be put as JavaDoc. Otherwise, it will get lost. I know only one contributor who used "git blame" to recover a commit and the deception. Usually, no one checks the PR and all (!) comments of the PR to learn about the reasons behind the code. JavaDoc was intended to capture those... (At least partially). - Regarding GSoC, your skills improved. Will be a tough competition - and you also have chances!

@koppor
Copy link
Member

koppor commented Mar 21, 2024

After more than 80 conversations, this is finally going to an end. 🥂 Hope, the users will like this feature.

If also should be available at the CLI. Thus, we (you @jeanprbt) could craft a blog post with examples and advertise this feature ^^.

Glad that you added test cases! That offered the possibility to improve the code without worrying.

  • Use JabRef's org.jabref.model.entry.Date instead of manual date parsing
  • Use JabRef's org.jabref.model.entry.BibEntry#getEntryLinkList instead of manual link parsing
  • Use more stream magic

@koppor koppor enabled auto-merge March 21, 2024 22:30
Siedlerchr
Siedlerchr previously approved these changes Mar 21, 2024
koppor
koppor previously approved these changes Mar 21, 2024
@koppor koppor added this pull request to the merge queue Mar 21, 2024
@calixtus calixtus dismissed stale reviews from koppor and Siedlerchr via 60904da March 21, 2024 22:38
Merged via the queue into JabRef:main with commit a1b6d38 Mar 21, 2024
27 of 28 checks passed
@jeanprbt jeanprbt deleted the issue/10993 branch March 22, 2024 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Craft round-trip test for CFF importer/exporter
6 participants