Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define optimal serialization format and variations (dropping or reviving the 2.9.2 format) #116

Closed
12 tasks done
koppor opened this issue Aug 19, 2015 · 29 comments
Closed
12 tasks done

Comments

@koppor
Copy link
Member

koppor commented Aug 19, 2015

In JabRef 2.10, we introduced following new feature:

    Changed serialization of BibTeX entries:
        First, the required, then the optional and then all other fields are written.
        Thereby, fields are now ordered by name. Except the title, which is written first.
        The second word in of the BibTeX type is capitalized. E.g., Inproceedings got InProceedings
        Configurable: Start field contents in same column. Enabled by default.
        Configurable: Use camel case for field names (e.g., "HowPublished" instead of "howpublished"). Enabled by default.
        If no field name is given, then "UNKNOWN" is used. For instance, " = {X}" gets " UNKNOWN = {X}".

When a user collaborates with a person using an older version of JabRef, they cannot use version control properly as the serialization always changes. In version 2.11 beta 2, we offer a quick hack to go back to the old 2.9.2 behavior, which is somehow incomplete. See #10 (comment)

We can just focus on other issues and let time solve the issue or invest time to implement the old serialization again.

Related issue: #115

Request at sourceforge: https://sourceforge.net/p/jabref/feature-requests/864/

To track the progress of implementation, the consensus described below is added here.

Rationale: People are very emotional about formatting -> Modify as little as possible in the bib file

  • Unmodified entries will not be formatted and written back exactly as read in (including all formatting issues, etc.
  • Modifying any part of an entry results in a reformatting of the complete entry
  • No more global configuration forsorting of entries on load / save. (sorting information is now stored in the file)
  • New entries are added to the bottom of the file
  • Fixed and non-configurable format for new or modified entries. Dropping all field saving options (Preferences -> File -> Field saving options)
  • Fields are sorted as stated in the Bibtex / Biblatex manuals.
  • The = is appended directly to the field key. The value part is indented to one space past the longest field key name + = (so that all values are aligned).
  • Bibtex entry keys are written in camel case, starting upper case
  • Field keys are formatted in lower case only
  • No longer perform any changes of field content (no more space/tab/newline elimination, etc.)
  • Provide formatter (Example: Remove tabs / newlines / duplicate spaces) that can be enabled by a user for specific fields onLoad / onSave. These are not enabled by default.
  • If in doubt, do as biber does
@stefan-kolb
Copy link
Member

  1. We should NOT add compatibility patches. We want people to switch to JabRef 3.
  2. We should probably revert some changes to be more BibTeX conform. We want to encourage usage of JabRef not plain editing and readability of bib files.

Dunno what Configurable: Start field contents in same column. Enabled by default means, yet.

@koppor
Copy link
Member Author

koppor commented Aug 20, 2015

JabRef is BibTeX conform. There are habits only :)

I think, we should enable readability as the .bib files can be put under version control, compared, quickly edited, etc. Otherwise, we produce a kind of a binary format, which is hard to handle in version control systems (see versioned .docx documents).

I tend to like the new format. We can discuss whether we remove the casing again. I got used to it, but no one else offers such a format.

Maybe we should start a wiki page with all possible formats and vote on the preferred format and implement that.

Configuration options

See Options -> Preferences -> File -> "Field saving options"

  • Use camel case for field names
  • Start field contents in same column
  • Include empty fields
  • Wrap fields as ver 2.9.2
  • Field sorting options: alphabetic, unsorted, user-defined order

Furthermore, one can configure the ordering of the entries: See "Database properties": Save as configured globally, original order, order as specified.

Should we drop all these options and define "the one and only" serialization? What about being as minimal intrusive as possible? For instance, when a group of researchers uses different tools to manage their bibliographies?

@mlep
Copy link
Contributor

mlep commented Aug 21, 2015

Sharing a file while using a revision control software causes trouble if the Field saving options are not identical among users.
Hence, to facilitate sharing, I suggest to add in the See Options -> Preferences -> File -> "Field saving options" a "master switch" named something like "Default sharing format". That way, to make sharing smooth, users will simply have to ensure this single switch is selected.
I do not know which of the 5 above options should be selected/unselected by the "Default sharing format" switch, but I am sure that ensuring that everybody have the same settings among 5 options is going to be a pain.

@stefan-kolb
Copy link
Member

Considering the configuration options you mentioned, Olly, i would suggest to remove (most of) them and just define one format for JabRef that suits our ideas and concept. This removes the problems that mlep described.

@mlep
Copy link
Contributor

mlep commented Aug 21, 2015

Well, "one format [...] that suits our ideas and concept", may not suit users...
These could be because of personal preferences or because of compatibility with other software. So, I would suggest to keep the possibility of various field saving options.

@koppor
Copy link
Member Author

koppor commented Aug 21, 2015

I think, this is the discussion GNOMEv2 vs. GNOMEv3, where they removed various configuration options. I see the following options:

A) Remove all configuration options and introduce a "one size fits all" format
B) Keep everything as is
C) Add support for the old 2.9.2 format

A) This increases testability (as less configuration combinations have to be tested) and eases sharing of BibTeX files across JabRef users.
B) We still have to add test cases. - Side comment: At "database properties", one can add the file sorting options. We could add a button "make database ready for sharing" or, even better, set a suitable default JabRef configuration. As soon as the old version vanish (and the users have reset their preferences), additional configuration is only required for special cases
C) This causes much effort (at least 20h) and is only because there are users not updating their software. Even debian stable has JabRef v2.10.

In all cases, we should check whether other BibTeX-based managers or compatible. We can also neglect them and assume that other BibTeX managers are used seldomly. The Microsoft Office way ^^.

@stefan-kolb
Copy link
Member

I agree with you all, there is no wrong or right here. My main argument is that

  • configuration makes usage difficult as you have to know how to configure and that there is a configuration option for certain features
  • makes portability problematic as there might be a lot of different user configurations
  • is a threat to testing and stability because of various possible states of configuration

Therefore my personal principles

  • should only be used if there is a real benefit by this configuration option
  • the good old convention over configuration concept really condenses my personal favors.

@mlep
Copy link
Contributor

mlep commented Aug 21, 2015

To me, C) should not be implemented. When a new version (of a freely available software) is released, it is the responsibility of the user if he/she want to use the old version.
Between A and B, I favor B: if the default settings are the best for sharing, it is the responsibility of the user to use another configuration. And to report a bug if he/she find one (so, the user does the testing ;-) )

@koppor
Copy link
Member Author

koppor commented Aug 21, 2015

Regarding C), I get more bug reports than usual, which is an indicator that it is important for our users. See sf feature request #864 and related bugs. I can be the bad guy and just say "We don't have resources for that. If someone volunteers, we will include the patch.". Or we could collect donations for it. Or we should put it in the survey as option for next development (what we currently have)?

FLOSS COACH says: "some people are direct, critical or seem to be rude." 😎

@mlep
Copy link
Contributor

mlep commented Aug 24, 2015

Well, if C) is a feature requested by user, I guess it should be kept...
And if the main contributors to JabRef do not have time to do it, I do not see anything bad about asking the community for a volunteer.

@lenhard
Copy link
Member

lenhard commented Aug 24, 2015

I'll throw in my hat for option A.

From my point of view, JabRef has gotten so complex and ill-structured that there's a considerable risk that it will die because no one can maintain or extend it anymore. Following this logic, anything that reduces complexity, even if we lose features, is desirable.

@koppor
Copy link
Member Author

koppor commented Oct 7, 2015

Regarding A). If we follow that, we should use the sort order given by the biblatex manual (Sections 2.1.1 and 2.1.2). We should also go back to field names written in lowercase letters. OK, this sort odering is also not always optimal: in incollection, the introduction comes before subtitle. Nevertheless, the current serialization is strange in the case of article: The number comes before volume and inbetween, there is pages.

@koppor koppor changed the title JabRef should support saving in the old 2.9.2 format again Define optimal serialization format and variations (dropping or reviving the 2.9.2 format) Oct 7, 2015
@lenhard
Copy link
Member

lenhard commented Oct 13, 2015

Regarding the field ordering, I guess the btxdoc reference available here is the ultimate source. Sect. 3.1, p. 8, states that "The fields within each class (required or optional) are listed in order of occurrence in the output, except that a few entry types may perturb the order slightly, depending on what fields are missing."

Hence, I would suggest that we hard code the order defined in the manual (easy, can be done during the definition of bibtex entries). Unknown fields can be sorted alphabetically after known fields. Btw.: where do entry types, such as "patent" or "periodical" come from?

@kubovy
Copy link
Contributor

kubovy commented Oct 14, 2015

I am also for A). The reason i made some of the options configurable was to support old use of JabRef and keep the option to decide if such "pretty" formatting would eventually be accepted.

I am using VCS for bibliographies and papers and I also share bibliographies between papers, e.g. using GIT modules. Therefore a unified "pretty" formatting always contributed to orientation in a BibTeX files and effectiveness during changes. I would be definitely against some binary-like serialization. BibTeX is a plain/text file and should be IMHO also readable using other tools, supporting text-view only, e.g. Overleaf (where such formatting also contributes to better orientation especially in huge BibTeX files).

AFAIK camel-case is not a problem in BibTeX but contributes to readability.

Sorting: title first, required fields, optional fields, other fields, grouped and separated with an empty line helps to understand the BibTeX structure when viewed in plain text. Fixed sorting in groups (alphabetically) also contributes to orientation and less conflicts during changes in VCS.

If the order has some compatibility issues, as Lenhard said, or some rules are saying that, e.g., volume should come before number, then I would also be for making fix (hard-coded) ordering of required and optional fields. But I would still keep alphabetical order of other fields and spaces between the groups and CamelCase.

My suggestion would be to remove as much as possible configuration elements and keep only those which are necessary (in my opinion none of those regarding formatting are). Maybe enable some "advanced" mode where some configuration is possible. Fix serialization supporting as much as possible readable BibTeX source.

Backwards compatibility regarding this issue is in my opinion not necessary. New version can still read files produced by old versions and vice-versa. New version produces new formatting, old version old formatting. The user can decide which version (s)he will use. If used along with VCS than the will be one commit with the "JabRef upgrade" comment :-) And from that point new commits will be "pretty".

@mlep
Copy link
Contributor

mlep commented Oct 14, 2015

@lenhard: About "patent" and "periodical" (but also "standard" and "electronic"), I believe they come from IEEE. See http://mirror.ibcp.fr/pub/CTAN/macros/latex/contrib/IEEEtran/bibtex/IEEEtran_bst_HOWTO.pdf

@lenhard
Copy link
Member

lenhard commented Oct 14, 2015

I think we are nearning a consensus :-) We will decide the issue during the next developer conference call.

I largely agree with @kubovy maybe expect for:

@mlep: Thanks! We can use that guide as a reference to fix the field ordering as well.

@kubovy
Copy link
Contributor

kubovy commented Oct 14, 2015

http://www.bibtex.org/Format/ they use for example: @string and @string -> it is case insensitive. IMHO camel-case formatting is just about how it appears to the human eye. Also there is this example:

@article{mrx05, 
    auTHor = "Mr. X", 
    Title = {Something Great}, 
    publisher = "nob" # "ody", 
    YEAR = 2005, 
} 

@bluebirch
Copy link

Sorry for a late instep, but what about biblatex?

@lenhard
Copy link
Member

lenhard commented Oct 15, 2015

@bluebirch: I'd say we treat biblatex in the same fashion as bibtex. Code-wise, this should make no difference. In JabRef's model, a biblatex entry is also a BibtexEntry. Sorting of fields can be taken from the the biblatex manual.

@kubovy: The parser should be able to parse the camel casing (I should write a test). The idea is to remove it when writing back the file.

@bluebirch
Copy link

@lenhard: Totally agree. Fully satisfied with that answer. ;-)

@lenhard
Copy link
Member

lenhard commented Nov 11, 2015

Since we were discussing the serialization format in a wider public in this issue, I'd like to document our decisions from yesterday's conference call here:

Rationale: People are very emotional about formatting -> Modify as little as possible in the bib file

For the concrete checks, see the initial post.

@mlep
Copy link
Contributor

mlep commented Nov 11, 2015

Nice!
Have you talked about some default settings for collaborative work using a VCS?

@koppor
Copy link
Member Author

koppor commented Nov 11, 2015

What do you mean by default settings? By removing all configuration options, we are VCS ready. The only thing what was difficult to decide is regarding "New entries are added to the bottom of the file". This will really cause issues when two collaborators are adding entries. The decision was that we cannot assume a particular order of the entries in the bibtex file. And if we add something after the first match, that could be wrong. First match is something like: First place of bibtex key or first place when considering sorting according to author, title, year. For instance, existing keys: a, d, f, c., new key b would be added after a, even the list was not sorted.

@mlep
Copy link
Contributor

mlep commented Nov 12, 2015

I was thinking about a way of ensuring all users have the same settings for the formatters (for example) when a file is used for collaborative work.
This could be set by a dedicated file property. If set, JabRef will use the default formatters (and not the user configuration). Just an idea.

@koppor
Copy link
Member Author

koppor commented Nov 12, 2015

@mlep There won't be any configuration settings any more. Nothing more to worry about different settings. This also makes #180 obsolete.

@matthiasgeiger
Copy link
Member

  • Fixed and non-configurable format (dropping several configuration options) for new or modified entries. Fields are sorted as stated in the Bibtex / Biblatex manuals. the = value part is indented to one space past the longest field key name (so that all values are aligned).

Okay - Olly was faster ;-)

So the only requirement is: all users should use the same (new) JabRef version.

@mlep
Copy link
Contributor

mlep commented Nov 12, 2015

@koppor & @matthiasgeiger :
From a previous discussion, I understood that we will not have any settings for the file content.
But in this thread, I read:

Provide formatter (Example: Remove tabs / newlines / duplicate spaces) that can be enabled by a user for specific fields onLoad / onSave. These are not enabled by default.

What if the user does not use the default formatter settings? (I am afraid it may mess up the VCS...)

@lenhard
Copy link
Member

lenhard commented Jan 26, 2016

I am closing this issue, since he features discussed here have been implemented.

A nice UI for save actions is still missing. Discussion concerning this UI can carry on in a separate issue: #720

@koppor
Copy link
Member Author

koppor commented Dec 23, 2016

Just for the sake of documentation. emacs does it as follows (bibtex-fill-entry)

@Article{smith1980,
  author =       {John Smith},
  title =        {How I Weave Baskets Underwater},
  journal =      {Journal of Underwater Basket Weaving and Nonsensical
                  Latin Placeholder Texts},
  year =         1980,
  abstract =     {Lorem ipsum dolor sit amet, consectetur adipiscing
                  elit, sed do eiusmod tempor incididunt ut labore et
                  dolore magna aliqua. Ut enim ad minim veniam, quis
                  nostrud exercitation ullamco laboris nisi ut aliquip
                  ex ea commodo consequat.  Duis aute irure dolor in
                  reprehenderit in voluptate velit esse cillum dolore
                  eu fugiat nulla pariatur. Excepteur sint occaecat
                  cupidatat non proident, sunt in culpa qui officia
                  deserunt mollit anim id est laborum.},
}

Source: http://tex.stackexchange.com/q/345190/9075

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants