Write unmodified entries to bib file in the same format as they were read #391

lenhard · 2015-11-25T17:06:37Z

As discussed in #116, we want to write entries that are not modified during a session back in the exact same format as they were read. This PR is WIP in this direction and not complete yet. I will continue working on it.

To be able to write back an entry in the same fashion as it is read, we need to store it upon reading. I added a field to BibtexEntry for storing this and tried to modify BibtexParser to store the file content, but to no avail. The current code of that class is pure hell (uses a global PushBackReader to read the file in a very confusing way) and this PR is hopefully a step towards its replacement.

Instead of modifying the logic in BibtexParser, I extend it with additional methods that perform the new functionality on top. Current status:

On initialization the complete file is read into a List<String> of its lines before handing it over to the old parser.
After an entry has been parsed, the original represenation of the entry is looked up in the List<String> and stored in the entry. Blank lines following the entry belong to the entry.

~~That means we read the file twice, which is terrible for large files, but as this is WIP...~~

Next step will be to detect when an entry has been changed and the modification of the writing logic.

The handling of newlines currently is not consistent.

simonharrer · 2015-11-25T20:27:57Z

Nice. Hm, what to do when two entries share a line? E.g. @article{key1, title={a}}@article{key2, title={b}}?

lenhard · 2015-11-26T09:02:09Z

In that case, the current implementation fails.

But that gives me an idea: Isn't @ a valid delimiter that is only used for setting bibtex entry types, but forbidden in other parts of the file? If so, I could use it for reading the file, instead of the newline character. This is very easy to do with a Scanner and would make things way easier.

oscargus · 2015-11-26T09:14:50Z

There is the issue of comments, which potentially can contain @. But as far as I know, everything outside of @xxx{..} is a comment.

tobiasdiez · 2015-11-26T09:30:00Z

The @ symbol can also occur in an entry, for example in an email-address. The current implementation has also the problem that it fails if the entry has no bibtex key. I fear there is no easy way to find an entry in a bibtex file without actually parsing it completely.

koppor · 2015-11-26T09:40:53Z

Isn't it possible to add a reference to the source line and column during parsing? This should be offered by ANTLR somehow, isn't it?

I assume that JBibTeX. Doesn't support that. Refs #123.

lenhard · 2015-11-26T09:57:19Z

@oscargus: that would not be a problem. It's more of a problem if an @ is within an entry

@koppor: This is work in progress. I'll start commenting when I have something closer to finalization. Do we want to parse the file twice? If so, then jbibtex might be an option. Can you tell me if jbibtex writes back an entry in exactly the same format (up to every line feed character) as it was read?

@tobiasdiez Thanks, I am aware of that. This is work in progress. I'd first like to arrive at a solution that does the round-trip for a well-structured database. Then, I'll modify it to match incomplete entries.

Conflicts: src/main/java/net/sf/jabref/model/entry/BibtexEntry.java

lenhard · 2015-11-27T11:46:35Z

Ok, now this works nicely with my large local database that doesn't include analomies.

Now comes the hard part: Properly reading and writing files with strange content. This is also the reason why some tests are failing at the moment. I might need to discuss some of these aspects and will ping you in this thread if I am unsure.

…newlines

lenhard · 2015-11-27T13:49:14Z

Dammit, I wanted to get some work done today and now I am only coding JabRef...

Anyways, this seems to work (and passes the tests)! I ditched my previous parsing logic and rewrote it in integration with the BibtexParser, which wasn't as hard as I claimed above (my bad).

I am now able to store each entry exactly as it is in the file. Each entry serialization contains all text from its end up to the end of previous entry. The main problem with this is that with this logic, the first entry also stores the file header and then the header is serialized twice when rewriting the file. I was able to get around that with some hacking, but there might still be edge cases I am not yet aware of.

To sum up, this needs some more manual testing, but should be pretty close. Feedback is welcome :)

matthiasgeiger · 2015-11-27T14:16:47Z

src/main/java/net/sf/jabref/model/entry/BibtexEntry.java

+    * marks if the complete serialization, which was read from file, should be used.
+    * Is set to false, if parts of the entry change
+     */
+    private boolean useCustomSerialization;


I would rename this (and the corresponding methods, of course) as it is always used as "if not changed then..." to "useParsedSerialization" or even to "hasChanged".

Good point, I'll do some renaming. On second thought, I will also replace the Deque with a Stack, because this is the way in which it is used.

matthiasgeiger · 2015-11-27T14:30:04Z

Tested it manually with some entries and it works smoothly!

Good job! 👍

lenhard · 2015-11-27T15:23:49Z

Should we really introduce this in v3.0 already? I'm not entirely sure it works on all occaison. For instance, I haven't tested it with strings in the bib file so far.

matthiasgeiger · 2015-12-02T11:46:50Z

e.g. https://github.com/JabRef/jabref/blob/master/src/main/java/net/sf/jabref/migrations/FileLinksUpgradeWarning.java#L65

lenhard · 2015-12-02T12:57:34Z

@matthiasgeiger But that does not really depend on the version information. The version is checked, but also every file written by JabRef 3.0 is marked as problematic. The file links upgrade really depends on whether the file contains fields named pdf or ps. It wouldn't make a difference isActionNecessary always returned true, regardless of the version.

Anyways, if you really really want the version parsing, we can keep it.

matthiasgeiger · 2015-12-02T13:20:48Z

okay... I have to confess that I only performed a "find usages" search without truly checking whether the usages are really needed.

So feel free to remove those lines ;-)

…sity to do a file upgrade

lenhard · 2015-12-02T15:44:33Z

I removed the old meta flag and version parsing. Version headers are now simply removed on rewrite. File upgrade functionality ultimately depends on preferences (and on whether JabRef finds errors in the file to begin with).

Conflicts: src/main/java/net/sf/jabref/model/entry/BibtexString.java

…ot of appending ones

Also save using Eclipse Formatting rules

…tadata

lenhard · 2015-12-04T17:36:51Z

So, from my point of view this PR is complete and ready to merge.

@koppor and I did some more testing with more advanced things like meta-data and it seems fine. I'm not guaranteeing the absence of bugs, but there should be no more obvious mistakes.

Somebody else can take a final look and merge this with master.

simonharrer · 2015-12-06T14:07:04Z

Looks good to me. It worked with my large .bib file without any issues.

koppor · 2015-12-06T15:16:13Z

@simonharrer Could you check what happens if you change the sort ordering of the file? Change the ordering. If everything is OK, change the ordering and change an entry. 🐇 I it doesn't work, maybe we drop the ordering possibilities of entries within JabRef now or leave that as open issue.

simonharrer · 2015-12-06T16:38:40Z

What do you think is the problem here? When I change the sort ordering, the order of the entries is changed in the file. But nothing else is. And if I also change an entry, this entry is changed as well. I do not get what you imply.

Write unmodified entries to bib file in the same format as they were read

lenhard added 2 commits November 25, 2015 17:40

Read each file twice to have a proper String represenation

af04d34

Store serialization in each entry

87aaa0b

Use '@' as delimiter for parsing

f3405c6

lenhard added 10 commits November 27, 2015 10:42

Merge branch 'master' into stable-serialization

51ccd2a

Conflicts: src/main/java/net/sf/jabref/model/entry/BibtexEntry.java

Add boolean variable to check if an entry was been changed

9bd3471

Write original serialization if entry has not been modified

4d7164d

Add lists with entry type names

7282782

Check if parsed tokens start with entry type name

dd4cdea

Set Preferences in BibtexParserTest

aa4b8aa

Compare entry types via lower case

025a101

Always right text back during preprocessing

aa234e1

Only append new lines if an entry is reformatted

dfd93a9

Merge branch 'master' into stable-serialization

3e3ed1e

lenhard added 4 commits November 27, 2015 14:01

Replace parsing of raw serialization with Deque

db65cf9

Fix tests, since parsing of unmodified entries does no longer remove …

14dea02

…newlines

Remove newlines before entry

bf1314f

Only keep newlines if there is another entry before

8980058

matthiasgeiger reviewed Nov 27, 2015
View reviewed changes

matthiasgeiger added this to the v3.0 milestone Nov 27, 2015

lenhard added 2 commits December 2, 2015 16:10

Remove deprecated META_FLAG_OLD

a4858ee

Remove deprecated version parsing and improve detection for the neces…

bef862f

…sity to do a file upgrade

lenhard and others added 13 commits December 3, 2015 14:13

Merge branch 'master' into stable-serialization

531ec98

Conflicts: src/main/java/net/sf/jabref/model/entry/BibtexString.java

Add test for utf8-bom mark

9b30ba0

Never save the serialization of meta-comments

7c77f96

The serialization of an entry takes care of preceding newlines, but n…

8f47afe

…ot of appending ones

Fix double serialization of @comment{jabref-entrytype

5203b50

Also save using Eclipse Formatting rules

Avoid duplicate serialization of preamble

26b8d9a

Separate file epilog from last entry and always write newlines for me…

9faa5ae

…tadata

Improve Exception Handling in FileActions

6dce25b

Fix serialization of newlines for meta data and custom entry types

c3a756e

Remove unused variable

1956112

Remove obsolete boolean variable

0ffed33

Do not add additional newline for Bibtex Strings

6f0827b

There should be 3 newlines appended to the encoding

d89eb00

lenhard changed the title ~~[WIP] Write unmodified entries to bib file in the same format as they were read~~ Write unmodified entries to bib file in the same format as they were read Dec 4, 2015

koppor added a commit that referenced this pull request Dec 6, 2015

Merge pull request #391 from JabRef/stable-serialization

741c40e

Write unmodified entries to bib file in the same format as they were read

koppor merged commit 741c40e into master Dec 6, 2015

koppor deleted the stable-serialization branch December 6, 2015 16:39

This was referenced Dec 21, 2015

BibTeX source should not start with two empty lines #559

Closed

Rethink displayable fields vs. writeable fields #574

Closed

koppor mentioned this pull request Apr 16, 2018

Bibtex parser - Found unbracketed comment #3956

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write unmodified entries to bib file in the same format as they were read #391

Write unmodified entries to bib file in the same format as they were read #391

lenhard commented Nov 25, 2015

simonharrer commented Nov 25, 2015

lenhard commented Nov 26, 2015

oscargus commented Nov 26, 2015

tobiasdiez commented Nov 26, 2015

koppor commented Nov 26, 2015

lenhard commented Nov 26, 2015

lenhard commented Nov 27, 2015

lenhard commented Nov 27, 2015

matthiasgeiger Nov 27, 2015

lenhard Nov 27, 2015

matthiasgeiger commented Nov 27, 2015

lenhard commented Nov 27, 2015

matthiasgeiger commented Dec 2, 2015

lenhard commented Dec 2, 2015

matthiasgeiger commented Dec 2, 2015

lenhard commented Dec 2, 2015

lenhard commented Dec 4, 2015

simonharrer commented Dec 6, 2015

koppor commented Dec 6, 2015

simonharrer commented Dec 6, 2015

Write unmodified entries to bib file in the same format as they were read #391

Write unmodified entries to bib file in the same format as they were read #391

Conversation

lenhard commented Nov 25, 2015

simonharrer commented Nov 25, 2015

lenhard commented Nov 26, 2015

oscargus commented Nov 26, 2015

tobiasdiez commented Nov 26, 2015

koppor commented Nov 26, 2015

lenhard commented Nov 26, 2015

lenhard commented Nov 27, 2015

lenhard commented Nov 27, 2015

matthiasgeiger Nov 27, 2015

Choose a reason for hiding this comment

lenhard Nov 27, 2015

Choose a reason for hiding this comment

matthiasgeiger commented Nov 27, 2015

lenhard commented Nov 27, 2015

matthiasgeiger commented Dec 2, 2015

lenhard commented Dec 2, 2015

matthiasgeiger commented Dec 2, 2015

lenhard commented Dec 2, 2015

lenhard commented Dec 4, 2015

simonharrer commented Dec 6, 2015

koppor commented Dec 6, 2015

simonharrer commented Dec 6, 2015