Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write unmodified entries to bib file in the same format as they were read #391

Merged
merged 52 commits into from
Dec 6, 2015

Conversation

lenhard
Copy link
Member

@lenhard lenhard commented Nov 25, 2015

As discussed in #116, we want to write entries that are not modified during a session back in the exact same format as they were read. This PR is WIP in this direction and not complete yet. I will continue working on it.

To be able to write back an entry in the same fashion as it is read, we need to store it upon reading. I added a field to BibtexEntry for storing this and tried to modify BibtexParser to store the file content, but to no avail. The current code of that class is pure hell (uses a global PushBackReader to read the file in a very confusing way) and this PR is hopefully a step towards its replacement.

Instead of modifying the logic in BibtexParser, I extend it with additional methods that perform the new functionality on top. Current status:

  1. On initialization the complete file is read into a List<String> of its lines before handing it over to the old parser.
  2. After an entry has been parsed, the original represenation of the entry is looked up in the List<String> and stored in the entry. Blank lines following the entry belong to the entry.

That means we read the file twice, which is terrible for large files, but as this is WIP...

Next step will be to detect when an entry has been changed and the modification of the writing logic.

The handling of newlines currently is not consistent.

@simonharrer
Copy link
Contributor

Nice. Hm, what to do when two entries share a line? E.g. @article{key1, title={a}}@article{key2, title={b}}?

@lenhard
Copy link
Member Author

lenhard commented Nov 26, 2015

In that case, the current implementation fails.

But that gives me an idea: Isn't @ a valid delimiter that is only used for setting bibtex entry types, but forbidden in other parts of the file? If so, I could use it for reading the file, instead of the newline character. This is very easy to do with a Scanner and would make things way easier.

@oscargus
Copy link
Contributor

There is the issue of comments, which potentially can contain @. But as far as I know, everything outside of @xxx{..} is a comment.

@tobiasdiez
Copy link
Member

The @ symbol can also occur in an entry, for example in an email-address. The current implementation has also the problem that it fails if the entry has no bibtex key. I fear there is no easy way to find an entry in a bibtex file without actually parsing it completely.

@koppor
Copy link
Member

koppor commented Nov 26, 2015

Isn't it possible to add a reference to the source line and column during parsing? This should be offered by ANTLR somehow, isn't it?

I assume that JBibTeX. Doesn't support that. Refs #123.

@lenhard
Copy link
Member Author

lenhard commented Nov 26, 2015

@oscargus: that would not be a problem. It's more of a problem if an @ is within an entry

@koppor: This is work in progress. I'll start commenting when I have something closer to finalization. Do we want to parse the file twice? If so, then jbibtex might be an option. Can you tell me if jbibtex writes back an entry in exactly the same format (up to every line feed character) as it was read?

@tobiasdiez Thanks, I am aware of that. This is work in progress. I'd first like to arrive at a solution that does the round-trip for a well-structured database. Then, I'll modify it to match incomplete entries.

@lenhard
Copy link
Member Author

lenhard commented Nov 27, 2015

Ok, now this works nicely with my large local database that doesn't include analomies.

Now comes the hard part: Properly reading and writing files with strange content. This is also the reason why some tests are failing at the moment. I might need to discuss some of these aspects and will ping you in this thread if I am unsure.

@lenhard
Copy link
Member Author

lenhard commented Nov 27, 2015

Dammit, I wanted to get some work done today and now I am only coding JabRef...

Anyways, this seems to work (and passes the tests)! I ditched my previous parsing logic and rewrote it in integration with the BibtexParser, which wasn't as hard as I claimed above (my bad).

I am now able to store each entry exactly as it is in the file. Each entry serialization contains all text from its end up to the end of previous entry. The main problem with this is that with this logic, the first entry also stores the file header and then the header is serialized twice when rewriting the file. I was able to get around that with some hacking, but there might still be edge cases I am not yet aware of.

To sum up, this needs some more manual testing, but should be pretty close. Feedback is welcome :)

* marks if the complete serialization, which was read from file, should be used.
* Is set to false, if parts of the entry change
*/
private boolean useCustomSerialization;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this (and the corresponding methods, of course) as it is always used as "if not changed then..." to "useParsedSerialization" or even to "hasChanged".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll do some renaming. On second thought, I will also replace the Deque with a Stack, because this is the way in which it is used.

@matthiasgeiger
Copy link
Member

Tested it manually with some entries and it works smoothly!

Good job! 👍

@matthiasgeiger matthiasgeiger added this to the v3.0 milestone Nov 27, 2015
@lenhard
Copy link
Member Author

lenhard commented Nov 27, 2015

Should we really introduce this in v3.0 already? I'm not entirely sure it works on all occaison. For instance, I haven't tested it with strings in the bib file so far.

@lenhard
Copy link
Member Author

lenhard commented Dec 2, 2015

@matthiasgeiger But that does not really depend on the version information. The version is checked, but also every file written by JabRef 3.0 is marked as problematic. The file links upgrade really depends on whether the file contains fields named pdf or ps. It wouldn't make a difference isActionNecessary always returned true, regardless of the version.

Anyways, if you really really want the version parsing, we can keep it.

@matthiasgeiger
Copy link
Member

okay... I have to confess that I only performed a "find usages" search without truly checking whether the usages are really needed.

So feel free to remove those lines ;-)

@lenhard
Copy link
Member Author

lenhard commented Dec 2, 2015

I removed the old meta flag and version parsing. Version headers are now simply removed on rewrite. File upgrade functionality ultimately depends on preferences (and on whether JabRef finds errors in the file to begin with).

@lenhard
Copy link
Member Author

lenhard commented Dec 4, 2015

So, from my point of view this PR is complete and ready to merge.

@koppor and I did some more testing with more advanced things like meta-data and it seems fine. I'm not guaranteeing the absence of bugs, but there should be no more obvious mistakes.

Somebody else can take a final look and merge this with master.

@lenhard lenhard changed the title [WIP] Write unmodified entries to bib file in the same format as they were read Write unmodified entries to bib file in the same format as they were read Dec 4, 2015
@simonharrer
Copy link
Contributor

Looks good to me. It worked with my large .bib file without any issues.

@koppor
Copy link
Member

koppor commented Dec 6, 2015

@simonharrer Could you check what happens if you change the sort ordering of the file? Change the ordering. If everything is OK, change the ordering and change an entry. 🐇 I it doesn't work, maybe we drop the ordering possibilities of entries within JabRef now or leave that as open issue.

@simonharrer
Copy link
Contributor

What do you think is the problem here? When I change the sort ordering, the order of the entries is changed in the file. But nothing else is. And if I also change an entry, this entry is changed as well. I do not get what you imply.

koppor added a commit that referenced this pull request Dec 6, 2015
Write unmodified entries to bib file in the same format as they were read
@koppor koppor merged commit 741c40e into master Dec 6, 2015
@koppor koppor deleted the stable-serialization branch December 6, 2015 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants