BOM now missing at beginning of bibliography file -- causes JabRef to not recognize existing library #9496

andrewhw · 2022-12-24T15:58:53Z

JabRef version

5.8 (latest release)

Operating system

macOS

Details on version and operating system

Darwin daphne.local 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:03:51 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T6000 arm64

Checked with the latest development build

I made a backup of my libraries before testing the latest development version.
I have tested the latest development version and the problem persists

Steps to reproduce the behaviour

Begin with an existing bibliography file
Update to the newest JabRef
Save the database (no edits required)
You will likely get a warning that "the library has been modified by another program". This is not actually true. Dismiss changes.
Examine the bibliography file using a text editor. The BOM (the bytes at the beginning of the file forming the Byte Order Mark) are now missing.
Reopening the file using JabRef will now cause a "no content in table" error after opening.

Note that if you reestablish the BOM using an external editor and then open the file again using JabRef, all is well until the bibliography is saved again.

Note that this may be apparent on my machine because I have an ARM processor, so this error may not be reproducible on an older Mac with an Intel processor.

The underlying problem is simply that the BOM is now missing during write. Putting the BOM back in (as it was in older JabRef versions) will fix the problem.

Appendix

...

Log File

Paste an excerpt of your log file here

The text was updated successfully, but these errors were encountered:

Siedlerchr · 2022-12-24T18:49:32Z

Thanks for reporting, does the bib file include a header line with % Encoding encoding? In general JabRef tries to detect the encoding for reading and will write in normal UTF8 if no header line is present
Additionally, could you please provide the bib file for us for debugging? You can also send it privately to web@jabref.org

andrewhw · 2022-12-24T19:01:16Z

Yes, the bib file does include a % Encoding line. This now reads "% Encoding: UTF-16BE" however at the last update I had (when the BOM was working) this line read "% Encoding: UTF-16" (that is, without the "BE").

I have attached two bib files. The first, "tiny-1-withBOM.bib" works fine and can be successfully read by JabRef. If however you read this file and save it, it will then match "tiny-2-noBOM.bib". The difference between the files is simply the two 0xfeff bytes prior to the '%' beginning the header proper that are missing in the second one.

Thanks for looking into this.
tiny-bib-example.zip

andrewhw · 2022-12-25T03:05:01Z

I just looked up what "UTF-16BE" is meant to mean, and the "BE" part is trying to flag that the file is "big endian".

The problem with this, in this context, is that the endianness of the file is required in order to correctly parse the 16-bit characters of the file, so without the BOM the "first" character (the "%" sign) will get loaded as character 0x2500 ("Box drawings light horizontal") rather than as 0x0025 ("percent").

The "% Encoding" strategy works well for UTF-8 as it is a orderless encoding (one byte processed at a time), but UTF-16 requires the order to be known before any characters are parsed at all.

Not sure if this helps, or if this is already obvious to everyone. Sorry if I am over-explaining.

Siedlerchr · 2022-12-25T09:08:55Z

Thanks for the additional information. For reference, we have been down that rabbit hole in #8947 and unicode-org/icu#2127

andrewhw · 2022-12-25T13:02:26Z

Great – thanks for letting me know!

andrewhw · 2022-12-26T18:17:21Z

In light of the examples in linked threads, maybe it is helpful to show the direct byte encodings in the files. I have shown them here with hexdump(1) and od(1) "octal dump" -- both of these are available command line tools under Linux and MacOSX.

Note the two bytes forming the BOM (0xFE 0xFF) shown prior to two byte sequence (<nul>-'%') forming the first readable Unicode character of the file.

koppor · 2023-06-06T19:35:28Z

Could you try the latest development version?

I think, this is a duplicate of #9926, which was fixed recently.

andrewhw · 2023-06-06T20:37:09Z

Hi Oliver, As a side note, I tried getting the latest development version for MacOSX using the .dmg file and the resulting application as installed was corrupted. I installed the .gz version and it is fine. Having installed the .gz version, I *think* that the issue is fixed? The current behaviour seems to be that it reads UTF16BE files if they have a BOM, but the ones that it previously created without the BOM (that I would argue can be seen as invalid) are broken. This will orphan anyone who used the previous version with UTF16BE files previous to the last major release, and they will need to update their files externally -- as long as everyone understands that, then I think we are all on the same page. Thanks for getting this fixed. Andrew From: Oliver Kopp ***@***.***> Date: Tuesday, June 6, 2023 at 15:35 To: JabRef/jabref ***@***.***> Cc: Andrew Wright ***@***.***>, Author ***@***.***> Subject: Re: [JabRef/jabref] BOM now missing at beginning of bibliography file -- causes JabRef to not recognize existing library (Issue #9496) CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to ***@***.*** Could you try the latest development version? I think, this is a duplicate of #9926<#9926>, which was fixed recently. — Reply to this email directly, view it on GitHub<#9496 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABPDAX2UKB63NOCAZVP7N3LXJ6BAZANCNFSM6AAAAAATIQLGVA>. You are receiving this because you authored the thread.Message ID: ***@***.***>

koppor · 2023-06-07T11:42:56Z

Regarding the Mac OS X bug, there is a work around: #9553

koppor · 2023-07-31T19:55:51Z

Note to us: There was a fix on May 20 (#9927), but at the comment on June, it said, some files can be broken. We need

Test files in the JabRef repository (most likely the files posted at BOM now missing at beginning of bibliography file -- causes JabRef to not recognize existing library #9496 (comment))
Test cases for the test files

andrewhw · 2023-07-31T22:24:37Z

If you are referring to the files I uploaded in the tiny-bib-example.zip file on Dec 24, 2022 above, then the test cases are simply this:

open JabRef with no database
select one of the two .bib files in the zip file above

Expected behaviour (as far as I understand it):

tiny-1-withBOM.bib -- successfully opens the file
tiny-2-noBOM.bib -- an error (the description mentions parsing) when loaded on a big-endian machine (I am on a MacOSX M1 chip machine). I do not know what will happen when loaded on a little-endian machine (e.g. intel chip). The fact that the files themselves have a big-endian ordering and it does not work when loaded on a big-endian machine causes me to suspect that things will go no better on Intel.

Is that what you need?

andrewhw · 2023-07-31T22:34:18Z

If it is helpful, here are the "tiny" files in both big and little endian formats, with and without BOM markers.

tiny-bib-example-endian-and-BOM-combinations.zip

Siedlerchr added export / save unicode unicode related issues labels Dec 24, 2022

koppor self-assigned this Jan 2, 2023

koppor added this to Prioritization Jan 2, 2023

github-project-automation bot moved this to Normal priority in Prioritization Jan 2, 2023

koppor moved this from Normal priority to High priority in Prioritization Jan 2, 2023

koppor added this to the 6.0-beta milestone Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BOM now missing at beginning of bibliography file -- causes JabRef to not recognize existing library #9496

BOM now missing at beginning of bibliography file -- causes JabRef to not recognize existing library #9496

andrewhw commented Dec 24, 2022

Siedlerchr commented Dec 24, 2022 •

edited

Loading

andrewhw commented Dec 24, 2022

andrewhw commented Dec 25, 2022

Siedlerchr commented Dec 25, 2022

andrewhw commented Dec 25, 2022 via email •

edited

Loading

andrewhw commented Dec 26, 2022 •

edited

Loading

koppor commented Jun 6, 2023

andrewhw commented Jun 6, 2023 via email

koppor commented Jun 7, 2023

koppor commented Jul 31, 2023

andrewhw commented Jul 31, 2023

andrewhw commented Jul 31, 2023

BOM now missing at beginning of bibliography file -- causes JabRef to not recognize existing library #9496

BOM now missing at beginning of bibliography file -- causes JabRef to not recognize existing library #9496

Comments

andrewhw commented Dec 24, 2022

JabRef version

Operating system

Details on version and operating system

Checked with the latest development build

Steps to reproduce the behaviour

Appendix

Siedlerchr commented Dec 24, 2022 • edited Loading

andrewhw commented Dec 24, 2022

andrewhw commented Dec 25, 2022

Siedlerchr commented Dec 25, 2022

andrewhw commented Dec 25, 2022 via email • edited Loading

andrewhw commented Dec 26, 2022 • edited Loading

koppor commented Jun 6, 2023

andrewhw commented Jun 6, 2023 via email

koppor commented Jun 7, 2023

koppor commented Jul 31, 2023

andrewhw commented Jul 31, 2023

andrewhw commented Jul 31, 2023

Siedlerchr commented Dec 24, 2022 •

edited

Loading

andrewhw commented Dec 25, 2022 via email •

edited

Loading

andrewhw commented Dec 26, 2022 •

edited

Loading