-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BOM now missing at beginning of bibliography file -- causes JabRef to not recognize existing library #9496
Comments
Thanks for reporting, does the bib file include a header line with % Encoding encoding? In general JabRef tries to detect the encoding for reading and will write in normal UTF8 if no header line is present |
Yes, the bib file does include a % Encoding line. This now reads "% Encoding: UTF-16BE" however at the last update I had (when the BOM was working) this line read "% Encoding: UTF-16" (that is, without the "BE"). I have attached two bib files. The first, "tiny-1-withBOM.bib" works fine and can be successfully read by JabRef. If however you read this file and save it, it will then match "tiny-2-noBOM.bib". The difference between the files is simply the two 0xfeff bytes prior to the '%' beginning the header proper that are missing in the second one. Thanks for looking into this. |
I just looked up what "UTF-16BE" is meant to mean, and the "BE" part is trying to flag that the file is "big endian". The problem with this, in this context, is that the endianness of the file is required in order to correctly parse the 16-bit characters of the file, so without the BOM the "first" character (the "%" sign) will get loaded as character 0x2500 ("Box drawings light horizontal") rather than as 0x0025 ("percent"). The "% Encoding" strategy works well for UTF-8 as it is a orderless encoding (one byte processed at a time), but UTF-16 requires the order to be known before any characters are parsed at all. Not sure if this helps, or if this is already obvious to everyone. Sorry if I am over-explaining. |
Thanks for the additional information. For reference, we have been down that rabbit hole in #8947 and unicode-org/icu#2127 |
Great – thanks for letting me know!
|
In light of the examples in linked threads, maybe it is helpful to show the direct byte encodings in the files. I have shown them here with hexdump(1) and od(1) "octal dump" -- both of these are available command line tools under Linux and MacOSX. Note the two bytes forming the BOM (0xFE 0xFF) shown prior to two byte sequence (<nul>-'%') forming the first readable Unicode character of the file. |
Could you try the latest development version? I think, this is a duplicate of #9926, which was fixed recently. |
Hi Oliver,
As a side note, I tried getting the latest development version for MacOSX using the .dmg file and the resulting application as installed was corrupted. I installed the .gz version and it is fine.
Having installed the .gz version, I *think* that the issue is fixed?
The current behaviour seems to be that it reads UTF16BE files if they have a BOM, but the ones that it previously created without the BOM (that I would argue can be seen as invalid) are broken.
This will orphan anyone who used the previous version with UTF16BE files previous to the last major release, and they will need to update their files externally -- as long as everyone understands that, then I think we are all on the same page.
Thanks for getting this fixed.
Andrew
From: Oliver Kopp ***@***.***>
Date: Tuesday, June 6, 2023 at 15:35
To: JabRef/jabref ***@***.***>
Cc: Andrew Wright ***@***.***>, Author ***@***.***>
Subject: Re: [JabRef/jabref] BOM now missing at beginning of bibliography file -- causes JabRef to not recognize existing library (Issue #9496)
CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to ***@***.***
Could you try the latest development version?
I think, this is a duplicate of #9926<#9926>, which was fixed recently.
—
Reply to this email directly, view it on GitHub<#9496 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABPDAX2UKB63NOCAZVP7N3LXJ6BAZANCNFSM6AAAAAATIQLGVA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Regarding the Mac OS X bug, there is a work around: #9553 |
Note to us: There was a fix on May 20 (#9927), but at the comment on June, it said, some files can be broken. We need
|
If you are referring to the files I uploaded in the tiny-bib-example.zip file on Dec 24, 2022 above, then the test cases are simply this:
Expected behaviour (as far as I understand it):
Is that what you need? |
If it is helpful, here are the "tiny" files in both big and little endian formats, with and without BOM markers. |
JabRef version
5.8 (latest release)
Operating system
macOS
Details on version and operating system
Darwin daphne.local 22.2.0 Darwin Kernel Version 22.2.0: Fri Nov 11 02:03:51 PST 2022; root:xnu-8792.61.2~4/RELEASE_ARM64_T6000 arm64
Checked with the latest development build
Steps to reproduce the behaviour
Note that if you reestablish the BOM using an external editor and then open the file again using JabRef, all is well until the bibliography is saved again.
Note that this may be apparent on my machine because I have an ARM processor, so this error may not be reproducible on an older Mac with an Intel processor.
The underlying problem is simply that the BOM is now missing during write. Putting the BOM back in (as it was in older JabRef versions) will fix the problem.
Appendix
...
Log File
The text was updated successfully, but these errors were encountered: