Skip to content

Commit

Permalink
Fix UTF bom parsing (#9927)
Browse files Browse the repository at this point in the history
  • Loading branch information
Siedlerchr authored May 20, 2023
1 parent 515edcc commit 8d1bb00
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 30 deletions.
11 changes: 7 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,15 +77,17 @@ Note that this project **does not** adhere to [Semantic Versioning](http://semve
- We fixed an issue where searching for unlinked files would include the current library's .bib file. [#9735](https://github.com/JabRef/jabref/issues/9735)
- We fixed an issue where it was no longer possible to connect to a shared mysql database due to an exception. [#9761](https://github.com/JabRef/jabref/issues/9761)
- We fixed an issue where an exception was thrown for the user after <kbd>Ctrl</kbd>+<kbd>Z</kbd> command. [#9737](https://github.com/JabRef/jabref/issues/9737)
- We fixed the citation key generation for (`[authors]`, `[authshort]`, `[authorsAlpha]`, `authIniN`, `authEtAl`, `auth.etal`)[https://docs.jabref.org/setup/citationkeypatterns#special-field-markers] to handle `and others` properly. [koppor#626](https://github.com/koppor/jabref/issues/626)
- We fixed the citation key generation for [`[authors]`, `[authshort]`, `[authorsAlpha]`, `[authIniN]`, `[authEtAl]`, `[auth.etal]`](https://docs.jabref.org/setup/citationkeypatterns#special-field-markers) to handle `and others` properly. [koppor#626](https://github.com/koppor/jabref/issues/626)
- We fixed the Save/save as file type shows BIBTEX_DB instead of "Bibtex library". [#9372](https://github.com/JabRef/jabref/issues/9372)
- We fixed the default main file directory for non-English Linux users. [#8010](https://github.com/JabRef/jabref/issues/8010)
- We fixed an issue when overwriting the owner was disabled. [#9896](https://github.com/JabRef/jabref/pull/9896)
- We fixed an issue regarding recording redundant prefixes in search history. [#9685](https://github.com/JabRef/jabref/issues/9685)
- We fixed an issue where passing a URL containing a DOI led to a "No entry found" notification. [#9821](https://github.com/JabRef/jabref/issues/9821)
- We fixed some minor visual inconsistencies and issues in the preferences dialog. [#9866](https://github.com/JabRef/jabref/pull/9866)
- The order of save actions is now retained. [#9890](https://github.com/JabRef/jabref/pull/9890)
- We fixed an issue where the order of save actions was not retained in the bib file. [#9890](https://github.com/JabRef/jabref/pull/9890)
- We fixed an issue in the preferences 'External file types' tab ignoring a custom application path in the edit dialog. [#9895](https://github.com/JabRef/jabref/issues/9895)
- We fixed an issue in the preferences where custom columns could be added to the entry table with no qualifier. [#9913](https://github.com/JabRef/jabref/issues/9913)
- We fixed an issue where the encoding header in a bib file was not respected when the file contained a BOM (Byte Order Mark). [#9926](https://github.com/JabRef/jabref/issues/9926)

### Removed

Expand All @@ -111,6 +113,7 @@ Note that this project **does not** adhere to [Semantic Versioning](http://semve
- We now have more "dots" in the offered journal abbreviations. [#9504](https://github.com/JabRef/jabref/pull/9504)
- We now disable the button "Full text search" in the Searchbar by default [#9527](https://github.com/JabRef/jabref/pull/9527)


### Fixed

- The tab "deprecated fields" is shown in biblatex-mode only. [#7757](https://github.com/JabRef/jabref/issues/7757)
Expand All @@ -123,8 +126,8 @@ Note that this project **does not** adhere to [Semantic Versioning](http://semve
- For portable versions, the `.deb` file now works on plain debian again. [#9472](https://github.com/JabRef/jabref/issues/9472)
- We fixed an issue where the download of linked online files failed after an import of entries for certain urls. [#9518](https://github.com/JabRef/jabref/issues/9518)
- We fixed an issue where an exception occurred when manually downloading a file from an URL in the entry editor. [#9521](https://github.com/JabRef/jabref/issues/9521)
- We fixed an issue with open office csv file formatting where commas in the abstract field where not escaped. [#9087][https://github.com/JabRef/jabref/issues/9087]
- We fixed an issue with deleting groups where subgroups different from the selected group were deleted. [#9281][https://github.com/JabRef/jabref/issues/9281]
- We fixed an issue with open office csv file formatting where commas in the abstract field where not escaped. [#9087](https://github.com/JabRef/jabref/issues/9087)
- We fixed an issue with deleting groups where subgroups different from the selected group were deleted. [#9281](https://github.com/JabRef/jabref/issues/9281)

## [5.8] - 2022-12-18

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -150,14 +150,15 @@ private static Optional<Charset> getSuppliedEncoding(BufferedReader reader) {
String line;
while ((line = reader.readLine()) != null) {
line = line.trim();

// % = char 37, we might have some bom chars in front that we need to skip, so we use index of
var percentPos = line.indexOf('%', 0);
// Line does not start with %, so there are no comment lines for us and we can stop parsing
if (!line.startsWith("%")) {
if (percentPos == -1) {
return Optional.empty();
}

// Only keep the part after %
line = line.substring(1).trim();
line = line.substring(percentPos + 1).trim();

if (line.startsWith(BibtexImporter.SIGNATURE)) {
// Signature line, so keep reading and skip to next line
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,10 @@
import org.jabref.logic.importer.ImportFormatPreferences;
import org.jabref.logic.importer.ParserResult;
import org.jabref.logic.util.StandardFileType;
import org.jabref.model.database.BibDatabaseMode;
import org.jabref.model.entry.BibEntry;
import org.jabref.model.entry.field.StandardField;
import org.jabref.model.entry.field.UnknownField;
import org.jabref.model.entry.types.StandardEntryType;
import org.jabref.model.metadata.MetaData;
import org.jabref.model.util.DummyFileUpdateMonitor;

import org.junit.jupiter.api.BeforeEach;
Expand Down Expand Up @@ -141,7 +139,9 @@ static Stream<Arguments> testParsingOfEncodedFileWithHeader() {
return Stream.of(
Arguments.of(StandardCharsets.US_ASCII, "encoding-us-ascii-with-header.bib"),
Arguments.of(StandardCharsets.UTF_8, "encoding-utf-8-with-header.bib"),
Arguments.of(Charset.forName("Windows-1252"), "encoding-windows-1252-with-header.bib")
Arguments.of(Charset.forName("Windows-1252"), "encoding-windows-1252-with-header.bib"),
Arguments.of(StandardCharsets.UTF_16BE, "encoding-utf-16BE-with-header.bib"),
Arguments.of(StandardCharsets.UTF_16BE, "encoding-utf-16BE-without-header.bib")
);
}

Expand All @@ -164,36 +164,31 @@ public void testParsingOfWindows1252EncodedFileReadsDegreeCharacterCorrectly(Str
}

@ParameterizedTest
@CsvSource({"encoding-utf-8-with-header.bib", "encoding-utf-8-without-header.bib"})
public void testParsingOfUtf8EncodedFileReadsUmlautCharacterCorrectly(String filename) throws Exception {
@CsvSource({"encoding-utf-8-with-header.bib", "encoding-utf-8-without-header.bib",
"encoding-utf-16BE-with-header.bib", "encoding-utf-16BE-without-header.bib"})
public void testParsingFilesReadsUmlautCharacterCorrectly(String filename) throws Exception {
ParserResult parserResult = importer.importDatabase(
Path.of(BibtexImporterTest.class.getResource(filename).toURI()));
assertEquals(
List.of(new BibEntry(StandardEntryType.Article).withField(StandardField.TITLE, "Ü ist ein Umlaut")),
parserResult.getDatabase().getEntries());
}

@ParameterizedTest
@CsvSource({"encoding-utf-16BE-with-header.bib", "encoding-utf-16BE-without-header.bib"})
public void testParsingOfUtf16EncodedFileReadsUmlautCharacterCorrectly(String filename) throws Exception {
ParserResult parserResult = importer.importDatabase(
Path.of(BibtexImporterTest.class.getResource(filename).toURI()));

assertEquals(
List.of(new BibEntry(StandardEntryType.Article).withField(StandardField.TITLE, "Ü ist ein Umlaut")),
parserResult.getDatabase().getEntries());

MetaData metaData = new MetaData();
metaData.setMode(BibDatabaseMode.BIBTEX);
metaData.setEncoding(StandardCharsets.UTF_16BE);
assertEquals(metaData, parserResult.getMetaData());
private static Stream<Arguments> encodingExplicitlySuppliedCorrectlyDetermined() {
return Stream.of(
Arguments.of("encoding-utf-8-with-header.bib", true),
Arguments.of("encoding-utf-8-without-header.bib", false),
Arguments.of("encoding-utf-16BE-with-header.bib", true),
Arguments.of("encoding-utf-16BE-without-header.bib", false)
);
}

@Test
public void encodingSupplied() throws Exception {
@ParameterizedTest
@MethodSource
public void encodingExplicitlySuppliedCorrectlyDetermined(String filename, boolean encodingExplicitlySupplied) throws Exception {
ParserResult parserResult = importer.importDatabase(
Path.of(BibtexImporterTest.class.getResource("encoding-utf-8-with-header.bib").toURI()));
assertTrue(parserResult.getMetaData().getEncodingExplicitlySupplied());
Path.of(BibtexImporterTest.class.getResource(filename).toURI()));
assertEquals(encodingExplicitlySupplied, parserResult.getMetaData().getEncodingExplicitlySupplied());
}

@Test
Expand Down

0 comments on commit 8d1bb00

Please sign in to comment.