Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug encode-csv with two value csv #494

Open
TobiasNx opened this issue Jul 17, 2023 · 2 comments
Open

Bug encode-csv with two value csv #494

TobiasNx opened this issue Jul 17, 2023 · 2 comments

Comments

@TobiasNx
Copy link
Contributor

In my example here:
TobiasNx/metafacture_workflows@16308bc

The outputted csv seems to have sometimes mixed up the columns.
This seems to be due to order of the incoming stream:

Hochschulbibliothek Pforzheim, Bereichsbibliothek Technik und Wirtschaft	http://lobid.org/organisations/DE-951#!
http://lobid.org/organisations/DE-1a#!	Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Potsdamer Straße
Hochschularchiv der ETH Zürich	http://lobid.org/organisations/CH-001807-7#!
Heimatgeschichtliches Museum Modautal	http://lobid.org/organisations/DE-MUS-265910#!
Museum Johannes Reuchlin MJR	http://lobid.org/organisations/DE-MUS-492617#!

If I output the json, the issue seem to be created by a variation in the output order:

{
  "name" : "früher: Frankfurt/Main; Institut für Rechtsgeschichte, Bibliothek",
  "id" : "http://lobid.org/organisations/DE-30-163#!"
}
{
  "name" : "Hochschulbibliothek Pforzheim, Bereichsbibliothek Technik und Wirtschaft",
  "id" : "http://lobid.org/organisations/DE-951#!"
}
{
  "id" : "http://lobid.org/organisations/DE-1a#!",
  "name" : "Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Potsdamer Straße"
}
{
  "name" : "Hochschularchiv der ETH Zürich",
  "id" : "http://lobid.org/organisations/CH-001807-7#!"
}
{
  "name" : "Heimatgeschichtliches Museum Modautal",
  "id" : "http://lobid.org/organisations/DE-MUS-265910#!"
}
@blackwinter
Copy link
Member

Currently, the CSV encoder writes literals (values) as they come in, without giving any regard to their names. Hence, if the input order is unstable, the output will be inconsistent.

A potential solution might be to write values in the order they were first received, which is also the order of the column headers. But this will get somewhat complicated when also taking repeated fields into account.

@TobiasNx
Copy link
Contributor Author

Task: map incoming data to header order, add new row in header, if element does not exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants