Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

%20 space at end of Internet Archive ID is kept in "Read" url #467

Closed
mekarpeles opened this issue Apr 10, 2017 · 6 comments
Closed

%20 space at end of Internet Archive ID is kept in "Read" url #467

mekarpeles opened this issue Apr 10, 2017 · 6 comments
Assignees
Labels
Good First Issue Easy issue. Good for newcomers. [managed] metadata

Comments

@mekarpeles
Copy link
Member

When an IA ID is added with a space at the end it is stripped from the id in the "ID Numbers" section but apparently is retained in the url linked to the "Read" button. This throws and error rather than taking you to the reader. e.g.:
https://openlibrary.org/books/OL26221263M/Corolla_Sancti_Eadmundi
clicking the "Read Online" button takes you to https://www.archive.org/stream/corollasanctiead00hervuoft%20?ref=ol
Reporter: @JeffKaplan

@LeadSongDog
Copy link

@tfmorris
Copy link
Contributor

It doesn't make sense to have all downstream consumers have to fix the same problem when it can be fixed at source. The database (and import/update process) should be fixed.

@hornc
Copy link
Collaborator

hornc commented May 20, 2017

I have just bot fixed 206 of these ocaids with spaces, which are all the ones I could find from scanning the ol_dump_editions_2017-04-30.txt data dump file.
See https://openlibrary.org/recentchanges/2017/05/20?page=14#bots for the changes. I also normalised any unicode found to NFC, and fixed some Cyrillic romanisation
I think that is all the data fixed. Unless the importer is still adding these, I believe this to be resolved.

@mekarpeles
Copy link
Member Author

@hornc hornc added the metadata label Sep 18, 2017
@hornc hornc self-assigned this Sep 18, 2017
@hornc
Copy link
Collaborator

hornc commented Sep 19, 2017

Command to find ocaids with spaces in the editions data dump:
egrep 'ocaid": "[^"]*\s'

There are some recent user additions which have spaces in the ocaid. I am going to investigate adding field validation on the UI.

@hornc
Copy link
Collaborator

hornc commented Jan 11, 2018

I have checked for spaces in ocaids again, from the Dec 2017 edition dump. There are 113 ocaids which have spaces anywhere in the string (invalid by definition). unfortunately some are recent additions.

The PR above added validation to the edit page, but the Add page works independently https://github.com/internetarchive/openlibrary/blob/master/openlibrary/templates/books/add.html , and appears to have a different mechanism, so I'm not sure how reusable the approach is :( I overlooked the two ways to get the data in. It would be nice if this validation occurred in one place!

@brad2014 brad2014 added the Good First Issue Easy issue. Good for newcomers. [managed] label Apr 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue Easy issue. Good for newcomers. [managed] metadata
Projects
None yet
Development

No branches or pull requests

5 participants