-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly import LCCNs #2865
Correctly import LCCNs #2865
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we've already invested all the work that we could have saved, but would it make sense to switch to PyMARC rather than continuing to maintain our own parser?
For the deprecation warnings, I think we should use the Deprecated
module and its @deprecated
decorator rather than rolling our own.
I presume the large blocks of "new" code are actually just copied from somewhere else, so didn't review them closely. I didn't run tests locally since they passed on CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooops, thought I was just approving the new commit, not the entire branch.
I still think it would be an improvement to use the @deprecated
decorator.
fd4407f
to
1fd0246
Compare
This is the UI component of the LCCN fix so that LCCNs with spaces link to the correct lccn permalink. Example item: https://openlibrary.org/books/OL23558498M/On_the_quiet
1fd0246
to
2a0ea28
Compare
@tfmorris regarding PyMARC, I have tried to simply refactor the code that is currently in use, identify where it is by removing the most glaring duplication, and deprecate duplicate code paths that probably shouldn't be used. Using replacing the current code with PyMARC should be another effort, if we require it. From what I have seen from this refactor, various imports, and from working with PyMARC separately, I think OL handles more cases than PyMARC, which is a problem in itself, and the cause of #2877 This code does use PyMARC for decoding MARC8, which is good. openlibrary/openlibrary/catalog/marc/marc_binary.py Lines 5 to 11 in b5aa71e
I'm not sure how widely these mnemonics are used: https://github.com/internetarchive/openlibrary/blob/e8d48561a90c3da5ab9a13d09c4117a10a2187c5/openlibrary/catalog/marc/mnemonics.py , but if PyMARC can parse those, we should replace it. I also don't know where 'wrapped line' MARC records are described -- PyMARC does not support it. I'm not sure if it is an official field wrapping convention, but OL does support field continuations that end in openlibrary/openlibrary/catalog/marc/marc_binary.py Lines 22 to 43 in e8d4856
|
@mekarpeles This PR will require a new pip install on production servers:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't run tests, but I reviewed all the code and didn't spot any problems by inspection.
Thanks for making the suggested changes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, should be blocked on @mekarpeles for installing deprecated
module on test + prod
|
e.g. http://lccn.loc.gov/64-11739
is the same as http://lccn.loc.gov/64011739
but the canonical LCCN listed on each is
64011739
. OL will now store64011739
to aid comparisonsIt does not strip spaces,
http://lccn.loc.gov/ca34001802
shows the LCCN with a space:
ca 34001802
The input for this example is
ca 34-1802
After this PR, we will store the LCCN as
ca 34001802
prior to this PR, we were reading it as
38
-- which is very broken.Technical
There are far too many duplicate functions, I am putting clear deprecation notices in the code, and only deleting code when I am sure it is not being used.
This has turned into a pretty big refactor of the MARC code -- the first problem I had was figuring out which of the multiple
lccn
methods was being used and needed to be fixed. The refactor helps label the deprecated MARC code, and moves a lot of the used code into the correct place, and ensures the code that is used is the code under test. The refactoring is not complete, but the work needed to fix the LCCN issue is complete.Testing
I'll do some live import tests locally to confirm none of the deprecated modules are being used by the MARC import paths.
Evidence
Stakeholders