-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetching citations from Crossref using REST API #139
Closed
Dominic-DallOsto
wants to merge
15
commits into
diegodlh:master
from
Dominic-DallOsto:Dominic-DallOsto/issue43
Closed
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
81bb8a4
Fetching citations from Crossref using REST API
Dominic-DallOsto 6f76d9c
Cleaned up code for getting Crossref references
Dominic-DallOsto c8af711
Added localisable strings
Dominic-DallOsto dfbd5c9
Simplify converting new items to citations
Dominic-DallOsto cd7290e
Move functionality to `Crossref` class
Dominic-DallOsto 6b9fde5
Fetching citations from crossref seems to be working
Dominic-DallOsto b797073
Update workflow for getting Crossref references
Dominic-DallOsto 4d6f0fe
Handle the case where none of the items with DOIs have references on …
Dominic-DallOsto ed795bd
Localise Crossref messages
Dominic-DallOsto 64bb696
Fix bug when no references were found
Dominic-DallOsto c341e09
Updates to code structure from feedback
Dominic-DallOsto 8ba4abc
Change for loops to map or forEach
Dominic-DallOsto c4969d6
Remove version change
Dominic-DallOsto 731b03c
Add comment about 429 responses
Dominic-DallOsto 5cb7683
Catch 429 errors when getting item reference lists
Dominic-DallOsto File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,250 @@ | ||
import Wikicite from './wikicite'; | ||
|
||
import Wikicite, { debug } from './wikicite'; | ||
import Citation from './citation'; | ||
import Progress from './progress'; | ||
/* global Services */ | ||
/* global Zotero */ | ||
/* global window */ | ||
|
||
export default class Crossref{ | ||
static getCitations() { | ||
Services.prompt.alert( | ||
|
||
/** | ||
* Get source item citations from CrossRef. | ||
* @param {SourceItemWrapper[]} sourceItems - One or more source items to get citations for. | ||
*/ | ||
static async addCrossrefCitationsToItems(sourceItems){ | ||
|
||
// Make sure that at least some of the source items have DOIs | ||
const sourceItemsWithDOI = sourceItems.filter((sourceItem) => sourceItem.getPID('DOI')); | ||
if (sourceItemsWithDOI.length == 0){ | ||
Services.prompt.alert( | ||
window, | ||
Wikicite.getString('wikicite.crossref.get-citations.no-doi-title'), | ||
Wikicite.getString('wikicite.crossref.get-citations.no-doi-message') | ||
); | ||
return; | ||
} | ||
|
||
// Get reference information for items from CrossRef | ||
const progress = new Progress( | ||
'loading', | ||
Wikicite.getString('wikicite.crossref.get-citations.loading') | ||
); | ||
|
||
let sourceItemReferences; | ||
try{ | ||
sourceItemReferences = await Promise.all( | ||
sourceItemsWithDOI.map(async (sourceItem) => await Crossref.getReferences(sourceItem.doi)) | ||
); | ||
} | ||
catch (error){ | ||
progress.updateLine( | ||
'error', | ||
Wikicite.getString('wikicite.crossref.get-citations.error-getting-references') | ||
); | ||
debug(error); | ||
return; | ||
} | ||
|
||
// Confirm with the user to add these citations | ||
const numberOfCitations = sourceItemReferences.map((references) => references.length); | ||
const itemsToBeUpdated = numberOfCitations.filter((number) => number > 0).length; | ||
const citationsToBeAdded = numberOfCitations.reduce((sum, value) => sum + value, 0); | ||
if (citationsToBeAdded == 0){ | ||
progress.updateLine( | ||
'error', | ||
Wikicite.getString('wikicite.crossref.get-citations.no-references') | ||
); | ||
return; | ||
} | ||
const confirmed = Services.prompt.confirm( | ||
window, | ||
Wikicite.getString('wikicite.global.unsupported'), | ||
Wikicite.getString('wikicite.crossref.get-citations.unsupported') | ||
Wikicite.getString('wikicite.crossref.get-citations.confirm-title'), | ||
Wikicite.formatString('wikicite.crossref.get-citations.confirm-message', [itemsToBeUpdated, sourceItems.length, citationsToBeAdded]) | ||
) | ||
if (!confirmed){ | ||
progress.close(); | ||
return; | ||
} | ||
|
||
// Parse this reference information, then add to sourceItems | ||
progress.updateLine( | ||
'loading', | ||
Wikicite.getString('wikicite.crossref.get-citations.parsing') | ||
); | ||
|
||
try { | ||
let parsedItems = 0; | ||
const parsedItemReferences = await Promise.all(sourceItemReferences.map(async (sourceItemReferenceList) => { | ||
if (!sourceItemReferenceList.length) | ||
return []; | ||
|
||
const parsedReferences = await Crossref.parseReferences(sourceItemReferenceList); | ||
progress.updateLine( | ||
'loading', | ||
Wikicite.formatString('wikicite.crossref.get-citations.parsing-progress', [++parsedItems, itemsToBeUpdated]) | ||
); | ||
return parsedReferences; | ||
})); | ||
|
||
// Add these citations to the items | ||
await Zotero.DB.executeTransaction(async function() { | ||
sourceItemsWithDOI.forEach((sourceItem, index) => { | ||
const newCitedItems = parsedItemReferences[index]; | ||
if (newCitedItems.length > 0){ | ||
const newCitations = newCitedItems.map((newItem) => new Citation({item: newItem, ocis: []}, sourceItem)); | ||
sourceItem.addCitations(newCitations); | ||
} | ||
}); | ||
}); | ||
progress.updateLine( | ||
'done', | ||
Wikicite.getString('wikicite.crossref.get-citations.done') | ||
); | ||
} | ||
catch (error){ | ||
progress.updateLine( | ||
'error', | ||
Wikicite.getString('wikicite.crossref.get-citations.error-parsing-references') | ||
); | ||
debug(error); | ||
} | ||
finally { | ||
progress.close(); | ||
} | ||
} | ||
|
||
/** | ||
* Get a list of references from Crossref for an item with a certain DOI. | ||
* Returned in JSON Crossref format. | ||
* @param {string} doi - DOI for the item for which to get references. | ||
* @returns {Promise<string[]>} list of references, or [] if none. | ||
*/ | ||
static async getReferences(doi) { | ||
const url = `https://api.crossref.org/works/${Zotero.Utilities.cleanDOI( | ||
doi | ||
)}`; | ||
const options = { | ||
headers: { | ||
"User-Agent": `${Wikicite.getUserAgent()} mailto:cita@duck.com`, | ||
}, | ||
responseType: "json", | ||
}; | ||
|
||
const response = await Zotero.HTTP.request("GET", url, options).catch((e) => { | ||
debug(`Couldn't access URL: ${url}. Got status ${e.status}.`); | ||
if (e.status == 429) { | ||
throw new Error("Received a 429 rate limit response from Crossref (https://github.com/CrossRef/rest-api-doc#rate-limits). Try getting references for fewer items at a time.") | ||
} | ||
}); | ||
if (!response) return []; | ||
|
||
return response.response.message.reference || []; | ||
} | ||
|
||
/** | ||
* Parse a list of references in JSON Crossref format. | ||
* @param {string[]} crossrefReferences - Array of Crossref references to parse to Zotero items. | ||
* @returns {Promise<Zotero.Item[]>} Zotero items parsed from references (where parsing is possible). | ||
*/ | ||
static async parseReferences(crossrefReferences) { | ||
if (!crossrefReferences.length) { | ||
debug("Item found in Crossref but doesn't contain any references"); | ||
return []; | ||
} | ||
|
||
const parsedReferences = await Zotero.Promise.allSettled( | ||
crossrefReferences.map(async (crossrefItem) => { | ||
if (crossrefItem.DOI) | ||
return this.getItemFromIdentifier({ DOI: crossrefItem.DOI }); | ||
|
||
if (crossrefItem.isbn) | ||
return this.getItemFromIdentifier({ ISBN: crossrefItem.ISBN }); | ||
|
||
return this.parseItemFromCrossrefReference(crossrefItem); | ||
}) | ||
); | ||
} | ||
return parsedReferences | ||
.map((reference) => reference.value || null) | ||
.filter(Boolean); | ||
} | ||
|
||
/** | ||
* Get a Zotero Item from a valid Zotero identifier - includes DOI, ISBN, PMID, ArXiv ID, and more. | ||
* @param {{string: string}} identifier - A reference item in JSON Crossref format. | ||
* @returns {Promise<Zotero.Item | null>} Zotero item parsed from the identifier, or null if parsing failed. | ||
*/ | ||
static async getItemFromIdentifier(identifier){ | ||
await Zotero.Schema.schemaUpdatePromise; | ||
let translation = new Zotero.Translate.Search(); | ||
translation.setIdentifier(identifier); | ||
|
||
let jsonItems; | ||
try { | ||
// set libraryID to false so we don't save this item in the Zotero library | ||
jsonItems = await translation.translate({libraryID: false}); | ||
} catch { | ||
debug(`No items returned for identifier ${identifier}`); | ||
// We could get a 429 error inside the translation if we make too many | ||
// requests to Crossref too quickly. Would need to be fixed in Zotero... | ||
// I don't think we can identify this here unfortunately... | ||
// (See https://forums.zotero.org/discussion/84985/new-odd-result-when-importing-pmids-with-magic-wand) | ||
} | ||
} | ||
|
||
if (jsonItems) { | ||
const jsonItem = jsonItems[0]; | ||
// delete irrelevant fields to avoid warnings in Item#fromJSON | ||
delete jsonItem['notes']; | ||
delete jsonItem['seeAlso']; | ||
delete jsonItem['attachments']; | ||
|
||
static getDOI() { | ||
const newItem = new Zotero.Item(jsonItem.itemType); | ||
newItem.fromJSON(jsonItem); | ||
return newItem; | ||
} | ||
else{ | ||
return null; | ||
} | ||
} | ||
|
||
} | ||
} | ||
/** | ||
* Get a Zotero Item from a Crossref reference item that doesn't include an identifier. | ||
* @param {string} crossrefItem - A reference item in JSON Crossref format. | ||
* @returns {Promise<Zotero.Item>} Zotero item parsed from the identifier, or null if parsing failed. | ||
*/ | ||
static parseItemFromCrossrefReference(crossrefItem){ | ||
let jsonItem = {}; | ||
if (crossrefItem['journal-title']){ | ||
jsonItem.itemType = 'journalArticle'; | ||
jsonItem.title = crossrefItem['article-title'] || crossrefItem['volume-title']; | ||
} | ||
else if(crossrefItem['volume-title']){ | ||
jsonItem.itemType = 'book'; | ||
jsonItem.title = crossrefItem['volume-title']; | ||
} | ||
else if(crossrefItem.unstructured){ | ||
// todo: Implement reference text parsing here | ||
return Promise.reject("Couldn't parse Crossref reference - unstructured references are not yet supported. " + JSON.stringify(crossrefItem)); | ||
} | ||
else{ | ||
return Promise.reject("Couldn't determine type of Crossref reference - doesn't contain `journal-title` or `volume-title` field. " + JSON.stringify(crossrefItem)); | ||
} | ||
jsonItem.date = crossrefItem.year; | ||
jsonItem.pages = crossrefItem['first-page']; | ||
jsonItem.volume = crossrefItem.volume; | ||
jsonItem.issue = crossrefItem.issue; | ||
jsonItem.creators = [{ | ||
'creatorType': 'author', | ||
'name': crossrefItem.author | ||
}]; | ||
// remove undefined properties | ||
for (let key in jsonItem){ | ||
if(jsonItem[key] === undefined){ | ||
delete jsonItem[key]; | ||
} | ||
} | ||
const newItem = new Zotero.Item(jsonItem.itemType); | ||
newItem.fromJSON(jsonItem); | ||
return Promise.resolve(newItem); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
Promise.all
the right way to combine these promises?When any one is rejected, the whole thing will fail, and we'll return.
This means that the confirmation message later will always say references have been found for 100% of items.
I suspect what you want here is
Promise.allSettled
; then you'll need to filter the resulting (settled) promises to include only thefulfilled
promises, for further processing. I suspect you'll want to print warnings for rejected promises.However, if there's no rate-limiting in
Zotero.HTTP.request
(which seems to be the case),Promise.allSettled
will fire all the requests at once, which we don't want to do. We should rate limit, appropriately. The simplest way is to use support for limiting concurrent requests from a library, but it isn't that complicated to write rate limit code. I don't think this should block this PR from being merged but it should block a new release.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention in this case was that
getReferences
would handle all errors, so these Promises should always be fulfilled, either with a list of references or []. An error here would be unexpected, and in this case we should stop and return early?I don't quite get what you mean here? For the items where we didn't find any references or an error occurred getting the list, the entry in
sourceItemReferences
will be[]
, so it will be filtered out ofitemsToBeUpdated
. Or am I missing something?Yeah, the rate limiting should be added, but pragmatically it doesn't seem the biggest issue at the moment. We only fire one request per item (which normally would just be 1 or a few). I tested with 130 items at once and didn't get any errors. Like you say, it'd probably make sense to focus on it after everything else is working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I made a mistake — I missed that it was a call to
sourceItems.length
, and you're filteringsourceItems
at the top of the function. Here's some of why:That makes sense, though I don't agree with your design choice. Here's why: you're implementing your own error handling logic, rather than using the logic provided by the language, and I don't see how it adds value (whereas I do see how it adds complexity). There is already a way to return early from the method in an (unanticipated) error case — don't catch the error, here. And rather than returning
[]
for fetch failures, you canPromise.reject
.Here's an alternative:
getReferences
(now much clearer — thank you) should return aPromise
with the (possibly empty) list of references. If the request fails it shouldPromise.reject
. In the highlighted snippet, withPromise.allSettled
, only the fulfilled promises are further processed. Any unanticipated error causes the whole method to fail; this needs handling somewhere (but not in the method itself).As I've said before, my goal here is to make review as easy as possible for the maintainer so your work can be merged. Every added layer of indirection makes the code more complicated to review (and maintain). It's often worth it, and often not. I think adding an implicit error handling layer on top of the one in the language is not worth it, here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I agree with what you're saying. I just don't quite see how it would fit perfectly here, but please suggest something I might be missing.
getReferences
can innocently return an empty list if there are no references, it seems a bit weird toreject
in this case? Or do you mean only for when an error occurs getting the references? But then I don't see the benefit - ifgetReferences
wouldreject
, we'd do the same as if it returned an empty list anyway?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter.
The benefit is that it is clear what is happening. It is obvious that there are error cases, and that the code is handling them.
That is the correct understanding, and it is just one option.
OK, this is good. The code needs to be clear. It also ought to present users with helpful error messages. I think the complication comes because the logic for fetching references is interspersed with the logic for showing the UI.
I'll approach this in a separate message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incidentally, are you missing a
progress.close()
before thereturn
?Also how does this whole plugin work: can I do anything else with zotero while the items are being fetched?
My less structured thoughts are below the break — I'll have to come back to this.
The simple change is to scrap the error message.
Failed to get citations for item(s) from Crossref
is not that helpful to users. Run the whole method in atry
/catch
/finally
block and thenprogress.close()
once, in thefinally
.I've missed something, in my review, that I should have highlighted earlier: handling the data processing and the user interaction together complicates this code.
I fear the right solution might be uniodiomatic for Zotero: to separate the code that deals with the
progress
from the code doing the fetching. I'd normally solve this in javascript withaddEventListener
anddispatchEvent
. I see examples of the former in this repository, but none of the latter — I presume they work with zotero but I don't know the conventions.The real challenge I have concerns the need for user interaction.
Services.prompt.confirm
is called synchronously? So execution of the whole thread is just suspended until it returns? That… that is not how I would expect things to work. Is this the whole program thread? Plugin thread? I… don't like this, at all. Plausible that this dip into zotero development goes no further…I'm going to have to come back to this, another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it'll be clearer if you are able to build and test the plugin. But in general yeah, everything is running independently of the main UI. So you can start fetching citations from Crossref then go about doing other stuff in Zotero and the progress box will keep you updated with what's going on. That's why I think it's fairly important - the progress bar is your only way as a user to know your command is still running / what it's doing / when it might finish.
I agree it's a bit ugly having all these progress messages in the code, but that's how the rest of the plugin is structured. Would replacing the progress bar updates with
dispatchEvent
make a big practical difference?As you point out -
Services.prompt.alert
andServices.prompt.confirm
are blocking - for when we can't proceeed without user interaction. Here both of these occur more or less towards the start of the method, so it shouldn't be too jarring for the user to need to provide some input given they've just clicked a button to do something (rather than this popping up 2 minutes later).