Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching citations from Crossref using REST API #139

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
252 changes: 242 additions & 10 deletions src/crossref.js
Original file line number Diff line number Diff line change
@@ -1,18 +1,250 @@
import Wikicite from './wikicite';

import Wikicite, { debug } from './wikicite';
import Citation from './citation';
import Progress from './progress';
/* global Services */
/* global Zotero */
/* global window */

export default class Crossref{
static getCitations() {
Services.prompt.alert(

/**
* Get source item citations from CrossRef.
* @param {SourceItemWrapper[]} sourceItems - One or more source items to get citations for.
*/
static async addCrossrefCitationsToItems(sourceItems){

// Make sure that at least some of the source items have DOIs
const sourceItemsWithDOI = sourceItems.filter((sourceItem) => sourceItem.getPID('DOI'));
if (sourceItemsWithDOI.length == 0){
Services.prompt.alert(
window,
Wikicite.getString('wikicite.crossref.get-citations.no-doi-title'),
Wikicite.getString('wikicite.crossref.get-citations.no-doi-message')
);
return;
}

// Get reference information for items from CrossRef
const progress = new Progress(
'loading',
Wikicite.getString('wikicite.crossref.get-citations.loading')
);

let sourceItemReferences;
try{
sourceItemReferences = await Promise.all(
sourceItemsWithDOI.map(async (sourceItem) => await Crossref.getReferences(sourceItem.doi))
);
}
catch (error){
progress.updateLine(
'error',
Wikicite.getString('wikicite.crossref.get-citations.error-getting-references')
);
debug(error);
return;
}
Comment on lines +33 to +46
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Promise.all the right way to combine these promises?

When any one is rejected, the whole thing will fail, and we'll return.

This means that the confirmation message later will always say references have been found for 100% of items.

wikicite.crossref.get-citations.confirm-message =
  Found references for %s out of %s item(s).\n%s citation(s) will be parsed.

I suspect what you want here is Promise.allSettled; then you'll need to filter the resulting (settled) promises to include only the fulfilled promises, for further processing. I suspect you'll want to print warnings for rejected promises.

However, if there's no rate-limiting in Zotero.HTTP.request (which seems to be the case), Promise.allSettled will fire all the requests at once, which we don't want to do. We should rate limit, appropriately. The simplest way is to use support for limiting concurrent requests from a library, but it isn't that complicated to write rate limit code. I don't think this should block this PR from being merged but it should block a new release.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention in this case was that getReferences would handle all errors, so these Promises should always be fulfilled, either with a list of references or []. An error here would be unexpected, and in this case we should stop and return early?

This means that the confirmation message later will always say references have been found for 100% of items.

I don't quite get what you mean here? For the items where we didn't find any references or an error occurred getting the list, the entry in sourceItemReferences will be [], so it will be filtered out of itemsToBeUpdated. Or am I missing something?

Yeah, the rate limiting should be added, but pragmatically it doesn't seem the biggest issue at the moment. We only fire one request per item (which normally would just be 1 or a few). I tested with 130 items at once and didn't get any errors. Like you say, it'd probably make sense to focus on it after everything else is working.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that the confirmation message later will always say references have been found for 100% of items.

I don't quite get what you mean here? For the items where we didn't find any references or an error occurred getting the list, the entry in sourceItemReferences will be [], so it will be filtered out of itemsToBeUpdated. Or am I missing something?

No, I made a mistake — I missed that it was a call to sourceItems.length, and you're filtering sourceItems at the top of the function. Here's some of why:

My intention in this case was that getReferences would handle all errors, so these Promises should always be fulfilled, either with a list of references or []. An error here would be unexpected, and in this case we should stop and return early?

That makes sense, though I don't agree with your design choice. Here's why: you're implementing your own error handling logic, rather than using the logic provided by the language, and I don't see how it adds value (whereas I do see how it adds complexity). There is already a way to return early from the method in an (unanticipated) error case — don't catch the error, here. And rather than returning [] for fetch failures, you can Promise.reject.

Here's an alternative: getReferences (now much clearer — thank you) should return a Promise with the (possibly empty) list of references. If the request fails it should Promise.reject. In the highlighted snippet, with Promise.allSettled, only the fulfilled promises are further processed. Any unanticipated error causes the whole method to fail; this needs handling somewhere (but not in the method itself).

As I've said before, my goal here is to make review as easy as possible for the maintainer so your work can be merged. Every added layer of indirection makes the code more complicated to review (and maintain). It's often worth it, and often not. I think adding an implicit error handling layer on top of the one in the language is not worth it, here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I agree with what you're saying. I just don't quite see how it would fit perfectly here, but please suggest something I might be missing.

  • because getReferences can innocently return an empty list if there are no references, it seems a bit weird to reject in this case? Or do you mean only for when an error occurs getting the references? But then I don't see the benefit - if getReferences would reject, we'd do the same as if it returned an empty list anyway?
  • If I understand you correctly, basically you're saying to remove the try/catch here and let an error naturally bubble up and stop execution? While that would make the code much clearer, I think the progress bar error message to a user is somewhat helpful?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • because getReferences can innocently return an empty list if there are no references, it seems a bit weird to reject in this case? Or do you mean only for when an error occurs getting the references?

The latter.

But then I don't see the benefit - if getReferences would reject, we'd do the same as if it returned an empty list anyway?

The benefit is that it is clear what is happening. It is obvious that there are error cases, and that the code is handling them.

  • If I understand you correctly, basically you're saying to remove the try/catch here and let an error naturally bubble up and stop execution?

That is the correct understanding, and it is just one option.

While that would make the code much clearer, I think the progress bar error message to a user is somewhat helpful?

OK, this is good. The code needs to be clear. It also ought to present users with helpful error messages. I think the complication comes because the logic for fetching references is interspersed with the logic for showing the UI.

I'll approach this in a separate message.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While that would make the code much clearer, I think the progress bar error message to a user is somewhat helpful?

Incidentally, are you missing a progress.close() before the return?

Also how does this whole plugin work: can I do anything else with zotero while the items are being fetched?

My less structured thoughts are below the break — I'll have to come back to this.


The simple change is to scrap the error message. Failed to get citations for item(s) from Crossref is not that helpful to users. Run the whole method in a try/catch/finally block and then progress.close() once, in the finally.

I've missed something, in my review, that I should have highlighted earlier: handling the data processing and the user interaction together complicates this code.

I fear the right solution might be uniodiomatic for Zotero: to separate the code that deals with the progress from the code doing the fetching. I'd normally solve this in javascript with addEventListener and dispatchEvent. I see examples of the former in this repository, but none of the latter — I presume they work with zotero but I don't know the conventions.

The real challenge I have concerns the need for user interaction. Services.prompt.confirm is called synchronously? So execution of the whole thread is just suspended until it returns? That… that is not how I would expect things to work. Is this the whole program thread? Plugin thread? I… don't like this, at all. Plausible that this dip into zotero development goes no further…

I'm going to have to come back to this, another time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it'll be clearer if you are able to build and test the plugin. But in general yeah, everything is running independently of the main UI. So you can start fetching citations from Crossref then go about doing other stuff in Zotero and the progress box will keep you updated with what's going on. That's why I think it's fairly important - the progress bar is your only way as a user to know your command is still running / what it's doing / when it might finish.

I agree it's a bit ugly having all these progress messages in the code, but that's how the rest of the plugin is structured. Would replacing the progress bar updates with dispatchEvent make a big practical difference?

As you point out - Services.prompt.alert and Services.prompt.confirm are blocking - for when we can't proceeed without user interaction. Here both of these occur more or less towards the start of the method, so it shouldn't be too jarring for the user to need to provide some input given they've just clicked a button to do something (rather than this popping up 2 minutes later).


// Confirm with the user to add these citations
const numberOfCitations = sourceItemReferences.map((references) => references.length);
const itemsToBeUpdated = numberOfCitations.filter((number) => number > 0).length;
const citationsToBeAdded = numberOfCitations.reduce((sum, value) => sum + value, 0);
if (citationsToBeAdded == 0){
progress.updateLine(
'error',
Wikicite.getString('wikicite.crossref.get-citations.no-references')
);
return;
}
const confirmed = Services.prompt.confirm(
window,
Wikicite.getString('wikicite.global.unsupported'),
Wikicite.getString('wikicite.crossref.get-citations.unsupported')
Wikicite.getString('wikicite.crossref.get-citations.confirm-title'),
Wikicite.formatString('wikicite.crossref.get-citations.confirm-message', [itemsToBeUpdated, sourceItems.length, citationsToBeAdded])
)
if (!confirmed){
progress.close();
return;
}

// Parse this reference information, then add to sourceItems
progress.updateLine(
'loading',
Wikicite.getString('wikicite.crossref.get-citations.parsing')
);

try {
let parsedItems = 0;
const parsedItemReferences = await Promise.all(sourceItemReferences.map(async (sourceItemReferenceList) => {
if (!sourceItemReferenceList.length)
return [];

const parsedReferences = await Crossref.parseReferences(sourceItemReferenceList);
progress.updateLine(
'loading',
Wikicite.formatString('wikicite.crossref.get-citations.parsing-progress', [++parsedItems, itemsToBeUpdated])
);
return parsedReferences;
}));

// Add these citations to the items
await Zotero.DB.executeTransaction(async function() {
sourceItemsWithDOI.forEach((sourceItem, index) => {
const newCitedItems = parsedItemReferences[index];
if (newCitedItems.length > 0){
const newCitations = newCitedItems.map((newItem) => new Citation({item: newItem, ocis: []}, sourceItem));
sourceItem.addCitations(newCitations);
}
});
});
progress.updateLine(
'done',
Wikicite.getString('wikicite.crossref.get-citations.done')
);
}
catch (error){
progress.updateLine(
'error',
Wikicite.getString('wikicite.crossref.get-citations.error-parsing-references')
);
debug(error);
}
finally {
progress.close();
}
}

/**
* Get a list of references from Crossref for an item with a certain DOI.
* Returned in JSON Crossref format.
* @param {string} doi - DOI for the item for which to get references.
* @returns {Promise<string[]>} list of references, or [] if none.
*/
static async getReferences(doi) {
const url = `https://api.crossref.org/works/${Zotero.Utilities.cleanDOI(
doi
)}`;
const options = {
headers: {
"User-Agent": `${Wikicite.getUserAgent()} mailto:cita@duck.com`,
},
responseType: "json",
};

const response = await Zotero.HTTP.request("GET", url, options).catch((e) => {
debug(`Couldn't access URL: ${url}. Got status ${e.status}.`);
if (e.status == 429) {
throw new Error("Received a 429 rate limit response from Crossref (https://github.com/CrossRef/rest-api-doc#rate-limits). Try getting references for fewer items at a time.")
}
});
if (!response) return [];

return response.response.message.reference || [];
}

/**
* Parse a list of references in JSON Crossref format.
* @param {string[]} crossrefReferences - Array of Crossref references to parse to Zotero items.
* @returns {Promise<Zotero.Item[]>} Zotero items parsed from references (where parsing is possible).
*/
static async parseReferences(crossrefReferences) {
if (!crossrefReferences.length) {
debug("Item found in Crossref but doesn't contain any references");
return [];
}

const parsedReferences = await Zotero.Promise.allSettled(
crossrefReferences.map(async (crossrefItem) => {
if (crossrefItem.DOI)
return this.getItemFromIdentifier({ DOI: crossrefItem.DOI });

if (crossrefItem.isbn)
return this.getItemFromIdentifier({ ISBN: crossrefItem.ISBN });

return this.parseItemFromCrossrefReference(crossrefItem);
})
);
}
return parsedReferences
.map((reference) => reference.value || null)
.filter(Boolean);
}

/**
* Get a Zotero Item from a valid Zotero identifier - includes DOI, ISBN, PMID, ArXiv ID, and more.
* @param {{string: string}} identifier - A reference item in JSON Crossref format.
* @returns {Promise<Zotero.Item | null>} Zotero item parsed from the identifier, or null if parsing failed.
*/
static async getItemFromIdentifier(identifier){
await Zotero.Schema.schemaUpdatePromise;
let translation = new Zotero.Translate.Search();
translation.setIdentifier(identifier);

let jsonItems;
try {
// set libraryID to false so we don't save this item in the Zotero library
jsonItems = await translation.translate({libraryID: false});
} catch {
debug(`No items returned for identifier ${identifier}`);
// We could get a 429 error inside the translation if we make too many
// requests to Crossref too quickly. Would need to be fixed in Zotero...
// I don't think we can identify this here unfortunately...
// (See https://forums.zotero.org/discussion/84985/new-odd-result-when-importing-pmids-with-magic-wand)
}
}

if (jsonItems) {
const jsonItem = jsonItems[0];
// delete irrelevant fields to avoid warnings in Item#fromJSON
delete jsonItem['notes'];
delete jsonItem['seeAlso'];
delete jsonItem['attachments'];

static getDOI() {
const newItem = new Zotero.Item(jsonItem.itemType);
newItem.fromJSON(jsonItem);
return newItem;
}
else{
return null;
}
}

}
}
/**
* Get a Zotero Item from a Crossref reference item that doesn't include an identifier.
* @param {string} crossrefItem - A reference item in JSON Crossref format.
* @returns {Promise<Zotero.Item>} Zotero item parsed from the identifier, or null if parsing failed.
*/
static parseItemFromCrossrefReference(crossrefItem){
let jsonItem = {};
if (crossrefItem['journal-title']){
jsonItem.itemType = 'journalArticle';
jsonItem.title = crossrefItem['article-title'] || crossrefItem['volume-title'];
}
else if(crossrefItem['volume-title']){
jsonItem.itemType = 'book';
jsonItem.title = crossrefItem['volume-title'];
}
else if(crossrefItem.unstructured){
// todo: Implement reference text parsing here
return Promise.reject("Couldn't parse Crossref reference - unstructured references are not yet supported. " + JSON.stringify(crossrefItem));
}
else{
return Promise.reject("Couldn't determine type of Crossref reference - doesn't contain `journal-title` or `volume-title` field. " + JSON.stringify(crossrefItem));
}
jsonItem.date = crossrefItem.year;
jsonItem.pages = crossrefItem['first-page'];
jsonItem.volume = crossrefItem.volume;
jsonItem.issue = crossrefItem.issue;
jsonItem.creators = [{
'creatorType': 'author',
'name': crossrefItem.author
}];
// remove undefined properties
for (let key in jsonItem){
if(jsonItem[key] === undefined){
delete jsonItem[key];
}
}
const newItem = new Zotero.Item(jsonItem.itemType);
newItem.fromJSON(jsonItem);
return Promise.resolve(newItem);
}
}
16 changes: 1 addition & 15 deletions src/sourceItemWrapper.js
Original file line number Diff line number Diff line change
Expand Up @@ -496,21 +496,7 @@ class SourceItemWrapper extends ItemWrapper {
// if not, do it for all

getFromCrossref() {
Crossref.getCitations();
// fail if item doesn't have a DOI specified
// In general I would say to try and get DOI with another plugin if not available locally
// call the crossref api
// the zotero-citationcounts already calls crossref for citations. Check it
// call this.add multiple times, or provide an aray
// if citation retrieved has doi, check if citation already exists locally
// if yes, set providers.crossref to true
// decide whether to ignore citations retrieved without doi
// or if I will check if they exist already using other fields (title, etc)

// offer to automatically get QID from wikidata for target items
// using Wikidata.getQID(items)

// offer to automatically link to zotero items
Crossref.addCrossrefCitationsToItems([this]);
}

getFromOcc() {
Expand Down
10 changes: 7 additions & 3 deletions src/zoteroOverlay.jsx
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import Wikicite, { debug } from './wikicite';
import Citation from './citation';
import Citations from './citations';
import CitationsBoxContainer from './containers/citationsBoxContainer';
import Crossref from './crossref';
Expand Down Expand Up @@ -360,12 +361,15 @@ const zoteroOverlay = {
}
},

getFromCrossref: function(menuName) {
getFromCrossref: async function(menuName) {
// get items selected
// filter items with doi
// generate batch call to crossref
// only add items not available locally yet
Crossref.getCitations();
const items = await this.getSelectedItems(menuName, true);
if (items.length){
Crossref.addCrossrefCitationsToItems(items);
}
},

getFromOCC: function(menuName) {
Expand Down Expand Up @@ -861,7 +865,7 @@ const zoteroOverlay = {
doc.getElementById('citation-menu-fetch-qid').disabled = false;
doc.getElementById('citation-menu-file-export').disabled = false;
doc.getElementById('citation-menu-croci-export').disabled = !sourceItem.doi || !targetItem.doi;
doc.getElementById('citation-menu-oci-crossref').disabled = !ociSuppliers.includes('crossref');
doc.getElementById('citation-menu-oci-crossref').disabled = !sourceItem.doi || !ociSuppliers.includes('crossref');
doc.getElementById('citation-menu-oci-occ').disabled = !ociSuppliers.includes('occ');
doc.getElementById('citation-menu-oci-wikidata').disabled = !ociSuppliers.includes('wikidata');
},
Expand Down
12 changes: 11 additions & 1 deletion static/chrome/locale/en-US/wikicite.properties
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,17 @@ wikicite.citations-pane.linked.confirm.message = This citation is linked to item
wikicite.citations-pane.linked.confirm.title = Linked citation
wikicite.citations-pane.linked.confirm.unlink = Unlink
wikicite.citations-pane.more = More...
wikicite.crossref.get-citations.unsupported = Getting citations from Crossref not yet supported
wikicite.crossref.get-citations.confirm-message = Found references for %s out of %s item(s).\n%s citation(s) will be parsed.
wikicite.crossref.get-citations.confirm-title = Add citations from Crossref?
wikicite.crossref.get-citations.done = Got item citations from Crossref
wikicite.crossref.get-citations.error-getting-references = Failed to get citations for item(s) from Crossref
wikicite.crossref.get-citations.error-parsing-references = Failed to parse citations from Crossref
wikicite.crossref.get-citations.loading = Getting item citations from Crossref
wikicite.crossref.get-citations.no-references = Didn't find any Crossref citations for item(s)
wikicite.crossref.get-citations.no-doi-message = No items with a DOI provided - reference data can't be retrieved from Crossref.
wikicite.crossref.get-citations.no-doi-title = Can't get Crossref references for items without DOIs
wikicite.crossref.get-citations.parsing = Parsing item citations from Crossref
wikicite.crossref.get-citations.parsing-progress = Parsed citations for %s out of %s items
wikicite.editor.cancel = Cancel
wikicite.editor.save = Save
wikicite.editor.title = Citation Editor
Expand Down