-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetching citations from Crossref using REST API #139
Fetching citations from Crossref using REST API #139
Conversation
Seems to be working nicely now. A couple of questions though:
|
Hi, incredible work on the extension. Thank you so much for it! I was wondering if this particular feature has seen any progress? It would, in my opinion (or, tbh, for my work), be the most useful addition, since CrossRef is such a big part of the infrastructure and has so much citation data. Again, thank you, I know I'm just clamoring for features on pro bono work 🙈. |
Hi, @urspx! @Dominic-DallOsto has probably done a great work to implement this, but unfortunately I haven't had the time to check it yet. I'm slowly starting to catch up on all his contributions. Hopefully I'll manage to review this one soon. In the meantime, maybe he has a beta XPI for you to give it a try? |
Amazing, thank you! Looking forward :) I'm happy to test / report bugs where possible to at least contribute something useful (and, once my schedule clears up a bit, perhaps even help with documentation – I know this is off topic, though). |
Thank you, @urspx! All bug reports and enhancement requests are welcome. We'll try to address them as fast as we can. Regarding testing, we have in mind opening a parallel beta channel to help us test our latest changes. We'll post news about this here: #161 And better documentation would definitely be AMAZING. You can contribute whatever (text, screenshots, video) here: https://www.wikidata.org/wiki/Wikidata:Zotero/Cita#Documentation |
I just checked out the branch, built an xpi and tested it and got "undefined" |
I tried improving this pull request with the code above, but for some reason I neither get an error nor an alert with the DOI. 🤷♂️ |
Thanks, I found the issue there. I'm in the middle of fixing it at the moment. But it raises the more general question of how to deal with batch actions in the overlay. I guess we should have a different progress message for the batch actions? And would it be best to just call the source item wrapper functions to avoid code duplication? Or to reimplement it for a batch at once? |
Nice! I look forward to seeing the solution. |
Yes, the other functions seem to take "items" as argument and process them. I think it would be a good idea to be consistent. I don't know how the notification system works currently, but we could limit the functionality for now to only support fetching 1 item from crossref at a time. That would work around the issue, but probably lead to user frustration. |
c8340cf
to
47dd543
Compare
47dd543
to
128fa83
Compare
Hi, why the PR is not merged on master ? |
@Dominic-DallOsto Is there a way to easily debug with an IDE like vscode ? |
Did you find a way to get the debugging in VSCode to work? I tried once before but didn't manage.
The only issue is here Line 44 in 128fa83
|
No, I didn't find a way. I debugged this part of your code (manually with the JS console of Zotero) and I found (but I might be wrong) that isn't defined anywhere. |
128fa83
to
cd7290e
Compare
Ok - I finally got things to a more or less working level. The popups aren't yet properly translated, but I hope the functionality is error-free. You can add citations from Crossref by right clicking the item(s), or from the citations menu for an item. The performance isn't amazing for large numbers of citations, because it currently gets each individually by DOI from Crossref. Any feedback is more than welcome, thanks! @urspx @dpriskorn @Futur3r Here's a debug build (you'll have to unzip it - Github won't let me upload an xpi here). zotero-cita-v0.6.0-beta-crossref.xpi.zip |
Hello, great to see this is working! I've been following this PR (from a distance) for a while, so it's great to see some progress. I had a question and some comments.
Lines 166 to 180 in 0508d84
For one example of the latter two points, this method, static async getCitations(doi) {
const JSONResponse = await Crossref.getDOI(doi);
if (JSONResponse){
const references = JSONResponse.message.reference
if (references && references.length > 0){
const parsedReferences = await Promise.all(references.map(async (reference) => {
const parsedReference = await Crossref.parseReferenceItem(reference);
return parsedReference;
}));
return parsedReferences.filter(Boolean);
}
else{
debug("Item found in Crossref but doesn't contain any references");
}
}
return [];
} could be something like static async getCitations(doi) {
const JSONResponse = await Crossref.getDOI(doi);
if (!JSONResponse) return [];
const references = JSONResponse.message.reference;
if (!references.length) {
debug("Item found in Crossref but doesn't contain any references");
return [];
}
const parsedReferences = await Promise.all(
references.map((reference) => Crossref.parseReferenceItem(reference))
);
return parsedReferences.filter(Boolean);
} |
1. Get list of references from Crossref for items - show how many to the user 2. Parse the references source item by source item, indicating progress because it can take a long time 3. Add references for all source items
Thanks for the detailed feedback!
It's here. From what I can tell it handles retrying requests if they fail, but not Crossref "you got rate limited" responses. But getting the reference list from Crossref occurs at most once per item. Most of the calls to Crossref actually happen when parsing the references, which is handled by Zotero's Crossref translator. If we start to experience rate limiting errors, I think the solution would have to be implemented there. I finally got around to adding the user agent string, thanks!
Sorry about the tabs/space mixture - not sure how that slipped through. The main logic is a bit longer / more verbose now (particularly with all the progress messages), but I hope at least clear enough to follow? In particular, I added a progress message saying how many items have successfully had their references parsed, because this tends to be quite slow if items have 50+ references. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! With that, I've taken a look at this, and it's great! I do understand what is happening, though I haven't studied all the underlying zotero APIs. So, for example, I presume the Services.prompt.confirm
function is synchronous (this would explain some annoying Zotero behaviour).
I figured this PR has been waiting long enough that a review with some feedback might help get this merged… so feel free to ignore what I have to say, or do whatever you want, but I'm hoping with some improvements we can get this merged and then the feature will be available.
OK from a high level, this seems like a very sensible approach (and it is very annoying that crossref doesn't have a batch API). The division of methods mostly makes sense (though it's clearer if you merge some of them). I didn't spot any minor correctness issues, but I also didn't run the code (need to figure out how to do that as the zotero javascript console did not work as I'd thought).
There is one major issue I spotted, and it's the implicit idea that Promise
s will always be fulfilled. Given the delays that you have identified, and the potential for transient network failures (made more likely due to crossref rate limiting), it does not seem right to assume all promises will be fulfilled. I've assumed this is not an explicit design decision, as you have a code path that assumes otherwise and thus cannot be reached (I've highlighted this in the review).
The fix needs a design decision: what happens when certain Promises
reject?
- Fetching references from upstream (you do handle this, but in a more complicated way than necessary)
- Parsing references
- Adding citations to the database
There is value to failing early; however I think the user experience would be so much better to process citations that are successfully identified, even if there are failures.
Additionally, I think this needs some rate-limiting code, but it should probably be in a subsequent PR (before the next release).
let sourceItemReferences; | ||
try{ | ||
sourceItemReferences = await Promise.all( | ||
sourceItemsWithDOI.map(async (sourceItem) => await Crossref.getReferences(sourceItem.doi)) | ||
); | ||
} | ||
catch (error){ | ||
progress.updateLine( | ||
'error', | ||
Wikicite.getString('wikicite.crossref.get-citations.error-getting-references') | ||
); | ||
debug(error); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Promise.all
the right way to combine these promises?
When any one is rejected, the whole thing will fail, and we'll return.
This means that the confirmation message later will always say references have been found for 100% of items.
wikicite.crossref.get-citations.confirm-message =
Found references for %s out of %s item(s).\n%s citation(s) will be parsed.
I suspect what you want here is Promise.allSettled
; then you'll need to filter the resulting (settled) promises to include only the fulfilled
promises, for further processing. I suspect you'll want to print warnings for rejected promises.
However, if there's no rate-limiting in Zotero.HTTP.request
(which seems to be the case), Promise.allSettled
will fire all the requests at once, which we don't want to do. We should rate limit, appropriately. The simplest way is to use support for limiting concurrent requests from a library, but it isn't that complicated to write rate limit code. I don't think this should block this PR from being merged but it should block a new release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention in this case was that getReferences
would handle all errors, so these Promises should always be fulfilled, either with a list of references or []. An error here would be unexpected, and in this case we should stop and return early?
This means that the confirmation message later will always say references have been found for 100% of items.
I don't quite get what you mean here? For the items where we didn't find any references or an error occurred getting the list, the entry in sourceItemReferences
will be []
, so it will be filtered out of itemsToBeUpdated
. Or am I missing something?
Yeah, the rate limiting should be added, but pragmatically it doesn't seem the biggest issue at the moment. We only fire one request per item (which normally would just be 1 or a few). I tested with 130 items at once and didn't get any errors. Like you say, it'd probably make sense to focus on it after everything else is working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that the confirmation message later will always say references have been found for 100% of items.
I don't quite get what you mean here? For the items where we didn't find any references or an error occurred getting the list, the entry in
sourceItemReferences
will be[]
, so it will be filtered out ofitemsToBeUpdated
. Or am I missing something?
No, I made a mistake — I missed that it was a call to sourceItems.length
, and you're filtering sourceItems
at the top of the function. Here's some of why:
My intention in this case was that
getReferences
would handle all errors, so these Promises should always be fulfilled, either with a list of references or []. An error here would be unexpected, and in this case we should stop and return early?
That makes sense, though I don't agree with your design choice. Here's why: you're implementing your own error handling logic, rather than using the logic provided by the language, and I don't see how it adds value (whereas I do see how it adds complexity). There is already a way to return early from the method in an (unanticipated) error case — don't catch the error, here. And rather than returning []
for fetch failures, you can Promise.reject
.
Here's an alternative: getReferences
(now much clearer — thank you) should return a Promise
with the (possibly empty) list of references. If the request fails it should Promise.reject
. In the highlighted snippet, with Promise.allSettled
, only the fulfilled promises are further processed. Any unanticipated error causes the whole method to fail; this needs handling somewhere (but not in the method itself).
As I've said before, my goal here is to make review as easy as possible for the maintainer so your work can be merged. Every added layer of indirection makes the code more complicated to review (and maintain). It's often worth it, and often not. I think adding an implicit error handling layer on top of the one in the language is not worth it, here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I agree with what you're saying. I just don't quite see how it would fit perfectly here, but please suggest something I might be missing.
- because
getReferences
can innocently return an empty list if there are no references, it seems a bit weird toreject
in this case? Or do you mean only for when an error occurs getting the references? But then I don't see the benefit - ifgetReferences
wouldreject
, we'd do the same as if it returned an empty list anyway? - If I understand you correctly, basically you're saying to remove the try/catch here and let an error naturally bubble up and stop execution? While that would make the code much clearer, I think the progress bar error message to a user is somewhat helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- because
getReferences
can innocently return an empty list if there are no references, it seems a bit weird toreject
in this case? Or do you mean only for when an error occurs getting the references?
The latter.
But then I don't see the benefit - if
getReferences
wouldreject
, we'd do the same as if it returned an empty list anyway?
The benefit is that it is clear what is happening. It is obvious that there are error cases, and that the code is handling them.
- If I understand you correctly, basically you're saying to remove the try/catch here and let an error naturally bubble up and stop execution?
That is the correct understanding, and it is just one option.
While that would make the code much clearer, I think the progress bar error message to a user is somewhat helpful?
OK, this is good. The code needs to be clear. It also ought to present users with helpful error messages. I think the complication comes because the logic for fetching references is interspersed with the logic for showing the UI.
I'll approach this in a separate message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While that would make the code much clearer, I think the progress bar error message to a user is somewhat helpful?
Incidentally, are you missing a progress.close()
before the return
?
Also how does this whole plugin work: can I do anything else with zotero while the items are being fetched?
My less structured thoughts are below the break — I'll have to come back to this.
The simple change is to scrap the error message. Failed to get citations for item(s) from Crossref
is not that helpful to users. Run the whole method in a try
/catch
/finally
block and then progress.close()
once, in the finally
.
I've missed something, in my review, that I should have highlighted earlier: handling the data processing and the user interaction together complicates this code.
I fear the right solution might be uniodiomatic for Zotero: to separate the code that deals with the progress
from the code doing the fetching. I'd normally solve this in javascript with addEventListener
and dispatchEvent
. I see examples of the former in this repository, but none of the latter — I presume they work with zotero but I don't know the conventions.
The real challenge I have concerns the need for user interaction. Services.prompt.confirm
is called synchronously? So execution of the whole thread is just suspended until it returns? That… that is not how I would expect things to work. Is this the whole program thread? Plugin thread? I… don't like this, at all. Plausible that this dip into zotero development goes no further…
I'm going to have to come back to this, another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it'll be clearer if you are able to build and test the plugin. But in general yeah, everything is running independently of the main UI. So you can start fetching citations from Crossref then go about doing other stuff in Zotero and the progress box will keep you updated with what's going on. That's why I think it's fairly important - the progress bar is your only way as a user to know your command is still running / what it's doing / when it might finish.
I agree it's a bit ugly having all these progress messages in the code, but that's how the rest of the plugin is structured. Would replacing the progress bar updates with dispatchEvent
make a big practical difference?
As you point out - Services.prompt.alert
and Services.prompt.confirm
are blocking - for when we can't proceeed without user interaction. Here both of these occur more or less towards the start of the method, so it shouldn't be too jarring for the user to need to provide some input given they've just clicked a button to do something (rather than this popping up 2 minutes later).
Thanks again for the detailed feedback! Most of those changes make sense - otherwise I left a few comments. Like you say, adding references despite an error would be a nicer user experience, particularly if running this on a large number of items. |
Glad you've found it helpful. Happy to keep up the pace of development on this to get this merged. I'll figure out how to build the plugin and give it a spin, over the weekend. (Edit: been delayed on this — I'll give this a go in the week.) |
Are you following dev:client coding:plugin development [Zotero Documentation] to test the plugin? |
Yeah. The steps here are hopefully easy enough to follow to get it up and running |
I did a quick test using the bottleneck library to rate limit requests with a minimum of 20 ms between each request. Actually this made the requests both slower and less reliable. I don't know if |
Nice — good job.
That's what I'd expect: it's how the library is designed, no? The docs for There's no rate limiting in So let's not bother with rate limiting, but we should respond to 429 requests by backing off. Seems reasonable? |
Sorry, that wasn't super clear. With the requests being sent out over a period of time I was referring to even when using If I enable the rate limiting, the time for all the requests to be fulfilled increases (fair enough), but a number of requests also fail to get a response. I guess in the ideal case Zotero would add some 429 handling upstream and we wouldn't have to worry about it? But I'm yet to be able to actually get a 429 response despite my best efforts... |
@Dominic-DallOsto Had to step away from this for a bit — if you can add some handling for the 429 case then this will be fine (for now) and better than it was, and you should ping the maintainer to check it over. We want to be good citizens, and that means handling 429s in every version that is released. Our counterparty might turn on 429 responses in the future. We can't force upgrades. Thanks for your work on this; I think it will be a really useful feature, when merged. I'm unlikely to get back to this for at least a month — hopefully this will be merged by then! |
I dug into this a bit more and it's not really trivial to handle 429 responses where they're more likely to occur (when we use Zotero translators to parse the reference lists we get). I'm not sure whether we can even identify this case, or whether it will just appear to us as if the translator failed to find any item. I added 429 handling for when we directly access the Crossref API, but it just throws and error that is logged and caught. |
Replaced by #300 |
Fixes #43
Uses CrossRef's REST API to get citations for an item. Mostly this works well because the cited items have DOIs which we can resolve. For items without identifiers, I attempt to roughly parse as much information as I can from them. Currently I just discard unstructured citations, but often some data (title, extra authors) is present in here that isn't in the strucutured data, so parsing them would be nice once #113 is resolved. For some items like this the data is CrossRef is still quite messy.
todo: