-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrates Google Scholar's citation count functionality, a websocket client for JabRef and other extensions/fixes #131
Conversation
…ed + startUrl extended, package.json: missing dependencies added
…content/zotero/xpcom/data/item.js
…md extended, manifest.json extended, connector.js extended
I am looking for a very easy way to wait until all (asynchronously) fetched citation counts have been fetched and updated in the method |
… currently, the citation data is not fetched asynchronously (which probably reduces the amount of required captchas from Google Scholar to show that one is not a robot)
Fetching and transferring citation counts to JabRef works now. :) Currently, the citation data is not fetched asynchronously (which probably has the benefit, that it might reduce the amount of required captchas of Google Scholar to show that one is not a robot). |
Thanks a lot for your PR. I really like the feature and think it's a valuable addition to the JabRef eco system. However, I'm not convinced that this should be backed into the browser extension. Instead I would prefer if it would be directly implemented in JabRef. Feels like the more flexible approach that also covers other use cases (e.g. articles added by hand instead of via the browser). But I guess you have thought about this also. Why did you choose to implement it as an addition to the browser extension? |
My main reasoning was as follows:
I have also thought about the use case where citation counts of references within JabRef should be fetched or updated (see: JabRef/jabref#5849), which covers references imported from the browser extension and e.g. manually imported ones. Actually, I see this use case (triggering fetching citation counts from JabRef) as the primary one. For this case my idea was sending a request (with either one or several references) from JabRef to the JabRef-Browser-Extension, which subsequently fetches the required data and then sends the required information back to JabRef. If the user is required to solve a captcha, the user could be notified within JabRef. Probably also opening a corresponding browser tab and loading an url could be possible/sensible. The formal steps in JabRef would be:
Even if a headless web browser was used within JabRef, there would still be the problem/challenge with solving Google Scholar's "not-a-robot" checks. Maybe it is possible to permanently "hold" a browser instance for retrieving data from Google Scholar within JabRef and to show a browser window whenever some user interaction for solving a captcha is required, where the user solves it and hopefully Google Scholar will probably accept it. But..., from a bird's eye perspective, this is actually already quite similar to the current approach, right? So I thought, the existing JabRef-Browser-Extension is a good starting point to integrate this. Probably, special cases need special treatment. The cool thing with browser extensions is, that they are really customizable. I know, the current design is probably a somewhat clumsy approach, but at least it works really good and in the end guaranteeing functionality has the highest priority. In my opinion, for now this was at least the easiest way to go. |
What do you think is the proper roadmap for this situation? |
That are some really good points that you brought up. Let me think about it for a bit... I'm pretty busy at the moment, sorry. I'll get back to you soon, promised! |
Maybe relevant for decision process:
|
So finally found some time to think about these matters. Thanks again for the detailed outline above which was good food for thought. There are a few things of concern:
Thus, it appears that there is now good way to fetch citations counts from google. Either using JabFox or JabRef leads to nasty issues. Proposal: Don't use google. There are a few services that provide citation meta data using a freely available api. For example, Microsoft Academic API or opencitations or semantic scholar. Personally I would tend to use semantic scholar as the seem to have the most extensive data coverage. These api can be simply consumed directly from JabRef. The advantages of this approach are multiple: you have a stable api to consume, which makes it a reliable and future proof solution, you get more metadata than just the citation count (e.g. it could be extended in the future to actually show a list of citing works providing easy ways to import them etc) and you don't need to worry about cross-communication JabRef <-> JabFox. What do you think? |
For now, I cannot respond in detail. I need to inspect the APIs in detail as well. Just some fragments to think about:
Google Scholar
|
I am in the process of adding a general structure for fetching manifold reference metadata. For now I will add the semantic scholar for fetching citation counts, since it is very easy. Furthermore, I am also in the process of adding a small websocket server in JabRef for bidirectional communication between JabRef and JabFox, which will be much more stable. It will now be used for fetching the citation counts from Google Scholar as well. (Since it has the most accurate and most complete information. Semantic Scholar e.g. states for a reference 3 citation counts but Google Scholar states 33. Furthermore, semantic scholar requires some identifier like DOI, but not every reference has one. Additionally, Google Scholar finds entries where others don't. I am confident, that this approach will work acceptably fine, if used properly and moderately and it could be optional.) This websocket server can later be used for other communication purposes as well (to e.g. exchange additional information between JabFox or any other application). |
Sounds really nice! Thanks for your work @systemoperator. I agree that the data of semantic scholar is not yet on the same level as google scholar, but I hope they are getting there. As you said, they have a nice API. Web sockets. The last time I looked at them, it was not possible to communicate from a browser extension via sockets (i.e. the web sockets API is not accessible). According to https://bugzilla.mozilla.org/show_bug.cgi?id=1247628 this is still the case. If I remember correctly for chrome it might work but you need additional permissions. If you find a solution, that would be nice. This would make it possible to make progress on #32 and JabRef/jabref#5719 |
Currently, I start the websocket client as a background script (using Firefox). I have made some tests with it and it seems to meet all my requirements. At least, I could already send some test websocket messages to JabRef and receive some as well. :) I hope I will not find any pitfalls and I hope this meets all future requirements as well. |
…and date field and can process various different date formats properly); handlerCmdFetchGoogleScholarCitationCounts() implemented, ...
I don't understand, how this code fragment creates the bibtex data: JabRef-Browser-Extension/connector.js Lines 49 to 68 in 7072f37
Where and how does the conversion process take place? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the follow-up! Codewise this looks good to me.
I'll have a look at your other PR @JabRef and then merge both at the same time.
# Conflicts: # package-lock.json # package.json
The PR can be merged now and the ws client is disabled until the JabRef's counterpart has been integrated. :) Is it possible to still keep this branch, so that I can use it onwards? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking so long to come back to you.
I finally found the time to go through your PR again. I only have two small remarks concerning the code. Could you please take care of these, and fix the merge conflicts then I will merge and release a new version. Thanks!
# Conflicts: # data/progressPanel.js # package-lock.json # package.json
done :) |
Many thanks again! |
done