-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full text crawlers #101
Full text crawlers #101
Conversation
About "Do we want to include more Crawlers? If yes, which?": |
@mlep Thanks 👍 |
2e93572
to
3d3bfb5
Compare
Your feature works pretty well. But there are cases in which a file can be downloaded but it is not the file that is intended. Thus, there must be a way to preview the pdf before adding it. |
We need to evaluate how often this will happen. Can you give me an example? I think with a reasonable amount of crawlers this will work with a very high accuracy. |
Ok, deletion after download is OK when it is done easily. At the moment, I cannot easily delete the file entry from the bibtex entry AND also delete the file from the hard drive. When we fix this, we do not need the check feature in the first place. Good idea! My example was: I wanted to download the conference article Barros, Service Interaction Patterns, but got the techrep instead. Another issue was that I got a PDF of slides in some language I do not understand, probably czech when attempting to download a very old version of the BPMN standard. |
43d0c7d
to
3384c4c
Compare
Regarding, "clean all used fields from Latex stuff, e.g. {~ etc.", this should be one of the "Edit -> Cleanup entries" functionality, isn't it? If not, this functionality should be added there, too. Regarding "check for duplicates", JabRef offers "Search -> find duplicates". This functionality should be reused here. |
A similar approach might have been taken by Christoph Lehner: https://sourceforge.net/p/jabref/discussion/318824/thread/6e5fea64/ Is it possible to synchronize somehow? |
The approach of C.Lehner is the base repository https://github.com/lehner/LocalCopy. |
4399958
to
5b6c7a4
Compare
6ab0b69
to
3288130
Compare
What is missing to get this PR functional and integrated into master? |
This is just too messy to understand right now. Fixing this would require fixing the separation of GUI Event Thread with other Code parts - a major effort. The issue is how to implement swing actions that require multiple user interactions during their task taking place. Normally, this would require a nesting of swing worker classes, one for each step and starting the next step within the EDT update method. What is more, sometimes SwingWorker, and sometimes the spin Framework is used. All of this makes this issue even more complicated. Someone else may take a look if they see this issue more clearly than me. |
So you are saying that you will not complete the PR? I cannot see anybody else who will. There is no point of having this hanging around in limbo until it deprecates. So we can close this PR without merging and close related open issues as won't fix. @simonharrer: Please confirm! |
Merge as good enough. |
Rework fulltext crawlers and first prototype
…ernal links could be set.
d545de8
to
9e0bb87
Compare
Hi, I've just tested the newly implemented "Full text article download" function, however for several articles it downloaded a version from ResearchGate but I want to have the one from Elsevier. Dependent who you are working for, you have special access privileges on certain publisher sites and it would be great to specify in the preferences which crawler the download function should prefer. Perhaps even define a priority list. |
There is a hard-coded priority list right know which prefers the official publishers over google scholar for example. If you want to file a new feature request or enhancement please create a separate issue. |
Done at #435. |
* Fix the code from code review * Fix from code review and create new AiChatTabWorking * Improve chat history storage code * More fix from code review * Remove obsolete parameter * Add JavaDoc comment * Fix checkstyle * Fix JavaDoc * Fix more checkstyle * More checkstyle fixes * Fix code changes * Improve the PR * Rework ADR-0031 to enable to use another option * Add many LOGGEr.trace statements * Change "message window" to "context window" * Fix compiler errors * Fix issue list index issue of langchain4j * Fix lint issue * Update 0031-store-chats-alongside-database.md * More tracing * Refine logging * Remove closing of AiChatLanguageModel (because it's not closable) * Use external package for OpenAI API connection * Provide a custom executor for RetrievalAugmentor * Fix shutdown issue (I hope) * Refactor classes * Change BibDatabaseChatHistoryFile * Revert BibDatabaseChatHistoryFile to old version because of langchain4j * Make round corners for chat messages * Refactor embeddings generation * Refactor embeddings generation * Refactor embeddings generation * Fix CHANGELOG.md * Remove jpro-mdfx * Add comment * Fix localizations * Fix checkstyle and remove OpenAI from PRIVACY.md * Remove unnecessary comments * Fix privacy notice UI * Introduce new ApiKeyMissingComponent * Thanks Tobiaz Diez for writing such a good EntryEditorTab class * Fix InAnYan/jabref issues * Merge `build.gradle` and `settings.gradle` from main branch * Update ADRs * Implement rethought ADR for chat history * Use OpenAI embedding model * Use Deep Java embedding model * Remove old langchain4j embedding models * Fix checkstyle errors * Fix checkstyle and remove old dependencies * Fixes from code review * Restructure * Fix checkstyle errors * Add API base URL parameter * Fix localization * Fix from code review + ADR * Something broken * Now MistralAI and Hugging Face work * Fix base URL for other LLM providers * Fix base URL for other LLM providers * Refactor MVStore usage * Load embedding model in background * Bump langchain4j version * Fix bug * Fix checkstyle and localization * Implement summarization * Fix checkstyle and localization * Improve PrivacyNoticeComponent * Fix from code review * Update localization * Wrap text * Add padding * Fix markdown * Use stuff algorithm * Add GPT-4o-mini * Make chat model editable * Update context window size and summarization * Fix checkstyle * Update PrivacyNoticeComponent.fxml * Update AI summary tab * Fix localization * Change order so that there is no diff * Reorrder dependencies * Add missing CHANGELOG.md entry * Refine ADR-0033 * Refine ADR0034 * Fix typos * Refine ADR-0036 * Fix ADR-0037 * Fix title case * Fix changes in module-info.java * Readd removed requires org.apache.httpcomponents.core5.httpcore5 * Revert change in JabRefGUI to avoid conflicts * Remove empty lines * Reorder entries in JabRef_en.properties * Simplify SummariesStorage (and add test) * Use region/endregion * Fix position of comment * Add comment why the event bus is needed * Do not show exception to the user - just that an error is occurred (saves %0 in localization) * Use "URL %0" without colon (consistency) * Fix typos * History has to be kept * Remove empty lines * Fix language (hopefully) * Compilefix * Simplify BibDatabaseChatHistoryManager * Fix from code review * Fix issue #103 * Rework embeddings cache clearing * Fix #99 and partially #101 * Partially fixing shutdown issues and UI progress monitor issue * Add "requires scala.library" and add "region:" / "endregion" * More grouping (move de.saxsys.mvvmfx.validation up) * Add alphabetical hint * Fix InAnYan#101 and InAnYan#106 * Discard changes to settings.gradle * Fix InAnYan#105 * Follow-up fix for InAnYan#103 * Follow-up fix for InAnYan#103 * Remove obsolete class * Partially fix InAnYan#98 * We do need dependencies to the AI providers, don't we? * Fix InAnYan#93 * Simplify code * Partially fix InAnYan#92 * Fix checkstyle and localization * Fix hyperlinks and text in ApiKeyMissingComponent * Fixes from code review * Fix InAnYan#120 * Remove "X% work done" messages * Fix InAnYan#114 * Partially fix InAnYan#113 * Partially fix InAnYan#110 * Fix InAnYan#110 * Fix InAnYan#111 * Improve embedding model downloading notifications * Fix InAnYan#124 * Fix InAnYan#122 * Fix wrong context window size when expert settings customization is turned off * Attempt to fix InAnYan#95 * Finally fix InAnYan#105 * Fix InAnYan#108 * Attempt to fix InAnYan#98 * Fix for InAnYan#104 * Fix for InAnYan#98 * Fix for InAnYan#95 (comment) * Fix for InAnYan#98 (comment) * Fix for InAnYan#126 * Fix for InAnYan#115 * Fix for InAnYan#113 * Fix for InAnYan#91 * Fix for InAnYan#121 * Fix for InAnYan#112 and InAnYan#116 * Fix for InAnYan#125 * Fixes from commit comments * Fix for InAnYan#115 * Fix for InAnYan#120 * Fix for InAnYan#132 * Fix for InAnYan#132 * Fix for InAnYan#104 * Fix for InAnYan#118 * Fix for InAnYan#114 * Fix for InAnYan#104 * Store error messages in chat history * Make error be a ChatMessageComponent * Implement delete messages InAnYan#136 * Fix for InAnYan#118 * Fix for InAnYan#92 * Fix checkstyle and localization. And refactoring * Fix for InAnYan#92 * Fix for InAnYan#139 * Show "Delete message" button only when necessary * Fix for InAnYan#83 * Update src/main/java/org/jabref/logic/ai/AiService.java Co-authored-by: Oliver Kopp <kopp.dev@gmail.com> * Update src/main/java/org/jabref/logic/ai/chathistory/BibDatabaseChatHistoryManager.java Co-authored-by: Oliver Kopp <kopp.dev@gmail.com> * Update src/main/java/org/jabref/logic/ai/AiService.java Co-authored-by: Oliver Kopp <kopp.dev@gmail.com> * Update src/main/java/org/jabref/gui/Base.css Co-authored-by: Oliver Kopp <kopp.dev@gmail.com> * Update src/main/java/org/jabref/gui/Base.css Co-authored-by: Oliver Kopp <kopp.dev@gmail.com> * Fix from code review * Partial fix for InAnYan#125 * Update colors for error message * Fix for InAnYan#145 and InAnYan#142 * Make progress for embedding model download * Fix checkstyle and localization * Add workaround to get FileHistoryMenuTest running again * Small fixes * Revert "Small fixes" This reverts commit 85382a1. * Introduce AiApiKeyProvider * Fix IDE setup instructions * Do not load API keys on startup * Rely on keystore encryption * Prevent mulitple rebuilds when muliple preferences are updated * Fix localization to be more provider independent * Fix method names * Add poor man's solution to notify of API key changes * Reduce calls to key store (and fix key saving) * Fix for InAnYan#148 and partially InAnYan#146 * Revert "Fix for InAnYan#148 and partially InAnYan#146" This reverts commit 5fa3bb5. * Fix for scrolling down when deleting a message * Sort EmbeddingModel enum variants * Fix GenerateSummaryTask progress indication * Fix dark mode * Add notice for embedding models size --------- Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>
This PR enables automatic PDF fulltext downloads.
Current catalogs:
Questions:
Tools
menu. Imho it should be either auto downloaded or included with thedownload
orauto
button in the detailed entry view.TODO:
403 Forbidden
(Bot detection)