Skip to content

Commit

Permalink
AI chatting functionality (#11430)
Browse files Browse the repository at this point in the history
* Fix the code from code review

* Fix from code review and create new AiChatTabWorking

* Improve chat history storage code

* More fix from code review

* Remove obsolete parameter

* Add JavaDoc comment

* Fix checkstyle

* Fix JavaDoc

* Fix more checkstyle

* More checkstyle fixes

* Fix code changes

* Improve the PR

* Rework ADR-0031 to enable to use another option

* Add many LOGGEr.trace statements

* Change "message window" to "context window"

* Fix compiler errors

* Fix issue list index issue of langchain4j

* Fix lint issue

* Update 0031-store-chats-alongside-database.md

* More tracing

* Refine logging

* Remove closing of AiChatLanguageModel (because it's not closable)

* Use external package for OpenAI API connection

* Provide a custom executor for RetrievalAugmentor

* Fix shutdown issue (I hope)

* Refactor classes

* Change BibDatabaseChatHistoryFile

* Revert BibDatabaseChatHistoryFile to old version because of langchain4j

* Make round corners for chat messages

* Refactor embeddings generation

* Refactor embeddings generation

* Refactor embeddings generation

* Fix CHANGELOG.md

* Remove jpro-mdfx

* Add comment

* Fix localizations

* Fix checkstyle and remove OpenAI from PRIVACY.md

* Remove unnecessary comments

* Fix privacy notice UI

* Introduce new ApiKeyMissingComponent

* Thanks Tobiaz Diez for writing such a good EntryEditorTab class

* Fix InAnYan/jabref issues

* Merge `build.gradle` and `settings.gradle` from main branch

* Update ADRs

* Implement rethought ADR for chat history

* Use OpenAI embedding model

* Use Deep Java embedding model

* Remove old langchain4j embedding models

* Fix checkstyle errors

* Fix checkstyle and remove old dependencies

* Fixes from code review

* Restructure

* Fix checkstyle errors

* Add API base URL parameter

* Fix localization

* Fix from code review + ADR

* Something broken

* Now MistralAI and Hugging Face work

* Fix base URL for other LLM providers

* Fix base URL for other LLM providers

* Refactor MVStore usage

* Load embedding model in background

* Bump langchain4j version

* Fix bug

* Fix checkstyle and localization

* Implement summarization

* Fix checkstyle and localization

* Improve PrivacyNoticeComponent

* Fix from code review

* Update localization

* Wrap text

* Add padding

* Fix markdown

* Use stuff algorithm

* Add GPT-4o-mini

* Make chat model editable

* Update context window size and summarization

* Fix checkstyle

* Update PrivacyNoticeComponent.fxml

* Update AI summary tab

* Fix localization

* Change order so that there is no diff

* Reorrder dependencies

* Add missing CHANGELOG.md entry

* Refine ADR-0033

* Refine ADR0034

* Fix typos

* Refine ADR-0036

* Fix ADR-0037

* Fix title case

* Fix changes in module-info.java

* Readd removed requires org.apache.httpcomponents.core5.httpcore5

* Revert change in JabRefGUI to avoid conflicts

* Remove empty lines

* Reorder entries in JabRef_en.properties

* Simplify SummariesStorage (and add test)

* Use region/endregion

* Fix position of comment

* Add comment why the event bus is needed

* Do not show exception to the user - just that an error is occurred (saves %0 in localization)

* Use "URL %0" without colon (consistency)

* Fix typos

* History has to be kept

* Remove empty lines

* Fix language (hopefully)

* Compilefix

* Simplify BibDatabaseChatHistoryManager

* Fix from code review

* Fix issue #103

* Rework embeddings cache clearing

* Fix #99 and partially #101

* Partially fixing shutdown issues and UI progress monitor issue

* Add "requires scala.library" and add "region:" / "endregion"

* More grouping (move de.saxsys.mvvmfx.validation up)

* Add alphabetical hint

* Fix InAnYan#101 and InAnYan#106

* Discard changes to settings.gradle

* Fix InAnYan#105

* Follow-up fix for InAnYan#103

* Follow-up fix for InAnYan#103

* Remove obsolete class

* Partially fix InAnYan#98

* We do need dependencies to the AI providers, don't we?

* Fix InAnYan#93

* Simplify code

* Partially fix InAnYan#92

* Fix checkstyle and localization

* Fix hyperlinks and text in ApiKeyMissingComponent

* Fixes from code review

* Fix InAnYan#120

* Remove "X% work done" messages

* Fix InAnYan#114

* Partially fix InAnYan#113

* Partially fix InAnYan#110

* Fix InAnYan#110

* Fix InAnYan#111

* Improve embedding model downloading notifications

* Fix InAnYan#124

* Fix InAnYan#122

* Fix wrong context window size when expert settings customization is turned off

* Attempt to fix InAnYan#95

* Finally fix InAnYan#105

* Fix InAnYan#108

* Attempt to fix InAnYan#98

* Fix for InAnYan#104

* Fix for InAnYan#98

* Fix for InAnYan#95 (comment)

* Fix for InAnYan#98 (comment)

* Fix for InAnYan#126

* Fix for InAnYan#115

* Fix for InAnYan#113

* Fix for InAnYan#91

* Fix for InAnYan#121

* Fix for InAnYan#112 and InAnYan#116

* Fix for InAnYan#125

* Fixes from commit comments

* Fix for InAnYan#115

* Fix for InAnYan#120

* Fix for InAnYan#132

* Fix for InAnYan#132

* Fix for InAnYan#104

* Fix for InAnYan#118

* Fix for InAnYan#114

* Fix for InAnYan#104

* Store error messages in chat history

* Make error be a ChatMessageComponent

* Implement delete messages InAnYan#136

* Fix for InAnYan#118

* Fix for InAnYan#92

* Fix checkstyle and localization. And refactoring

* Fix for InAnYan#92

* Fix for InAnYan#139

* Show "Delete message" button only when necessary

* Fix for InAnYan#83

* Update src/main/java/org/jabref/logic/ai/AiService.java

Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>

* Update src/main/java/org/jabref/logic/ai/chathistory/BibDatabaseChatHistoryManager.java

Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>

* Update src/main/java/org/jabref/logic/ai/AiService.java

Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>

* Update src/main/java/org/jabref/gui/Base.css

Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>

* Update src/main/java/org/jabref/gui/Base.css

Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>

* Fix from code review

* Partial fix for InAnYan#125

* Update colors for error message

* Fix for InAnYan#145 and InAnYan#142

* Make progress for embedding model download

* Fix checkstyle and localization

* Add workaround to get FileHistoryMenuTest running again

* Small fixes

* Revert "Small fixes"

This reverts commit 85382a1.

* Introduce AiApiKeyProvider

* Fix IDE setup instructions

* Do not load API keys on startup

* Rely on keystore encryption

* Prevent mulitple rebuilds when muliple preferences are updated

* Fix localization to be more provider independent

* Fix method names

* Add poor man's solution to notify of API key changes

* Reduce calls to key store (and fix key saving)

* Fix for InAnYan#148 and partially InAnYan#146

* Revert "Fix for InAnYan#148 and partially InAnYan#146"

This reverts commit 5fa3bb5.

* Fix for scrolling down when deleting a message

* Sort EmbeddingModel enum variants

* Fix GenerateSummaryTask progress indication

* Fix dark mode

* Add notice for embedding models size

---------

Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>
  • Loading branch information
InAnYan and koppor committed Aug 14, 2024
1 parent 1374813 commit 8a0edc2
Show file tree
Hide file tree
Showing 93 changed files with 6,340 additions and 26 deletions.
16 changes: 8 additions & 8 deletions .github/workflows/deployment-jdk-ea.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macos-latest, buildjet-4vcpu-ubuntu-2204-arm]
os: [ubuntu-latest, windows-latest, macos-latest, buildjet-8vcpu-ubuntu-2204-arm]
jdk: [22]
javafx: [23]
include:
Expand All @@ -45,7 +45,7 @@ jobs:
- os: windows-latest
displayName: windows
archivePortable: 7z a -r build/distribution/JabRef-portable_windows.zip ./build/distribution/JabRef && rm -R build/distribution/JabRef
- os: buildjet-4vcpu-ubuntu-2204-arm
- os: buildjet-8vcpu-ubuntu-2204-arm
displayName: "linux-arm"
archivePortable: "tar -c -C build/distribution JabRef | pigz --rsyncable > build/distribution/JabRef-portable_linux-arm64.tar.gz && rm -R build/distribution/JabRef"
- os: macos-latest
Expand Down Expand Up @@ -78,7 +78,7 @@ jobs:
submodules: 'true'
show-progress: 'false'
- name: Install pigz and cache (linux)
if: (matrix.os == 'ubuntu-latest') || (matrix.os == 'buildjet-4vcpu-ubuntu-2204-arm')
if: (matrix.os == 'ubuntu-latest') || (matrix.os == 'buildjet-8vcpu-ubuntu-2204-arm')
uses: awalsh128/cache-apt-pkgs-action@master
with:
packages: pigz
Expand Down Expand Up @@ -115,7 +115,7 @@ jobs:
# JavaFX
- name: Download and extract JavaFX ${{ matrix.javafx }}
if: (matrix.os != 'buildjet-4vcpu-ubuntu-2204-arm')
if: (matrix.os != 'buildjet-8vcpu-ubuntu-2204-arm')
shell: bash
run: |
cd javafx
Expand All @@ -127,7 +127,7 @@ jobs:
EXTRACT="tar xzf *.tar.gz"
EXT="tar.gz"
;;
"buildjet-4vcpu-ubuntu-2204-arm")
"buildjet-8vcpu-ubuntu-2204-arm")
OS="linux"
EXTRACT="tar xzf *.tar.gz"
EXT="tar.gz"
Expand Down Expand Up @@ -163,19 +163,19 @@ jobs:
$EXTRACT
rm *.$EXT
- name: 'Set JavaFX ${{ matrix.javafx }} (linux, Windows)'
if: (matrix.os != 'macos-latest') && (matrix.os != 'buildjet-4vcpu-ubuntu-2204-arm')
if: (matrix.os != 'macos-latest') && (matrix.os != 'buildjet-8vcpu-ubuntu-2204-arm')
run: |
sed -i '/javafx {/{n;s#version = ".*"#sdk = "javafx/javafx-sdk-${{ matrix.javafx }}"#}' build.gradle
sed -i "s#jlink {#jlink { addExtraModulePath 'javafx/javafx-jmods-${{ matrix.javafx }}'#" build.gradle
cat build.gradle
- name: 'Set JavaFX ${{ matrix.javafx }} (macOS)'
if: (matrix.os == 'macos-latest') && (matrix.os != 'buildjet-4vcpu-ubuntu-2204-arm')
if: (matrix.os == 'macos-latest') && (matrix.os != 'buildjet-8vcpu-ubuntu-2204-arm')
run: |
sed -i '.bak' -e '/javafx {/{n' -e 's#version = ".*"#sdk = "javafx/javafx-sdk-${{ matrix.javafx }}"#;}' build.gradle
sed -i '.bak' -e "s#jlink {#jlink { addExtraModulePath 'javafx/javafx-jmods-${{ matrix.javafx }}'#" build.gradle
cat build.gradle
- name: 'Set JavaFX ${{ matrix.javafx }} (linux-arm)'
if: (matrix.os == 'buildjet-4vcpu-ubuntu-2204-arm')
if: (matrix.os == 'buildjet-8vcpu-ubuntu-2204-arm')
# No JavaFX EA build for ARM at https://jdk.java.net/javafx23/, therefore using Maven Central artifact
run: |
curl -s "https://search.maven.org/solrsearch/select?q=g:org.openjfx+AND+a:javafx&rows=10&core=gav" > /tmp/versions.json
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ Note that this project **does not** adhere to [Semantic Versioning](https://semv

### Added

- We added an AI-based chat for entries with linked PDF files. [#11430](https://github.com/JabRef/jabref/pull/11430)
- We added an AI-based summarization possibility for entries with linked PDF files. [#11430](https://github.com/JabRef/jabref/pull/11430)
- We added support for selecting and using CSL Styles in JabRef's OpenOffice/LibreOffice integration for inserting bibliographic and in-text citations into a document. [#2146](https://github.com/JabRef/jabref/issues/2146), [#8893](https://github.com/JabRef/jabref/issues/8893)
- We added Tools > New library based on references in PDF file... to create a new library based on the references section in a PDF file. [#11522](https://github.com/JabRef/jabref/pull/11522)
- When converting the references section of a paper (PDF file), more than the last page is treated. [#11522](https://github.com/JabRef/jabref/pull/11522)
Expand Down
31 changes: 29 additions & 2 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ version = project.findProperty('projVersion') ?: '100.0.0'
java {
sourceCompatibility = JavaVersion.VERSION_21
targetCompatibility = JavaVersion.VERSION_21

// Workaround needed for Eclipse, probably because of https://github.com/gradle/gradle/issues/16922
// Should be removed as soon as Gradle 7.0.1 is released ( https://github.com/gradle/gradle/issues/16922#issuecomment-828217060 )
modularity.inferModulePath.set(false)
Expand Down Expand Up @@ -125,6 +126,8 @@ repositories {
maven { url 'https://s01.oss.sonatype.org/content/repositories/snapshots/' }
maven { url 'https://jitpack.io' }
maven { url 'https://oss.sonatype.org/content/groups/public' }

// Required for one.jpro.jproutils:tree-showing
maven { url 'https://sandec.jfrog.io/artifactory/repo' }
}

Expand Down Expand Up @@ -238,12 +241,17 @@ dependencies {
exclude module: 'commons-lang3'
exclude group: 'org.apache.commons.validator'
exclude group: 'org.apache.commons.commons-logging'
exclude module: 'kotlin-stdlib-jdk8'
exclude group: 'com.squareup.retrofit2'
exclude group: 'org.openjfx'
exclude group: 'org.apache.logging.log4j'
exclude group: 'tech.units'
}
// Required by gemsfx
implementation 'tech.units:indriya:2.2'
implementation ('com.squareup.retrofit2:retrofit:2.11.0') {
exclude group: 'com.squareup.okhttp3'
}

implementation 'org.controlsfx:controlsfx:11.2.1'

Expand Down Expand Up @@ -315,6 +323,25 @@ dependencies {
// YAML formatting
implementation 'org.yaml:snakeyaml:2.2'

// AI
implementation 'dev.langchain4j:langchain4j:0.33.0'
// Even though we use jvm-openai for LLM connection, we still need this package for tokenization.
implementation('dev.langchain4j:langchain4j-open-ai:0.33.0') {
exclude group: 'org.jetbrains.kotlin', module: 'kotlin-stdlib-jdk8'
}
implementation('dev.langchain4j:langchain4j-mistral-ai:0.33.0')
implementation('dev.langchain4j:langchain4j-hugging-face:0.33.0')
implementation 'ai.djl:api:0.29.0'
implementation 'ai.djl.pytorch:pytorch-model-zoo:0.29.0'
implementation 'ai.djl.huggingface:tokenizers:0.29.0'
implementation 'io.github.stefanbratanov:jvm-openai:0.9.3'
// openai depends on okhttp, which needs kotlin - see https://github.com/square/okhttp/issues/5299 for details
implementation ('com.squareup.okhttp3:okhttp:4.12.0') {
exclude group: 'org.jetbrains.kotlin', module: 'kotlin-stdlib-jdk8'
}
// GemxFX also (transitively) depends on kotlin
implementation 'org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.9.24'

implementation 'commons-io:commons-io:2.16.1'

testImplementation 'io.github.classgraph:classgraph:4.8.174'
Expand Down Expand Up @@ -455,8 +482,8 @@ compileJava {
options.generatedSourceOutputDirectory.set(file("src-gen/main/java"))

moduleOptions {
// TODO: Remove access to internal api
addExports = [
// TODO: Remove access to internal api
'javafx.controls/com.sun.javafx.scene.control' : 'org.jabref',
'org.controlsfx.controls/impl.org.controlsfx.skin' : 'org.jabref'
]
Expand All @@ -470,10 +497,10 @@ run {
application.applicationDefaultJvmArgs = []
}

// TODO: Remove access to internal api
moduleOptions {
// On a change here, also adapt "application > applicationDefaultJvmArgs"
addExports = [
// TODO: Remove access to internal api
'javafx.base/com.sun.javafx.event' : 'org.jabref.merged.module',
'javafx.controls/com.sun.javafx.scene.control' : 'org.jabref',

Expand Down
66 changes: 66 additions & 0 deletions docs/decisions/0032-store-chats-in-local-user-folder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
nav_order: 0032
parent: Decision Records
---
# Store Chats Alongside Database

## Context and Problem Statement

Chats with AI should be stored somewhere. But where and how?

## Considered Options

* Inside `.bib` file
* In local user folder
* Alongside `.bib` file

## Decision Drivers

* Should work when shared with OneDrive, Dropbox or similar asynchronous services
* Should work on network drives
* Should be "easy" for users to follow
* Should be the same in a shared and non-shared setting (e.g., if Dropbox is used or not should make a difference)

## Decision Outcome

Chosen option: "In local user folder", because
it's very hard to work with a shared library, if two users will work
simultaneously on one library, then AI chats file will be absolutely arbitrary
and unmergable.

## Pros and Cons of the Options

### Inside `.bib` file

* Good, because we already have a machinery for managing the fields and other information of BIB entries
* Good, because chats are stored inside one file, and if the `.bib` file is moved, the chat history is preserved
* Bad, because there may be lots of chats and messages and `.bib` file become too cluttered and too big which slows down the processing of `.bib` file
* Bad, because if user shares a `.bib` file, they will also share chat messages, but chats are not ideal, so user may not
want to share them

### In local user folder

One can use `%APPDATA%`, where JabRef stores the Lucene index and other information.
See `org.jabref.gui.desktop.os.NativeDesktop#getFulltextIndexBaseDirectory` for use in JabRef and
<https://github.com/harawata/appdirs> for general information.

Concrete example for backup folder: `C:\Users\${username}\AppData\Local\org.jabref\jabref\backups`.
Example filename: `4a070cf3--Chocolate.bib--2024-03-25--14.20.12.bak`.

* Good, because `.bib` file is kept clean
* Good, because chat messages are saved locally
* Neutral, because may be a little harder to implement
* Bad, because chat messages cannot be easily shared
* Bad, because when path of a `.bib` file is changed, the chats are lost

### Alongside `.bib` file

* Good, because simple implementation
* Good, because, the user can send the chats file alongside the `.bib` file if they want to share the chats. If users do not want
to share the messages, then they can omit the chats file
* Good, because `.bib` files is kept clean
* Bad, because user may not expect that a new file will be created alongside their `.bib` (or other LaTeX-related) files
* Bad, because, it may be not convenient to share both files (`.bib` file and chats file) in order to share chat history.
* Bad, because if `.bib` files are edited externally (meaning, not inside the JabRef), then chats file will not be updated correspondingly
* Bad, because if user moves `.bib` file, they should move the chats file too
* Bad, because if two persons work in parallel using a OneDrive share, the file is overwritten or a conflict file is generated. ([Dropbox "conflicted copy"](https://help.dropbox.com/en-en/organize/conflicted-copy))
53 changes: 53 additions & 0 deletions docs/decisions/0033-store-chats-in-mvstore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
nav_order: 0033
parent: Decision Records
---

# Store Chats in MVStore

## Context and Problem Statement

This is a follow-up to [ADR-031](0032-store-chats-in-local-user-folder).

The chats with AI should be saved on exit from JabRef and retrieved on launch. We need to decide the format of
the serialized messages.

## Decision Drivers

* Easy to implement and maintain
* Memory-efficient (because JabRef is said to consume much memory)

## Considered Options

* JSON
* MVStore
* Custom format

## Decision Outcome

Chosen option: "MVStore", because it is simple and memory-efficient.

## Pros and Cons of the Options

### JSON

* Good, because allows for easy storing and loading of chats
* Good, because cross-platform
* Good, because widely used and accepted, so there are lots of libraries for JSON format
* Good, because it is even possible to reuse the chats file for other purposes
* Good, because has potential for being mergeable by external tooling
* Bad, because too verbose (meaning the file size could be much smaller)

### MVStore

* Good, because automatic loading and saving to disk
* Good, because memory-efficient
* Bad, because does not support mutable values in maps.
* Bad, because the order of messages need to be "hand-crafted" (e.g., by mapping from an Integer to the concrete message), since [MVStore does not support storing list which update](https://github.com/koppor/mvstore-mwe/pull/1).
* Bad, because it stores data as key-values, but not as a custom data type (like tables in RDBMS)

### Custom format

* Good, because we have the full control
* Bad, because involves writing our own language and parser
* Bad, because we need to implement optimizations found in databases on our own (storing some data in RAM, other on disk)
69 changes: 69 additions & 0 deletions docs/decisions/0034-use-citation-key-for-grouping-chat-messages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
nav_order: 0034
parent: Decision Records
---

# Use Citation Key for Grouping Chat Messages

## Context and Problem Statement

Because we store chat messages not inside a BIB entry in `.bib` filecc, the chats file is represented as a map to
BIB entry and a list of messages. We need to specify the key of this map. Turns out, it is not that easy.

## Decision Drivers

* The key should exist for every BIB entry
* The key should be unique along other BIB entries in one library file
* The key should not change at run-time, between launches of JabRef, and should be cross-platform (most important)

## Considered Options

* `BibEntry` Java object
* `BibEntry`'s `id`
* `BibEntry`'s Citation key
* `BibEntry`'s `ShareId`

## Decision Outcome

Chosen option: "`BibEntry`'s Citation key", because this is the only choice that complains to the third point in Decision Drivers.

### Positive Consequences

* Easy to implement
* Cross-platform

### Negative Consequences

* If the citation key is changed externally, then the chats file becomes out-of-sync
* Additional user interaction in order to make the citation key complain the first and second points of Decision Drivers

## Pros and Cons of the Options

### `BibEntry` Java object

Very bad, because it works only at run-time and is not stable.

### `BibEntry`'s `id`

JabRef stores a unique identifier for each `BibEntry`.
This identifier is created on each load of a library (and not stored permanently).

Very bad, for the same reasons as `BibEntry` Java object.

### `BibEntry`'s Citation key

* Good, because it is cross-platform, stable (meaning stays the same across launches of JabRef)
* Bad, because it is not guaranteed that citation key exists on `BibEntry`, and that it is unique across other
`BibEntriy`'s' in the library

### `BibEntry`'s `ShareId`

[ADR-0027](0027-synchronization.md) describes the procedure of synchronization of a Bib(La)TeX library with a server.
Thereby, also local and remote entries need to be kept consistent.
The solution chosen there is that the **server** creates a UUID for each entry.

This approach cannot be used here, because there is no server running which we can ask for an UUID of an entry.

## More Information

Refer to [issue #160](https://github.com/JabRef/jabref/issues/160) in JabRef main repository
Loading

0 comments on commit 8a0edc2

Please sign in to comment.