Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarize email thread #8508

Closed
5 tasks done
Tracked by #70
ChristophWurst opened this issue Jun 1, 2023 · 17 comments · Fixed by #8653
Closed
5 tasks done
Tracked by #70

Summarize email thread #8508

ChristophWurst opened this issue Jun 1, 2023 · 17 comments · Fixed by #8653
Assignees
Labels
3. to review design enhancement skill:backend Issues and PRs that require backend development skills skill:frontend Issues and PRs that require JavaScript/Vue/styling development skills

Comments

@ChristophWurst
Copy link
Member

ChristophWurst commented Jun 1, 2023

Is your feature request related to a problem? Please describe.

As a user I receive long email threads and I want to efficiently get the gist of them without reading the details.

Describe the solution you'd like

Show a text box below (long) threads with a few sentences of summary of the messages above.

A reference concept can be seen at https://www.youtube.com/watch?v=6DaJVZBXETE&t=25s.

Implementation

An API will be provided to which we can send text and retrieve a summary. Therefore, the Mail app has to be changed to collect a thread's message texts, have the text summarized and show the results in the UI.

Sync vs async processing

If the summary should show instantly, when loading the thread, the processing has to happen in the background and the result persisted with the thread data.
If the summary is generated async, the data can be processed on demand. Depending on the performance of the AI there can be a significant delay.

Opt-out

There will be people who don't want this feature. We should allow them to turn off the feature.

Describe alternatives you've considered

No response

Additional context

No response

@ChristophWurst ChristophWurst added 1. to develop design blocked skill:backend Issues and PRs that require backend development skills skill:frontend Issues and PRs that require JavaScript/Vue/styling development skills and removed 0. to triage labels Jun 1, 2023
@ChristophWurst
Copy link
Member Author

Backend skills only required if the summary API is a PHP API. Then we need a small controller to invoke the API. If the API is exposed via OCS a frontender can query results directly.

@ChristophWurst
Copy link
Member Author

The original idea was to do the processing in an AJAX request on-demand when the thread is loaded by the user. With the insights from nextcloud/server#38578 (comment) we'll have to remodel the architecture. It's not reasonable to process the summary on-demand.

The mail app will continue to synchronize mailboxes in the background like it does now. As the last step of this process, threads are (re)built. We have to add logic to detect when threads are new or changed. Then we have to fire an async text processing task to build the summary and register a listener to process/store the result. The result can go into something like a oc_threads table.

If a thread changes, the previous summary has to be discarded immediately so we don't show an outdated summary to the user if they open the thread earlier than the finished background task.

Stocking thread summaries sounds expensive, so I'm thinking of limiting the feature to threads of x messages. E.g. only start to summarize if there are three or more messages.

@marcoambrosini also raised that if you have an organizational instance with emails sent between a group of people, the same summary might be processed n times. It's to be evaluated if an optimization is possible.

@nimishavijay
Copy link
Member

nimishavijay commented Jul 13, 2023

After discussion with the design team, here is what it could look like:

image

  • A container at the top of the thread with outline in --color-primary-element to differentiate it from the rest of the messages
  • Similar to a normal message, the summary container also has a few actions (which would be considered thread-level actions):
    • Reply all icon+text secondary button
    • Favorite icon only tertiary
    • Mark important icon only tertiary
    • Mark unread icon only tertiary
  • Contents:
    • ✨ Nextcloud Assistant chip for indicating an assistant feature
    • Conversation summary heading
    • Conversation contents summarized and presented as a list for easy scanning
    • Attachments
  • Summarisation could start from 2 messages, my worry with always showing a summary is that people start to ignore it
  • The option to turn it off could be in bottom left settings ("Mail settings") and is reflected for all accounts of that user
  • Nice-to-have: Items in the summary list that have been changed/added since your last visit are shown in bold

@ChristophWurst
Copy link
Member Author

Attachments

What is the logic behind that? Does it show all attachments in the thread or does the LLM do some sort of selection?

@nimishavijay
Copy link
Member

What is the logic behind that? Does it show all attachments in the thread or does the LLM do some sort of selection?

Most ideal: the LLM decides which attachments are relevant and shows that (for eg. if a colleague shared an updated invoicing template that will be shown and not the original)

It would also be nice if all the attachments were shown if the associated text is also summarized and shown. For eg.

- Alice shared the invoicing template
  [invoice_template.pdf]

- Bob shared the scripts and slides for the release presentation
  [script_hub5.md] [Frank's part .md] [Alice's part .md] [slides.pptx]

- Alice updated the invoice template
  [invoice_template (1).pdf]

Would any of those be in scope?

@ChristophWurst
Copy link
Member Author

@DaphneMuller @marcelklehr would it? ^

@ChristophWurst
Copy link
Member Author

ChristophWurst commented Jul 20, 2023

How do we feed messages of a thread into the LLM so the LLM understands who said/shared/sent what? Do we feed attachments too?
E.g. in the example above the result contains information about who said what. If we only concatenate the message text without any sender information it won't be possible to generate such results.

Just thinking out loud.

@ChristophWurst
Copy link
Member Author

nextcloud/server#38578 is in.

@nimishavijay the LLM won't be able to provide links or lists, it's only plain text that will be returned. Ref nextcloud/server#38578 (comment). We'll have to expect a simpler summarization for the first iteration.

@ChristophWurst
Copy link
Member Author

Conversation summary heading

Could it be thread instead of conversation to avoid two terms for the same thing?

@nimishavijay
Copy link
Member

We'll have to expect a simpler summarization for the first iteration.

No worries, we can make design changes if needed based on the first version :)

Could it be thread instead of conversation to avoid two terms for the same thing?

Works for me!

@ChristophWurst
Copy link
Member Author

ChristophWurst commented Jul 25, 2023

We will make the feature opt-in for admins so it doesn't put too much load on a system. Additionally we only summarize threads of three or more messages.

@st3iny st3iny self-assigned this Jul 25, 2023
@hamza221
Copy link
Contributor

hamza221 commented Jul 26, 2023

After talking to @ChristophWurst and @jancborchardt we agreed that actions are not really usable on the summary we're replacing them with a button to scroll to the newest message instead.

@ChristophWurst
Copy link
Member Author

Short update on the text summarization experiments:

The local LLM of https://github.com/nextcloud/llm doesn't provide usable summaries at the moment. Short text stays as-is. Longer text processes for 20 minutes and runs into the process timeout.

We could look into OpenAI/ChatGPT instead for a proof of concept. The app doesn't support the text processing APIs yet: nextcloud/integration_openai#35.

@DaphneMuller
Copy link

DaphneMuller commented Jul 27, 2023

We will probably make this supported in the openai integration but it has a lower priority compared to making the on-premise ai of Marcel work

@marcelklehr
Copy link
Member

The local LLM of nextcloud/llm doesn't provide usable summaries at the moment. Short text stays as-is. Longer text processes for 20 minutes and runs into the process timeout.

Timeout depends on the machine it runs on I'd guess. I've changed the model to a more up-to-date, lightweight one that should also give higher-quality output, increased the timeout and improved the summary algorithm. Also see nextcloud/server#39567 for a few fixes and improvements to the textprocessing feature.

@ChristophWurst
Copy link
Member Author

MVP is in. Reopening for the follow-ups.

@ChristophWurst
Copy link
Member Author

Planned follow-up seem to be done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review design enhancement skill:backend Issues and PRs that require backend development skills skill:frontend Issues and PRs that require JavaScript/Vue/styling development skills
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants