Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalidating Previous Nodes #139

Open
fredngg opened this issue Sep 21, 2024 · 3 comments
Open

Invalidating Previous Nodes #139

fredngg opened this issue Sep 21, 2024 · 3 comments
Assignees

Comments

@fredngg
Copy link

fredngg commented Sep 21, 2024

My friend and I love the idea behind graphiti and was trying to test out how we can invalidate the facts using the episode functions, but each episode adds a new fact. The new episode may be invalidated from the beginning if we add a historical fact but it doesn't seem to look back at old facts (for the same entity) to invalidate them. Is this intended? Do we need to write our own logic to check old facts for the same entity to invalidate them. Otherwise, it doesn't return the most fitting response at the top.

Love to get your thoughts. Thanks!

Question - What is Nicholas drinking now
Correct answer - Coffee*

Nicholas is drinking green tea.
r.invalid_at=None
r.valid_at=datetime.datetime(2023, 9, 21, 10, 0, tzinfo=)

Nicholas is not drinking green tea at the moment.
r.invalid_at=None
r.valid_at=datetime.datetime(2024, 9, 21, 5, 32, 30, tzinfo=)

Nicholas started drinking coffee.
r.invalid_at=None
r.valid_at=datetime.datetime(2023, 9, 23, 0, 0, tzinfo=)

Nicholas stopped drinking green tea.
r.invalid_at=None
r.valid_at=datetime.datetime(2023, 9, 22, 0, 0, tzinfo=)

@prasmussen15
Copy link
Collaborator

Hey, thanks for the interest and questions!

First off, I would say that for the time being make sure to add a group_id to ingested episodes (they can all be the same like group_id='1'). We were having a bug where deduplication wasn't occurring between edges with nil group ids. I thought we had resolved the issue but it looks like it is still happening for nodes. We will have a bug fix for this in early next week though.

I think your question can be broken into two parts: (1) how do we invalidate facts? and (2) how do invalidated facts effect search?

For (1), we basically do a search on existing facts that serve as potential candidates for invalidation based on the new fact being added. If the LLM determines that the facts are in conflict with each other, then they will be invalidated based on timestamp-dependent logic. I ran through your examples and it looks like the LLM is determining that these statements aren't in contradiction with each other as they are temporally sequential. The invalidation prompt is something that we will improve over time with prompt engineering, and I could see us having some amount of custom invalidation logic in the future as use cases for that can be very different across domains.

For the time being, our perspective is that extracting the correct facts with the correct timestamps is the more important part to be consistent, as this allows us to store the information in a non-lossy way in the graph. This means that when the information is retrieved and passed to an LLM, it will be able to understand the timeline and answer questions accordingly.

For (2), we currently don't have a way to filter the search on things like the timestamps or other properties. This is something we have discussed internally and will be doing, but we want to make sure that we build the filter field in a safe ,flexible and extensible way. As such, we aren't actively working on the filtering at the time being with out focus on other high priority tasks.

Thanks again for the interest and let me know if you have any other questions!

@prasmussen15 prasmussen15 self-assigned this Sep 22, 2024
@fredngg
Copy link
Author

fredngg commented Sep 25, 2024

Thanks for the quick response @prasmussen15. With the group id, the deduplication works now. I can see its use case as an individual id to ensure multiple people with the same names to be captured as separate nodes.

That said, I'm wondering why the search using this question - where is Fred now - returning responses from nodes that are not about Fred. The nodes currently captured that Fred has left his workplace after winning the lottery.

image

image

@prasmussen15
Copy link
Collaborator

Hey, sorry for the late response to this reply. It's hard to know exactly what's going on with the information presented here. I would clarify that group_ids create independent subgraphs, so you should use them for different users whose information you don't want to interact with.

group_id is also a parameter in search, so is it possible that it is only looking at Timothy's subgraph? Alternatively, we offer a reranker based on distance to a node, is it possible that you have passed in Timothy's node to be reranked on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants