Invalidating Previous Nodes #139

fredngg · 2024-09-21T11:23:38Z

My friend and I love the idea behind graphiti and was trying to test out how we can invalidate the facts using the episode functions, but each episode adds a new fact. The new episode may be invalidated from the beginning if we add a historical fact but it doesn't seem to look back at old facts (for the same entity) to invalidate them. Is this intended? Do we need to write our own logic to check old facts for the same entity to invalidate them. Otherwise, it doesn't return the most fitting response at the top.

Love to get your thoughts. Thanks!

Question - What is Nicholas drinking now
Correct answer - Coffee*

Nicholas is drinking green tea.
r.invalid_at=None
r.valid_at=datetime.datetime(2023, 9, 21, 10, 0, tzinfo=)

Nicholas is not drinking green tea at the moment.
r.invalid_at=None
r.valid_at=datetime.datetime(2024, 9, 21, 5, 32, 30, tzinfo=)

Nicholas started drinking coffee.
r.invalid_at=None
r.valid_at=datetime.datetime(2023, 9, 23, 0, 0, tzinfo=)

Nicholas stopped drinking green tea.
r.invalid_at=None
r.valid_at=datetime.datetime(2023, 9, 22, 0, 0, tzinfo=)

prasmussen15 · 2024-09-22T04:00:33Z

Hey, thanks for the interest and questions!

First off, I would say that for the time being make sure to add a group_id to ingested episodes (they can all be the same like group_id='1'). We were having a bug where deduplication wasn't occurring between edges with nil group ids. I thought we had resolved the issue but it looks like it is still happening for nodes. We will have a bug fix for this in early next week though.

I think your question can be broken into two parts: (1) how do we invalidate facts? and (2) how do invalidated facts effect search?

For (1), we basically do a search on existing facts that serve as potential candidates for invalidation based on the new fact being added. If the LLM determines that the facts are in conflict with each other, then they will be invalidated based on timestamp-dependent logic. I ran through your examples and it looks like the LLM is determining that these statements aren't in contradiction with each other as they are temporally sequential. The invalidation prompt is something that we will improve over time with prompt engineering, and I could see us having some amount of custom invalidation logic in the future as use cases for that can be very different across domains.

For the time being, our perspective is that extracting the correct facts with the correct timestamps is the more important part to be consistent, as this allows us to store the information in a non-lossy way in the graph. This means that when the information is retrieved and passed to an LLM, it will be able to understand the timeline and answer questions accordingly.

For (2), we currently don't have a way to filter the search on things like the timestamps or other properties. This is something we have discussed internally and will be doing, but we want to make sure that we build the filter field in a safe ,flexible and extensible way. As such, we aren't actively working on the filtering at the time being with out focus on other high priority tasks.

Thanks again for the interest and let me know if you have any other questions!

fredngg · 2024-09-25T17:19:22Z

Thanks for the quick response @prasmussen15. With the group id, the deduplication works now. I can see its use case as an individual id to ensure multiple people with the same names to be captured as separate nodes.

That said, I'm wondering why the search using this question - where is Fred now - returning responses from nodes that are not about Fred. The nodes currently captured that Fred has left his workplace after winning the lottery.

prasmussen15 · 2024-10-03T14:13:33Z

Hey, sorry for the late response to this reply. It's hard to know exactly what's going on with the information presented here. I would clarify that group_ids create independent subgraphs, so you should use them for different users whose information you don't want to interact with.

group_id is also a parameter in search, so is it possible that it is only looking at Timothy's subgraph? Alternatively, we offer a reranker based on distance to a node, is it possible that you have passed in Timothy's node to be reranked on?

prasmussen15 self-assigned this Sep 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalidating Previous Nodes #139

Invalidating Previous Nodes #139

fredngg commented Sep 21, 2024

prasmussen15 commented Sep 22, 2024

fredngg commented Sep 25, 2024 •

edited

Loading

prasmussen15 commented Oct 3, 2024

Invalidating Previous Nodes #139

Invalidating Previous Nodes #139

Comments

fredngg commented Sep 21, 2024

prasmussen15 commented Sep 22, 2024

fredngg commented Sep 25, 2024 • edited Loading

prasmussen15 commented Oct 3, 2024

fredngg commented Sep 25, 2024 •

edited

Loading