Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Details about cost / token usage etc #200

Open
heyitsaamir opened this issue Oct 23, 2024 · 2 comments
Open

Details about cost / token usage etc #200

heyitsaamir opened this issue Oct 23, 2024 · 2 comments
Assignees

Comments

@heyitsaamir
Copy link

Hello!

This is an extremely intriguing project. Thank you for building it!

I was curious if there was any benchmarking on cost and latency when it comes to extraction with this tool? Aside from the mentioned problem with graphrag ("GraphRAG did not address our core problem: It’s primarily designed for static documents and doesn’t inherently handle temporal aspects of data."), graph rag is also slow, and costs a lot (atleast when building knowledge graphs for static documents). For incremental data, how does graphiti do when it comes to building knowledge graphs?

Thanks!

@prasmussen15 prasmussen15 self-assigned this Oct 28, 2024
@prasmussen15
Copy link
Collaborator

Thanks for the insightful question. Cost and latency are very important considerations to us, and it's something we spend a lot of time tracking and evaluating internally. In the future, we also plan to share analysis of these metrics publicly. You can keep an eye out on our blog post for future updates: https://blog.getzep.com.

Now to answer your questions. The RAG (and more broadly IR) domain has two fundamental components: data ingestion, and data retrieval. The two components have different latency requirements, and so we will discuss our approach to each below.

Ingestion:
During Ingestion, Graphiti not only extracts the nodes and edges from the episodes, it also deduplicates that data with information already in the graph, extracts timestamps, invalidates existing edges, and generates summaries for the entity nodes. We have many approaches to speed this up (as can be read in our blog), but ultimately we are using an LLM to perform these tasks so it is going to be slow compared to non-LLM-based ingestions techniques. We optimize our prompts for small models, which tend to be faster and cheaper. GPT-4o-mini is the default OpenAI model that we use and it costs $.15/1M input tokens and $.60/1M output tokens. That means that you could add 2000 pdf pages to Graphiti for about a dollar.

Another note on this, in contrast to many other GraphRAG approaches, we don't feed the entire Graph back into the LLM when extracting more data, so once a certain saturation is reached the time it takes to add new data data to the graph won't grow indefinitely with graph size.

Now, Graphiti allows for the option to build communities, which summarizes closely connected entities. This process is very similar to the process in Microsoft's GraphRAG paper and has a similar associated cost and latency. However, we note that unlike with Microsoft's GraphRAG, Graphiti is fully functional and very powerful even without using communities, so if this is a barrier we recommend trying Graphiti out on some test cases without using communities.

Retrieval:
Retrieval has much stricter cost and latency requirements as we understand that users have their own business logic that they have to execute once Graphiti returns results, and that could be long and expensive for agents applications.

Our retrieval is very fast and very cheap, especially when compared to Microsoft's GraphRAG. This is because we don't use text-to-text LLMs as part of our retrieval flow (although depending on the search method being employed we do use sentence embedder and cross-encoders). This means that while using the default Graphiti flow, it's not uncommon for >50% of the latency to come from OpenAI's sentence embedder. In this case you will be getting results back in a few hundred ms at negligible costs.

This contrasts Microsoft's GraphRAG where they use an LLM-based Map reduce style summarization flow over all communities on every query. Even if you choose to employ communities in Graphiti, we use other search techniques to identify which communities are most relevant to the query, and simply return those summaries to the user, rather than continually summarizing over the entire graph.

@heyitsaamir
Copy link
Author

Thank you for the detailed answer! Let me digest and I'll post for more questions. And thank you for the blog links, super fascinating topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants