Details about cost / token usage etc #200

heyitsaamir · 2024-10-23T19:22:15Z

Hello!

This is an extremely intriguing project. Thank you for building it!

I was curious if there was any benchmarking on cost and latency when it comes to extraction with this tool? Aside from the mentioned problem with graphrag ("GraphRAG did not address our core problem: It’s primarily designed for static documents and doesn’t inherently handle temporal aspects of data."), graph rag is also slow, and costs a lot (atleast when building knowledge graphs for static documents). For incremental data, how does graphiti do when it comes to building knowledge graphs?

Thanks!

prasmussen15 · 2024-10-29T21:36:05Z

Thanks for the insightful question. Cost and latency are very important considerations to us, and it's something we spend a lot of time tracking and evaluating internally. In the future, we also plan to share analysis of these metrics publicly. You can keep an eye out on our blog post for future updates: https://blog.getzep.com.

Now to answer your questions. The RAG (and more broadly IR) domain has two fundamental components: data ingestion, and data retrieval. The two components have different latency requirements, and so we will discuss our approach to each below.

Ingestion:
During Ingestion, Graphiti not only extracts the nodes and edges from the episodes, it also deduplicates that data with information already in the graph, extracts timestamps, invalidates existing edges, and generates summaries for the entity nodes. We have many approaches to speed this up (as can be read in our blog), but ultimately we are using an LLM to perform these tasks so it is going to be slow compared to non-LLM-based ingestions techniques. We optimize our prompts for small models, which tend to be faster and cheaper. GPT-4o-mini is the default OpenAI model that we use and it costs $.15/1M input tokens and $.60/1M output tokens. That means that you could add 2000 pdf pages to Graphiti for about a dollar.

Another note on this, in contrast to many other GraphRAG approaches, we don't feed the entire Graph back into the LLM when extracting more data, so once a certain saturation is reached the time it takes to add new data data to the graph won't grow indefinitely with graph size.

Now, Graphiti allows for the option to build communities, which summarizes closely connected entities. This process is very similar to the process in Microsoft's GraphRAG paper and has a similar associated cost and latency. However, we note that unlike with Microsoft's GraphRAG, Graphiti is fully functional and very powerful even without using communities, so if this is a barrier we recommend trying Graphiti out on some test cases without using communities.

Retrieval:
Retrieval has much stricter cost and latency requirements as we understand that users have their own business logic that they have to execute once Graphiti returns results, and that could be long and expensive for agents applications.

Our retrieval is very fast and very cheap, especially when compared to Microsoft's GraphRAG. This is because we don't use text-to-text LLMs as part of our retrieval flow (although depending on the search method being employed we do use sentence embedder and cross-encoders). This means that while using the default Graphiti flow, it's not uncommon for >50% of the latency to come from OpenAI's sentence embedder. In this case you will be getting results back in a few hundred ms at negligible costs.

This contrasts Microsoft's GraphRAG where they use an LLM-based Map reduce style summarization flow over all communities on every query. Even if you choose to employ communities in Graphiti, we use other search techniques to identify which communities are most relevant to the query, and simply return those summaries to the user, rather than continually summarizing over the entire graph.

heyitsaamir · 2024-10-30T16:12:20Z

Thank you for the detailed answer! Let me digest and I'll post for more questions. And thank you for the blog links, super fascinating topic.

prasmussen15 self-assigned this Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Details about cost / token usage etc #200

Details about cost / token usage etc #200

heyitsaamir commented Oct 23, 2024

prasmussen15 commented Oct 29, 2024

heyitsaamir commented Oct 30, 2024

Details about cost / token usage etc #200

Details about cost / token usage etc #200

Comments

heyitsaamir commented Oct 23, 2024

prasmussen15 commented Oct 29, 2024

heyitsaamir commented Oct 30, 2024