Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'previous_episodes' on bulk ingest #223

Open
StefanDimitrov95 opened this issue Dec 3, 2024 · 2 comments
Open

KeyError: 'previous_episodes' on bulk ingest #223

StefanDimitrov95 opened this issue Dec 3, 2024 · 2 comments
Assignees

Comments

@StefanDimitrov95
Copy link

StefanDimitrov95 commented Dec 3, 2024

Hello, I'm trying to ingest a chunked text file using the add_episode_bulk() method to an empty graph. I have ran build_indices_and_constraints() first.
Using:

  • Neo4j 5.25.1
  • Python 3.12.7
  • Graphiti 0.4.2
  • gpt-4o-mini
async def bulk_ingest(graphiti, doc_metadata: dict, doc_chunks: list[str]):
    bulk_episodes = [
        RawEpisode(
            name=doc_metadata["composite_file_name"],
            content=row,
            source=EpisodeType.text,
            source_description=doc_metadata["source"],
            reference_time=datetime.datetime.strptime(
                doc_metadata["report_date"], r"%Y-%m-%d"
            ),
        )
        for row in doc_chunks
    ]
    await graphiti.add_episode_bulk(bulk_episodes[:2])
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 71, in <module>
    cli.main()
  File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
    run()
  File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
    return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
    _run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
  File "/Users/stefandimitrov/.vscode/extensions/ms-python.debugpy-2024.12.0-darwin-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
    exec(code, run_globals)
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/main.py", line 138, in <module>
    asyncio.run(main(config))
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.7_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/main.py", line 82, in main
    await bulk_ingest(graphiti, filing, doc_chunks)
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/main.py", line 111, in bulk_ingest
    await graphiti.add_episode_bulk(bulk_episodes[:2])
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/graphiti.py", line 593, in add_episode_bulk
    raise e
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/graphiti.py", line 557, in add_episode_bulk
    (nodes, uuid_map), extracted_edges_timestamped = await asyncio.gather(
                                                     ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/utils/bulk_utils.py", line 177, in dedupe_nodes_bulk
    await asyncio.gather(
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/utils/maintenance/node_operations.py", line 189, in dedupe_extracted_nodes
    llm_response = await llm_client.generate_response(prompt_library.dedupe_nodes.node(context))
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/prompts/lib.py", line 109, in __call__
    return self.func(context)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/stefandimitrov/Documents/Projects/graph_ingest/.venv/lib/python3.12/site-packages/graphiti_core/prompts/dedupe_nodes.py", line 43, in node
    {json.dumps([ep for ep in context['previous_episodes']], indent=2)}
                              ~~~~~~~^^^^^^^^^^^^^^^^^^^^^
KeyError: 'previous_episodes'

The context built here does not have the keys referenced in the prompt:

context = {
'existing_nodes': existing_nodes_context,
'extracted_nodes': extracted_nodes_context,
}
llm_response = await llm_client.generate_response(prompt_library.dedupe_nodes.node(context))

<PREVIOUS MESSAGES>
{json.dumps([ep for ep in context['previous_episodes']], indent=2)}
</PREVIOUS MESSAGES>
<CURRENT MESSAGE>
{context["episode_content"]}
</CURRENT MESSAGE>

@prasmussen15
Copy link
Collaborator

The bulk_add_episode is currently WIP, I can add a comment about it in the code. We don't have it in our documentation for this reason.

Sorry about the confusion!

@StefanDimitrov95
Copy link
Author

It's in the docs here: https://help.getzep.com/graphiti/graphiti/adding-episodes#loading-episodes-in-bulk

Do you have a roadmap about such features? Would be great to ingest in bulk, given the regular add_episode method gets progressively slower on every episode. Could you also share a guildeline on the optimal episode content size?
Thanks!

@prasmussen15 prasmussen15 self-assigned this Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants