Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to breakout config from rest of code #289

Merged
merged 40 commits into from
Sep 8, 2024

Conversation

whitead
Copy link
Collaborator

@whitead whitead commented Jun 20, 2024

Large refactor to bring us closer to decoupling Docs, LLMs, and Config.

  1. Moves to centralized config that can be loaded from package data or defaults. This changes many function calls so that they pass config as an object or name of a config object.
  2. Removed doc_match and doc_index - these were only sometimes useful and not really part of current usage. We can add back if needed, but just complicated things.
  3. Removed the unnecessary complicated get_callbacks factories. Now, callbacks can have a kwarg of name to get access to name of chain being called.
  4. Switched to contextvars for setting answer_id in LLMResults so that we do not need to have so much back and forth for callbacks.
  5. Deferred updates to Answer objects until end of functions so that retrying is possible (except token counts)
  6. Generally moved all config out of Docs, except LLMModel which will be fixed in port to litellm

@whitead whitead marked this pull request as draft June 20, 2024 01:05
@whitead whitead changed the base branch from main to september-2024-release August 30, 2024 19:11
paperqa/agents/search.py Outdated Show resolved Hide resolved
paperqa/agents/search.py Outdated Show resolved Hide resolved
paperqa/configs/debug.json Outdated Show resolved Hide resolved
paperqa/config.py Outdated Show resolved Hide resolved
paperqa/config.py Outdated Show resolved Hide resolved
paperqa/config.py Outdated Show resolved Hide resolved
paperqa/config.py Outdated Show resolved Hide resolved
MaybeSettings = Settings | str | None


def get_settings(config_or_name: MaybeSettings = None) -> Settings:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a classmethod of Settings, since that's mostly what this is

raise FileNotFoundError(f"No configuration file found for {config_name}")


MaybeSettings = Settings | str | None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this type alias, can we remove it?

@@ -240,7 +240,7 @@ def iterate( # noqa: C901, PLR0912
_pdfs = [self.get_pdf(item) for item in _items]

# Filter:
for item, pdf in zip(_items, _pdfs):
for item, pdf in zip(_items, _pdfs, strict=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment why we're not strict here? To me, it seems we should be strict

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

black junk - fixed

paperqa/core.py Show resolved Hide resolved
paperqa/core.py Show resolved Hide resolved
paperqa/docs.py Outdated Show resolved Hide resolved
paperqa/prompts.py Show resolved Hide resolved
paperqa/types.py Outdated Show resolved Hide resolved
paperqa/utils.py Outdated Show resolved Hide resolved
paperqa/utils.py Outdated Show resolved Hide resolved
settings.agent.paper_directory = stub_data_dir
settings.agent.index_directory = agent_index_dir
settings.agent.search_count = 2
settings.embedding = "sparse"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we configure this in the instantiation of Settings(embedding="sparse")?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not opposed - but why?

tests/test_configs.py Outdated Show resolved Hide resolved
paperqa/config.py Outdated Show resolved Hide resolved
@whitead whitead marked this pull request as ready for review September 7, 2024 19:27
@whitead
Copy link
Collaborator Author

whitead commented Sep 7, 2024

Ready for review - the PR is getting too sprawling to keep going.

TODO list [for future PRs]

  • Get CLI tests back
  • Add litellm logging to configure_cli_logging
  • Remove tiktoken now that we have litellm
  • Track costs using litellm instead of our list (CC @jamesbraza has code somewhere for this)
  • Regnerate VCR
  • Make it possible to build_index from CLI
  • Fix search query - if you try to search local index, it find the question field missing (CC @mskarlin - not sure if this was working)
  • Switch to Aviary env with baseline agent
  • Kickass README
  • Cool pre-configured settings for fast, local open source model optimized, extreme-recall (do not retrieve - run on all items), wikicrow, etc.
  • tenacity retries on model exceptions
  • aiohttp to httpx

Copy link
Collaborator

@jamesbraza jamesbraza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I realized I didn't have my text in the approval text box

I was gonna say, really excellent work here both from Andrew and Mike, LGTM

@whitead whitead merged commit b9b5980 into september-2024-release Sep 8, 2024
2 of 4 checks passed
@whitead whitead changed the title Refactor to functional API Refactor to breakout config from rest of code Sep 8, 2024
@whitead whitead deleted the issue-283 branch September 8, 2024 06:41
This was referenced Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants