-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Knowledge #1567
base: main
Are you sure you want to change the base?
Knowledge #1567
Conversation
from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource | ||
|
||
|
||
class Knowledge(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets write some docs about:
- how to use this
- setting your own custom embedder for this
@@ -0,0 +1,82 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd drop the ollama version. support openai, then let anyone bring their own embedder function (super easy) then have the knowledge_config setup like embedder_config setup for our rag storage
@abstractmethod | ||
def add(self, embedder: BaseEmbedder) -> None: | ||
"""Process content, chunk it, compute embeddings, and save them.""" | ||
pass | ||
|
||
def get_embeddings(self) -> List[np.ndarray]: | ||
"""Return the list of embeddings for the chunks.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets make this save to the project directory instead of the root.
# Compute embeddings for the new chunks | ||
new_embeddings = embedder.embed_chunks(new_chunks) | ||
# Save the embeddings | ||
self.chunk_embeddings.extend(new_embeddings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also be saving this to a db and persist it. like our ragstorage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do this as we can generate the embeddings for files once, then just query if they already exist. otherwise, users will spend tokens generating embeddings that already exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll help with this.
from pydantic import BaseModel, ConfigDict, Field | ||
|
||
from crewai.knowledge.embedder.base_embedder import BaseEmbedder | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extending this to save embeddings to db, then using knowledge class to query from there
src/crewai/agent.py
Outdated
@@ -85,6 +88,10 @@ class Agent(BaseAgent): | |||
llm: Union[str, InstanceOf[LLM], Any] = Field( | |||
description="Language model that will run the agent.", default=None | |||
) | |||
knowledge_sources: Optional[List[BaseKnowledgeSource]] = Field( | |||
default=None, | |||
description="Knowledge sources for the agent.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to declare this on the crew class. the task prompt will query from the relevant trickling down to the agent level, then defining here on the agent level
No description provided.