-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Knowledge #1567
Open
bhancockio
wants to merge
14
commits into
main
Choose a base branch
from
knowledge
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Knowledge #1567
Changes from 12 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
75322b2
initial knowledge
joaomdmoura dc314c1
Merge branch 'main' into knowledge
bhancockio a8a2f80
WIP
bhancockio 1a35114
Adding core knowledge sources
bhancockio 6131dba
Improve types and better support for file paths
bhancockio 617ee98
added additional sources
bhancockio 4af263c
Merge branch 'main' into knowledge
bhancockio 59165cb
fix linting
bhancockio 86ede83
update yaml to include optional deps
bhancockio 7b59c5b
adding in lorenze feedback
bhancockio 98a708c
Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge
lorenzejay 10f445e
ensure embeddings are persisted
lorenzejay cb03ee6
improvements all around Knowledge class
lorenzejay cdf5233
Merge branch 'main' of github.com:crewAIInc/crewAI into knowledge
lorenzejay File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
32 changes: 32 additions & 0 deletions
32
path/to/src/crewai/knowledge/source/base_knowledge_source.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
from abc import ABC, abstractmethod | ||
from typing import List | ||
|
||
from crewai.knowledge.embedder.base_embedder import BaseEmbedder | ||
|
||
|
||
class BaseKnowledgeSource(ABC): | ||
"""Abstract base class for different types of knowledge sources.""" | ||
|
||
def __init__( | ||
self, | ||
chunk_size: int = 1000, | ||
chunk_overlap: int = 200, | ||
): | ||
self.chunk_size = chunk_size | ||
self.chunk_overlap = chunk_overlap | ||
self.chunks: List[str] = [] | ||
|
||
@abstractmethod | ||
def load_content(self): | ||
"""Load and preprocess content from the source.""" | ||
pass | ||
|
||
@abstractmethod | ||
def add(self, embedder: BaseEmbedder) -> None: | ||
"""Add content to the knowledge base, chunk it, and compute embeddings.""" | ||
pass | ||
|
||
@abstractmethod | ||
def query(self, embedder: BaseEmbedder, query: str, top_k: int = 3) -> str: | ||
"""Query the knowledge base using semantic search.""" | ||
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
from abc import ABC, abstractmethod | ||
from typing import List | ||
|
||
import numpy as np | ||
|
||
|
||
class BaseEmbedder(ABC): | ||
""" | ||
Abstract base class for text embedding models | ||
""" | ||
|
||
@abstractmethod | ||
def embed_chunks(self, chunks: List[str]) -> np.ndarray: | ||
""" | ||
Generate embeddings for a list of text chunks | ||
|
||
Args: | ||
chunks: List of text chunks to embed | ||
|
||
Returns: | ||
Array of embeddings | ||
""" | ||
pass | ||
|
||
@abstractmethod | ||
def embed_texts(self, texts: List[str]) -> np.ndarray: | ||
""" | ||
Generate embeddings for a list of texts | ||
|
||
Args: | ||
texts: List of texts to embed | ||
|
||
Returns: | ||
Array of embeddings | ||
""" | ||
pass | ||
|
||
@abstractmethod | ||
def embed_text(self, text: str) -> np.ndarray: | ||
""" | ||
Generate embedding for a single text | ||
|
||
Args: | ||
text: Text to embed | ||
|
||
Returns: | ||
Embedding array | ||
""" | ||
pass | ||
|
||
@property | ||
@abstractmethod | ||
def dimension(self) -> int: | ||
"""Get the dimension of the embeddings""" | ||
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
from pathlib import Path | ||
from typing import List, Optional, Union | ||
|
||
import numpy as np | ||
|
||
from .base_embedder import BaseEmbedder | ||
|
||
try: | ||
from fastembed_gpu import TextEmbedding # type: ignore | ||
|
||
FASTEMBED_AVAILABLE = True | ||
except ImportError: | ||
try: | ||
from fastembed import TextEmbedding | ||
|
||
FASTEMBED_AVAILABLE = True | ||
except ImportError: | ||
FASTEMBED_AVAILABLE = False | ||
|
||
|
||
class FastEmbed(BaseEmbedder): | ||
""" | ||
A wrapper class for text embedding models using FastEmbed | ||
""" | ||
|
||
def __init__( | ||
self, | ||
model_name: str = "BAAI/bge-small-en-v1.5", | ||
cache_dir: Optional[Union[str, Path]] = None, | ||
): | ||
""" | ||
Initialize the embedding model | ||
|
||
Args: | ||
model_name: Name of the model to use | ||
cache_dir: Directory to cache the model | ||
gpu: Whether to use GPU acceleration | ||
""" | ||
if not FASTEMBED_AVAILABLE: | ||
raise ImportError( | ||
"FastEmbed is not installed. Please install it with: " | ||
"uv pip install fastembed or uv pip install fastembed-gpu for GPU support" | ||
) | ||
|
||
self.model = TextEmbedding( | ||
model_name=model_name, | ||
cache_dir=str(cache_dir) if cache_dir else None, | ||
) | ||
|
||
def embed_chunks(self, chunks: List[str]) -> List[np.ndarray]: | ||
""" | ||
Generate embeddings for a list of text chunks | ||
|
||
Args: | ||
chunks: List of text chunks to embed | ||
|
||
Returns: | ||
List of embeddings | ||
""" | ||
embeddings = list(self.model.embed(chunks)) | ||
return embeddings | ||
|
||
def embed_texts(self, texts: List[str]) -> List[np.ndarray]: | ||
""" | ||
Generate embeddings for a list of texts | ||
|
||
Args: | ||
texts: List of texts to embed | ||
|
||
Returns: | ||
List of embeddings | ||
""" | ||
embeddings = list(self.model.embed(texts)) | ||
return embeddings | ||
|
||
def embed_text(self, text: str) -> np.ndarray: | ||
""" | ||
Generate embedding for a single text | ||
|
||
Args: | ||
text: Text to embed | ||
|
||
Returns: | ||
Embedding array | ||
""" | ||
return self.embed_texts([text])[0] | ||
|
||
@property | ||
def dimension(self) -> int: | ||
"""Get the dimension of the embeddings""" | ||
# Generate a test embedding to get dimensions | ||
test_embed = self.embed_text("test") | ||
return len(test_embed) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to declare this on the crew class. the task prompt will query from the relevant trickling down to the agent level, then defining here on the agent level