-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add Loaders components page #3672
Merged
Merged
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# Loaders | ||
|
||
Loaders are components used to load documents from various sources, such as databases, websites, and local files. They can be used to fetch data from external sources and convert it into a format that can be processed by other components. | ||
|
||
## Confluence | ||
|
||
The Confluence component integrates with the Confluence wiki collaboration platform to load and process documents. It utilizes the ConfluenceLoader from LangChain to fetch content from a specified Confluence space. | ||
|
||
### Parameters | ||
|
||
#### Inputs: | ||
|
||
| Name | Display Name | Info | | ||
| --- | --- | --- | | ||
| url | Site URL | The base URL of the Confluence Space (e.g., https://<company>.atlassian.net/wiki) | | ||
| username | Username | Atlassian User E-mail (e.g., email@example.com) | | ||
| api_key | API Key | Atlassian API Key (Create at: https://id.atlassian.com/manage-profile/security/api-tokens) | | ||
| space_key | Space Key | The key of the Confluence space to access | | ||
| cloud | Use Cloud? | Whether to use Confluence Cloud (default: true) | | ||
| content_format | Content Format | Specify content format (default: STORAGE) | | ||
| max_pages | Max Pages | Maximum number of pages to retrieve (default: 1000) | | ||
|
||
#### Outputs: | ||
|
||
| Name | Display Name | Info | | ||
| --- | --- | --- | | ||
| data | Data | List of Data objects containing the loaded Confluence documents | | ||
|
||
## GitLoader | ||
|
||
The GitLoader component uses the GitLoader from LangChain to fetch and load documents from a specified Git repository. | ||
|
||
### Parameters | ||
|
||
#### Inputs: | ||
|
||
| Name | Display Name | Info | | ||
| --- | --- | --- | | ||
| repo_path | Repository Path | The local path to the Git repository | | ||
| clone_url | Clone URL | The URL to clone the Git repository from (optional) | | ||
| branch | Branch | The branch to load files from (default: 'main') | | ||
| file_filter | File Filter | Patterns to filter files (e.g., '.py' to include only .py files, '!.py' to exclude .py files) | | ||
| content_filter | Content Filter | A regex pattern to filter files based on their content | | ||
|
||
#### Outputs: | ||
|
||
| Name | Display Name | Info | | ||
| --- | --- | --- | | ||
| data | Data | List of Data objects containing the loaded Git repository documents | | ||
|
||
## Unstructured | ||
|
||
This component uses the [Unstructured](https://unstructured.io/) library to load and parse PDF, DOCX, and TXT files into structured data. This component works with both the open-source library and the Unstructured API. | ||
|
||
### Parameters | ||
|
||
#### Inputs: | ||
|
||
| Name | Display Name | Info | | ||
| --- | --- | --- | | ||
| file | File | The path to the file to be parsed (supported types: pdf, docx, txt) | | ||
| api_key | API Key | Unstructured API Key (optional, if not provided, open-source library will be used) | | ||
|
||
#### Outputs: | ||
|
||
| Name | Display Name | Info | | ||
| --- | --- | --- | | ||
| data | Data | List of Data objects containing the parsed content from the input file | |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, this is correct. However, in a soon to come release (1.0.19) this should be updated to:
| api_key | API Key | Unstructured API Key. Create at: https://app.unstructured.io/ |
As the key will be required. I think its okay to leave this for the interim period though.