Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add Loaders components page #3672

Merged
merged 3 commits into from
Sep 3, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions docs/docs/Components/components-loaders.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Loaders

Loaders are components used to load documents from various sources, such as databases, websites, and local files. They can be used to fetch data from external sources and convert it into a format that can be processed by other components.

## Confluence

The Confluence component integrates with the Confluence wiki collaboration platform to load and process documents. It utilizes the ConfluenceLoader from LangChain to fetch content from a specified Confluence space.

### Parameters

#### Inputs:

| Name | Display Name | Info |
| --- | --- | --- |
| url | Site URL | The base URL of the Confluence Space (e.g., https://<company>.atlassian.net/wiki) |
| username | Username | Atlassian User E-mail (e.g., email@example.com) |
| api_key | API Key | Atlassian API Key (Create at: https://id.atlassian.com/manage-profile/security/api-tokens) |
| space_key | Space Key | The key of the Confluence space to access |
| cloud | Use Cloud? | Whether to use Confluence Cloud (default: true) |
| content_format | Content Format | Specify content format (default: STORAGE) |
| max_pages | Max Pages | Maximum number of pages to retrieve (default: 1000) |

#### Outputs:

| Name | Display Name | Info |
| --- | --- | --- |
| data | Data | List of Data objects containing the loaded Confluence documents |

## GitLoader

The GitLoader component uses the GitLoader from LangChain to fetch and load documents from a specified Git repository.

### Parameters

#### Inputs:

| Name | Display Name | Info |
| --- | --- | --- |
| repo_path | Repository Path | The local path to the Git repository |
| clone_url | Clone URL | The URL to clone the Git repository from (optional) |
| branch | Branch | The branch to load files from (default: 'main') |
| file_filter | File Filter | Patterns to filter files (e.g., '.py' to include only .py files, '!.py' to exclude .py files) |
| content_filter | Content Filter | A regex pattern to filter files based on their content |

#### Outputs:

| Name | Display Name | Info |
| --- | --- | --- |
| data | Data | List of Data objects containing the loaded Git repository documents |

## Unstructured

This component uses the [Unstructured](https://unstructured.io/) library to load and parse PDF, DOCX, and TXT files into structured data. This component works with both the open-source library and the Unstructured API.

### Parameters

#### Inputs:

| Name | Display Name | Info |
| --- | --- | --- |
| file | File | The path to the file to be parsed (supported types: pdf, docx, txt) |
| api_key | API Key | Unstructured API Key (optional, if not provided, open-source library will be used) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, this is correct. However, in a soon to come release (1.0.19) this should be updated to:

| api_key | API Key | Unstructured API Key. Create at: https://app.unstructured.io/ |

As the key will be required. I think its okay to leave this for the interim period though.


#### Outputs:

| Name | Display Name | Info |
| --- | --- | --- |
| data | Data | List of Data objects containing the parsed content from the input file |
Loading