Skip to content

components llm_ingest_dbcopilot_faiss_e2e

github-actions[bot] edited this page Jul 4, 2024 · 68 revisions

Data Ingestion for DB Data Output to FAISS E2E Deployment

llm_ingest_dbcopilot_faiss_e2e

Overview

Single job pipeline to chunk data from AzureML DB Datastore and create faiss embeddings index

Version: 0.0.57

View in Studio: https://ml.azure.com/registries/azureml/components/llm_ingest_dbcopilot_faiss_e2e/version/0.0.57

Inputs

Name Description Type Default Optional Enum
db_datastore database datastore uri in the format of 'azureml://datastores/{datastore_name}' string
sample_data Sample data to be used for data ingestion. format: 'azureml:samples-test:1' uri_folder True

path: "azureml:samples-test:1" data ingest setting

Name Description Type Default Optional Enum
embeddings_model The model used to generate embeddings. 'azure_open_ai://endpoint/{endpoint_name}/deployment/{deployment_name}/model/{model_name}' string
chat_aoai_deployment_name The name of the chat AOAI deployment string True
embedding_aoai_deployment_name The name of the embedding AOAI deployment string

grounding settings

Name Description Type Default Optional Enum
max_tables integer True
max_columns integer True
max_rows integer True
max_sampling_rows integer True
max_text_length integer True
max_knowledge_pieces integer True
selected_tables The list of tables to be ingested. If not specified, all tables will be ingested. Format: ["table1","table2","table3"] string True
column_settings string True

copilot settings

Name Description Type Default Optional Enum
tools The name of the tools for dbcopilot. Supported tools: "tsql", "python". Format: ["tsql", "python"] string True

deploy settings

Name Description Type Default Optional Enum
endpoint_name The name of the endpoint string
deployment_name The name of the deployment string blue
mir_environment The name of the mir environment. Format: azureml://registries/{registry_name}/environments/llm-dbcopilot-mir string

compute settings

Name Description Type Default Optional Enum
serverless_instance_count integer 1 True
serverless_instance_type string Standard_DS3_v2 True
embedding_connection Azure OpenAI workspace connection ARM ID for embeddings string True
llm_connection Azure OpenAI workspace connection ARM ID for llm string True
temperature number 0.0 True
top_p number 0.0 True
include_builtin_examples boolean True True
knowledge_pieces The list of knowledge pieces to be used for grounding. string True
include_views Whether to turn on views. boolean True
instruct_template The instruct template for the LLM. string True
managed_identity_enabled Whether to connect using managed identity. boolean False True

Outputs

Name Description Type
grounding_index uri_folder
db_context uri_folder
Clone this wiki locally