Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(wren-ai-service): sql pairs #961

Open
wants to merge 52 commits into
base: main
Choose a base branch
from
Open

Conversation

cyyeh
Copy link
Member

@cyyeh cyyeh commented Nov 28, 2024

  • new apis:
    • POST /sql-pairs-preparations
    • DELETE /sql-pairs-preparations
    • GET /sql-pairs-preparations/{sql_pairs_preparation_id}/status
  • remove async_timer, timer
  • rename historical_question to historical_question_retrieval
  • remove user_id, since we don't use it in apis

Summary by CodeRabbit

  • New Features

    • Introduced new pipeline stages for SQL pair handling: sql_pairs_preparation, sql_pairs_deletion, and sql_pairs_retrieval.
    • Added endpoints for managing SQL pairs preparation and deletion in the API.
  • Bug Fixes

    • Enhanced error handling and logging in several services to improve clarity and robustness.
  • Documentation

    • Updated README and configuration files to reflect new pipeline entries and modifications.
  • Chores

    • Removed deprecated settings and consolidated imports across multiple files for better organization.

@cyyeh cyyeh added module/ai-service ai-service related ci/ai-service ai-service related labels Nov 28, 2024
@cyyeh cyyeh force-pushed the feat/ai-service/sql-pairs branch from 0d5aa5d to bf2515b Compare November 29, 2024 05:35
@cyyeh cyyeh marked this pull request as ready for review November 29, 2024 05:39
@cyyeh cyyeh requested a review from paopa November 29, 2024 05:40
@cyyeh cyyeh force-pushed the feat/ai-service/sql-pairs branch from 1f952c1 to 3142348 Compare November 29, 2024 05:46
@cyyeh cyyeh force-pushed the feat/ai-service/sql-pairs branch from 3142348 to 63999d4 Compare December 3, 2024 05:12
Comment on lines 155 to 158
- name: sql_pairs_preparation
document_store: qdrant
embedder: openai_embedder.text-embedding-3-large
llm: litellm_llm.gpt-4o-mini
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about rename it to sql_pairs_indexing? i think it will be more suitable style as other

Comment on lines 103 to 113
- document_store: qdrant
embedder: openai_embedder.text-embedding-3-large
llm: litellm_llm.gpt-4o-mini
name: sql_pairs_preparation
- document_store: qdrant
embedder: openai_embedder.text-embedding-3-large
name: sql_pairs_deletion
- document_store: qdrant
embedder: openai_embedder.text-embedding-3-large
name: sql_pairs_retrieval
llm: litellm_llm.gpt-4o-mini
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you sort the pipe name on top? it will be better for us

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

def __init__(self, sql_pairs_store: DocumentStore) -> None:
self._sql_pairs_store = sql_pairs_store

@component.output_types(documents=List[Document])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the output type. will it return a list of doc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 46 to 61
def visualize(self, sql_pair_ids: List[str], id: Optional[str] = None) -> None:
destination = "outputs/pipelines/indexing"
if not Path(destination).exists():
Path(destination).mkdir(parents=True, exist_ok=True)

self._pipe.visualize_execution(
["delete_sql_pairs"],
output_file_path=f"{destination}/sql_pairs_deletion.dot",
inputs={
"sql_pair_ids": sql_pair_ids,
"id": id or "",
**self._components,
},
show_legend=True,
orient="LR",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gently remind, remove this method, tks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


@observe(name="SQL Pairs Deletion")
async def run(
self, sql_pair_ids: List[str], id: Optional[str] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering about using project_id instead of id for the naming more purposeful. wdyt?

"""


@router.post("/sql-pairs-preparations")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about removing -preparations? i think it is a bit redundant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return DeleteSqlPairsResponse(sql_pairs_preparation_id=id)


@router.get("/sql-pairs-preparations/{sql_pairs_preparation_id}/status")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we could remove /status to make it more concise

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you remove this test file? it seems to be duplicated with test_historical_question.

Comment on lines 3 to 4
from src.pipelines.generation import intent_classification, sql_generation, sql_summary
from src.pipelines.retrieval import historical_question, retrieval
from src.pipelines.retrieval import historical_question_retrieval, retrieval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we could directly access from generation and retrieval module. could you also modify the import?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -14,7 +14,7 @@
sql_correction,
sql_generation,
)
from src.pipelines.retrieval import historical_question, retrieval
from src.pipelines.retrieval import historical_question_retrieval, retrieval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the previous comment, thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link

coderabbitai bot commented Dec 26, 2024

Walkthrough

This pull request introduces significant enhancements to the Wren AI service, focusing on SQL pairs management, pipeline restructuring, and removal of timer-related functionality. The changes include new pipelines for SQL pairs preparation, deletion, and retrieval, a consolidation of module imports, and the removal of the enable_timer setting across multiple components. The modifications streamline the codebase, improve modularity, and introduce more flexible SQL pair handling capabilities.

Changes

File/Group Change Summary
Configuration Files - Added new pipeline entries for SQL pairs preparation, deletion, and retrieval
- Removed enable_timer setting
- Renamed some pipeline components
Pipeline Components - Introduced new classes for SQL pairs management
- Removed async_timer and timer decorators
- Restructured imports from generation and retrieval modules
Web Services - Removed user_id from various request models
- Updated service class names for consistency
- Removed timing decorators
Testing - Updated test fixtures and mocks
- Added tests for SQL pairs preparation and deletion

Poem

🐰 Hop, hop, through code so bright,
SQL pairs dance with pure delight!
Decorators shed, timers cease,
Pipelines flow with newfound peace.
A rabbit's refactor, clean and neat! 🌟


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1afc694 and 7d96661.

📒 Files selected for processing (1)
  • wren-ai-service/src/pipelines/generation/intent_classification.py (0 hunks)
💤 Files with no reviewable changes (1)
  • wren-ai-service/src/pipelines/generation/intent_classification.py

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (41)
wren-ai-service/src/web/v1/services/chart_adjustment.py (1)

9-9: Confirmed import replacement from async_timer to trace_metadata.

Previously, the async_timer import was used for timing. Now you’ve replaced it with trace_metadata, which focuses on tracing rather than capturing performance metrics. This is okay if timing is no longer needed. If performance measurement or timing is still required, consider adding an alternative.

wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (6)

1-16: Ensure consistent logging strategy for debugging and production.
While the imports and logging setup appear fine, consider whether logger.setLevel(...) or additional contextual logs are needed to help debug issues in production, especially given the asynchronous nature of this pipeline.


18-34: Enhance documentation for OutputFormatter.
This class is straightforward, but consider adding a docstring or clarifying comment explaining how the output structure is intended to be consumed by downstream processes.


61-82: Ensure filter conditions align with Qdrant indexing.
If id is empty or None, the filters object is null. Confirm that an unfiltered retrieval is appropriate. This could cause performance/memory issues when the index is large.


84-90: Parameterize score thresholds in filtered_documents.
The filter uses score=0.7 as a hard-coded threshold. Consider making it configurable, possibly from environment variables or pipeline settings, to allow more flexible tuning.


105-127: Add docstring for the SqlPairsRetrieval class.
This class orchestrates the retrieval pipeline. A brief docstring explaining the main usage, inputs, outputs, and expected environment would help future maintainers.


141-148: Remove or secure the main block in production.
This main block is useful for local testing but may not be intended for production. Consider guarding it with an if DEBUG: check or removing it in deployment contexts.

wren-ai-service/src/web/v1/services/sql_pairs_preparation.py (6)

1-12: Consistently handle exceptions in service modules.
While logging is present, consider a centralized error-handling mechanism or standardized exception classes for consistent error responses across all service modules.


34-37: Validate the necessity of a dedicated response model.
SqlPairsPreparationResponse only has a single field. Consider if it could be combined with the request or a more general wrap.


39-55: Service layering for deletion.
DeleteSqlPairsRequest and DeleteSqlPairsResponse mirror the prepare request/response. Validate usage to avoid duplication. If differences are minimal, a shared model might be more maintainable.


62-69: Assess error details in SqlPairsPreparationStatusResponse.
SqlPairsPreparationError uses a very general code OTHERS. If more error categories exist or are needed, consider enumerating them for clearer telemetry and troubleshooting.


71-81: Ensure TTLCache suits concurrency and memory constraints.
The cache is set to a potentially large maxsize=1_000_000. Confirm that this usage pattern doesn’t cause memory issues under high load.


169-188: Distinct logging for “id not found” in get_prepare_sql_pairs_status.
Currently logs as an exception. Consider logging at a warning or info level if it’s expected that some requests might query missing IDs.

wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (6)

1-21: Add docstrings or references for domain usage.
The imports and usage are not fully explained in the code. For maintainability, clarify in docstrings how these providers and classes fit into the broader SQL pairs preparation architecture.


80-87: Document usage of prompts function.
This function transforms raw SQL pairs into prompt dicts. A short docstring explaining the rationale for the prompt_builder would help future maintainers.


89-99: Parallel tasks with gather: handle partial failures.
If one generation task fails, asyncio.gather() raises, discarding all results. Consider using return_exceptions=True or a fallback mechanism if partial progress is acceptable.


126-134: Check embedding context.
If the documents are large, consider chunking or partial embedding strategies. Large documents can cause memory or performance bottlenecks.


157-160: Expand on field usage in SqlIntentionGenerationResult.
Currently, only intention is stored. If more fields are needed later (e.g., potential error code, data lineage), consider extending this model now for future-proofing.


173-217: Centralize pipeline component initialization.
SqlPairsPreparation includes numerous components in _components. If more logic is added, consider splitting into separate factory methods or classes to maintain clarity.

wren-ai-service/src/globals.py (13)

9-9: Optimize import style.
While wildcard imports can be convenient, consider using explicit imports for clarity and to avoid naming conflicts, especially in large codebases.


20-21: Consistent naming for new services.
SqlExplanationService and SqlPairsPreparationService follow a different naming style from older classes. Confirm naming guidelines to keep the codebase consistent.


39-41: Documentation for newly introduced class members.
sql_explanation_service, sql_regeneration_service, and sql_pairs_preparation_service are newly added. Add docstrings describing their purpose.


61-61: Track semantic version changes for generation pipeline.
Modules like generation.SemanticsDescription are newly integrated. If new features are introduced, reflect that in service versioning or changelogs.


84-87: Standardize specialized pipeline references.
intent_classification and data_assistance are grouped here. Confirm these belong in generation pipelines. Possibly rename them to reflect domain usage.


100-102: Assess performance implications of sql_pairs_retrieval.
It’s newly introduced, and retrieving large volumes of pairs can be memory-intensive. Evaluate the need for pagination or partial retrieval.


119-122: Guard or configure chart generation.
Chart generation can be resource-hungry. If runtime constraints exist, consider gating or caching repeated chart generation requests.


130-133: Enforce consistent naming between chart_generation and chart_adjustment.
One pipeline is chart_generation, the other is chart_adjustment. Validate that this difference is intentional and documented.


152-155: Docstring improvement for ask_details_service.
Explain how sql_breakdown and sql_summary pipelines collectively handle the request.


174-180: Consolidate naming for sql_summary.
sql_summary is used in multiple services. If they differ in usage or logic, clarify via better naming or shared references.


182-188: Potential to unify explanation logic.
sql_explanation_service might overlap partially with sql_summary. If so, consider unifying or layering them to reduce duplication.


190-198: Document advanced usage for sql_regeneration_service.
Regeneration can be complex. Provide docstrings or references regarding the logic behind regenerating queries.


221-231: Integration tests for SQL pairs prep.
Before finalizing, ensure integration tests are available that call sql_pairs_preparation_service end to end, verifying typical and edge use cases (e.g., empty requests, large-scale requests).

wren-ai-service/src/pipelines/retrieval/__init__.py (1)

1-5: Import clarity for new retrieval classes.
You have introduced five new imports here. Ensure each class is truly needed by external modules and that no cyclical dependencies occur.

wren-ai-service/src/pipelines/indexing/sql_pairs_deletion.py (1)

59-67: Dry run invocation
Providing a main block for quick debugging or demonstration is helpful. Any advanced dry-run logic (e.g., fetching test data) could be added if needed.

wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (1)

89-89: [Parameterization suggestion] The threshold value 0.9 might be preferable as a configurable setting.
Hardcoding score=0.9 may hamper flexibility if different retrieval contexts demand varying thresholds. Consider passing this as a parameter or configuring it externally for better adaptability.

wren-ai-service/src/pipelines/generation/sql_generation.py (1)

58-62: Include sanitization or validation for sample data.

These lines introduce a template section for SQL samples. Consider validating or sanitizing the sample.summary and sample.sql fields to avoid potential injection or formatting issues in downstream usage.

wren-ai-service/src/web/v1/services/ask.py (1)

270-276: New sql_pairs_retrieval pipeline usage.
The introduction of sql_pairs_retrieval is coherent with the PR’s objective to manage SQL pairs. Ensure you handle empty results or errors consistently in the pipeline.

docker/config.example.yaml (1)

113-113: Fix trailing whitespace.

Remove the trailing space after openai_embedder.text-embedding-3-large.

-    embedder: openai_embedder.text-embedding-3-large 
+    embedder: openai_embedder.text-embedding-3-large
🧰 Tools
🪛 yamllint (1.35.1)

[error] 113-113: trailing spaces

(trailing-spaces)

wren-ai-service/src/pipelines/indexing/__init__.py (1)

90-110: Consider handling the return value from delete_documents.

The implementation looks good, but the return value from delete_documents might contain useful information about the number of documents deleted. Consider either:

  1. Removing the explicit return statement since it's not needed, or
  2. Returning the result to the caller for potential logging or verification.
-        return await self._sql_pairs_store.delete_documents(filters)
+        result = await self._sql_pairs_store.delete_documents(filters)
+        logger.info(f"Deleted {result} SQL pair documents")
+        return result
wren-ai-service/tools/config/config.example.yaml (1)

131-131: Fix trailing whitespace.

Remove the trailing space after openai_embedder.text-embedding-3-large.

-    embedder: openai_embedder.text-embedding-3-large 
+    embedder: openai_embedder.text-embedding-3-large
🧰 Tools
🪛 yamllint (1.35.1)

[error] 131-131: trailing spaces

(trailing-spaces)

wren-ai-service/tools/config/config.full.yaml (1)

150-150: Remove trailing whitespace

There is a trailing space after openai_embedder.text-embedding-3-large.

-    embedder: openai_embedder.text-embedding-3-large 
+    embedder: openai_embedder.text-embedding-3-large
🧰 Tools
🪛 yamllint (1.35.1)

[error] 150-150: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd66a8f and 1afc694.

📒 Files selected for processing (65)
  • deployment/kustomizations/base/cm.yaml (2 hunks)
  • docker/config.example.yaml (1 hunks)
  • wren-ai-service/README.md (0 hunks)
  • wren-ai-service/docs/configuration.md (0 hunks)
  • wren-ai-service/eval/evaluation.py (1 hunks)
  • wren-ai-service/eval/pipelines.py (2 hunks)
  • wren-ai-service/src/config.py (0 hunks)
  • wren-ai-service/src/globals.py (8 hunks)
  • wren-ai-service/src/pipelines/generation/__init__.py (1 hunks)
  • wren-ai-service/src/pipelines/generation/chart_adjustment.py (1 hunks)
  • wren-ai-service/src/pipelines/generation/chart_generation.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/data_assistance.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/followup_sql_generation.py (6 hunks)
  • wren-ai-service/src/pipelines/generation/intent_classification.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_answer.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_breakdown.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_correction.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_expansion.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_explanation.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_generation.py (4 hunks)
  • wren-ai-service/src/pipelines/generation/sql_regeneration.py (0 hunks)
  • wren-ai-service/src/pipelines/generation/sql_summary.py (0 hunks)
  • wren-ai-service/src/pipelines/indexing/__init__.py (1 hunks)
  • wren-ai-service/src/pipelines/indexing/sql_pairs_deletion.py (1 hunks)
  • wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/__init__.py (1 hunks)
  • wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (5 hunks)
  • wren-ai-service/src/pipelines/retrieval/preprocess_sql_data.py (0 hunks)
  • wren-ai-service/src/pipelines/retrieval/retrieval.py (0 hunks)
  • wren-ai-service/src/pipelines/retrieval/sql_executor.py (0 hunks)
  • wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (1 hunks)
  • wren-ai-service/src/utils.py (0 hunks)
  • wren-ai-service/src/web/development.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/__init__.py (2 hunks)
  • wren-ai-service/src/web/v1/routers/ask.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/ask_details.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/semantics_preparation.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/sql_answers.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/sql_expansions.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/sql_explanations.py (0 hunks)
  • wren-ai-service/src/web/v1/routers/sql_pairs_preparation.py (1 hunks)
  • wren-ai-service/src/web/v1/routers/sql_regenerations.py (0 hunks)
  • wren-ai-service/src/web/v1/services/ask.py (5 hunks)
  • wren-ai-service/src/web/v1/services/ask_details.py (1 hunks)
  • wren-ai-service/src/web/v1/services/chart.py (1 hunks)
  • wren-ai-service/src/web/v1/services/chart_adjustment.py (1 hunks)
  • wren-ai-service/src/web/v1/services/semantics_preparation.py (0 hunks)
  • wren-ai-service/src/web/v1/services/sql_answer.py (1 hunks)
  • wren-ai-service/src/web/v1/services/sql_expansion.py (1 hunks)
  • wren-ai-service/src/web/v1/services/sql_explanation.py (1 hunks)
  • wren-ai-service/src/web/v1/services/sql_pairs_preparation.py (1 hunks)
  • wren-ai-service/src/web/v1/services/sql_regeneration.py (1 hunks)
  • wren-ai-service/tests/data/config.test.yaml (1 hunks)
  • wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_deletion.py (1 hunks)
  • wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_preparation.py (1 hunks)
  • wren-ai-service/tests/pytest/services/mocks.py (4 hunks)
  • wren-ai-service/tests/pytest/services/test_ask.py (3 hunks)
  • wren-ai-service/tests/pytest/services/test_ask_details.py (2 hunks)
  • wren-ai-service/tests/pytest/services/test_sql_pairs_preparation.py (1 hunks)
  • wren-ai-service/tests/pytest/test_config.py (0 hunks)
  • wren-ai-service/tests/pytest/test_main.py (1 hunks)
  • wren-ai-service/tests/pytest/test_utils.py (1 hunks)
  • wren-ai-service/tools/config/config.example.yaml (1 hunks)
  • wren-ai-service/tools/config/config.full.yaml (1 hunks)
  • wren-ui/README.md (0 hunks)
💤 Files with no reviewable changes (28)
  • wren-ai-service/src/config.py
  • wren-ui/README.md
  • wren-ai-service/tests/pytest/test_config.py
  • wren-ai-service/src/web/v1/routers/sql_explanations.py
  • wren-ai-service/src/web/v1/routers/ask_details.py
  • wren-ai-service/src/web/v1/services/semantics_preparation.py
  • wren-ai-service/README.md
  • wren-ai-service/docs/configuration.md
  • wren-ai-service/src/web/v1/routers/sql_answers.py
  • wren-ai-service/src/web/development.py
  • wren-ai-service/src/pipelines/retrieval/preprocess_sql_data.py
  • wren-ai-service/src/web/v1/routers/sql_regenerations.py
  • wren-ai-service/src/pipelines/generation/data_assistance.py
  • wren-ai-service/src/pipelines/retrieval/sql_executor.py
  • wren-ai-service/src/pipelines/generation/sql_breakdown.py
  • wren-ai-service/src/pipelines/generation/sql_answer.py
  • wren-ai-service/src/web/v1/routers/sql_expansions.py
  • wren-ai-service/src/utils.py
  • wren-ai-service/src/web/v1/routers/ask.py
  • wren-ai-service/src/pipelines/generation/sql_expansion.py
  • wren-ai-service/src/web/v1/routers/semantics_preparation.py
  • wren-ai-service/src/pipelines/generation/sql_correction.py
  • wren-ai-service/src/pipelines/generation/intent_classification.py
  • wren-ai-service/src/pipelines/generation/sql_summary.py
  • wren-ai-service/src/pipelines/generation/sql_explanation.py
  • wren-ai-service/src/pipelines/generation/sql_regeneration.py
  • wren-ai-service/src/pipelines/generation/chart_generation.py
  • wren-ai-service/src/pipelines/retrieval/retrieval.py
🧰 Additional context used
🪛 yamllint (1.35.1)
deployment/kustomizations/base/cm.yaml

[error] 132-132: syntax error: could not find expected ':'

(syntax)

wren-ai-service/tools/config/config.full.yaml

[error] 150-150: trailing spaces

(trailing-spaces)

docker/config.example.yaml

[error] 113-113: trailing spaces

(trailing-spaces)

wren-ai-service/tools/config/config.example.yaml

[error] 131-131: trailing spaces

(trailing-spaces)

🔇 Additional comments (90)
wren-ai-service/src/pipelines/generation/followup_sql_generation.py (6)

3-3: Import statement looks good.

The newly imported Dict is now utilized for the sql_samples parameter. This aligns well with the code's usage of typed dictionaries in the subsequent changes.


107-116: Check for missing or invalid data in template rendering.

You’re conditionally injecting {{sample.summary}} and {{sample.sql}} within the prompt. It might be worth adding validation or at least a fallback if summary or sql fields are missing or None. This ensures the rendered prompt doesn't break if partial data is passed.


133-133: Optional parameter addition is consistent with Python 3.10+.

Declaring sql_samples: List[Dict] | None = None requires Python 3.10 or above (for the union operator “|”). Ensure the runtime environment is updated accordingly if it isn’t already.


145-145: Parameter forwarding to prompt_builder.run() is valid.

Forwarding sql_samples here is consistent with the prompt function’s updated signature and ensures that sql_samples are utilized in the Jinja template.


220-220: Extended method signature for run().

Adding the sql_samples parameter broadens the function’s scope to incorporate SQL examples. Verify that all upstream invocations provide or handle this parameter properly.


232-232: Ensure consistent usage of sql_samples in downstream pipelines.

The dictionary key sql_samples is correctly added. Verify that any follow-up or third-party pipeline steps expecting additional data still function as intended.

wren-ai-service/src/web/v1/services/chart_adjustment.py (1)

Line range hint 42-65: Double-check for potential gaps left by removing the @async_timer decorator.

The removal of the @async_timer decorator might impact any monitoring, performance analysis, or logging processes that previously relied on it. If there's still a need for execution time tracking, ensure that you have another suitable mechanism in place or confirm that timing is no longer a project requirement.

wren-ai-service/src/pipelines/retrieval/sql_pairs_retrieval.py (4)

36-51: Validate filters logic in count_documents.
When id is None, filters are not applied. Ensure that this is intended and that it won’t return documents from unintended projects. If project isolation is crucial, you may want to handle the None case differently.


53-59: Check for empty embeddings usage.
If count_documents is zero, the function returns {}. Validate that an empty embedding dict does not lead to unexpected behavior in later steps, e.g., silent failures or bypassed logic.


92-100: Handle potential empty outputs in formatted_output.
When filtered_documents is empty or doesn’t contain the key "documents", the function returns an empty list by default. Verify the calling code properly handles the scenario of zero documents.


129-139: Confirm resilience in the run method.
The pipeline depends on external components (_components). If any are improperly initialized or fail during execution, confirm that the pipeline either retries or raises a clear exception.

wren-ai-service/src/web/v1/services/sql_pairs_preparation.py (3)

20-34: Ensure each SqlPairsPreparationRequest has a unique query_id.
The _query_id is optional but is used in status tracking. An accidental reuse of query_id may overwrite a previous request’s status.


57-60: Consider input validation for sql_pairs_preparation_id.
No validation is performed on sql_pairs_preparation_id. Add checks for format or allowed characters, especially if used as a key in caches or logs.


83-125: Add concurrency or partial-completion checks in prepare_sql_pairs.
The method updates the cache status on success/failure. If multiple calls with the same query_id occur concurrently, the status might fluctuate unexpectedly. Consider locking or unique IDs.

wren-ai-service/src/pipelines/indexing/sql_pairs_preparation.py (5)

51-78: Consider adding validation or a fallback for incomplete fields in the run method (SqlPairsDescriptionConverter).
If sql_pair data is missing keys, .get() returns None. Confirm that id or sql is always present to avoid empty documents or indexing errors.


117-124: Input validation for convert_sql_pairs_to_documents.
Similar to the SqlPairsDescriptionConverter, ensure fields are present and typed correctly.


136-147: Confirm the rationale for calling delete_sql_pairs within the same pipeline.
delete_sql_pairs references sql_pairs_cleaner. If these pairs are also needed for embedding or further processing, ensure the cleaning logic is indeed desired here.


162-171: Monitor JSON schema validity.
Ensure the defined JSON schema in SQL_INTENTION_GENERATION_MODEL_KWARGS remains in sync if additional fields are added to SqlIntentionGenerationResult.


219-226: Validate the final pipeline run for empty input.
If sql_pairs=[], the pipeline might skip certain steps. Confirm that this path is tested and returns a meaningful response to the caller.

wren-ai-service/src/globals.py (5)

96-99: Check historical_question_retrieval usage.
Renaming historical_question to historical_question_retrieval might require changes in tests or data references. Verify consistent usage across the codebase.


105-108: Check completeness of sql_generation and sql_correction logic.
These pipelines can be interdependent. If one fails, confirm the fallback or error route is properly handled.


141-144: Potential concurrency checks in preprocess_sql_data.
If preprocess_sql_data fetches or transforms large data sets, confirm concurrency usage is safe.


168-171: Remove or confirm necessity of sql_correction in expansion.
sql_expansion sometimes duplicates logic found in sql_correction. Confirm there’s no overlap or repeated overhead.


Line range hint 206-215: Verify that retrieval pipelines are correctly reused.
question_recommendation reuses database retrieval logic. Confirm that the pipeline handles different search contexts without collisions.

wren-ai-service/src/pipelines/retrieval/__init__.py (1)

7-13: Use the all list consistently.
Explicitly listing public exports in __all__ is a good practice. Periodically check if new classes need to be added or removed as the module evolves.

wren-ai-service/src/web/v1/routers/__init__.py (4)

11-11: Use of singular naming improves clarity
Renaming semantics_preparations to semantics_preparation provides a clearer singular concept. This is consistent with best practices for naming modules and routes.


15-15: New router inclusion for SQL Pairs Preparation
Great addition of sql_pairs_preparation. This aligns with the PR objective of expanding SQL pairs functionality.


25-25: Confirm references to updated router name
Ensure references to the old module or plural naming (semantics_preparations) are removed throughout the codebase to avoid confusion or import errors.


32-32: SQL pairs preparation route
This newly added route effectively wires up the SQL Pairs Preparation service. Looks good for handling the new functionality.

wren-ai-service/src/pipelines/generation/__init__.py (2)

1-16: Comprehensive imports
All newly imported classes are succinctly grouped. This enhances discoverability and modular usage in downstream code. Ensure these imports do not introduce circular dependencies.


18-35: Explicit public API
Defining __all__ clarifies the intended exports and prevents accidental usage of internal classes. Good practice!

wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_deletion.py (4)

10-11: Asynchronous test function
Using @pytest.mark.asyncio is correct for testing async code. This allows for better concurrency handling while verifying pipeline behaviors.


21-24: Multiple SQL pairs insertion
The SqlPair instantiation ensures each pair has a unique identifier. This tests that deletion operations can handle multiple entries, which is a solid coverage scenario.


37-38: Validation after first deletion
The assertion that the document count is still 2 provides a partial deletion scenario verification. This is good for ensuring partial or selective deletions operate as expected.


39-42: Full deletion scenario
Asserting 0 documents remain ensures complete deletion functionality is verified. This thoroughly validates the end-to-end flow of the pipeline.

wren-ai-service/src/pipelines/indexing/sql_pairs_deletion.py (3)

16-23: Leverage SqlPairsCleaner effectively
The delete_sql_pairs function simplifies the pipeline approach by delegating deletion to sql_pairs_cleaner. This encapsulation is clean and maintainable.


29-40: Pipeline constructor is coherent
Passing in a DocumentStoreProvider and building a dictionary of components is a structured approach. Ensure that the dataset_name="sql_pairs" remains consistent with any future expansions (e.g. multi-dataset support).


45-58: Final pipeline execution
Using self._pipe.execute within run is flexible, allowing additional steps to be added if the pipeline grows. The observe decorator aligns well with logging and monitoring.

wren-ai-service/tests/pytest/services/mocks.py (5)

3-3: Consolidated imports look good.

The consolidated import from src.pipelines helps keep the imports cleaner. No issues here.


16-16: Ensure consistent naming usage.

HistoricalQuestionMock now inherits from HistoricalQuestionRetrieval. Verify all references to HistoricalQuestionMock and historical_question.HistoricalQuestion across the codebase are updated or removed as needed.


24-24: Inheritance updated to IntentClassification.

Confirm that the prior import references to intent_classification.IntentClassification have been removed or updated.


34-34: Refactored to use generation.SQLGeneration.

Check all references to sql_generation.SQLGeneration to ensure they are replaced with the new generation.SQLGeneration.


55-55: No issues with the renamed SQLSummary.

Inheriting from generation.SQLSummary is consistent with the new import patterns.

wren-ai-service/tests/pytest/test_utils.py (1)

77-77: user_id parameter is set to None.

This ensures consistency with the broader removal of user_id from various request classes. Verify that any logic relying on user_id has been updated accordingly.

wren-ai-service/tests/pytest/pipelines/indexing/test_sql_pairs_preparation.py (7)

1-6: Imports are modular and clear.

Imports for settings, DocumentStoreProvider, and SqlPairsPreparation are well arranged and explicit.


9-19: Test function naming and structure are descriptive.

The test name test_sql_pairs_preparation_saving_to_document_store accurately conveys the test’s purpose. Setup steps (e.g., store creation) are straightforward.


20-29: SqlPairsPreparation instantiation is well-focused.

Parameters are clearly passed in. Validate that the pipeline components defined in settings.components["sql_pairs_preparation"] exist and are correct.


31-37: Document validations look correct.

All essential assertions (content, meta, sql_pair_id, sql) are present. This ensures robust coverage.


40-50: Second test function effectively covers multiple IDs scenario.

This test replicates the pipeline run with different project IDs, ensuring coverage for varied contexts.


51-69: Repeated run logic is consistent.

Running the pipeline multiple times is a valid approach to confirm isolation by id.


70-79: Filter-based validation is sound.

Filtering documents by project_id checks data partitioning logic and ensures correct categorization per ID.

wren-ai-service/tests/pytest/services/test_ask_details.py (2)

6-6: Broad import of generation is consistent with ongoing refactoring.

No concerns here; it aligns with other updated references to the generation module.


20-23: Creation of SQLBreakdown and SQLSummary is aligned with the new prefix.

Ensure references to the older module or class names (sql_breakdown / sql_summary) are removed or replaced throughout.

wren-ai-service/src/pipelines/retrieval/historical_question_retrieval.py (4)

13-13: [Import usage confirmation] ScoreFilter appears to be correctly imported and used.
It’s good to see ScoreFilter continue to be used here. Ensure no references to any removed or renamed classes remain.


107-107: [Rename alignment] The new class name aligns well with its retrieval purpose.
Renaming HistoricalQuestionHistoricalQuestionRetrieval clarifies the pipeline’s intent. Confirm that all references to the old name have been removed throughout the codebase.


132-132: [Log statement consistency] The logger statement is appropriate to the renamed class.
Good job updating the logger message to match the new HistoricalQuestionRetrieval pipeline name.


147-147: [Consistency check] The pipeline’s dry run reference is properly updated to the new class name.
This usage in dry_run_pipeline looks correct.

wren-ai-service/src/web/v1/services/sql_explanation.py (1)

55-55: [Naming practice] Changing the class name from SQLExplanationServiceSqlExplanationService improves consistency.
Ensure references to the previous name are fully removed, and confirm adherence to project naming conventions.

wren-ai-service/src/web/v1/services/sql_answer.py (1)

10-10: [Trace metadata addition] The trace_metadata decorator could provide valuable debugging insight.
No apparent issues with the addition. Confirm that the relevant logs or tracing dashboards can consume this metadata effectively.

wren-ai-service/src/web/v1/services/sql_regeneration.py (1)

78-78: [Service rename] Transitioning from SQLRegenerationService to SqlRegenerationService is consistent with the style used elsewhere.
Verify that all imports or references reflect this updated name.

wren-ai-service/src/pipelines/generation/sql_generation.py (4)

85-85: Parameter rename looks consistent.

Changing samples to sql_samples clarifies usage. Ensure any external references to this parameter have also been updated to avoid runtime errors.


93-93: Good usage of the newly introduced parameter.

Forwarding sql_samples to the prompt builder is correctly aligned with the updated parameter name.


168-168: Addition of the sql_samples parameter aligns with the prompt signature.

The run() method signature now correctly matches the updated function parameters, improving readability.


178-178: Properly passing sql_samples.

Adding sql_samples to the inputs ensures consistent data flow through the pipeline.

wren-ai-service/src/web/v1/routers/sql_pairs_preparation.py (3)

68-88: POST /sql-pairs endpoint creation looks solid.

The endpoint’s asynchronous flow using BackgroundTasks is a neat solution for offloading long-running tasks. Make sure to handle potential exceptions within the background task to keep the status consistent.


91-111: DELETE /sql-pairs endpoint is well-structured.

The logic for generating a unique query ID and tracking deletion status is clearly implemented. Confirm that any exceptions in the deletion process also appropriately update the status.


114-123: GET /sql-pairs/{sql_pairs_preparation_id} endpoint.

Retrieving the current status based on the unique ID completes a full asynchronous workflow. Looks good for monitoring and handling user queries about the process.

wren-ai-service/src/web/v1/services/chart.py (1)

9-9: New import for tracing metadata.

The introduction of trace_metadata aligns with the simplified removal of @async_timer. Ensuring each request is traceable without extra timers makes the code cleaner.

wren-ai-service/tests/pytest/services/test_sql_pairs_preparation.py (3)

18-32: Fixture for sql_pairs_preparation_service.

The fixture sets up the environment and document store for SQL pairs. This is a clean approach to ensure each test starts with the correct, isolated context.


49-93: Test for SQL pairs preparation.

The loop that checks status until "finished" or "failed" ensures asynchronous correctness. Great job verifying the final state of the document store after preparation.


95-164: Test for SQL pairs deletion.

Reusing the same query_id for deletion is a nice approach that demonstrates a real-world flow (prepare → delete). Checking the empty document store confirms the logic is complete.

wren-ai-service/src/web/v1/services/ask_details.py (2)

10-10: Ensure import consistency and usage clarity.

The added import of trace_metadata appears consistent with the usage in the ask_details method. No issues found.


Line range hint 25-36: Confirm removal of user_id from AskDetailsRequest is fully handled.

We see that user_id is no longer part of the class. Please verify that all references to user_id have been removed or refactored accordingly elsewhere, so that downstream functionality is not impacted.

wren-ai-service/eval/evaluation.py (1)

121-122: Validate fallback path when keys are absent.

Switching to meta.get("session_id") and meta.get("user_id") prevents potential KeyError. Ensure that if these values are absent and become None, downstream logic can safely handle it or provide a suitable default.

wren-ai-service/tests/pytest/services/test_ask.py (2)

37-52: Refactored pipeline references look good.

Using generation and retrieval modules consistently improves readability and aligns naming across components. Good job!


163-163: Renamed mock aligns with real service naming.

Updating historical_question to historical_question_retrieval maintains consistency. Looks good!

wren-ai-service/tests/pytest/test_main.py (1)

26-26: Test name correction is fine—ensure references are updated if needed.

Renaming from test_semantics_preparations to test_semantics_preparation clarifies the intent. Verify that other references, if any, have been updated consistently.

wren-ai-service/src/web/v1/services/sql_expansion.py (1)

9-9: Removal of async_timer looks consistent.
No issues spotted with removing the @async_timer. The remaining @observe and @trace_metadata decorators ensure the method still has observability and trace functionality. This change also aligns with other timer removals throughout the PR.

wren-ai-service/eval/pipelines.py (2)

145-146: Graceful fallback on missing session and user data.
Switching to .get("session_id") and .get("user_id") mitigates potential KeyError exceptions when these fields are missing. This improves resilience.


162-163: Consistent session and user ID handling.
Similar improvement here—replacing direct dictionary key access with .get(...) ensures safer metadata retrieval.

wren-ai-service/src/web/v1/services/ask.py (4)

10-10: Trace metadata import usage is correct.
No further action needed; consistent with how other services incorporate tracing.


245-247: Renamed pipeline key from historical_question to historical_question_retrieval.
This aligns with the updated naming convention in configuration files. No functional issues apparent.


286-286: sql_samples parameter for followup_sql_generation.
Accepting sql_samples adds flexibility, but confirm that both the retrieval and generation pipeline handle the relevant data shape and potential empty arrays.


297-297: sql_samples parameter for sql_generation.
The same suggestion applies here—validate or test thoroughly to ensure the new parameter is consistently consumed and does not introduce unexpected behaviors.

wren-ai-service/tests/data/config.test.yaml (1)

45-45: Consistent pipeline rename in config.
Renaming historical_question to historical_question_retrieval keeps the configuration aligned with code changes. This helps avoid runtime mismatches.

docker/config.example.yaml (1)

107-117: LGTM: SQL pairs pipeline configuration.

The new SQL pairs pipelines are properly configured with the required components (document store, embedder, and LLM where needed).

🧰 Tools
🪛 yamllint (1.35.1)

[error] 113-113: trailing spaces

(trailing-spaces)

wren-ai-service/src/pipelines/indexing/__init__.py (1)

119-125: LGTM: Export list updated correctly.

The __all__ list is properly updated to include the new SQL pairs components.

wren-ai-service/tools/config/config.example.yaml (1)

125-135: LGTM: SQL pairs pipeline configuration.

The new SQL pairs pipelines are properly configured with the required components (document store, embedder, and LLM where needed).

🧰 Tools
🪛 yamllint (1.35.1)

[error] 131-131: trailing spaces

(trailing-spaces)

wren-ai-service/tools/config/config.full.yaml (1)

144-147: Consider renaming for consistency with existing patterns

The pipeline name sql_pairs_preparation could be renamed to sql_pairs_indexing to maintain consistency with other indexing operations like db_schema_indexing and historical_question_indexing.

wren-ai-service/src/pipelines/generation/chart_adjustment.py (1)

175-175: LGTM: Clean simplification of post-processor invocation

The change to use direct class invocation instead of the run method simplifies the code while maintaining the same functionality.

Comment on lines +14 to +18
# POST /v1/sql-pairs
class SqlPair(BaseModel):
sql: str
id: str

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add validation or docstrings for SqlPair.
Although it’s straightforward, clarifying constraints on the sql field (like maximum length or allowed characters) can prevent upstream injection issues.

Comment on lines +126 to +168
@observe(name="Delete SQL Pairs")
@trace_metadata
async def delete_sql_pairs(
self,
delete_sql_pairs_request: DeleteSqlPairsRequest,
**kwargs,
):
results = {
"metadata": {
"error_type": "",
"error_message": "",
},
}

try:
await self._pipelines["sql_pairs_deletion"].run(
sql_pair_ids=delete_sql_pairs_request.ids,
id=delete_sql_pairs_request.project_id,
)

self._prepare_sql_pairs_statuses[
delete_sql_pairs_request.query_id
] = SqlPairsPreparationStatusResponse(
status="finished",
)
except Exception as e:
logger.exception(f"Failed to delete SQL pairs: {e}")

self._prepare_sql_pairs_statuses[
delete_sql_pairs_request.query_id
] = SqlPairsPreparationStatusResponse(
status="failed",
error=SqlPairsPreparationStatusResponse.SqlPairsPreparationError(
code="OTHERS",
message=f"Failed to delete SQL pairs: {e}",
),
)

results["metadata"]["error_type"] = "DELETION_FAILED"
results["metadata"]["error_message"] = str(e)

return results

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid silent data corruption in delete_sql_pairs.
If an exception occurs mid-deletion, some pairs might be partially removed. Consider introducing a rollback approach or partial-completion logs for better transparency.

Comment on lines +23 to +49

sql_intention_generation_system_prompt = """
### TASK ###

You are a data analyst great at generating the concise and readable summary of a SQL query.

### INSTRUCTIONS ###

- Summary should be concise and readable.
- Summary should be no longer than 20 words.
- Don't rephrase keywords in the SQL query, just use them as they are.

### OUTPUT ###

You need to output a JSON object as following:
{
"intention": "<CONCISE_AND_READABLE_SUMMARY_STRING>"
}
"""

sql_intention_generation_user_prompt_template = """
### INPUT ###
SQL: {{sql}}

Please think step by step
"""

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Review system prompt for potential PII exposure.
The system prompt instructs on summarizing queries. If queries can contain sensitive data, ensure that the prompt or the generation doesn't expose private details externally, especially when using an external LLM.

Comment on lines +101 to +115
@observe()
def post_process(
generate_sql_intention: List[dict],
sql_pairs: List[SqlPair],
) -> List[Dict[str, Any]]:
intentions = [
orjson.loads(result["replies"][0])["intention"]
for result in generate_sql_intention
]

return [
{"id": sql_pair.id, "sql": sql_pair.sql, "intention": intention}
for sql_pair, intention in zip(sql_pairs, intentions)
]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Heavy reliance on user content in orjson decoding.
Relying on replies content from external calls can lead to decode errors. Confirm robust error handling for malformed or incomplete JSON.

Comment on lines +149 to +155
@observe(capture_input=False)
async def write_sql_pairs(
embed_sql_pairs: Dict[str, Any],
sql_pairs_writer: AsyncDocumentWriter,
) -> None:
return await sql_pairs_writer.run(documents=embed_sql_pairs["documents"])

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Disable partial writes if a portion fails.
write_sql_pairs does not appear to handle exceptions or partial writes. Consider transaction-like mechanisms: all or nothing, or log partial success.

Comment on lines +140 to +143
- name: chart_generation
llm: litellm_llm.gpt-4o-mini-2024-07-18
- name: chart_adjustment
llm: litellm_llm.gpt-4o-mini-2024-07-18
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate pipeline entries.

The chart_generation and chart_adjustment pipelines are duplicated. Remove the duplicate entries as they are identical to the existing ones.

-  - name: chart_generation
-    llm: litellm_llm.gpt-4o-mini-2024-07-18
-  - name: chart_adjustment
-    llm: litellm_llm.gpt-4o-mini-2024-07-18

Comment on lines +159 to +161
- name: chart_generation
llm: litellm_llm.gpt-4o-mini-2024-07-18
- name: chart_adjustment
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate chart pipeline entries

The chart_generation and chart_adjustment entries appear twice in the pipeline configuration. Remove the duplicate entries to avoid confusion.

Comment on lines +171 to +172
- name: preprocess_sql_data
llm: litellm_llm.gpt-4o-mini-2024-07-18
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate pipeline entry

The preprocess_sql_data pipeline entry appears twice in the configuration. Remove the duplicate entry to maintain a clean configuration.

Comment on lines +131 to +135
<<<<<<< HEAD
=======
- name: preprocess_sql_data
llm: litellm_llm.gpt-4o-mini-2024-07-18
>>>>>>> main
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Resolve merge conflict markers

The file contains unresolved merge conflict markers. Please resolve the conflict and remove the markers.

-<<<<<<< HEAD
-=======
-      - name: preprocess_sql_data
-        llm: litellm_llm.gpt-4o-mini-2024-07-18
->>>>>>> main
🧰 Tools
🪛 yamllint (1.35.1)

[error] 132-132: syntax error: could not find expected ':'

(syntax)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/ai-service ai-service related module/ai-service ai-service related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants