Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(wren-ai-service): spider metrics in evaluation #761

Merged
merged 36 commits into from
Oct 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
326def6
restructure eval files
cyyeh Oct 14, 2024
9f94d26
add spider_database
cyyeh Oct 14, 2024
d559465
ignore spider database
cyyeh Oct 14, 2024
61e670d
update
cyyeh Oct 14, 2024
58d380a
generate mdl and question sql pairs by db
cyyeh Oct 14, 2024
a7ecfa6
refactor
cyyeh Oct 14, 2024
f77db0d
update
cyyeh Oct 15, 2024
695961c
add comments
cyyeh Oct 15, 2024
277f247
allow data curation app connects to duckdb
cyyeh Oct 15, 2024
d848ee0
fix bug
cyyeh Oct 15, 2024
eb7894b
revert
cyyeh Oct 15, 2024
d2a947c
fix relationship name
cyyeh Oct 15, 2024
8c875aa
feat: implement exact match accuracy metric of spider eval
paopa Oct 15, 2024
8902f7a
chore: move the metric to an independent module
paopa Oct 15, 2024
45ce662
chore: remove redundant module
paopa Oct 15, 2024
e775c21
chore: fix typo
paopa Oct 15, 2024
9befa19
chore: update lock file
paopa Oct 15, 2024
5cbe200
chore: change the path of kmap and db dir
paopa Oct 15, 2024
00f1dce
feat: put the catalog name into additional metadata in every prediction
paopa Oct 15, 2024
39e6d5c
fix: get catalog from the wrong place
paopa Oct 15, 2024
bdb2cec
feat: support duckdb on prediction
paopa Oct 15, 2024
5e72436
feat: modify the metric config to use duckdb in eval
paopa Oct 15, 2024
e011bf1
update README and command
cyyeh Oct 16, 2024
fa862ad
update
cyyeh Oct 16, 2024
c88a182
refine env name
cyyeh Oct 16, 2024
5548ac1
fix eval bug using duckdb
cyyeh Oct 16, 2024
89d55b9
fix bug
cyyeh Oct 16, 2024
6c049cd
fix: wrong additional metadata
paopa Oct 16, 2024
84dec12
feat: implement Spider Execution Accuracy
paopa Oct 16, 2024
036a6c7
feat: add execution accuracy into generation and end-to-end eval pipe…
paopa Oct 16, 2024
6583eed
fix: rewrite sql
paopa Oct 16, 2024
3643a79
fix bug
cyyeh Oct 16, 2024
c698550
fix mdl table construction bug
cyyeh Oct 16, 2024
679f3a5
Merge branch 'main' into feat/spider-metrics
cyyeh Oct 16, 2024
f3f40a3
fix: don't rewrite sql in execution accuracy metrci
paopa Oct 16, 2024
7f95b7d
try fix github actions issue
cyyeh Oct 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ai-service-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ defaults:
jobs:
pytest:
if: ${{ contains(github.event.pull_request.labels.*.name, 'ci/ai-service') || github.event_name == 'push' }}
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ wren-ai-service/src/eval/wren-engine/etc/archived
wren-ai-service/src/eval/data
wren-ai-service/**/outputs/
wren-ai-service/**/spider/
!wren-ai-service/**/metrics/spider/
!wren-ai-service/tests/data
!wren-ai-service/src/eval/data/book_2*.json
!wren-ai-service/src/eval/data/baseball_1*.json
Expand Down
3 changes: 3 additions & 0 deletions wren-ai-service/Justfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ start:
curate_eval_data:
poetry run streamlit run eval/data_curation/app.py

prep:
poetry run python -m eval.preparation

predict dataset pipeline='ask':
@poetry run python -u eval/prediction.py --file {{dataset}} --pipeline {{pipeline}}

Expand Down
8 changes: 5 additions & 3 deletions wren-ai-service/eval/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@ OPENAI_API_KEY=
WREN_IBIS_ENDPOINT=http://localhost:8000
WREN_ENGINE_ENDPOINT=http://localhost:8080
WREN_IBIS_TIMEOUT=10
BATCH_SIZE=4
BATCH_INTERVAL=1

DATA_SOURCE=bigquery
bigquery.project-id=
bigquery.dataset-id=
bigquery.credentials-key=
BATCH_SIZE=4
BATCH_INTERVAL=1
bigquery.credentials-key=
19 changes: 19 additions & 0 deletions wren-ai-service/eval/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,25 @@ The dataset curation process is used to prepare the evaluation dataset for the W
- copy `.env.example` to `.env` and fill in the environment variables
- execute the command under the `wren-ai-service` folder: `just curate_eval_data`

## Eval Dataset Preparation(If using Spider 1.0 dataset)

This command will do two things:
1. download Spider 1.0 dataset in `wren-ai-service/tools/dev/spider1.0`; and there are two folders inside: database and spider_data
- database: it contains test data. It's downloaded from [this repo](https://github.com/taoyds/test-suite-sql-eval).
- spider_data: it contains table schema, ground truths(question sql pairs), etc. For more information, please refer to [this repo](https://github.com/taoyds/spider).
2. prepare evaluation dataset and put them in `wren-ai-service/eval/dataset`. File name of eval dataset for Spider would look like this: `spider_<db_name>_eval_dataset.toml`

```cli
just prep
```

## Evaluation Dataset Schema

- dataset_id(UUID)
- date
- mdl
- eval dataset

## Prediction Process

The prediction process is used to produce the results of the evaluation data using the Wren AI service. It will create traces and a session on Langfuse to make the results available to the user. You can use the following command to predict the evaluation dataset under the `eval/dataset` directory:
Expand Down
33 changes: 26 additions & 7 deletions wren-ai-service/eval/data_curation/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,23 @@
from streamlit_tags import st_tags
from utils import (
DATA_SOURCES,
WREN_ENGINE_ENDPOINT,
WREN_IBIS_ENDPOINT,
get_contexts_from_sqls,
get_data_from_wren_engine_with_sqls,
get_documents_given_contexts,
get_eval_dataset_in_toml_string,
get_openai_client,
get_question_sql_pairs,
is_sql_valid,
prettify_sql,
)

from eval.utils import (
get_documents_given_contexts,
get_eval_dataset_in_toml_string,
prepare_duckdb_init_sql,
prepare_duckdb_session_sql,
)

st.set_page_config(layout="wide")
st.title("WrenAI Data Curation App")

Expand Down Expand Up @@ -101,11 +108,17 @@ def on_click_setup_uploaded_file():
uploaded_file.getvalue().decode("utf-8")
)

st.session_state["connection_info"] = {
"project_id": os.getenv("bigquery.project-id"),
"dataset_id": os.getenv("bigquery.dataset-id"),
"credentials": os.getenv("bigquery.credentials-key"),
}
if data_source == "bigquery":
st.session_state["connection_info"] = {
"project_id": os.getenv("bigquery.project-id"),
"dataset_id": os.getenv("bigquery.dataset-id"),
"credentials": os.getenv("bigquery.credentials-key"),
}
elif data_source == "duckdb":
prepare_duckdb_session_sql(WREN_ENGINE_ENDPOINT)
prepare_duckdb_init_sql(
WREN_ENGINE_ENDPOINT, st.session_state["mdl_json"]["catalog"]
)
else:
st.session_state["data_source"] = None
st.session_state["mdl_json"] = None
Expand All @@ -126,6 +139,9 @@ def on_change_sql(i: int, key: str):
st.session_state["data_source"],
st.session_state["mdl_json"],
st.session_state["connection_info"],
WREN_ENGINE_ENDPOINT
if st.session_state["data_source"] == "duckdb"
else WREN_IBIS_ENDPOINT,
)
)
if valid:
Expand Down Expand Up @@ -388,6 +404,9 @@ def on_click_remove_candidate_dataset_button(i: int):
st.session_state["data_source"],
st.session_state["mdl_json"],
st.session_state["connection_info"],
WREN_ENGINE_ENDPOINT
if st.session_state["data_source"] == "duckdb"
else WREN_IBIS_ENDPOINT,
)
)[0]
st.dataframe(
Expand Down
Loading
Loading