MSchema is a semi-structured schema representation of database structure, which could be used in various scenarios such as Text-to-SQL. This repository contains the code for connecting to the database and constructing M-Schema. We support a variety of relational databases, such as MySQL, PostgreSQL, Oracle, etc.
- python >= 3.9
You can install the required packages with the following command:
pip install -r requirements.txt
You can just connect to the database using sqlalchemy
and construct M-Schema representation.
- Create a database connection.
Take PostgreSQL as an example:
from sqlalchemy import create_engine
db_name = 'mydatabase'
db_engine = create_engine(f"postgresql+psycopg2://scott:tiger@localhost:5432/{db_name}")
Connect to MySQL:
db_engine = create_engine(f"mysql+pymysql://{db_user_name}:{db_pwd}@{db_host}:{port}/{db_name}")
Connect to SQLite:
import os
db_path = ""
abs_path = os.path.abspath(db_path)
db_engine = create_engine(f'sqlite:///{abs_path}')
- Construct M-Schema representation.
from schema_engine import SchemaEngine
schema_engine = SchemaEngine(engine=db_engine, db_name=db_name)
mschema = schema_engine.mschema
mschema_str = mschema.to_mschema()
print(mschema_str)
mschema.save(f'./{db_name}.json')
- Use for Text-to-SQL.
dialect = db_engine.dialect.name
question = ''
evidence = ''
prompt = """You are now a {dialect} data analyst, and you are given a database schema as follows:
【Schema】
{db_schema}
【Question】
{question}
【Evidence】
{evidence}
Please read and understand the database schema carefully, and generate an executable SQL based on the user's question and evidence. The generated SQL is protected by ```sql and ```.
""".format(dialect=dialect, question=question, db_schema=mschema_str, evidence=evidence)
# Replace the function call_llm() with your own function or method to interact with a LLM API.
# response = call_llm(prompt)
If you find our work helpful, feel free to give us a cite.
@article{xiyansql,
title={XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL},
author={Yingqi Gao and Yifu Liu and Xiaoxia Li and Xiaorong Shi and Yin Zhu and Yiming Wang and Shiqi Li and Wei Li and Yuntao Hong and Zhiling Luo and Jinyang Gao and Liyu Mou and Yu Li},
year={2024},
journal={arXiv preprint arXiv:2411.08599},
url={https://arxiv.org/abs/2411.08599},
primaryClass={cs.AI}
}