This is an example application that utilizes ChatGPT-like models using langchain Langchain documentation
In this example, there is an API in Python, that accepts POST query with text, connects to Big Query and returns the result, processed by GhatGPT model you have specified.
This walkthrough provides step-by-step instructions for building a solution that enables chatting with Google's BigQuery service.
Please note that the primary focus of this guide is to assist you in creating a ChatInterface for Google BigQuery. It does not dive deep into the specifics of Google BigQuery and Google Cloud Management itself. Therefore, it is assumed that you have already set up a Google Cloud project and have billing enabled for the project.
It is crucial to ensure that the following Google Cloud Project APIs are active:
- IAM API
- BigQuery API
The following steps will guide you through the process of authentication:
-
Begin by creating a service account. This service account should be assigned the following roles in BigQuery:
- BigQuery User
- BigQuery Data Viewer
- BigQuery Job User
-
The final step involves downloading a JSON key related to your service account.
The script uses the following environment variables, which need to be set for successful execution:
-
SERVICE_ACCOUNT_FILE
: The path to your Google Cloud service account key file. This key file provides the necessary credentials for the script to interact with your BigQuery service. -
PROJECT
: The ID of your Google Cloud project where your BigQuery service is hosted. -
DATASET
: The specific dataset in BigQuery that you want to query. -
OPENAI_API_KEY
: Your OpenAI API key, which is necessary for the script to interact with the OpenAI GPT models. -
MODEL
: The specific model provided by OpenAI that you want to use. -
TOP_K
: A parameter for the language model that specifies the number of results to consider when predicting the next token. -
DEBUG
: Specifies whether to run the Flask application in debug mode. -
LANGCHAIN_VERBOSE
: This variable sets the verbosity of the LangChain agent executor. If true, the executor will provide more detailed logs of its operations. -
REQUEST_TIMEOUT
: The maximum time in seconds that the application will wait for a request to be processed before it times out.
Please ensure to replace the actual values in the environment variables with your own before running the script.
Open AI API model description: official documentation
In order to use OpenAI's models, you will need to obtain an API key. Here's a step-by-step guide on how to get it:
-
Create an OpenAI Account: Go to OpenAI's website and create an account if you don't already have one.
-
Dashboard Access: Once you've signed up and logged in, navigate to the dashboard. The dashboard is typically accessible from the user menu.
-
API Key: In the dashboard, look for an option to generate an API key. If it's your first time, you might need to create a new key.
-
Generate and Copy: Follow the instructions to generate a new key. Once the key is generated, be sure to copy it and keep it secure. This key provides the necessary authentication to interact with OpenAI's models.
Please remember, this API key is sensitive information and should be kept secure. Don't expose it in public repositories or share it with unauthorized individuals.
Image available on Docker Hub: image
docker pull bibikovvitaly/langchain-bigquery-rest
Create a .env file: The application expects to find certain environment variables that are not included in the repository for security reasons.
Create a .env
file in the root directory of the project, and fill it with the necessary values, such as your OpenAI API key, Google Cloud credentials, and other necessary variables as described earlier in this document.
The application relies on a number of environment variables. These should be set in a .env
file in the root directory of your project. Here's an example of what this file might look like:
SERVICE_ACCOUNT_FILE=./path-to-your-service-account.json
PROJECT=your-google-cloud-project-id
DATASET=your-bigquery-dataset
OPENAI_API_KEY=your-openai-api-key
MODEL=gpt-3.5-turbo-16k
TOP_K=1000
DEBUG=False
LANGCHAIN_VERBOSE=True
REQUEST_TIMEOUT=90
docker-compose up --build
The api starts using port 6000 by default.
To interact with the API, you need to send a POST request to the /execute
endpoint. The body of the request should be a JSON object with a query
field. The value of the query
field should be a natural language query that you want the model to interpret and execute against the BigQuery dataset.
Here's an example of how to send a request to the API using curl
in the command line:
curl --location 'http://localhost:6000/execute' \
--header 'Content-Type: application/json' \
--data '{
"query": "Create a query to get the number of users who have deposited money for the last 7 days from some_table table"
}'
Or if you're using Python's requests library, it might look like this:
import requests
url = "http://localhost:6000/execute"
data = {
"query": "Create a query to get the number of users who have deposited money for the last 7 days from some_table table"
}
response = requests.post(url, json=data)
print(response.json())
The response from the API will be a JSON object that contains the result of the query. For example:
"There are 38 users who have deposited money in the last 14 days."