RudderStack is a customer data pipeline tool for collecting, routing and processing data from your websites, apps, cloud tools, and data warehouse.
More information on RudderStack can be found here.
The Event Query Generator is a tool that allows you to seamlessly navigate through the warehouse event schema through an intuitive UI, and generate complex SQL queries to interact with your customer event data.
Businesses collect the event data and persist it in a data warehouse with the intent of performing analytics, which generally involves writing complex SQL queries. The users face two major challenges in doing so:
-
To generate these SQL queries, they need access to a lot of implicit information within the event data, such as different events and their attributes. Getting easy access to this information within the data warehouse can be quite a challenge.
-
Customer Data Platforms like RudderStack and Segment dump the data in the warehouse in a specific format. We noticed that our customers often run a similar set of queries on this data for their use-cases, e.g event analytics, funnel analytics etc. Some of these queries can be quite tricky to write - you need to find out the column names, join multiple tables, etc.
We built the Event Query Generator to address these challenges. With this tool, you can navigate through the event schema through an easy-to-use interface. You can specify the data filtering conditions through the UI, and the tool generates the SQL queries. You can then run these queries on your warehouse to obtain the required data.
See the Event Query Generator in Action: https://query-gen-app.dev.rudderlabs.com/
✔️ Note: As of this writing, the RudderStack warehouse schema is compatible with Segment. Hence, this tool can be applied to the downstream data warehouses where the Segment event data is persisted.
- Simple, easy-to-use interface for generating the queries
- Caches data from the warehouse for better performance
- Fully tested for Snowflake; support for Redshift and other data warehouses is under development
The Event Query Generator has the following workflow:
- Pre-populates the list of events and their associated properties present in the data warehouse.
- Retrieves the list of events and their properties based on the filtering conditions set by you through the UI.
- Automatically generates the SQL query based on the user-specified filters. You can then run the query on your data warehouse to obtain the required data.
- Clone this repository
- Start the Node.js server by running the following commands from the local repository root:
cd api; npm install; npm start
Here's a short demo:
✔️ Note: All the sample input JSON files are placed under
api/data
.
✔️ To know more about how the backend server works, how to operate the backend server in the standalone mode, or how the events are pre-populated for query generation, check out our Wiki.
- First, install the necessary dependencies for the app by running the following commands:
cd ui; npm i
- Then, run the following command:
REACT_APP_QUERY_GEN_BACKEND_URL=<SERVER_URL> REACT_APP_WH=<SNOWFLAKE/REDSHIFT> REACT_APP_DATABASE=<WAREHOUSE_DB_NAME> REACT_APP_SCHEMA=<WAREHOUSE_SCHEMA_NAME> REACT_APP_ACCOUNT=<WAREHOUSE_ACCOUNT_URL> REACT_APP_USERNAME=<WAREHOUSE_ACCOUNT_NAME> REACT_APP_PASSWORD=<WAREHOUSE_ACCOUNT_PASSWORD> REACT_APP_CACHE_REFRESH_HOURS=<INTERVAL_BETWEEN_LAST_UPDATED_TIME_OF_PREPOPULATED_FILE_AND_NOW> npm start
⚠️ Important: The above command runs the app in the development mode.
The environment variables are:
REACT_APP_QUERY_GEN_BACKEND_URL
- The Query Generator backend runs on this URLREACT_APP_DATABASE
- Name of the warehouse database goes hereREACT_APP_SCHEMA
- Name of the warehouse schema goes hereREACT_APP_ACCOUNT
- Specify the warehouse account URLREACT_APP_USERNAME
- Specify the warehouse account usernameREACT_APP_PASSWORD
- Specify the warehouse account passwordREACT_APP_CACHE_REFRESH_HOURS
- This variable determines whether to fetch the data from the data warehouse or read from the cached files. Set a higher value here to make the UI more responsive.
✔️ Note: The front-end app passes the variables needed by backend to make the warehouse connection and fetch the schema, column and row values from the respective event tables and cache them locally for future lookups via the front-end.
To build the app for production to the build
folder, run the following command:
npm run build
The interface for the Event Query Generator is bootstrapped with Craco. Make sure you pass the env
variables, so that the Craco build picks them up before packaging.
Here is how you can generate a query by adding the Users filter:
You can then copy the queries and run them on your warehouse data to get the desired results!
RudderStack is an open-source Segment alternative for collecting, storing and routing customer event data securely to your data warehouse and dozens of other tools. Read more about RudderStack here.
For more information on the Event Query Generator, feel free to reach out to us on any of the platforms below: