This page describes how to develop a dbt project on your computer using dbt-databricks
. We will create an empty dbt project with information on how to connect to Databricks. We will then run our first dbt models.
- Access to a Databricks workspace
- Ability to create a Personal Access Token (PAT)
- Python 3.8+
- dbt-core v1.1.0+
- dbt-databricks v1.1.0+
Before you scaffold a new dbt project, you have to collect some information which dbt will use to connect to Databricks. Where you find this information depends on whether you are using Databricks Clusters or Databricks SQL endpoints. We recommend that you develop dbt models against Databricks SQL endpoints as they provide the latest SQL features and optimizations.
- Log in to your Databricks workspace
- Click the SQL persona in the left navigation bar to switch to Databricks SQL
- Click SQL Endpoints
- Choose the SQL endpoint you want to connect to
- Click Connection details
- Copy the value of Server hostname. This will be the value of
host
when you scaffold a dbt project. - Copy the value of HTTP path. This will be the value of
http_path
when you scaffold a dbt project.
- Log in to your Databricks workspace
- Click the Data Science & Engineering persona in the left navigation bar
- Click Compute
- Click on the cluster you want to connect to
- Near the bottom of the page, click Advanced options
- Scroll down some more and click JDBC/ODBC
- Copy the value of Server Hostname. This will be the value of
host
when you scaffold a dbt project. - Copy the value of HTTP Path. This will be the value of
http_path
when you scaffold a dbt project.
Now, we are ready to scaffold a new dbt project. Switch to your terminal and type:
dbt init databricks_demo
In the choice that follows, type 1
, which instructs dbt to use the dbt-databricks
adapter:
Which database would you like to use?
[1] databricks
[2] spark
Next, you have to provide the full hostname of your Databricks workspace. For example, if your workspace is myworkspace.cloud.databricks.com
, enter it here.
In the http_path
field, enter the HTTP path you noted above.
In the token
field, enter the PAT you created earlier.
In the catalog
field, enter the name of the Unity Catalog catalog if you are using it. Otherwise, enter null
. This field only shows if you are using dbt-databricks>=1.1.1 and is only relevant to users of Unity Catalog.
In schema
, enter databricks_demo
, which is the schema you created earlier.
Leave threads at 1
for now.
You are now ready to test the connection to Databricks. In the terminal, enter the following command:
dbt debug
If all goes well, you will see a successful connection. If you cannot connect to Databricks, double-check the PAT and update it accordingly in ~/.dbt/profiles.yml
.
At this point, you simply run the demo models in the models/example
directory. In your terminal, type:
dbt run
Once the dbt run completes, switch to Databricks, click Data in the left navigation bar and find the tables you just created! If you created your own schema, you will find two tables:
demo_databricks.my_first_dbt_model
demo_databricks.my_second_dbt_model