This project both documents and provides a script to ceate a SQL proxy in Google Cloud Composer. I am assuming you know why you need to create a proxy, as outlined here:
See: Connect from Google Kubernetes Engine
Unfortunately, these instructions will not suffice. There are also several blogs on how to do this, but these seem outdated. Recent versions of Composer use namespaces in kubernetes, so much of the documentation (including that from Google itself) will not work.
The script will create a sql proxy connection for workload identity with the sidecar pattern.
-
Get the name of your Compser cluster. See the section below if you don't know this.
-
Install the kubectl client:
gcloud components install gke-gcloud-auth-plugin
- Create a config.ini file. See the examples
ksa_name
can be anything you wantcluster_name
the name of the clusterregion_name
name of region id of your clusterdeployment_name
can be anyhtingdb_secret_name
Optional, only if you need to create a kubernetes secret; can be anytingdb_name
Optional, only use if creating a kuberntes secret; the name of db you are connecting todb_user_name
Optional, only use if creating a kuberntes secret; the name of db user you are connecting todb_port
port of instanceinstance_connection_name
the name of your Cloud SQL instance, found in GCP consoleservice_account
can be anythingproject_id
project id
- Run python scripts/create.py
<path to config>
- use the -v option for verbosity. Default is no messaging. Use 1, 2, or 3 for more verbosity
- use the -s or --use-secret if you want to create s kubernetes secret
Get the identity of the worker pod (where composer has the workers, and where the connection to sql must exist):
kubectl get pods --all-namespaces
In the second column, look for a name that is something like: airflow-worker-xxxx. Note the name and the namespace for step 3
- Check that the service was created
kubectl get services --all-namespaces
You should see your service
- connect to the worker pod:
kubectl --namespace=<from step1> exec -it <from step 1> bash
For postgres
psql -h <service-name>.default.svc.cluster.local --user <user>
Click on your instance:
Click on "Environment Configuration"
Scroll down until you see the section "GKE Cluster". You cluster name is the string after the word "/clusters/"
This will be used in the config.ini file (see the examples), as well as for connecting to the cluster.
In addtion, get the region name. In this case, it is "us-west2"
-
I am not sure creating a kubernetes secret serves any purpose. My understanding is that this secret is mounted in your cluster, so you could access it. But I don't see any place in Google Composer to do so.
-
Google documentation states: "Finally, configure your application to connect using 127.0.0.1". This will not work, because Composer now uses namespaces in kubernetes, and clusters with different namespaces can't communicate with ech other. Instead, you need the full name: <service_name>..svc.cluster.local.
-
Google documentation does not state you need to create a service, but apparently, you do. I followed the configuration for airflow-sqlproxy, which needed a service.
-
Google documentation for the yaml file (sidecar) would not work. I had to bind the address of "0.0.0.0" to the proxy to get the service recognized.