An Exasol script written on Lua that allows to perform Data Lineage analysis.
The script analyzes SQL without running it by means of in-built SQL parsing library. For each output column it identifies a list of source columns.
- determines output columns origin
- multiple schemas
- multiple source columns
- CTE
- UNION
- FROM
- JOINS
- EMITS
- Subqueries
- LOCAL keyword
- quoted identifiers
- expression columns
- table and column aliases
- the script doesn't check SQL syntax
- only one statement at a time
- ON and USING clauses, WHERE conditions are not analyzed yet
- Connect to Exasol cluster
- Open schema in which you want to install scripts
- Execute *.sql files from scripts directory
SQL_DATA_LINEAGE script expects 2 arguments:
- SQL statement. It is allowed to pass SELECT or CREATE VIEW statements.
- Current schema. If null value passed, script takes current schema from session.
EXECUTE SCRIPT FN.SQL_DATA_LINEAGE(
'CREATE OR REPLACE VIEW test_view AS SELECT * FROM users',
'TEST_DATA_LINEAGE'
)
Output
+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+
| COLUMN_NAME | SOURCE_SCHEMA_NAME | SOURCE_OBJECT_NAME | SOURCE_COLUMN_NAME | FNAME | IS_AGG | ORDINAL_POSITION |
+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+
| USER_ID | TEST_DATA_LINEAGE | USERS | USER_ID | (null) | false | 1 |
| NAME | TEST_DATA_LINEAGE | USERS | NAME | (null) | false | 2 |
| REGISTERED | TEST_DATA_LINEAGE | USERS | REGISTERED | (null) | false | 3 |
| STATUS | TEST_DATA_LINEAGE | USERS | STATUS | (null) | false | 4 |
+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+
EXECUTE SCRIPT FN.SQL_DATA_LINEAGE(
'
WITH
users AS (
SELECT
user_id
, name
, status AS status_id
FROM users
WHERE status != 3
),
status AS (
SELECT
id AS status_id
, name AS status_name
FROM dim_status
)
SELECT
a.*
, COALESCE(b.status_name, ''Unknown'') AS status_name
FROM users a
LEFT JOIN status b ON (a.status_id = b.status_id)
',
'TEST_DATA_LINEAGE'
)
Output
+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+
| COLUMN_NAME | SOURCE_SCHEMA_NAME | SOURCE_OBJECT_NAME | SOURCE_COLUMN_NAME | FNAME | IS_AGG | ORDINAL_POSITION |
+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+
| USER_ID | TEST_DATA_LINEAGE | USERS | USER_ID | (null) | false | 1 |
| NAME | TEST_DATA_LINEAGE | USERS | NAME | (null) | false | 2 |
| STATUS_ID | TEST_DATA_LINEAGE | USERS | STATUS | (null) | false | 3 |
| STATUS_NAME | TEST_DATA_LINEAGE | DIM_STATUS | NAME | COALESCE | false | 4 |
+-------------+--------------------+--------------------+--------------------+----------+--------+------------------+
- install PyEXASOL driver
- set Exasol credentials in tests/config.py
cd tests/
python -m unittest test_sql_data_lineage.py
- Dmitry Umarov d.umarov@team.bumble.com