Skip to content

Commit

Permalink
test
Browse files Browse the repository at this point in the history
  • Loading branch information
acracker committed Jun 1, 2024
1 parent 68ffa2b commit 45ede66
Show file tree
Hide file tree
Showing 2 changed files with 170 additions and 66 deletions.
133 changes: 67 additions & 66 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,103 +1,104 @@
# data_watchtower
# data-watchtower

数据监控校验工具
A data verification tool.
Detect issues before your CTO does.

在你的CTO发现问题前, 发现问题

## 安装
## Installation

Install via pip:
```
pip install data-watchtower
```

## 数据加载器
## Data Loader

Loads data into memory for validation by checkers.

## Validators

加载数据到内存中,供校验器使用
Validate whether the loaded data meets expectations.

## 校验器
### Built-in Validators

校验加载器加载的数据是否符合预期
- ExpectColumnValuesToNotBeNull
- ExpectColumnRecentlyUpdated
- ExpectColumnStdToBeBetween
- ExpectColumnMeanToBeBetween
- ExpectColumnNullRatioToBeBetween
- ExpectRowCountToBeBetween
- ExpectColumnDistinctValuesToContainSet
- ExpectColumnDistinctValuesToEqualSet
- More...

### 内置加载器
### Custom Loaders

* ExpectColumnValuesToNotBeNull
* ExpectColumnRecentlyUpdated
* ExpectColumnStdToBeBetween
* ExpectColumnMeanToBeBetween
* ExpectColumnNullRatioToBeBetween
* ExpectRowCountToBeBetween
* ExpectColumnDistinctValuesToContainSet
* ExpectColumnDistinctValuesToEqualSet
* ExpectColumnValuesToNotBeNull
* ExpectColumnDistinctValuesToBeInSet
Support for tailor-made loaders to handle specific data sources.

### 自定义加载器
## Macros

。。。
Custom macros enable referencing variables like dates or config files within monitoring tasks.

##
### Scope of Effect

通过自定义宏, 可以在监控项中引用一些自定义的变量, 比如日期, 配置文件等
Applies to Watchtower names, checker parameters, and loader configurations.

### 生效范围
### Custom Macros

* Watchtower的名称
* 校验器的参数
* 数据加载器的参数
Leverage personalized macros for dynamic content insertion.

### 自定义宏
## Supported Databases

。。。
- MySQL
- PostgreSQL
- SQLite
- And more...

## 支持的数据库
## TODO

* MySQL
* Postgresql
* SQLite
* ...
- web frontend

## 示例
## Example

```python
import datetime
from data_watchtower import (DbServices, Watchtower, DatabaseLoader,
ExpectRowCountToBeBetween, ExpectColumnValuesToNotBeNull)
from data_watchtower import DbServices, Watchtower, DatabaseLoader, ExpectRowCountToBeBetween,
ExpectColumnValuesToNotBeNull

# Database URLs
dw_test_data_db_url = "sqlite:///test.db"
dw_backend_db_url = "sqlite:///data.db"

# 自定义宏模板
# Custom Macro Definitions
custom_macro_map = {
'today': {'impl': lambda: datetime.datetime.today().strftime("%Y-%m-%d")},
'start_date': '2024-04-01',
'column': 'name',
}
# 设置数据加载器,用来加载需要校验的数据
query = "SELECT * FROM score where date='${today}'"

# Configure Data Loader
query = "SELECT * FROM score WHERE date='${today}'"
data_loader = DatabaseLoader(query=query, connection=dw_test_data_db_url)
data_loader.load()
# 创建监控项
wt = Watchtower(name='score of ${today}', data_loader=data_loader, custom_macro_map=custom_macro_map)
# 添加校验器
params = ExpectRowCountToBeBetween.Params(min_value=20, max_value=None)
wt.add_validator(ExpectRowCountToBeBetween(params))

params = ExpectColumnValuesToNotBeNull.Params(column='${column}')
wt.add_validator(ExpectColumnValuesToNotBeNull(params))

result = wt.run()
print(result['success'])

# 保存监控配置以及监控结果
db_svr = DbServices(dw_backend_db_url)
# 创建表
db_svr.create_tables()
# 保存监控配置
db_svr.add_watchtower(wt)
# 保存监控结果
db_svr.save_result(wt, result)
# 重新计算监控项的成功状态
db_svr.update_watchtower_success_status(wt)


```

# Instantiate Watchtower
wt = Watchtower(name='Score Data of ${today}', data_loader=data_loader, custom_macro_map=custom_macro_map)

# Add Validators
row_count_params = ExpectRowCountToBeBetween.Params(min_value=20)
wt.add_validator(ExpectRowCountToBeBetween(row_count_params))

null_check_params = ExpectColumnValuesToNotBeNull.Params(column='${column}')
wt.add_validator(ExpectColumnValuesToNotBeNull(null_check_params))

# Execute Validation
validation_result = wt.run()
print(validation_result['success'])

# Persist Monitoring Setup and Results
db_service = DbServices(dw_backend_db_url)
db_service.create_tables()
db_service.add_watchtower(wt)
db_service.save_result(wt, validation_result)
db_service.update_watchtower_success_status(wt)
```

103 changes: 103 additions & 0 deletions README_zh-cn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# data_watchtower

数据监控校验工具

在你的CTO发现问题前, 发现问题

## 安装

```
pip install data-watchtower
```

## 数据加载器

加载数据到内存中,供校验器使用

## 校验器

校验加载器加载的数据是否符合预期

### 内置加载器

* ExpectColumnValuesToNotBeNull
* ExpectColumnRecentlyUpdated
* ExpectColumnStdToBeBetween
* ExpectColumnMeanToBeBetween
* ExpectColumnNullRatioToBeBetween
* ExpectRowCountToBeBetween
* ExpectColumnDistinctValuesToContainSet
* ExpectColumnDistinctValuesToEqualSet
* ExpectColumnValuesToNotBeNull
* ExpectColumnDistinctValuesToBeInSet

### 自定义加载器

。。。

##

通过自定义宏, 可以在监控项中引用一些自定义的变量, 比如日期, 配置文件等

### 生效范围

* Watchtower的名称
* 校验器的参数
* 数据加载器的参数

### 自定义宏

。。。

## 支持的数据库

* MySQL
* Postgresql
* SQLite
* ...

## 示例

```python
import datetime
from data_watchtower import (DbServices, Watchtower, DatabaseLoader,
ExpectRowCountToBeBetween, ExpectColumnValuesToNotBeNull)

dw_test_data_db_url = "sqlite:///test.db"
dw_backend_db_url = "sqlite:///data.db"

# 自定义宏模板
custom_macro_map = {
'today': {'impl': lambda: datetime.datetime.today().strftime("%Y-%m-%d")},
'start_date': '2024-04-01',
'column': 'name',
}
# 设置数据加载器,用来加载需要校验的数据
query = "SELECT * FROM score where date='${today}'"
data_loader = DatabaseLoader(query=query, connection=dw_test_data_db_url)
data_loader.load()
# 创建监控项
wt = Watchtower(name='score of ${today}', data_loader=data_loader, custom_macro_map=custom_macro_map)
# 添加校验器
params = ExpectRowCountToBeBetween.Params(min_value=20, max_value=None)
wt.add_validator(ExpectRowCountToBeBetween(params))

params = ExpectColumnValuesToNotBeNull.Params(column='${column}')
wt.add_validator(ExpectColumnValuesToNotBeNull(params))

result = wt.run()
print(result['success'])

# 保存监控配置以及监控结果
db_svr = DbServices(dw_backend_db_url)
# 创建表
db_svr.create_tables()
# 保存监控配置
db_svr.add_watchtower(wt)
# 保存监控结果
db_svr.save_result(wt, result)
# 重新计算监控项的成功状态
db_svr.update_watchtower_success_status(wt)


```

0 comments on commit 45ede66

Please sign in to comment.