Replies: 5 comments 5 replies
-
Nice, this makes a lot of sense. I'm totally in favor of taking dbt syntax and making some bindings to Pandera's API. Then I think there's an additional conversation to be had about types. When I was implementing metadata stuff, I ended up copying what was in the READMEs, but this would be a nice opportunity to revisit that. For example, for numeric types with a decimal, what should we call those? I called them doubles, copying the READMEs, but... is that what they are? (As opposed to floats, decimals, etc.) Or, another example, you've got Then for accepted values, I think that they serve a dual purpose as a |
Beta Was this translation helpful? Give feedback.
-
@damonmcc @fvankrieken would love to get your thoughts on this |
Beta Was this translation helpful? Give feedback.
-
As far as the yml syntax goes, this looks great. My one pushback is against checks being "schema"-level, I think "dataset"-level makes more sense. And in that case, I think dataset-level The types is a funny one - I think it'd be best to stay more generalized and human-oriented (i.e. "text" over "string") and not python-specific, largely for consistency with other docs. Hopefully that wont complicate things too much. I need to think about @alexrichey's point about accepted_values being under tests vs its own special field. I think I agree though |
Beta Was this translation helpful? Give feedback.
-
Proposed data types: What about geom types? |
Beta Was this translation helpful? Give feedback.
-
and related question: for geom types, is the projection a part of the type? Or does that fall under |
Beta Was this translation helpful? Give feedback.
-
What
We will be implementing one data validation framework using
pandera
iningest
&distribution
. The goal of this discussion is to agree on a syntax we will be using in metadata and recipe template yaml files when defining data checks.Common tests
From my research, most common tests we will be using at ingest and packaging stages are:
schema level:
dbt
)column level:
Requirements for syntax:
dbt
test syntax, so that we just need to know 1 syntax that could be used everywhere.dbt
test syntax:Proposed syntax
Combing existing metadata syntax +
dbt
syntax, I'm proposing something like this:Beta Was this translation helpful? Give feedback.
All reactions