-
-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to define a strategy function for a custom wide check #538
Comments
hi @suned sorry for the long reply-time! currently this isn't quite possible, but we're working on adding more flexibility to the data synthesis strategy functionality:
Will keep you updated on when support for your use case will be added! |
I've got a different use case, but similarly want to be able to provide a strategy for data synthesis for multiple column checks. @cosmicBboy is this currently a work in progress (I know there are a couple related PRs)? If not, I'm new to the pandera's code base, but I am willing to do what I can to help contribute this functionality. def check_maturity(df):
is_okay = df["Maturity_Date"] - df["Orig_Date"] = df["Age"]
return is_okay |
hi @RNKuhns, can you elaborate on the column checks for which you want to generate data? It'll help with the design and implementation of making the data synthesis strategy customization effort. No one's currently working on this, so would be happy to work with you to get started. At a high level, we want to do the following:
This is pretty down in the weeds of the pandera codebase, let me know if you'd like to tackle it! |
@cosmicBboy I would definitely like to help out. Do you have some time to have a quick meeting? I can explain some of what I'm hoping to accomplish and you can give me some quick pointers on the code base. |
hi @RNKuhns feel free to find a time here! https://calendly.com/niels-bantilan/30min |
high-level approach:
tactically:
contribution as per:
for example, the def make_strategy(schema, ...): # with some other args TBD
# return hypothesis strategy using schema metadata |
If I'm understanding correctly, I am trying to accomplish essentially what the OP is. I want to make sure that I'm in the right issue and that this is still not currently possible. My example - I am applying the following dataframe check to schemas:
How would I go about defining a strategy so that we can correctly generate data for testing purposes? Or is this not possible at this time? I'm fairly new to hypothesis and strategy generation so I apologize if I'm missing something. |
I'm attempting to implement a dataframe level custom check for one-hot encoding for a subset of the columns in a dataframe. I'm interested in using the resulting schema to create a hypothesis strategy, so to that end I'm writing a strategy function as described here. A minimal example:
Since the
is_one_hot_strategy
is called once per column in the schema, its quite hard to implement a strategy for a wide check since, by definition, a wide check is a constraint between columns, not on a single column.My use-case is further complicated by the fact that the
is_one_hot
check only operates on a subset of the columns, and its tricky to tell just from the dtype which columns should be constrained inis_one_hot_strategy
.My questions:
The text was updated successfully, but these errors were encountered: