Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandera: A flexible and expressive pandas data validation library. #12

Closed
11 of 22 tasks
cosmicBboy opened this issue Aug 14, 2019 · 46 comments
Closed
11 of 22 tasks

Comments

@cosmicBboy
Copy link

cosmicBboy commented Aug 14, 2019

Submitting Author: Niels Bantilan (@cosmicBboy)
All current maintainers: (@cosmicBboy)
Package Name: pandera
One-Line Description of Package: validate the types, properties, and statistics of pandas data structures
Repository Link: https://github.com/unionai-oss/pandera
Version submitted: 0.1.5
Editor: @lwasser
Reviewer 1: @mbjoseph
Reviewer 2: @xmnlab
Archive: https://github.com/pandera-dev/pandera/releases/tag/v0.2.3
Version accepted: v0.2.3
Date Accepted: 10/10/2019


Description

pandas data structures can hide a lot of information, and explicitly
validating them at runtime in production-critical or reproducible research
settings is a good idea for building reliable data transformation pipelines.
pandera enables users to:

  1. Check the types and properties of columns in a DataFrame or values in
    a Series.
  2. Perform descriptive and inferential statistical validation, e.g. two-sample
    t-tests.
  3. Seamlessly integrate with existing data analysis/processing pipelines
    via function decorators.

pandera provides a flexible and expressive API for performing data validation
on tidy (long-form) and wide data to make data processing pipelines more
readable and robust.

Scope

  • Please indicate which category or categories this package falls under:
    • Data retrieval
    • Data extraction
    • Data munging
    • Data deposition
    • Reproducibility
    • Geospatial
    • Education
    • Data visualization*

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.

  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

Data munging: the package makes ETL, data analysis, and data processing
pipelines more robust and reliable by providing users with tools to validate
assumptions about the schema and statistical properties of datasets.
This package supports validation on long (tidy) data and wide data.

Reproducibility: This package enables users to validate DataFrame or Series
objects at runtime or as unit/integration tests, and can easily be integrated
to existing pipelines using the check_input and check_output decorators.
It also supports collaboration and reproducible research by programmatically
enforcing assertions made about the statistical properties of a dataset in
addition to making it easier to review pandas code in production-critical
contexts.

  • Who is the target audience and what are scientific applications of this package?

The target audience of pandera consist of data scientists, data engineers,
machine learning engineers, and machine learning scientists who use pandas in
their data processing pipelines for various purposes e.g., transforming data
for reporting, analytics, model training, and data visualization. This tool is
built on top of pandas and scipy to provide a user-friendly interface for
explicitly specifying the set of properties that a DataFrame or Series must
fulfill in order to be considered valid. Since pandera makes no assumptions
about the domain of study or contents of these pandas data structures, it
could be used in a wide variety of quantitative fields that involve the
analysis of tabular data.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

There are a few alternatives to pandera in the the Python ecosystem and here
is how they compare:

Key differentiators of pandera:

  • column data types, nullability, and uniqueness are first-class concepts.

  • check_input and check_output decorators enable seamless integration with
    existing code.

  • Checks provide flexibility and performance by providing access to pandas
    API by design.

  • Hypothesis class provides a tidy-first interface for statistical hypothesis
    testing.

  • Checks and Hypothesis objects support both tidy and wide data validation.

  • Comprehensive documentation on key functionality.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

https://pyopensci.discourse.group/t/candidate-package-pandera-a-flexible-pandas-data-structure-validation-package/92

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has an OSI approved license
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.

Publication options

JOSS Checks
  • The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
  • The package is deposited in a long-term repository with the DOI:

Note: Do not submit your package separately to JOSS

Are you OK with Reviewers Submitting Issues to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Code of conduct

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

Previous Repo: https://github.com/cosmicBboy/pandera

@lwasser
Copy link
Member

lwasser commented Aug 19, 2019

Thank you @cosmicBboy !! we will get back to you with the editor / review process next steps !!

@lwasser
Copy link
Member

lwasser commented Aug 23, 2019

Editor checks:

  • Fit: The package meets criteria for fit and overlap.
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service. Might add better dev setup instructions for contributing... but i see a dev envt txt
  • License: The package has an OSI accepted license MIT License
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments


Reviewers: @mbjoseph @xmnlab
Due date: @mbjoseph we agreed to do reviews one at a time. Given that, is a 2 week deadline (which would be September 6) ok for your schedule? if that is ok then @xmnlab i will ping you once Max's review is in and you can begin your review!! @cosmicBboy has agreed to issues and PR's if you want to create a review using that approach rather than all text in this issue (links to the issue and/or PR may be preferred). Thank you all for your time!!

@mbjoseph
Copy link
Member

@lwasser yes! A 2 week deadline works for me. I'll have my review in by Sep 6.

@lwasser
Copy link
Member

lwasser commented Aug 28, 2019

@mbjoseph thank you!! and thank you for being willing to help @xmnlab out as well but submitting the first review. Ivan, we can totally support your first review for pyopensci!! so psyched to have you on board with us.

@cosmicBboy
Copy link
Author

thanks everyone for participating in this review! Just FYI, the pandera issues page has a couple of tickets that may be of interest for reviewers.

We're planning on a 0.2.0 release in the next week or so.

@xmnlab
Copy link

xmnlab commented Aug 28, 2019

@lwasser thank you so much! I am excited to contribute to pyopensci project! <3

@mbjoseph
Copy link
Member

mbjoseph commented Sep 3, 2019

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all user-facing functions
  • Examples for all user-facing functions
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer.

Readme requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for continuous integration and test coverage, the badge for pyOpenSci peer-review once it has started (see below), a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges, see this example, that one and that one. Such a table should be more wide than high.
  • Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
  • Installation instructions
  • Any additional setup required (authentication tokens, etc)
  • Brief demonstration usage
  • Direction to more detailed documentation (e.g. your documentation files or website).
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages
  • Citation information

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 6


Review Comments

Overall, this is a great package with a clear scope, good docs, and good testing infrastructure. Clearly, a lot of effort has been put into its development, and as somebody who works with raw data, something like this would be immediately useful. With this in mind, most of my comments are fairly minor.

Bigger points:

These relate to the top-level boxes for the pyOpenSci review process that I could not check.

  1. API documentation is in pretty good shape, but there are some things without a description in the API docs (e.g., https://pandera.readthedocs.io/en/stable/API.html#pandera.Check.error_message).

  2. I am not checking the box for "Examples for all user-facing functions". Taken literally, there are user-facing functions that do not have examples (e.g., generic_error_message), though I believe the examples cover the most common use cases. It might be a good idea to prefix some of these methods that users aren't expected to use with an underscore, or if it makes more sense to add examples (e.g., via doctest in the API docs), that could also be worth considering.

Minor notes

These are a smattering of questions I ran into, and notes that might help improve the package.

>>> pylint pandera
************* Module pandera
pandera/__init__.py:1:0: C0111: Missing module docstring (missing-docstring)
************* Module pandera.dtypes
pandera/dtypes.py:6:0: C0111: Missing class docstring (missing-docstring)
pandera/dtypes.py:17:0: C0103: Constant name "Bool" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:18:0: C0103: Constant name "DateTime" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:19:0: C0103: Constant name "Category" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:20:0: C0103: Constant name "Float" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:21:0: C0103: Constant name "Int" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:22:0: C0103: Constant name "Object" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:23:0: C0103: Constant name "String" doesn't conform to UPPER_CASE naming style (invalid-name)
pandera/dtypes.py:24:0: C0103: Constant name "Timedelta" doesn't conform to UPPER_CASE naming style (invalid-name)
************* Module pandera.constants
pandera/constants.py:1:0: C0111: Missing module docstring (missing-docstring)
************* Module pandera.errors
pandera/errors.py:4:0: C0111: Missing class docstring (missing-docstring)
pandera/errors.py:8:0: C0111: Missing class docstring (missing-docstring)
pandera/errors.py:12:0: C0111: Missing class docstring (missing-docstring)
************* Module pandera.schemas
pandera/schemas.py:252:0: C0330: Wrong hanging indentation (add 1 space).
                            constants.N_FAILURE_CASES).to_dict()))
                            ^| (bad-continuation)
pandera/schemas.py:258:0: C0330: Wrong hanging indentation (add 1 space).
                            constants.N_FAILURE_CASES).to_dict()))
                            ^| (bad-continuation)
pandera/schemas.py:268:0: C0330: Wrong hanging indentation (add 1 space).
                        constants.N_FAILURE_CASES).to_dict()))
                        ^| (bad-continuation)
pandera/schemas.py:11:0: R0205: Class 'DataFrameSchema' inherits from object, can be safely removed from bases in python3 (useless-object-inheritance)
pandera/schemas.py:14:4: R0913: Too many arguments (7/5) (too-many-arguments)
pandera/schemas.py:56:4: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/schemas.py:79:25: W0212: Access to a protected member _checks of a client class (protected-access)
pandera/schemas.py:105:28: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
pandera/schemas.py:118:4: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/schemas.py:172:0: R0205: Class 'SeriesSchemaBase' inherits from object, can be safely removed from bases in python3 (useless-object-inheritance)
pandera/schemas.py:175:4: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/schemas.py:246:16: R1720: Unnecessary "else" after "raise" (no-else-raise)
pandera/schemas.py:219:4: R0912: Too many branches (13/12) (too-many-branches)
pandera/schemas.py:172:0: R0903: Too few public methods (1/2) (too-few-public-methods)
pandera/schemas.py:285:0: C0111: Missing class docstring (missing-docstring)
pandera/schemas.py:287:4: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/schemas.py:287:4: W0235: Useless super delegation in method '__init__' (useless-super-delegation)
pandera/schemas.py:285:0: R0903: Too few public methods (1/2) (too-few-public-methods)
pandera/schemas.py:5:0: C0411: standard import "from typing import Optional" should be placed before "import pandas as pd" (wrong-import-order)
************* Module pandera.checks
pandera/checks.py:98:0: C0330: Wrong hanging indentation (remove 4 spaces).
                "%s failed element-wise validator %d:\n"
            |   ^ (bad-continuation)
pandera/checks.py:100:0: C0330: Wrong hanging indentation (remove 4 spaces).
                (parent_schema, check_index,
            |   ^ (bad-continuation)
pandera/checks.py:59:8: C0103: Attribute name "fn" doesn't conform to snake_case naming style (invalid-name)
pandera/checks.py:12:0: C0111: Missing class docstring (missing-docstring)
pandera/checks.py:12:0: R0205: Class 'Check' inherits from object, can be safely removed from bases in python3 (useless-object-inheritance)
pandera/checks.py:14:4: R0913: Too many arguments (7/5) (too-many-arguments)
pandera/checks.py:77:4: C0111: Missing method docstring (missing-docstring)
pandera/checks.py:163:4: R0201: Method could be a function (no-self-use)
pandera/checks.py:194:8: R1705: Unnecessary "elif" after "return" (no-else-return)
pandera/checks.py:212:8: R1705: Unnecessary "else" after "return" (no-else-return)
pandera/checks.py:238:12: R1705: Unnecessary "elif" after "return" (no-else-return)
pandera/checks.py:261:8: R1720: Unnecessary "elif" after "raise" (no-else-raise)
pandera/checks.py:160:8: W0201: Attribute 'failure_cases' defined outside __init__ (attribute-defined-outside-init)
pandera/checks.py:5:0: C0411: standard import "from functools import partial" should be placed before "import pandas as pd" (wrong-import-order)
pandera/checks.py:6:0: C0411: standard import "from typing import Union, Optional, List, Dict" should be placed before "import pandas as pd" (wrong-import-order)
************* Module pandera.decorators
pandera/decorators.py:64:0: C0330: Wrong hanging indentation (remove 4 spaces).
                        "error in check_input decorator of function '%s': the "
                    |   ^ (bad-continuation)
pandera/decorators.py:68:0: C0330: Wrong hanging indentation (remove 4 spaces).
                        (fn.__name__,
                    |   ^ (bad-continuation)
pandera/decorators.py:74:0: C0330: Wrong hanging indentation.
                        )
                |   |   ^ (bad-continuation)
pandera/decorators.py:13:0: C0103: Argument name "fn" doesn't conform to snake_case naming style (invalid-name)
pandera/decorators.py:22:0: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/decorators.py:57:4: C0103: Argument name "fn" doesn't conform to snake_case naming style (invalid-name)
pandera/decorators.py:62:12: C0103: Variable name "e" doesn't conform to snake_case naming style (invalid-name)
pandera/decorators.py:88:12: C0103: Variable name "e" doesn't conform to snake_case naming style (invalid-name)
pandera/decorators.py:57:21: W0613: Unused argument 'instance' (unused-argument)
pandera/decorators.py:100:0: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/decorators.py:135:4: C0103: Argument name "fn" doesn't conform to snake_case naming style (invalid-name)
pandera/decorators.py:153:8: C0103: Variable name "e" doesn't conform to snake_case naming style (invalid-name)
pandera/decorators.py:135:21: W0613: Unused argument 'instance' (unused-argument)
************* Module pandera.schema_components
pandera/schema_components.py:9:0: C0111: Missing class docstring (missing-docstring)
pandera/schema_components.py:11:4: R0913: Too many arguments (7/5) (too-many-arguments)
pandera/schema_components.py:70:4: W0222: Signature differs from overridden '__call__' method (signature-differs)
pandera/schema_components.py:85:0: C0111: Missing class docstring (missing-docstring)
pandera/schema_components.py:87:4: R0913: Too many arguments (6/5) (too-many-arguments)
pandera/schema_components.py:87:4: W0235: Useless super delegation in method '__init__' (useless-super-delegation)
pandera/schema_components.py:101:4: W0222: Signature differs from overridden '__call__' method (signature-differs)
pandera/schema_components.py:110:0: C0111: Missing class docstring (missing-docstring)
pandera/schema_components.py:115:21: W0212: Access to a protected member _name of a client class (protected-access)
pandera/schema_components.py:115:46: W0212: Access to a protected member _name of a client class (protected-access)
pandera/schema_components.py:116:20: W0212: Access to a protected member _pandas_dtype of a client class (protected-access)
pandera/schema_components.py:117:27: W0212: Access to a protected member _checks of a client class (protected-access)
pandera/schema_components.py:118:29: W0212: Access to a protected member _nullable of a client class (protected-access)
pandera/schema_components.py:119:37: W0212: Access to a protected member _allow_duplicates of a client class (protected-access)
pandera/schema_components.py:127:4: W0222: Signature differs from overridden '__call__' method (signature-differs)
************* Module pandera.hypotheses
pandera/hypotheses.py:237:0: C0301: Line too long (103/100) (line-too-long)
pandera/hypotheses.py:30:4: R0913: Too many arguments (8/5) (too-many-arguments)
pandera/hypotheses.py:148:12: R1720: Unnecessary "else" after "raise" (no-else-raise)
pandera/hypotheses.py:168:8: R1705: Unnecessary "else" after "return" (no-else-return)
pandera/hypotheses.py:177:4: R0913: Too many arguments (8/5) (too-many-arguments)
pandera/hypotheses.py:5:0: C0411: standard import "from functools import partial" should be placed before "import pandas as pd" (wrong-import-order)
pandera/hypotheses.py:8:0: C0411: standard import "from typing import Union, Optional, List, Dict" should be placed before "import pandas as pd" (wrong-import-order)
pandera/hypotheses.py:1:0: R0801: Similar lines in 3 files
==pandera.schema_components:86
==pandera.schemas:174
==pandera.schemas:286
    def __init__(
            self,
            pandas_dtype,
            checks: callable = None,
            nullable: bool = False,
            allow_duplicates: bool = True,
            name: str = None): (duplicate-code)
pandera/hypotheses.py:1:0: R0801: Similar lines in 3 files
==pandera.schema_components:10
==pandera.schemas:174
==pandera.schemas:286
    def __init__(
            self,
            pandas_dtype,
            checks: callable = None,
            nullable: bool = False,
            allow_duplicates: bool = True, (duplicate-code)

------------------------------------------------------------------

@lwasser
Copy link
Member

lwasser commented Sep 10, 2019

thank you @mbjoseph for this extremely thorough review. gosh i'm not sure why i didn't see this in my github notifications. my apologies. @xmnlab you can have a look at the review above. Do you want to give the second review a go after seeing what max has pointed out above? If you need any guidance, please say the word!!

@xmnlab
Copy link

xmnlab commented Sep 10, 2019

@lwasser sure thing! I am planning to start to work on that today :) thanks!

@lwasser
Copy link
Member

lwasser commented Sep 10, 2019

awesome @xmnlab please reach out if you have any questions !! we are all hear to support. @cosmicBboy just a note that the second reviewer is starting the process. You could have a look at @mbjoseph review if you'd like in the meantime!! thank you all!! :)

@cosmicBboy
Copy link
Author

cosmicBboy commented Sep 11, 2019

thanks @lwasser!

@mbjoseph your review is much appreciated! I've released v0.2.1, where I addressed many of the points that you raised, check out the release notes. @xmnlab FYI I've taken a crack at some of @mbjoseph's comments.

Most notable changes:

  • add citation information
  • add dev installation instructions
  • improve formatting and wording of sphinx documentation (this addresses several of the points you made about formatting and wording in the documentation)
  • make SchemaError message formatting functions private (generic_error_message and this other methods should have been private all along)
  • add docstrings to error classes

Minor points:

Test coverage is pretty high - any particular reason why the remaining lines are not tested?

I haven't really had to much time to prioritize covering the rest, though I'd like to prioritize the biggest holes and cover those.

There are some deprecation warnings that arise in running the tests: https://travis-ci.org/pandera-dev/pandera/jobs/579197344#L2287

Planning to do this as part of unionai-oss/pandera#110

CI testing on OSX and Windows might be nice too.

Made an issue for this: unionai-oss/pandera#109

Why not conda-forge instead of the cosmicbboy conda channel?

Yes, would love to get a conda-forge recipe going: unionai-oss/pandera#90

pylint points out some places where the code could be streamlined a bit (e.g., unnecessary else statements, and some cases where object is explicitly declared as a parent class), but none of the output is indicative of major problems. Feel free to address or ignore any of these checks:

Cool, made an issue to add pylint to CI: unionai-oss/pandera#108

@xmnlab
Copy link

xmnlab commented Sep 11, 2019

just one question. the version submitted for review is 0.1.5
but it seems pandera has more 2 version after that.

should I review just 0.1.5? the same applies to documentation on readthedocs?

@mbjoseph
Copy link
Member

IMO @xmnlab you should focus on the most recent version, but @lwasser may also have a preference!

@lwasser
Copy link
Member

lwasser commented Sep 11, 2019

@mbjoseph i think that is a reasonable suggestion!! may i assume you reviewed the most recent version as well? if that is the case then the reviews will be consistent. thank you both!!

@mbjoseph
Copy link
Member

That's right @lwasser -- my review was for the most recent version at the time, but the package has been updated since (including updates that address my review). So, probably better to work on the most recent version for review 2.

@cosmicBboy
Copy link
Author

sorry for throwing a wrench in the review process! I probably should have waited on review 2 before updating the package

@xmnlab
Copy link

xmnlab commented Sep 13, 2019

thanks for the feedback @mbjoseph and @lwasser ! I am doing the review on the latest version. thanks

@xmnlab
Copy link

xmnlab commented Sep 14, 2019

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all user-facing functions
  • Examples for all user-facing functions
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer.

Readme requirements
The package meets the readme requirements below:

  • Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • The package name
  • Badges for continuous integration and test coverage, the badge for pyOpenSci peer-review once it has started (see below), a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges, see this example, that one and that one. Such a table should be more wide than high.
  • Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
  • Installation instructions
  • Any additional setup required (authentication tokens, etc)
  • Brief demonstration usage
  • Direction to more detailed documentation (e.g. your documentation files or website).
  • If applicable, how the package compares to other similar packages and/or how it relates to other packages
  • Citation information

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Continuous Integration: Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
  • Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines.

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4:30


Review Comments

The package looks very good: package structure, documentation, tests and CI looks in very good shape. Some points reported by @mbjoseph were already fixed or already added as an GitHub issue.

I am adding just 2 more comments. Actually the 1st is just a comment related to an issue that was already partial fixed (installation for development) but maybe it could be improved.

  • Installation instructions:: probably the documentation should recommend python setup.py develop or pip install -e . for the installation in development mode (as @mbjoseph suggested)
  • Examples: maybe it should consider the usage of example sections for docstrings. It seems the project is using sphinx style for docstrings. I didn't find an official documentation for that but maybe it could help: http://queirozf.com/entries/python-docstrings-reference-examples

@lwasser
Copy link
Member

lwasser commented Sep 16, 2019

awesome. thanks @xmnlab and great job on your first review !!! @cosmicBboy please note the new round of review comments. Ping me when changes have been implemented / you have questions etc!! Thank you all for a really smooth review process!!

@cosmicBboy
Copy link
Author

thanks @lwasser @xmnlab @mbjoseph!

I've cut a new pandera release 0.2.2 that adds example docstrings to all public-facing classes and methods. The commit also:

  • docstring examples should be reflected in the docs
  • changes README with updated development installation instructions.
  • adds more test coverage in schema.py
  • fixes unit test pandas FutureDeprecation warnings

Please let me know if you have any questions.

@lwasser
Copy link
Member

lwasser commented Sep 30, 2019

thank you @cosmicBboy !! @mbjoseph @xmnlab will you please have a look at the latest release? let me know if the changes are acceptable given your review! if so, you can check the. "the author has responded to my review" box at the bottom of your review submission. If you see anything that wasn't addressed to your satisfaction please let me know!!

thank you all for such a smooth review process!

@lwasser lwasser closed this as completed Sep 30, 2019
@mbjoseph
Copy link
Member

@cosmicBboy thanks for addressing my suggestions - v0.2.2 looks good to me!

@lwasser
Copy link
Member

lwasser commented Oct 1, 2019

@xmnlab can you kindly have a look at the above and if you are happy with the edits, check the box in your review that states that the author has addressed everything to your satisfaction .

@lwasser
Copy link
Member

lwasser commented Nov 13, 2019

given this has been APPROVED, i will close this issue. If there is any reason to reopen it, please say the word!!!

@lwasser
Copy link
Member

lwasser commented Jul 16, 2021

reopening to keep tabs on JOSS submission!

@astrojuanlu
Copy link

astrojuanlu commented Aug 31, 2021

I tried to locate the pandera paper on JOSS, without success. Am I missing anything?

@lwasser
Copy link
Member

lwasser commented Aug 31, 2021

hey there @astrojuanlu i believe that @cosmicBboy hasn't yet submitted to JOSS. I briefly chatted over twitter i think or maybe at scipy and it wasn't submitted yet. it may not be under review yet. @cosmicBboy can you confirm? i can also remove that tag if you don't plan on submitting there but it sounded like you were interested in doing that at some point. the submission process is fast with JOSS once it goes through our review.

@cosmicBboy
Copy link
Author

Hi @lwasser @astrojuanlu yes I do intend on submitting a paper to JOSS, I'm still working on a draft and plan on submitting within the next 2-3 weeks.

@lwasser
Copy link
Member

lwasser commented Dec 16, 2021

hey there @cosmicBboy did this ever go through JOSS? i just didn't see the issue referenced here. I am going to close this for the time being but if it does go into JOSS please reference this issue and we can update it accordingly! thank you!

@lwasser lwasser closed this as completed Dec 16, 2021
@cosmicBboy
Copy link
Author

thanks @lwasser will do! Just got swamped with other things, but am committed to submitting through JOSS in the new year

@lwasser
Copy link
Member

lwasser commented Sep 15, 2022

hey 👋 @cosmicBboy @mbjoseph @xmnlab ! I hope that you are all well. I am reaching out here to all reviewers and maintainers about pyOpenSci now that i am working full time on the project (read more here). We have a survey that we'd like for you to fill out so we can:

🔗 HERE IS THE SURVEY LINK 🔗

  1. invite you to our slack channel to participate in our community (if you wish to join - no worries if that is not how you prefer to communicate / participate).
  2. Collect information from you about how we can improve our review process and also better serve maintainers.
    The survey should take about 10 minutes to complete depending upon how much you decide to write. This information will help us greatly as we make decisions about how pyOpenSci grows and serves the community. Thank you so much in advance for filling it out.

NOTE: this is different from the form designed for reviewers to sign up to review.
If there are other maintainers for this project, please ping them here and ask them to fill out the survey as well. It is important that we ensure packages are supported long term or sunsetted with sufficient communication to users. Thus we will check in with maintainers annually about maintenance.

Thank you in advance for doing this and supporting pyOpenSci.

@lwasser
Copy link
Member

lwasser commented Sep 28, 2022

hey there @cosmicBboy @mbjoseph 👋 Just a friendly reminder to take 5-10 minutes to fill out our survey . We really appreciate it. Thank you in advance for helping us by filling out the survey!! 🙌 Niels, it's really important for us to collect information from our maintainers so that we can both stay in touch with you regarding package maintenance and also support you through time. We really appreciate your time in filling this out. Also are you the sole maintainer of this package? if not, please have your co-maintainers also fill it out and please list them here as well. Many thanks in advance!

✨ Ivan you only need to do this once :) ping me on slack with any questions!! 🙌

🔗 HERE IS THE SURVEY LINK 🔗

@lwasser
Copy link
Member

lwasser commented Oct 19, 2022

hi again @cosmicBboy and @mbjoseph i'd be super appreciative if your filling our our survey

🔗 HERE IS THE SURVEY LINK 🔗!

I know you are busy and Niels I know you have super exciting job transition life happening now. But i'd appreciate your time. We'd like to check in with maintainers once a year to ensure all is well with package maintenance. Also your input on the survey helps us improve and show funders we are doing good things! Many thanks for your time!

@cosmicBboy
Copy link
Author

just filled it out!

@lwasser
Copy link
Member

lwasser commented Oct 24, 2022

You rock!! thanks Niels!

@NickleDave
Copy link
Contributor

Hi @cosmicBboy we are updating our metadata to be consistent.

When you have a second, can you please confirm for me that at the time of this review you were the only core maintainer? I have added that in the "all current maintainers" field above (as in #109)

@cosmicBboy
Copy link
Author

Hi @NickleDave sorry for the late response 😅

can you please confirm for me that at the time of this review you were the only core maintainer?
Yes, confirmed

@lwasser lwasser moved this to pyos-accepted in peer-review-status Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: pyos-accepted
Development

No branches or pull requests

6 participants