Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for docstring Validation Script + py27 compat #20061

Merged
merged 36 commits into from
Aug 17, 2018

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Mar 8, 2018

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@jorisvandenbossche @toobaz a few edits here should make it work for both Py3 and Py2 users

@codecov
Copy link

codecov bot commented Mar 8, 2018

Codecov Report

Merging #20061 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #20061   +/-   ##
=======================================
  Coverage   92.08%   92.08%           
=======================================
  Files         169      169           
  Lines       50706    50706           
=======================================
  Hits        46691    46691           
  Misses       4015     4015
Flag Coverage Δ
#multiple 90.49% <ø> (ø) ⬆️
#single 42.33% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 601d71f...1fb5405. Read the comment docs.

@toobaz
Copy link
Member

toobaz commented Mar 9, 2018

Tested, works great, thanks!

@jreback jreback added the Docs label Mar 9, 2018
@jreback
Copy link
Contributor

jreback commented Mar 9, 2018

could you add a test for this somewhere? (need to make sure the test runner runs this); or can you have the lint.sh script actually run (a sub-set of this). that way we know changes to this will be good.

@WillAyd
Copy link
Member Author

WillAyd commented Mar 9, 2018

Fair point. The only issue with CI here is going to be the location of the files in the scripts folder, which I assume pytest nor the tests themselves would have access to given it's outside of the top level pandas package. lint.sh could fix that but I do think it would be better to have dedicated unit tests for the functionality.

Have to think it over but if you have any ideas on how to ideally configure the imports for testing let me know

@jreback
Copy link
Contributor

jreback commented Mar 9, 2018

can just add

pandas/tests/scripts might be ok

@pep8speaks
Copy link

pep8speaks commented Mar 9, 2018

Hello @WillAyd! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 13, 2018 at 20:40 Hours UTC

@WillAyd
Copy link
Member Author

WillAyd commented Mar 9, 2018

Latest commit will fail but should be a starting point for a test module. @jorisvandenbossche @datapythonista I took these examples directly from the guide, but it looks like even the examples in the guide fail the validation script. In this case do you know if it's the script or the examples that need to be updated?

Here are the major failures quasi-summarized:

  • Parameters {'kwargs'} not documented
  • Unknown parameters {'**kwargs'}
  • Parameter "**kwargs" has not type
  • No returns section found (some methods don't return anything)
  • No see also section found (docs technically say these are optional)
  • No examples section found (docs suggest these may be required, but aren't explicit)
  • Parameters {'letters', 'length'} not documented (this is wrong for test random_letters)
  • Docstring text should start in the line immediately... (head examples in docs are wrong)

Curious to hear how you would like to approach the above

@@ -186,12 +185,11 @@ def signature_parameters(self):
# accessor classes have a signature, but don't want to show this
return tuple()
try:
signature = inspect.signature(self.method_obj)
params = self.method_obj.__code__.co_varnames
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, this does not give the correct result (it seems to give some extra names). Eg

In [156]: pd.DataFrame.apply.__code__.co_varnames
Out[156]: 
('self',
 'func',
 'axis',
 'broadcast',
 'raw',
 'reduce',
 'result_type',
 'args',
 'kwds',
 'frame_apply',
 'op')

the last two should not be there

@toobaz
Copy link
Member

toobaz commented Mar 9, 2018

With the new version of the script, an error of the kind:

Line 141, in pandas.DataFrame.apply
Failed example:
    df.apply(lambda x: Series([1, 2], index=['foo', 'bar']), axis=1)
Exception raised:
    Traceback (most recent call last):
      File "/usr/lib/python3.5/doctest.py", line 1321, in __run
        compileflags, 1), test.globs)
      File "<doctest pandas.DataFrame.apply[7]>", line 1, in <module>
        df.apply(lambda x: Series([1, 2], index=['foo', 'bar']), axis=1)
      File "/home/nobackup/repo/pandas/pandas/core/frame.py", line 5004, in apply
        return op.get_result()
      File "/home/nobackup/repo/pandas/pandas/core/apply.py", line 136, in get_result
        return self.apply_standard()
      File "/home/nobackup/repo/pandas/pandas/core/apply.py", line 242, in apply_standard
        self.apply_series_generator()
      File "/home/nobackup/repo/pandas/pandas/core/apply.py", line 271, in apply_series_generator
        results[i] = self.f(v)
      File "<doctest pandas.DataFrame.apply[7]>", line 1, in <lambda>
        df.apply(lambda x: Series([1, 2], index=['foo', 'bar']), axis=1)
    NameError: ("name 'Series' is not defined", 'occurred at index 0')

becomes under python2

Traceback (most recent call last):
  File "validate_docstrings.py", line 496, in <module>
    sys.exit(main(args.function))
  File "validate_docstrings.py", line 482, in main
    return validate_one(function)
  File "validate_docstrings.py", line 459, in validate_one
    examples_errs = doc.examples_errors
  File "validate_docstrings.py", line 265, in examples_errors
    runner.run(test, out=f.write)
  File "/usr/lib/python2.7/doctest.py", line 1454, in run
    return self.__run(test, compileflags, out)
  File "/usr/lib/python2.7/doctest.py", line 1363, in __run
    self.report_failure(out, test, example, got)
  File "/usr/lib/python2.7/doctest.py", line 1228, in report_failure
    self._checker.output_difference(example, got, self.optionflags))
TypeError: unicode argument expected, got 'str'

(obtained with python validate_docstrings.py pandas.DataFrame.apply).

@toobaz
Copy link
Member

toobaz commented Mar 9, 2018

As I stated in gitter, fixing the remaining issues for py2 support would be great, but I think changing #!/usr/bin/env python to #!/usr/bin/env python3 and telling users that they must have python 3 is also perfectly acceptable.

@WillAyd
Copy link
Member Author

WillAyd commented Mar 9, 2018

Thanks @toobaz and @jorisvandenbossche. So it looks like there are a couple of things to do here and with the sprint being tomorrow I'm wondering if it's even worth trying to get the compatibility to work and rather doing as @toobaz suggested and just explicitly requiring python3.

I think the bigger issue is that the test cases here taken from the doc don't pass the script. @jorisvandenbossche I saw your clarification on the **kwargs piece on the Google group, but curious what you think of the other issues?


@pytest.fixture(autouse=True, scope="class")
def import_scripts(self):
up = os.path.dirname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty magical can you add a comment

@@ -0,0 +1,327 @@
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if the CI is actually running this?

@WillAyd
Copy link
Member Author

WillAyd commented Mar 12, 2018

@datapythonista @jorisvandenbossche @TomAugspurger in the last commit I blended a few things together, namely:

  • Added the test cases
  • Fixed Py27 support
  • Fixed varargs and keyword args validation AND
  • Relaxed errors on some sections (namely Extended Summary and Examples)

I copied most of the test cases from the document, but even then some of them weren't passing the validation script so I took a few liberties to modify. Ideally we would build out test cases to unit test particular aspects of the validation script, but for now I've just done a blanket pass for success / failures.

errs.append('No returns section found')
if not doc.returns and "return" in doc.method_source:
errs.append('No Returns section found')
if "yield" in doc.method_source:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would ideally have the same structure as the returns check directly above it, but I think there's a bug in numpy doc where it doesn't parse Yields sections

@jreback jreback added this to the 0.23.0 milestone Mar 13, 2018
@jreback jreback added the Testing pandas testing functions or related to the test suite label Mar 13, 2018
@jreback
Copy link
Contributor

jreback commented Mar 13, 2018

andas/tests/scripts/test_validate_docstrings.py .............. [100%]
normal? maybe should catch this

================================================================================================================================ warnings summary =================================================================================================================================
pandas/tests/scripts/test_validate_docstrings.py::TestValidator::()::test_good_functions[sample_values]
  /Users/jreback/pandas/doc/sphinxext/numpydoc/docscrape.py:119: UserWarning: Unknown section Yields
    warn("Unknown section %s" % key)

-- Docs: http://doc.pytest.org/en/latest/warnings.html

@WillAyd
Copy link
Member Author

WillAyd commented Mar 13, 2018

The issue with 'Yields' stems from the fact that we include a copy of numpydocstr directly in pandas, but it doesn't appear to be a recent version (Yields support was added back in numpydoc 0.6)

Is there a particular reason we decided to copy that package into pandas rather than manage as a dependency? @jorisvandenbossche

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 13, 2018 via email

@WillAyd
Copy link
Member Author

WillAyd commented Aug 14, 2018

Not sure why I originally added a skip_if_no('sphinx') a few months back but that was the culprit for not running. Should be good to go now.

One failure in Travis is around clipboard tests so unrelated imo. Any other feedback lmk

@WillAyd
Copy link
Member Author

WillAyd commented Aug 17, 2018

Any other feedback on this change? Hoping to avoid more rebasing :-)

@datapythonista
Copy link
Member

lgtm, just wanted to run it for some docstrings before merging it, and didn't have the time. But will do during the weekend.

@jorisvandenbossche jorisvandenbossche changed the title Added Py27 Support for Validation Script Add tests for docstring Validation Script + py27 compat Aug 17, 2018
@jorisvandenbossche jorisvandenbossche added this to the 0.24.0 milestone Aug 17, 2018
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!
Quickly skimmed through the code and changes look good to me. Some nice improvements! And tried some docstrings and seems to work nicely.

One thing (but not necessarily for this PR), when doing the check that a parameter description ends with a '.', we should check that the last line does not contain 'versionadded' (that's a typical false positive in the script that should not be hard to catch)

if hasattr(self.method_obj, '_accessors') and (
self.method_name.split('.')[-1] in
self.method_obj._accessors):
# accessor classes have a signature but don't want to show this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem to work for me. Eg checking pandas.Series.dt with the script complains about parameter 'data' not being documented.

(but anyhow, it is better as on master, there it raises an attribute error somewhere in the script)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. I’d be happy to open that up as a follow up issue here - worth adding dedicated tests for accessors

@datapythonista
Copy link
Member

I've been doing tests, and besides what @jorisvandenbossche said, there are few other cases where the script still does not work (unrelated to this PR). For example pandas.Timestamp.combine, pandas.Series.droplevel or pandas.Panel.transpose.

Merging, so we can keep improving the scripts and the docs in separate PRs.

Thanks a lot @WillAyd, I know it's been a lot of work this PR, but it'll help a lot.

@datapythonista datapythonista merged commit 9f6c02d into pandas-dev:master Aug 17, 2018
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Aug 18, 2018

@datapythonista what is wrong for pandas.Series.droplevel ? (for Panel.transpose it fails because the docstring is just badly misformatted)

@datapythonista
Copy link
Member

@jorisvandenbossche not sure what was the problem, I was executing the validation for all docstrings, and it failed, but probably caused by something in my changes.

I fixed the problem with Panel.transpose in #22408, which also allows running the validation of all docstrings, returning the result as a json. Only 11% pass all the validations. A bit more, as some will fail because of the problem with .. versionadded:: (#22405). But seems like we're still far from having all ready.

@WillAyd WillAyd deleted the script-compat branch August 18, 2018 21:50
gfyoung added a commit to forking-repos/pandas that referenced this pull request Aug 19, 2018
The scripts directory is not visible if the
installation is not inplace.

Follow-up to pandas-devgh-20061.
@jreback
Copy link
Contributor

jreback commented Aug 20, 2018

this broke all of the wheel building:
https://travis-ci.org/MacPython/pandas-wheels/jobs/418057182

something wrong with the imports

@jorisvandenbossche
Copy link
Member

@jreback are the tests run there for an wheel installed version of pandas? (because I suppose that then the scripts directory is not available?)

If that is the case, easiest solution is probably to check the existence of the script, and otherwise simply skip the tests

@jorisvandenbossche
Copy link
Member

Ah, this is already discussed in #22413

gfyoung added a commit to forking-repos/pandas that referenced this pull request Aug 20, 2018
If the pandas is not inplace, the scripts directory
will not exist, and the tests will fail.

Follow-up to pandas-devgh-20061.
gfyoung added a commit to forking-repos/pandas that referenced this pull request Aug 20, 2018
If the pandas is not inplace, the scripts directory
will not exist, and the tests will fail.

Follow-up to pandas-devgh-20061.
gfyoung added a commit to forking-repos/pandas that referenced this pull request Aug 20, 2018
If the pandas is not inplace, the scripts directory
will not exist, and the tests will fail.

Follow-up to pandas-devgh-20061.
jreback pushed a commit that referenced this pull request Aug 21, 2018
If the pandas is not inplace, the scripts directory
will not exist, and the tests will fail.

Follow-up to gh-20061.
@jorisvandenbossche
Copy link
Member

@WillAyd the test log output on travis et al now includes output of the validation script from running it in its tests. We should maybe capture stdout for the tests here to avoid that?

@WillAyd
Copy link
Member Author

WillAyd commented Aug 23, 2018

Makes sense - opened #22483 for that.

On the road for the next few days so wouldn’t be able to get around to it personally until next week, though it’s out there for the community to pick up as well :-)

@h-vetinari
Copy link
Contributor

h-vetinari commented Aug 31, 2018

Since I rebased, I'm getting an error related to this PR (I believe), but only in the 3.6 travis job (https://travis-ci.org/pandas-dev/pandas/jobs/422781966 and https://travis-ci.org/pandas-dev/pandas/jobs/422883422):

UNEXPECTED EXCEPTION: SyntaxError('unexpected EOF while parsing', ('<doctest pandas.core.frame.DataFrame.duplicated[0]>', 1, 66, "data = {'species': ['lama', 'cow', 'lama', 'ant', 'lama', 'bee'],\n"))

The relevant section from the docstring is:

        Examples
        --------
        By default, for each set of duplicated values, the first occurrence is
        set on False and all others on True:

        >>> data = {'species': ['lama', 'cow', 'lama', 'ant', 'lama', 'bee'],
                    'type': ['mammal'] * 3 + ['insect', 'mammal', 'insect']}
        >>> animals = pd.DataFrame(data, index=[1, 4, 9, 16, 25, 36])

and I can't see what's wrong with it. How can one have a line-break in there, or is this just not supported / a bug?

@jorisvandenbossche
Copy link
Member

@h-vetinari it's not related to this PR, but to this one: #19952
But, this is actually the point of that test, to catch such errors. I commented on your PR what is wrong.

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
If the pandas is not inplace, the scripts directory
will not exist, and the tests will fail.

Follow-up to pandas-devgh-20061.
@thoo
Copy link
Contributor

thoo commented Nov 5, 2018

When I run scripts/validate_docstrings.py pandas.read_fwf , I got this error.

Parameters {**kwds} not documented
                Unknown parameters {mangle_dupe_cols, compression, doublequote, warn_bad_lines, quotechar, usecols, na_values, converters, chunksize, skiprows, na_filter, true_values, escapechar, comment, memory_map, delim_whitespace, squeeze, low_memory, index_col, parse_dates, lineterminator, float_precision, iterator, dtype, keep_default_na, dialect, infer_datetime_format, encoding, dayfirst, decimal, verbose, delimiter, skip_blank_lines, quoting, names, tupleize_cols, error_bad_lines, header, nrows, keep_date_col, thousands, false_values, skipfooter, date_parser, prefix, skipinitialspace}

Should this one also fix **kwds? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants