Add tests for docstring Validation Script + py27 compat #20061

WillAyd · 2018-03-08T21:07:04Z

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

@jorisvandenbossche @toobaz a few edits here should make it work for both Py3 and Py2 users

codecov · 2018-03-08T22:10:41Z

Codecov Report

Merging #20061 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #20061   +/-   ##
=======================================
  Coverage   92.08%   92.08%           
=======================================
  Files         169      169           
  Lines       50706    50706           
=======================================
  Hits        46691    46691           
  Misses       4015     4015

Flag	Coverage Δ
#multiple	`90.49% <ø> (ø)`	⬆️
#single	`42.33% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 601d71f...1fb5405. Read the comment docs.

toobaz · 2018-03-09T00:06:31Z

Tested, works great, thanks!

jreback · 2018-03-09T00:27:55Z

could you add a test for this somewhere? (need to make sure the test runner runs this); or can you have the lint.sh script actually run (a sub-set of this). that way we know changes to this will be good.

WillAyd · 2018-03-09T01:08:47Z

Fair point. The only issue with CI here is going to be the location of the files in the scripts folder, which I assume pytest nor the tests themselves would have access to given it's outside of the top level pandas package. lint.sh could fix that but I do think it would be better to have dedicated unit tests for the functionality.

Have to think it over but if you have any ideas on how to ideally configure the imports for testing let me know

jreback · 2018-03-09T01:29:37Z

can just add

pandas/tests/scripts might be ok

pep8speaks · 2018-03-09T03:32:49Z

Hello @WillAyd! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 13, 2018 at 20:40 Hours UTC

WillAyd · 2018-03-09T03:52:00Z

Latest commit will fail but should be a starting point for a test module. @jorisvandenbossche @datapythonista I took these examples directly from the guide, but it looks like even the examples in the guide fail the validation script. In this case do you know if it's the script or the examples that need to be updated?

Here are the major failures quasi-summarized:

Parameters {'kwargs'} not documented
Unknown parameters {'**kwargs'}
Parameter "**kwargs" has not type
No returns section found (some methods don't return anything)
No see also section found (docs technically say these are optional)
No examples section found (docs suggest these may be required, but aren't explicit)
Parameters {'letters', 'length'} not documented (this is wrong for test random_letters)
Docstring text should start in the line immediately... (head examples in docs are wrong)

Curious to hear how you would like to approach the above

jorisvandenbossche · 2018-03-09T09:53:46Z

scripts/validate_docstrings.py

@@ -186,12 +185,11 @@ def signature_parameters(self):
            # accessor classes have a signature, but don't want to show this
            return tuple()
        try:
-            signature = inspect.signature(self.method_obj)
+            params = self.method_obj.__code__.co_varnames


unfortunately, this does not give the correct result (it seems to give some extra names). Eg

In [156]: pd.DataFrame.apply.__code__.co_varnames Out[156]: ('self', 'func', 'axis', 'broadcast', 'raw', 'reduce', 'result_type', 'args', 'kwds', 'frame_apply', 'op')

the last two should not be there

toobaz · 2018-03-09T10:51:09Z

With the new version of the script, an error of the kind:

Line 141, in pandas.DataFrame.apply
Failed example:
    df.apply(lambda x: Series([1, 2], index=['foo', 'bar']), axis=1)
Exception raised:
    Traceback (most recent call last):
      File "/usr/lib/python3.5/doctest.py", line 1321, in __run
        compileflags, 1), test.globs)
      File "<doctest pandas.DataFrame.apply[7]>", line 1, in <module>
        df.apply(lambda x: Series([1, 2], index=['foo', 'bar']), axis=1)
      File "/home/nobackup/repo/pandas/pandas/core/frame.py", line 5004, in apply
        return op.get_result()
      File "/home/nobackup/repo/pandas/pandas/core/apply.py", line 136, in get_result
        return self.apply_standard()
      File "/home/nobackup/repo/pandas/pandas/core/apply.py", line 242, in apply_standard
        self.apply_series_generator()
      File "/home/nobackup/repo/pandas/pandas/core/apply.py", line 271, in apply_series_generator
        results[i] = self.f(v)
      File "<doctest pandas.DataFrame.apply[7]>", line 1, in <lambda>
        df.apply(lambda x: Series([1, 2], index=['foo', 'bar']), axis=1)
    NameError: ("name 'Series' is not defined", 'occurred at index 0')

becomes under python2

Traceback (most recent call last):
  File "validate_docstrings.py", line 496, in <module>
    sys.exit(main(args.function))
  File "validate_docstrings.py", line 482, in main
    return validate_one(function)
  File "validate_docstrings.py", line 459, in validate_one
    examples_errs = doc.examples_errors
  File "validate_docstrings.py", line 265, in examples_errors
    runner.run(test, out=f.write)
  File "/usr/lib/python2.7/doctest.py", line 1454, in run
    return self.__run(test, compileflags, out)
  File "/usr/lib/python2.7/doctest.py", line 1363, in __run
    self.report_failure(out, test, example, got)
  File "/usr/lib/python2.7/doctest.py", line 1228, in report_failure
    self._checker.output_difference(example, got, self.optionflags))
TypeError: unicode argument expected, got 'str'

(obtained with python validate_docstrings.py pandas.DataFrame.apply).

toobaz · 2018-03-09T10:52:18Z

As I stated in gitter, fixing the remaining issues for py2 support would be great, but I think changing #!/usr/bin/env python to #!/usr/bin/env python3 and telling users that they must have python 3 is also perfectly acceptable.

WillAyd · 2018-03-09T16:51:29Z

Thanks @toobaz and @jorisvandenbossche. So it looks like there are a couple of things to do here and with the sprint being tomorrow I'm wondering if it's even worth trying to get the compatibility to work and rather doing as @toobaz suggested and just explicitly requiring python3.

I think the bigger issue is that the test cases here taken from the doc don't pass the script. @jorisvandenbossche I saw your clarification on the **kwargs piece on the Google group, but curious what you think of the other issues?

jreback · 2018-03-10T12:42:56Z

pandas/tests/scripts/test_validate_docstrings.py

+
+    @pytest.fixture(autouse=True, scope="class")
+    def import_scripts(self):
+        up = os.path.dirname


pretty magical can you add a comment

jreback · 2018-03-10T12:43:28Z

pandas/tests/scripts/test_validate_docstrings.py

@@ -0,0 +1,327 @@
+import os


not sure if the CI is actually running this?

WillAyd · 2018-03-12T19:02:52Z

@datapythonista @jorisvandenbossche @TomAugspurger in the last commit I blended a few things together, namely:

Added the test cases
Fixed Py27 support
Fixed varargs and keyword args validation AND
Relaxed errors on some sections (namely Extended Summary and Examples)

I copied most of the test cases from the document, but even then some of them weren't passing the validation script so I took a few liberties to modify. Ideally we would build out test cases to unit test particular aspects of the validation script, but for now I've just done a blanket pass for success / failures.

WillAyd · 2018-03-12T19:03:48Z

scripts/validate_docstrings.py

-        errs.append('No returns section found')
+    if not doc.returns and "return" in doc.method_source:
+        errs.append('No Returns section found')
+    if "yield" in doc.method_source:


This would ideally have the same structure as the returns check directly above it, but I think there's a bug in numpy doc where it doesn't parse Yields sections

jreback · 2018-03-13T10:14:50Z

andas/tests/scripts/test_validate_docstrings.py .............. [100%]
normal? maybe should catch this

================================================================================================================================ warnings summary =================================================================================================================================
pandas/tests/scripts/test_validate_docstrings.py::TestValidator::()::test_good_functions[sample_values]
  /Users/jreback/pandas/doc/sphinxext/numpydoc/docscrape.py:119: UserWarning: Unknown section Yields
    warn("Unknown section %s" % key)

-- Docs: http://doc.pytest.org/en/latest/warnings.html

WillAyd · 2018-03-13T16:27:29Z

The issue with 'Yields' stems from the fact that we include a copy of numpydocstr directly in pandas, but it doesn't appear to be a recent version (Yields support was added back in numpydoc 0.6)

Is there a particular reason we decided to copy that package into pandas rather than manage as a dependency? @jorisvandenbossche

TomAugspurger · 2018-03-13T16:30:03Z

Long story involving many awful hacks :) #18147 I think the final blocker is numpy/numpydoc#106

…

On Tue, Mar 13, 2018 at 11:27 AM, William Ayd ***@***.***> wrote: The issue with 'Yields' stems from the fact that we include a copy of numpydocstr directly in pandas, but it doesn't appear to be a recent version (Yields support was added back in numpydoc 0.6) Is there a particular reason we decided to copy that package into pandas rather than manage as a dependency? @jorisvandenbossche <https://github.com/jorisvandenbossche> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20061 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIh5OJJnGHIl2AEy0XJdw5IBzt5sBks5td_N3gaJpZM4SjWJQ> .

WillAyd · 2018-08-14T03:26:14Z

Not sure why I originally added a skip_if_no('sphinx') a few months back but that was the culprit for not running. Should be good to go now.

One failure in Travis is around clipboard tests so unrelated imo. Any other feedback lmk

WillAyd · 2018-08-17T11:53:06Z

Any other feedback on this change? Hoping to avoid more rebasing :-)

datapythonista · 2018-08-17T11:58:57Z

lgtm, just wanted to run it for some docstrings before merging it, and didn't have the time. But will do during the weekend.

jorisvandenbossche

This looks good to me!
Quickly skimmed through the code and changes look good to me. Some nice improvements! And tried some docstrings and seems to work nicely.

One thing (but not necessarily for this PR), when doing the check that a parameter description ends with a '.', we should check that the last line does not contain 'versionadded' (that's a typical false positive in the script that should not be hard to catch)

jorisvandenbossche · 2018-08-17T15:55:19Z

scripts/validate_docstrings.py

+            if hasattr(self.method_obj, '_accessors') and (
+                    self.method_name.split('.')[-1] in
+                    self.method_obj._accessors):
+                # accessor classes have a signature but don't want to show this


This does not seem to work for me. Eg checking pandas.Series.dt with the script complains about parameter 'data' not being documented.

(but anyhow, it is better as on master, there it raises an attribute error somewhere in the script)

Good to know. I’d be happy to open that up as a follow up issue here - worth adding dedicated tests for accessors

datapythonista · 2018-08-17T19:36:06Z

I've been doing tests, and besides what @jorisvandenbossche said, there are few other cases where the script still does not work (unrelated to this PR). For example pandas.Timestamp.combine, pandas.Series.droplevel or pandas.Panel.transpose.

Merging, so we can keep improving the scripts and the docs in separate PRs.

Thanks a lot @WillAyd, I know it's been a lot of work this PR, but it'll help a lot.

jorisvandenbossche · 2018-08-18T07:26:05Z

@datapythonista what is wrong for pandas.Series.droplevel ? (for Panel.transpose it fails because the docstring is just badly misformatted)

datapythonista · 2018-08-18T13:34:00Z

@jorisvandenbossche not sure what was the problem, I was executing the validation for all docstrings, and it failed, but probably caused by something in my changes.

I fixed the problem with Panel.transpose in #22408, which also allows running the validation of all docstrings, returning the result as a json. Only 11% pass all the validations. A bit more, as some will fail because of the problem with .. versionadded:: (#22405). But seems like we're still far from having all ready.

The scripts directory is not visible if the installation is not inplace. Follow-up to pandas-devgh-20061.

jreback · 2018-08-20T09:46:12Z

this broke all of the wheel building:
https://travis-ci.org/MacPython/pandas-wheels/jobs/418057182

something wrong with the imports

jorisvandenbossche · 2018-08-20T09:55:58Z

@jreback are the tests run there for an wheel installed version of pandas? (because I suppose that then the scripts directory is not available?)

If that is the case, easiest solution is probably to check the existence of the script, and otherwise simply skip the tests

jorisvandenbossche · 2018-08-20T09:56:36Z

Ah, this is already discussed in #22413

If the pandas is not inplace, the scripts directory will not exist, and the tests will fail. Follow-up to pandas-devgh-20061.

If the pandas is not inplace, the scripts directory will not exist, and the tests will fail. Follow-up to gh-20061.

jorisvandenbossche · 2018-08-23T08:42:49Z

@WillAyd the test log output on travis et al now includes output of the validation script from running it in its tests. We should maybe capture stdout for the tests here to avoid that?

WillAyd · 2018-08-23T13:03:40Z

Makes sense - opened #22483 for that.

On the road for the next few days so wouldn’t be able to get around to it personally until next week, though it’s out there for the community to pick up as well :-)

h-vetinari · 2018-08-31T07:28:56Z

Since I rebased, I'm getting an error related to this PR (I believe), but only in the 3.6 travis job (https://travis-ci.org/pandas-dev/pandas/jobs/422781966 and https://travis-ci.org/pandas-dev/pandas/jobs/422883422):

UNEXPECTED EXCEPTION: SyntaxError('unexpected EOF while parsing', ('<doctest pandas.core.frame.DataFrame.duplicated[0]>', 1, 66, "data = {'species': ['lama', 'cow', 'lama', 'ant', 'lama', 'bee'],\n"))

The relevant section from the docstring is:

        Examples
        --------
        By default, for each set of duplicated values, the first occurrence is
        set on False and all others on True:

        >>> data = {'species': ['lama', 'cow', 'lama', 'ant', 'lama', 'bee'],
                    'type': ['mammal'] * 3 + ['insect', 'mammal', 'insect']}
        >>> animals = pd.DataFrame(data, index=[1, 4, 9, 16, 25, 36])

and I can't see what's wrong with it. How can one have a line-break in there, or is this just not supported / a bug?

jorisvandenbossche · 2018-08-31T07:40:50Z

@h-vetinari it's not related to this PR, but to this one: #19952
But, this is actually the point of that test, to catch such errors. I commented on your PR what is wrong.

… py27 (pandas-dev#20061)

If the pandas is not inplace, the scripts directory will not exist, and the tests will fail. Follow-up to pandas-devgh-20061.

thoo · 2018-11-05T19:21:24Z

When I run scripts/validate_docstrings.py pandas.read_fwf , I got this error.

Parameters {**kwds} not documented
                Unknown parameters {mangle_dupe_cols, compression, doublequote, warn_bad_lines, quotechar, usecols, na_values, converters, chunksize, skiprows, na_filter, true_values, escapechar, comment, memory_map, delim_whitespace, squeeze, low_memory, index_col, parse_dates, lineterminator, float_precision, iterator, dtype, keep_default_na, dialect, infer_datetime_format, encoding, dayfirst, decimal, verbose, delimiter, skip_blank_lines, quoting, names, tupleize_cols, error_bad_lines, header, nrows, keep_date_col, thousands, false_values, skipfooter, date_parser, prefix, skipinitialspace}

Should this one also fix **kwds? Thanks.

jreback added the Docs label Mar 9, 2018

WillAyd added 3 commits March 8, 2018 19:31

Added Py27 support for validation script

4673475

Removed contextlib import

9abc004

Added test for script validator

752c6db

WillAyd force-pushed the script-compat branch from 7438852 to 752c6db Compare March 9, 2018 03:32

Fixed writer arg to doctest.runner

3eaf3ba

jorisvandenbossche reviewed Mar 9, 2018

View reviewed changes

jreback requested changes Mar 10, 2018

View reviewed changes

WillAyd mentioned this pull request Mar 12, 2018

DOC: docstring validation script improvements #20298

Open

19 tasks

WillAyd added 2 commits March 12, 2018 11:57

Py27 compat and updated tests / logic

876337f

Merge remote-tracking branch 'upstream/master' into script-compat

aa2a0f9

WillAyd commented Mar 12, 2018

View reviewed changes

WillAyd added 2 commits March 12, 2018 23:20

Merge remote-tracking branch 'upstream/master' into script-compat

beb56d2

Added skipif for no sphinx

d0e0ad6

jreback added this to the 0.23.0 milestone Mar 13, 2018

jreback added the Testing pandas testing functions or related to the test suite label Mar 13, 2018

LINT fixup

1fb5405

jorisvandenbossche changed the title ~~Added Py27 Support for Validation Script~~ Add tests for docstring Validation Script + py27 compat Aug 17, 2018

jorisvandenbossche added this to the 0.24.0 milestone Aug 17, 2018

jorisvandenbossche approved these changes Aug 17, 2018

View reviewed changes

datapythonista merged commit 9f6c02d into pandas-dev:master Aug 17, 2018

WillAyd deleted the script-compat branch August 18, 2018 21:50

gfyoung added a commit to forking-repos/pandas that referenced this pull request Aug 19, 2018

BLD: Install scripts tests only during inplace

189c585

The scripts directory is not visible if the installation is not inplace. Follow-up to pandas-devgh-20061.

gfyoung mentioned this pull request Aug 19, 2018

BLD: Install scripts tests only during inplace #22413

Merged

gfyoung added a commit to forking-repos/pandas that referenced this pull request Aug 20, 2018

TST: Skip scripts test if scripts doesn't exist

9f2ad74

If the pandas is not inplace, the scripts directory will not exist, and the tests will fail. Follow-up to pandas-devgh-20061.

gfyoung added a commit to forking-repos/pandas that referenced this pull request Aug 20, 2018

TST: Skip scripts test if scripts doesn't exist

96ec3d7

If the pandas is not inplace, the scripts directory will not exist, and the tests will fail. Follow-up to pandas-devgh-20061.

gfyoung added a commit to forking-repos/pandas that referenced this pull request Aug 20, 2018

TST: Skip scripts test if scripts doesn't exist

c14c540

If the pandas is not inplace, the scripts directory will not exist, and the tests will fail. Follow-up to pandas-devgh-20061.

jreback pushed a commit that referenced this pull request Aug 21, 2018

TST: Skip scripts test if scripts doesn't exist (#22413)

9c35865

If the pandas is not inplace, the scripts directory will not exist, and the tests will fail. Follow-up to gh-20061.

WillAyd mentioned this pull request Aug 23, 2018

Capture STDOUT For Validations Script Tests #22483

Closed

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

Adding tests for validate_docstrings.py and making it compatible with…

6942152

… py27 (pandas-dev#20061)

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

TST: Skip scripts test if scripts doesn't exist (pandas-dev#22413)

a379e0b

If the pandas is not inplace, the scripts directory will not exist, and the tests will fail. Follow-up to pandas-devgh-20061.

thoo mentioned this pull request Nov 5, 2018

DOC: Fix docstring of read_csv and related methods #23496

Closed

Add tests for docstring Validation Script + py27 compat #20061

Add tests for docstring Validation Script + py27 compat #20061

Conversation

WillAyd commented Mar 8, 2018 • edited by jbrockmendel Loading

codecov bot commented Mar 8, 2018 • edited Loading

Codecov Report

toobaz commented Mar 9, 2018

jreback commented Mar 9, 2018

WillAyd commented Mar 9, 2018

jreback commented Mar 9, 2018

pep8speaks commented Mar 9, 2018 • edited Loading

Comment last updated on August 13, 2018 at 20:40 Hours UTC

WillAyd commented Mar 9, 2018

jorisvandenbossche Mar 9, 2018

Choose a reason for hiding this comment

toobaz commented Mar 9, 2018

toobaz commented Mar 9, 2018

WillAyd commented Mar 9, 2018

jreback Mar 10, 2018

Choose a reason for hiding this comment

jreback Mar 10, 2018

Choose a reason for hiding this comment

WillAyd commented Mar 12, 2018 • edited Loading

WillAyd Mar 12, 2018

Choose a reason for hiding this comment

jreback commented Mar 13, 2018

WillAyd commented Mar 13, 2018

TomAugspurger commented Mar 13, 2018 via email

WillAyd commented Aug 14, 2018

WillAyd commented Aug 17, 2018

datapythonista commented Aug 17, 2018

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche Aug 17, 2018

Choose a reason for hiding this comment

WillAyd Aug 17, 2018

Choose a reason for hiding this comment

datapythonista commented Aug 17, 2018

jorisvandenbossche commented Aug 18, 2018 • edited Loading

datapythonista commented Aug 18, 2018

jreback commented Aug 20, 2018

jorisvandenbossche commented Aug 20, 2018

jorisvandenbossche commented Aug 20, 2018

jorisvandenbossche commented Aug 23, 2018

WillAyd commented Aug 23, 2018

h-vetinari commented Aug 31, 2018 • edited Loading

jorisvandenbossche commented Aug 31, 2018

thoo commented Nov 5, 2018

WillAyd commented Mar 8, 2018 •

edited by jbrockmendel

Loading

codecov bot commented Mar 8, 2018 •

edited

Loading

pep8speaks commented Mar 9, 2018 •

edited

Loading

WillAyd commented Mar 12, 2018 •

edited

Loading

jorisvandenbossche commented Aug 18, 2018 •

edited

Loading

h-vetinari commented Aug 31, 2018 •

edited

Loading