Skip to content

Commit

Permalink
merge upstream/master
Browse files Browse the repository at this point in the history
  • Loading branch information
simonjayhawkins committed Feb 10, 2019
1 parent b12f658 commit 0e581ad
Show file tree
Hide file tree
Showing 313 changed files with 8,137 additions and 7,393 deletions.
10 changes: 5 additions & 5 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,16 @@ Our main contributing guide can be found [in this repo](https://github.com/panda

If you are looking to contribute to the *pandas* codebase, the best place to start is the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues). This is also a great place for filing bug reports and making suggestions for ways in which we can improve the code and documentation.

If you have additional questions, feel free to ask them on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). Further information can also be found in the "[Where to start?](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#where-to-start)" section.
If you have additional questions, feel free to ask them on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). Further information can also be found in the "[Where to start?](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#where-to-start)" section.

## Filing Issues

If you notice a bug in the code or documentation, or have suggestions for how we can improve either, feel free to create an issue on the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) using [GitHub's "issue" form](https://github.com/pandas-dev/pandas/issues/new). The form contains some questions that will help us best address your issue. For more information regarding how to file issues against *pandas*, please refer to the "[Bug reports and enhancement requests](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#bug-reports-and-enhancement-requests)" section.
If you notice a bug in the code or documentation, or have suggestions for how we can improve either, feel free to create an issue on the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) using [GitHub's "issue" form](https://github.com/pandas-dev/pandas/issues/new). The form contains some questions that will help us best address your issue. For more information regarding how to file issues against *pandas*, please refer to the "[Bug reports and enhancement requests](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#bug-reports-and-enhancement-requests)" section.

## Contributing to the Codebase

The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to the "[Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#working-with-the-code)" section.
The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to the "[Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#working-with-the-code)" section.

Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites in the "[Test-driven development/code writing](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#test-driven-development-code-writing)" section. We also have guidelines regarding coding style that will be enforced during testing, which can be found in the "[Code standards](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#code-standards)" section.
Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites in the "[Test-driven development/code writing](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#test-driven-development-code-writing)" section. We also have guidelines regarding coding style that will be enforced during testing, which can be found in the "[Code standards](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#code-standards)" section.

Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request. Details about how to do that can be found in the "[Contributing your changes to pandas](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#contributing-your-changes-to-pandas)" section. We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's ready, we will merge it, and you will have successfully contributed to the codebase!
Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request. Details about how to do that can be found in the "[Contributing your changes to pandas](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#contributing-your-changes-to-pandas)" section. We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's ready, we will merge it, and you will have successfully contributed to the codebase!
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -101,14 +101,14 @@ asv_bench/pandas/
# Documentation generated files #
#################################
doc/source/generated
doc/source/api/generated
doc/source/user_guide/styled.xlsx
doc/source/reference/api
doc/source/_static
doc/source/vbench
doc/source/vbench.rst
doc/source/index.rst
doc/build/html/index.html
# Windows specific leftover:
doc/tmp.sv
doc/source/styled.xlsx
env/
doc/source/savefig/
1 change: 0 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,3 @@ doc:
cd doc; \
python make.py clean; \
python make.py html
python make.py spellcheck
1 change: 1 addition & 0 deletions asv_bench/benchmarks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Pandas benchmarks."""
3 changes: 1 addition & 2 deletions asv_bench/benchmarks/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
import pandas as pd
from pandas.util import testing as tm


for imp in ['pandas.util', 'pandas.tools.hashing']:
try:
hashing = import_module(imp)
Expand Down Expand Up @@ -142,4 +141,4 @@ def time_quantile(self, quantile, interpolation, dtype):
self.idx.quantile(quantile, interpolation=interpolation)


from .pandas_vb_common import setup # noqa: F401
from .pandas_vb_common import setup # noqa: F401 isort:skip
19 changes: 13 additions & 6 deletions asv_bench/benchmarks/categoricals.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,12 +223,19 @@ class CategoricalSlicing(object):

def setup(self, index):
N = 10**6
values = list('a' * N + 'b' * N + 'c' * N)
indices = {
'monotonic_incr': pd.Categorical(values),
'monotonic_decr': pd.Categorical(reversed(values)),
'non_monotonic': pd.Categorical(list('abc' * N))}
self.data = indices[index]
categories = ['a', 'b', 'c']
values = [0] * N + [1] * N + [2] * N
if index == 'monotonic_incr':
self.data = pd.Categorical.from_codes(values,
categories=categories)
elif index == 'monotonic_decr':
self.data = pd.Categorical.from_codes(list(reversed(values)),
categories=categories)
elif index == 'non_monotonic':
self.data = pd.Categorical.from_codes([0, 1, 2] * N,
categories=categories)
else:
raise ValueError('Invalid index param: {}'.format(index))

self.scalar = 10000
self.list = list(range(10000))
Expand Down
2 changes: 1 addition & 1 deletion asv_bench/benchmarks/ctors.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ class SeriesDtypesConstructors(object):

def setup(self):
N = 10**4
self.arr = np.random.randn(N, N)
self.arr = np.random.randn(N)
self.arr_str = np.array(['foo', 'bar', 'baz'], dtype=object)
self.s = Series([Timestamp('20110101'), Timestamp('20120101'),
Timestamp('20130101')] * N * 10)
Expand Down
3 changes: 2 additions & 1 deletion asv_bench/benchmarks/index_object.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,8 @@ def setup(self, dtype):
self.sorted = self.idx.sort_values()
half = N // 2
self.non_unique = self.idx[:half].append(self.idx[:half])
self.non_unique_sorted = self.sorted[:half].append(self.sorted[:half])
self.non_unique_sorted = (self.sorted[:half].append(self.sorted[:half])
.sort_values())
self.key = self.sorted[N // 4]

def time_boolean_array(self, dtype):
Expand Down
4 changes: 2 additions & 2 deletions asv_bench/benchmarks/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,10 +102,10 @@ def setup(self, repeats):
N = 10**5
self.s = Series(tm.makeStringIndex(N))
repeat = {'int': 1, 'array': np.random.randint(1, 3, N)}
self.repeat = repeat[repeats]
self.values = repeat[repeats]

def time_repeat(self, repeats):
self.s.str.repeat(self.repeat)
self.s.str.repeat(self.values)


class Cat(object):
Expand Down
2 changes: 1 addition & 1 deletion azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ jobs:
if git diff upstream/master --name-only | grep -q "^asv_bench/"; then
cd asv_bench
asv machine --yes
ASV_OUTPUT="$(asv dev)"
ASV_OUTPUT="$(asv run --quick --show-stderr --python=same --launch-method=spawn)"
if [[ $(echo "$ASV_OUTPUT" | grep "failed") ]]; then
echo "##vso[task.logissue type=error]Benchmarks run with errors"
echo "$ASV_OUTPUT"
Expand Down
13 changes: 7 additions & 6 deletions ci/code_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then
# this particular codebase (e.g. src/headers, src/klib, src/msgpack). However,
# we can lint all header files since they aren't "generated" like C files are.
MSG='Linting .c and .h' ; echo $MSG
cpplint --quiet --extensions=c,h --headers=h --recursive --filter=-readability/casting,-runtime/int,-build/include_subdir pandas/_libs/src/*.h pandas/_libs/src/parser pandas/_libs/ujson pandas/_libs/tslibs/src/datetime
cpplint --quiet --extensions=c,h --headers=h --recursive --filter=-readability/casting,-runtime/int,-build/include_subdir pandas/_libs/src/*.h pandas/_libs/src/parser pandas/_libs/ujson pandas/_libs/tslibs/src/datetime pandas/io/msgpack pandas/_libs/*.cpp pandas/util
RET=$(($RET + $?)) ; echo $MSG "DONE"

echo "isort --version-number"
Expand Down Expand Up @@ -174,9 +174,10 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
MSG='Check that no file in the repo contains tailing whitespaces' ; echo $MSG
set -o pipefail
if [[ "$AZURE" == "true" ]]; then
! grep -n --exclude="*.svg" -RI "\s$" * | awk -F ":" '{print "##vso[task.logissue type=error;sourcepath=" $1 ";linenumber=" $2 ";] Tailing whitespaces found: " $3}'
# we exclude all c/cpp files as the c/cpp files of pandas code base are tested when Linting .c and .h files
! grep -n '--exclude=*.'{svg,c,cpp,html} -RI "\s$" * | awk -F ":" '{print "##vso[task.logissue type=error;sourcepath=" $1 ";linenumber=" $2 ";] Tailing whitespaces found: " $3}'
else
! grep -n --exclude="*.svg" -RI "\s$" * | awk -F ":" '{print $1 ":" $2 ":Tailing whitespaces found: " $3}'
! grep -n '--exclude=*.'{svg,c,cpp,html} -RI "\s$" * | awk -F ":" '{print $1 ":" $2 ":Tailing whitespaces found: " $3}'
fi
RET=$(($RET + $?)) ; echo $MSG "DONE"
fi
Expand Down Expand Up @@ -206,7 +207,7 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then

MSG='Doctests frame.py' ; echo $MSG
pytest -q --doctest-modules pandas/core/frame.py \
-k"-axes -combine -itertuples -join -pivot_table -query -reindex -reindex_axis -round"
-k" -itertuples -join -reindex -reindex_axis -round"
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Doctests series.py' ; echo $MSG
Expand Down Expand Up @@ -240,8 +241,8 @@ fi
### DOCSTRINGS ###
if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then

MSG='Validate docstrings (GL06, GL07, GL09, SS04, PR03, PR05, EX04)' ; echo $MSG
$BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL06,GL07,GL09,SS04,PR03,PR05,EX04
MSG='Validate docstrings (GL06, GL07, GL09, SS04, PR03, PR05, PR10, EX04, RT04, SS05, SA05)' ; echo $MSG
$BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL06,GL07,GL09,SS04,PR03,PR05,EX04,RT04,SS05,SA05
RET=$(($RET + $?)) ; echo $MSG "DONE"

fi
Expand Down
Binary file modified doc/cheatsheet/Pandas_Cheat_Sheet.pdf
Binary file not shown.
Binary file modified doc/cheatsheet/Pandas_Cheat_Sheet.pptx
Binary file not shown.
Binary file modified doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf
Binary file not shown.
Binary file modified doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx
Binary file not shown.
86 changes: 80 additions & 6 deletions doc/make.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,18 @@
import sys
import os
import shutil
import csv
import subprocess
import argparse
import webbrowser
import docutils
import docutils.parsers.rst


DOC_PATH = os.path.dirname(os.path.abspath(__file__))
SOURCE_PATH = os.path.join(DOC_PATH, 'source')
BUILD_PATH = os.path.join(DOC_PATH, 'build')
BUILD_DIRS = ['doctrees', 'html', 'latex', 'plots', '_static', '_templates']
REDIRECTS_FILE = os.path.join(DOC_PATH, 'redirects.csv')


class DocBuilder:
Expand All @@ -50,7 +53,7 @@ def __init__(self, num_jobs=0, include_api=True, single_doc=None,
if single_doc and single_doc.endswith('.rst'):
self.single_doc_html = os.path.splitext(single_doc)[0] + '.html'
elif single_doc:
self.single_doc_html = 'api/generated/pandas.{}.html'.format(
self.single_doc_html = 'reference/api/pandas.{}.html'.format(
single_doc)

def _process_single_doc(self, single_doc):
Expand All @@ -60,7 +63,7 @@ def _process_single_doc(self, single_doc):
For example, categorial.rst or pandas.DataFrame.head. For the latter,
return the corresponding file path
(e.g. generated/pandas.DataFrame.head.rst).
(e.g. reference/api/pandas.DataFrame.head.rst).
"""
base_name, extension = os.path.splitext(single_doc)
if extension in ('.rst', '.ipynb'):
Expand Down Expand Up @@ -118,8 +121,6 @@ def _sphinx_build(self, kind):
raise ValueError('kind must be html or latex, '
'not {}'.format(kind))

self.clean()

cmd = ['sphinx-build', '-b', kind]
if self.num_jobs:
cmd += ['-j', str(self.num_jobs)]
Expand All @@ -139,6 +140,77 @@ def _open_browser(self, single_doc_html):
single_doc_html)
webbrowser.open(url, new=2)

def _get_page_title(self, page):
"""
Open the rst file `page` and extract its title.
"""
fname = os.path.join(SOURCE_PATH, '{}.rst'.format(page))
option_parser = docutils.frontend.OptionParser(
components=(docutils.parsers.rst.Parser,))
doc = docutils.utils.new_document(
'<doc>',
option_parser.get_default_values())
with open(fname) as f:
data = f.read()

parser = docutils.parsers.rst.Parser()
# do not generate any warning when parsing the rst
with open(os.devnull, 'a') as f:
doc.reporter.stream = f
parser.parse(data, doc)

section = next(node for node in doc.children
if isinstance(node, docutils.nodes.section))
title = next(node for node in section.children
if isinstance(node, docutils.nodes.title))

return title.astext()

def _add_redirects(self):
"""
Create in the build directory an html file with a redirect,
for every row in REDIRECTS_FILE.
"""
html = '''
<html>
<head>
<meta http-equiv="refresh" content="0;URL={url}"/>
</head>
<body>
<p>
The page has been moved to <a href="{url}">{title}</a>
</p>
</body>
<html>
'''
with open(REDIRECTS_FILE) as mapping_fd:
reader = csv.reader(mapping_fd)
for row in reader:
if not row or row[0].strip().startswith('#'):
continue

path = os.path.join(BUILD_PATH,
'html',
*row[0].split('/')) + '.html'

try:
title = self._get_page_title(row[1])
except Exception:
# the file can be an ipynb and not an rst, or docutils
# may not be able to read the rst because it has some
# sphinx specific stuff
title = 'this page'

if os.path.exists(path):
raise RuntimeError((
'Redirection would overwrite an existing file: '
'{}').format(path))

with open(path, 'w') as moved_page_fd:
moved_page_fd.write(
html.format(url='{}.html'.format(row[1]),
title=title))

def html(self):
"""
Build HTML documentation.
Expand All @@ -150,6 +222,8 @@ def html(self):

if self.single_doc_html is not None:
self._open_browser(self.single_doc_html)
else:
self._add_redirects()
return ret_code

def latex(self, force=False):
Expand Down Expand Up @@ -184,7 +258,7 @@ def clean():
Clean documentation generated files.
"""
shutil.rmtree(BUILD_PATH, ignore_errors=True)
shutil.rmtree(os.path.join(SOURCE_PATH, 'api', 'generated'),
shutil.rmtree(os.path.join(SOURCE_PATH, 'reference', 'api'),
ignore_errors=True)

def zip_html(self):
Expand Down
Loading

0 comments on commit 0e581ad

Please sign in to comment.