Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: create shared includes for comparison docs #38771

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 6 additions & 59 deletions doc/source/getting_started/comparison/comparison_with_sas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ For potential users coming from `SAS <https://en.wikipedia.org/wiki/SAS_(softwar
this page is meant to demonstrate how different SAS operations would be
performed in pandas.

.. include:: comparison_boilerplate.rst
.. include:: includes/introduction.rst

.. note::

Expand Down Expand Up @@ -93,16 +93,7 @@ specifying the column names.
;
run;

A pandas ``DataFrame`` can be constructed in many different ways,
but for a small number of values, it is often convenient to specify it as
a Python dictionary, where the keys are the column names
and the values are the data.

.. ipython:: python

df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
df

.. include:: includes/construct_dataframe.rst

Reading external data
~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -217,12 +208,7 @@ or more columns.
DATA step begins and can also be used in PROC statements */
run;

DataFrames can be filtered in multiple ways; the most intuitive of which is using
:ref:`boolean indexing <indexing.boolean>`

.. ipython:: python

tips[tips["total_bill"] > 10].head()
.. include:: includes/filtering.rst

If/then logic
~~~~~~~~~~~~~
Expand All @@ -239,18 +225,7 @@ In SAS, if/then logic can be used to create new columns.
else bucket = 'high';
run;

The same operation in pandas can be accomplished using
the ``where`` method from ``numpy``.

.. ipython:: python

tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
tips.head()

.. ipython:: python
:suppress:

tips = tips.drop("bucket", axis=1)
.. include:: includes/if_then.rst

Date functionality
~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -278,28 +253,7 @@ functions pandas supports other Time Series features
not available in Base SAS (such as resampling and custom offsets) -
see the :ref:`timeseries documentation<timeseries>` for more details.

.. ipython:: python

tips["date1"] = pd.Timestamp("2013-01-15")
tips["date2"] = pd.Timestamp("2015-02-15")
tips["date1_year"] = tips["date1"].dt.year
tips["date2_month"] = tips["date2"].dt.month
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
"date1"
].dt.to_period("M")

tips[
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
].head()

.. ipython:: python
:suppress:

tips = tips.drop(
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
axis=1,
)
.. include:: includes/time_date.rst

Selection of columns
~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -349,14 +303,7 @@ Sorting in SAS is accomplished via ``PROC SORT``
by sex total_bill;
run;

pandas objects have a :meth:`~DataFrame.sort_values` method, which
takes a list of columns to sort by.

.. ipython:: python

tips = tips.sort_values(["sex", "total_bill"])
tips.head()

.. include:: includes/sorting.rst

String processing
-----------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ terminology and link to documentation for Excel, but much will be the same/simil
`Apple Numbers <https://www.apple.com/mac/numbers/compatibility/functions.html>`_, and other
Excel-compatible spreadsheet software.

.. include:: comparison_boilerplate.rst
.. include:: includes/introduction.rst

Data structures
---------------
Expand Down
21 changes: 3 additions & 18 deletions doc/source/getting_started/comparison/comparison_with_sql.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Since many potential pandas users have some familiarity with
`SQL <https://en.wikipedia.org/wiki/SQL>`_, this page is meant to provide some examples of how
various SQL operations would be performed using pandas.

.. include:: comparison_boilerplate.rst
.. include:: includes/introduction.rst

Most of the examples will utilize the ``tips`` dataset found within pandas tests. We'll read
the data into a DataFrame called ``tips`` and assume we have a database table of the same name and
Expand Down Expand Up @@ -65,24 +65,9 @@ Filtering in SQL is done via a WHERE clause.

SELECT *
FROM tips
WHERE time = 'Dinner'
LIMIT 5;
Copy link
Member Author

@afeld afeld Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only change of substance: left out the LIMIT and the .head(5) below since they aren't needed and LIMIT is covered elsewhere.


DataFrames can be filtered in multiple ways; the most intuitive of which is using
:ref:`boolean indexing <indexing.boolean>`

.. ipython:: python

tips[tips["time"] == "Dinner"].head(5)

The above statement is simply passing a ``Series`` of True/False objects to the DataFrame,
returning all rows with True.

.. ipython:: python
WHERE time = 'Dinner';

is_dinner = tips["time"] == "Dinner"
is_dinner.value_counts()
tips[is_dinner].head(5)
.. include:: includes/filtering.rst

Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame using | (OR) and &
(AND).
Expand Down
65 changes: 6 additions & 59 deletions doc/source/getting_started/comparison/comparison_with_stata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ For potential users coming from `Stata <https://en.wikipedia.org/wiki/Stata>`__
this page is meant to demonstrate how different Stata operations would be
performed in pandas.

.. include:: comparison_boilerplate.rst
.. include:: includes/introduction.rst

.. note::

Expand Down Expand Up @@ -89,16 +89,7 @@ specifying the column names.
5 6
end

A pandas ``DataFrame`` can be constructed in many different ways,
but for a small number of values, it is often convenient to specify it as
a Python dictionary, where the keys are the column names
and the values are the data.

.. ipython:: python

df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
df

.. include:: includes/construct_dataframe.rst

Reading external data
~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -210,12 +201,7 @@ Filtering in Stata is done with an ``if`` clause on one or more columns.

list if total_bill > 10

DataFrames can be filtered in multiple ways; the most intuitive of which is using
:ref:`boolean indexing <indexing.boolean>`.

.. ipython:: python

tips[tips["total_bill"] > 10].head()
.. include:: includes/filtering.rst

If/then logic
~~~~~~~~~~~~~
Expand All @@ -227,18 +213,7 @@ In Stata, an ``if`` clause can also be used to create new columns.
generate bucket = "low" if total_bill < 10
replace bucket = "high" if total_bill >= 10

The same operation in pandas can be accomplished using
the ``where`` method from ``numpy``.

.. ipython:: python

tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
tips.head()

.. ipython:: python
:suppress:

tips = tips.drop("bucket", axis=1)
.. include:: includes/if_then.rst

Date functionality
~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -266,28 +241,7 @@ functions, pandas supports other Time Series features
not available in Stata (such as time zone handling and custom offsets) --
see the :ref:`timeseries documentation<timeseries>` for more details.

.. ipython:: python

tips["date1"] = pd.Timestamp("2013-01-15")
tips["date2"] = pd.Timestamp("2015-02-15")
tips["date1_year"] = tips["date1"].dt.year
tips["date2_month"] = tips["date2"].dt.month
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
"date1"
].dt.to_period("M")

tips[
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
].head()

.. ipython:: python
:suppress:

tips = tips.drop(
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
axis=1,
)
.. include:: includes/time_date.rst

Selection of columns
~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -327,14 +281,7 @@ Sorting in Stata is accomplished via ``sort``

sort sex total_bill

pandas objects have a :meth:`DataFrame.sort_values` method, which
takes a list of columns to sort by.

.. ipython:: python

tips = tips.sort_values(["sex", "total_bill"])
tips.head()

.. include:: includes/sorting.rst

String processing
-----------------
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
:orphan:

A pandas ``DataFrame`` can be constructed in many different ways,
but for a small number of values, it is often convenient to specify it as
a Python dictionary, where the keys are the column names
and the values are the data.

.. ipython:: python

df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
df
25 changes: 25 additions & 0 deletions doc/source/getting_started/comparison/includes/filtering.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
:orphan:

DataFrames can be filtered in multiple ways; the most intuitive of which is using
:ref:`boolean indexing <indexing.boolean>`

.. ipython:: python
:suppress:

# ensure tips is defined when scanning with flake8-rst
if 'tips' not in vars():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you can just add this

   url = (
       "https://raw.github.com/pandas-dev"
       "/pandas/master/pandas/tests/io/data/csv/tips.csv"
   )
   tips = pd.read_csv(url)

but maybe @jorisvandenbossche has a better soln

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we would want to avoid to read this data multiple times in each file during the doc build, I think, so ideally we can let flake-rst ignore this in a different way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we would want to avoid to read this data multiple times in each file during the doc build, I think, so ideally we can let flake-rst ignore this in a different way

sure, any idea how?

tips = {}

.. ipython:: python

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think in all of these you need a :suppress: directive that defines tips :->

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It's a bit of a hack, so let me know if you'd like it done differently.

tips[tips["total_bill"] > 10]

The above statement is simply passing a ``Series`` of ``True``/``False`` objects to the DataFrame,
returning all rows with ``True``.

.. ipython:: python

is_dinner = tips["time"] == "Dinner"
is_dinner
is_dinner.value_counts()
tips[is_dinner]
21 changes: 21 additions & 0 deletions doc/source/getting_started/comparison/includes/if_then.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
:orphan:

The same operation in pandas can be accomplished using
the ``where`` method from ``numpy``.

.. ipython:: python
:suppress:

# ensure tips is defined when scanning with flake8-rst
if 'tips' not in vars():
tips = {}

.. ipython:: python

tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
tips.head()

.. ipython:: python
:suppress:

tips = tips.drop("bucket", axis=1)
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
:orphan:

If you're new to pandas, you might want to first read through :ref:`10 Minutes to pandas<10min>`
to familiarize yourself with the library.

Expand Down
16 changes: 16 additions & 0 deletions doc/source/getting_started/comparison/includes/sorting.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
:orphan:

pandas objects have a :meth:`DataFrame.sort_values` method, which
takes a list of columns to sort by.

.. ipython:: python
:suppress:

# ensure tips is defined when scanning with flake8-rst
if 'tips' not in vars():
tips = {}

.. ipython:: python

tips = tips.sort_values(["sex", "total_bill"])
tips.head()
31 changes: 31 additions & 0 deletions doc/source/getting_started/comparison/includes/time_date.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
:orphan:

.. ipython:: python
:suppress:

# ensure tips is defined when scanning with flake8-rst
if 'tips' not in vars():
tips = {}

.. ipython:: python

tips["date1"] = pd.Timestamp("2013-01-15")
tips["date2"] = pd.Timestamp("2015-02-15")
tips["date1_year"] = tips["date1"].dt.year
tips["date2_month"] = tips["date2"].dt.month
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
"date1"
].dt.to_period("M")

tips[
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
].head()

.. ipython:: python
:suppress:

tips = tips.drop(
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
axis=1,
)