Skip to content

Commit

Permalink
DOC: create shared includes for content shared by comparison docs
Browse files Browse the repository at this point in the history
This will help ensure consistency between the examples.
  • Loading branch information
afeld committed Dec 29, 2020
1 parent fb35344 commit cce169a
Show file tree
Hide file tree
Showing 10 changed files with 94 additions and 137 deletions.
65 changes: 6 additions & 59 deletions doc/source/getting_started/comparison/comparison_with_sas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ For potential users coming from `SAS <https://en.wikipedia.org/wiki/SAS_(softwar
this page is meant to demonstrate how different SAS operations would be
performed in pandas.

.. include:: comparison_boilerplate.rst
.. include:: includes/introduction.rst

.. note::

Expand Down Expand Up @@ -93,16 +93,7 @@ specifying the column names.
;
run;
A pandas ``DataFrame`` can be constructed in many different ways,
but for a small number of values, it is often convenient to specify it as
a Python dictionary, where the keys are the column names
and the values are the data.

.. ipython:: python
df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
df
.. include:: includes/construct_dataframe.rst

Reading external data
~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -217,12 +208,7 @@ or more columns.
DATA step begins and can also be used in PROC statements */
run;
DataFrames can be filtered in multiple ways; the most intuitive of which is using
:ref:`boolean indexing <indexing.boolean>`

.. ipython:: python
tips[tips["total_bill"] > 10].head()
.. include:: includes/filtering.rst

If/then logic
~~~~~~~~~~~~~
Expand All @@ -239,18 +225,7 @@ In SAS, if/then logic can be used to create new columns.
else bucket = 'high';
run;
The same operation in pandas can be accomplished using
the ``where`` method from ``numpy``.

.. ipython:: python
tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
tips.head()
.. ipython:: python
:suppress:
tips = tips.drop("bucket", axis=1)
.. include:: includes/if_then.rst

Date functionality
~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -278,28 +253,7 @@ functions pandas supports other Time Series features
not available in Base SAS (such as resampling and custom offsets) -
see the :ref:`timeseries documentation<timeseries>` for more details.

.. ipython:: python
tips["date1"] = pd.Timestamp("2013-01-15")
tips["date2"] = pd.Timestamp("2015-02-15")
tips["date1_year"] = tips["date1"].dt.year
tips["date2_month"] = tips["date2"].dt.month
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
"date1"
].dt.to_period("M")
tips[
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
].head()
.. ipython:: python
:suppress:
tips = tips.drop(
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
axis=1,
)
.. include:: includes/time_date.rst

Selection of columns
~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -349,14 +303,7 @@ Sorting in SAS is accomplished via ``PROC SORT``
by sex total_bill;
run;
pandas objects have a :meth:`~DataFrame.sort_values` method, which
takes a list of columns to sort by.

.. ipython:: python
tips = tips.sort_values(["sex", "total_bill"])
tips.head()
.. include:: includes/sorting.rst

String processing
-----------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ terminology and link to documentation for Excel, but much will be the same/simil
`Apple Numbers <https://www.apple.com/mac/numbers/compatibility/functions.html>`_, and other
Excel-compatible spreadsheet software.

.. include:: comparison_boilerplate.rst
.. include:: includes/introduction.rst

Data structures
---------------
Expand Down
21 changes: 3 additions & 18 deletions doc/source/getting_started/comparison/comparison_with_sql.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Since many potential pandas users have some familiarity with
`SQL <https://en.wikipedia.org/wiki/SQL>`_, this page is meant to provide some examples of how
various SQL operations would be performed using pandas.

.. include:: comparison_boilerplate.rst
.. include:: includes/introduction.rst

Most of the examples will utilize the ``tips`` dataset found within pandas tests. We'll read
the data into a DataFrame called ``tips`` and assume we have a database table of the same name and
Expand Down Expand Up @@ -65,24 +65,9 @@ Filtering in SQL is done via a WHERE clause.
SELECT *
FROM tips
WHERE time = 'Dinner'
LIMIT 5;
DataFrames can be filtered in multiple ways; the most intuitive of which is using
:ref:`boolean indexing <indexing.boolean>`

.. ipython:: python
tips[tips["time"] == "Dinner"].head(5)
The above statement is simply passing a ``Series`` of True/False objects to the DataFrame,
returning all rows with True.

.. ipython:: python
WHERE time = 'Dinner';
is_dinner = tips["time"] == "Dinner"
is_dinner.value_counts()
tips[is_dinner].head(5)
.. include:: includes/filtering.rst

Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame using | (OR) and &
(AND).
Expand Down
65 changes: 6 additions & 59 deletions doc/source/getting_started/comparison/comparison_with_stata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ For potential users coming from `Stata <https://en.wikipedia.org/wiki/Stata>`__
this page is meant to demonstrate how different Stata operations would be
performed in pandas.

.. include:: comparison_boilerplate.rst
.. include:: includes/introduction.rst

.. note::

Expand Down Expand Up @@ -89,16 +89,7 @@ specifying the column names.
5 6
end
A pandas ``DataFrame`` can be constructed in many different ways,
but for a small number of values, it is often convenient to specify it as
a Python dictionary, where the keys are the column names
and the values are the data.

.. ipython:: python
df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
df
.. include:: includes/construct_dataframe.rst

Reading external data
~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -210,12 +201,7 @@ Filtering in Stata is done with an ``if`` clause on one or more columns.
list if total_bill > 10
DataFrames can be filtered in multiple ways; the most intuitive of which is using
:ref:`boolean indexing <indexing.boolean>`.

.. ipython:: python
tips[tips["total_bill"] > 10].head()
.. include:: includes/filtering.rst

If/then logic
~~~~~~~~~~~~~
Expand All @@ -227,18 +213,7 @@ In Stata, an ``if`` clause can also be used to create new columns.
generate bucket = "low" if total_bill < 10
replace bucket = "high" if total_bill >= 10
The same operation in pandas can be accomplished using
the ``where`` method from ``numpy``.

.. ipython:: python
tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
tips.head()
.. ipython:: python
:suppress:
tips = tips.drop("bucket", axis=1)
.. include:: includes/if_then.rst

Date functionality
~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -266,28 +241,7 @@ functions, pandas supports other Time Series features
not available in Stata (such as time zone handling and custom offsets) --
see the :ref:`timeseries documentation<timeseries>` for more details.

.. ipython:: python
tips["date1"] = pd.Timestamp("2013-01-15")
tips["date2"] = pd.Timestamp("2015-02-15")
tips["date1_year"] = tips["date1"].dt.year
tips["date2_month"] = tips["date2"].dt.month
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
"date1"
].dt.to_period("M")
tips[
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
].head()
.. ipython:: python
:suppress:
tips = tips.drop(
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
axis=1,
)
.. include:: includes/time_date.rst

Selection of columns
~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -327,14 +281,7 @@ Sorting in Stata is accomplished via ``sort``
sort sex total_bill
pandas objects have a :meth:`DataFrame.sort_values` method, which
takes a list of columns to sort by.

.. ipython:: python
tips = tips.sort_values(["sex", "total_bill"])
tips.head()
.. include:: includes/sorting.rst

String processing
-----------------
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
:orphan:

A pandas ``DataFrame`` can be constructed in many different ways,
but for a small number of values, it is often convenient to specify it as
a Python dictionary, where the keys are the column names
and the values are the data.

.. ipython:: python
df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
df
18 changes: 18 additions & 0 deletions doc/source/getting_started/comparison/includes/filtering.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
:orphan:

DataFrames can be filtered in multiple ways; the most intuitive of which is using
:ref:`boolean indexing <indexing.boolean>`

.. ipython:: python
tips[tips["total_bill"] > 10]
The above statement is simply passing a ``Series`` of ``True``/``False`` objects to the DataFrame,
returning all rows with ``True``.

.. ipython:: python
is_dinner = tips["time"] == "Dinner"
is_dinner
is_dinner.value_counts()
tips[is_dinner]
14 changes: 14 additions & 0 deletions doc/source/getting_started/comparison/includes/if_then.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
:orphan:

The same operation in pandas can be accomplished using
the ``where`` method from ``numpy``.

.. ipython:: python
tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
tips.head()
.. ipython:: python
:suppress:
tips = tips.drop("bucket", axis=1)
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
:orphan:

If you're new to pandas, you might want to first read through :ref:`10 Minutes to pandas<10min>`
to familiarize yourself with the library.

Expand Down
9 changes: 9 additions & 0 deletions doc/source/getting_started/comparison/includes/sorting.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
:orphan:

pandas objects have a :meth:`DataFrame.sort_values` method, which
takes a list of columns to sort by.

.. ipython:: python
tips = tips.sort_values(["sex", "total_bill"])
tips.head()
24 changes: 24 additions & 0 deletions doc/source/getting_started/comparison/includes/time_date.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
:orphan:

.. ipython:: python
tips["date1"] = pd.Timestamp("2013-01-15")
tips["date2"] = pd.Timestamp("2015-02-15")
tips["date1_year"] = tips["date1"].dt.year
tips["date2_month"] = tips["date2"].dt.month
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
"date1"
].dt.to_period("M")
tips[
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
].head()
.. ipython:: python
:suppress:
tips = tips.drop(
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
axis=1,
)

0 comments on commit cce169a

Please sign in to comment.