pandas-dev · jreback · Nov 29, 2018 · Oct 30, 2018 · Nov 6, 2018 · Nov 11, 2018
diff --git a/doc/source/10min.rst b/doc/source/10min.rst
@@ -113,13 +113,40 @@ Here is how to view the top and bottom rows of the frame:
    df.head()
    df.tail(3)
 
-Display the index, columns, and the underlying NumPy data:
+Display the index, columns:
 
 .. ipython:: python
 
    df.index
    df.columns
-   df.values
+
+:meth:`DataFrame.to_numpy` gives a NumPy representation of the underlying data.
+Note that his can be an expensive operation when your :class:`DataFrame` has
+columns with different data types, which comes down to a fundamental difference
+between pandas and NumPy: **NumPy arrays have one dtype for the entire array,
+while pandas DataFrames have one dtype per column**. When you call
+:meth:`DataFrame.to_numpy`, pandas will find the NumPy dtype that can hold *all*
+of the dtypes in the DataFrame. This may end up being ``object``, which requires
+casting every value to a Python object.
+
+For ``df``, our :class:`DataFrame` of all floating-point values,
+:meth:`DataFrame.to_numpy` is fast and doesn't require copying data.
+
+.. ipython:: python
+
+   df.to_numpy()
+
+For ``df2``, the :class:`DataFrame` with multiple dtypes,
+:meth:`DataFrame.to_numpy` is relatively expensive.
+
+.. ipython:: python
+
+   df2.to_numpy()
+
+.. note::
+
+   :meth:`DataFrame.to_numpy` does *not* include the index or column
+   labels in the output.
 
 :func:`~DataFrame.describe` shows a quick statistic summary of your data:
 

diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst
@@ -188,7 +188,7 @@ highly performant. If you want to see only the used levels, you can use the
 
 .. ipython:: python
 
-   df[['foo', 'qux']].columns.values
+   df[['foo', 'qux']].columns.to_numpy()
 
    # for a specific level
    df[['foo', 'qux']].columns.get_level_values(0)

diff --git a/doc/source/basics.rst b/doc/source/basics.rst
@@ -46,8 +46,8 @@ of elements to display is five, but you may pass a custom number.
 
 .. _basics.attrs:
 
-Attributes and the raw ndarray(s)
----------------------------------
+Attributes and Underlying Data
+------------------------------
 
 pandas objects have a number of attributes enabling you to access the metadata
 
@@ -65,14 +65,28 @@ Note, **these attributes can be safely assigned to**!
    df.columns = [x.lower() for x in df.columns]
    df
 
-To get the actual data inside a data structure, one need only access the
-**values** property:
+Pandas objects (:class:`Index`, :class:`Series`, :class:`DataFrame`) can be
+thought of as containers for arrays, which hold the actual data and do the
+actual computation. For many types, the underlying array is a
+:class:`numpy.ndarray`. However, pandas and 3rd party libraries may *extend*
+NumPy's type system to add support for custom arrays
+(see :ref:`basics.dtypes`).
+
+To get the actual data inside a :class:`Index` or :class:`Series`, use
+the **array** property
 
 .. ipython:: python
 
-    s.values
-    df.values
-    wp.values
+   s.array
+   s.index.array
+
+Getting the "raw data" inside a :class:`DataFrame` is possibly a bit more
+complex. When your ``DataFrame`` only has a single data type for all the
+columns, :atr:`DataFrame.to_numpy` will return the underlying data:
+
+.. ipython:: python
+
+   df.to_numpy()
 
 If a DataFrame or Panel contains homogeneously-typed data, the ndarray can
 actually be modified in-place, and the changes will be reflected in the data
@@ -541,7 +555,7 @@ will exclude NAs on Series input by default:
 .. ipython:: python
 
    np.mean(df['one'])
-   np.mean(df['one'].values)
+   np.mean(df['one'].array)
 
 :meth:`Series.nunique` will return the number of unique non-NA values in a
 Series:
@@ -839,7 +853,7 @@ Series operation on each column or row:
 
    tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
                        index=pd.date_range('1/1/2000', periods=10))
-   tsdf.values[3:7] = np.nan
+   tsdf.iloc[3:7] = np.nan
 
 .. ipython:: python
 
@@ -1875,17 +1889,29 @@ dtypes
 ------
 
 For the most part, pandas uses NumPy arrays and dtypes for Series or individual
-columns of a DataFrame. The main types allowed in pandas objects are ``float``,
-``int``, ``bool``, and ``datetime64[ns]`` (note that NumPy does not support
-timezone-aware datetimes).
-
-In addition to NumPy's types, pandas :ref:`extends <extending.extension-types>`
-NumPy's type-system for a few cases.
-
-* :ref:`Categorical <categorical>`
-* :ref:`Datetime with Timezone <timeseries.timezone_series>`
-* :ref:`Period <timeseries.periods>`
-* :ref:`Interval <indexing.intervallindex>`
+columns of a DataFrame. NumPy provides support for ``float``,
+``int``, ``bool``, ``timedelta64[ns]`` and ``datetime64[ns]`` (note that NumPy
+does not support timezone-aware datetimes).
+
+Pandas and third-party libraries *extend* NumPy's type system in a few places.
+This section describes the extensions pandas has made internally.
+See :ref:`extending.extension-types` for how to write your own extension that
+works with pandas. See :ref:`ecosystem.extensions` for a list of third-party
+libraries that have implemented an extension.
+
+The following table lists all of pandas extension types. See the respective
+documentation sections for more on each type.
+
+=================== ========================= ================== ============================= =============================
+Kind of Data        Data Type                 Scalar             Array                         Documentation
+=================== ========================= ================== ============================= =============================
+tz-aware datetime   :class:`DatetimeArray`    :class:`Timestamp` :class:`arrays.DatetimeArray` :ref:`timeseries.timezone`
+Categorical         :class:`CategoricalDtype` (none)             :class:`Categorical`          :ref:`categorical`
+period (time spans) :class:`PeriodDtype`      :class:`Period`    :class:`arrays.PeriodArray`   :ref:`timeseries.periods`
+sparse              :class:`SparseDtype`      (none)             :class:`arrays.SparseArray`   :ref:`sparse`
+intervals           :class:`IntervalDtype`    :class:`Interval`  :class:`arrays.IntervalArray` :ref:`advanced.intervalindex`
+nullable integer    :clsas:`Int64Dtype`, ...  (none)             :class:`arrays.IntegerArray`  :ref:`integer_na`
+=================== ========================= ================== ============================= =============================
 
 Pandas uses the ``object`` dtype for storing strings.
 
@@ -1989,7 +2015,7 @@ force some *upcasting*.
 
 .. ipython:: python
 
-   df3.values.dtype
+   df3.to_numpy().dtype
 
 astype
 ~~~~~~
@@ -2211,11 +2237,11 @@ dtypes:
                       'float64': np.arange(4.0, 7.0),
                       'bool1': [True, False, True],
                       'bool2': [False, True, False],
-                      'dates': pd.date_range('now', periods=3).values,
+                      'dates': pd.date_range('now', periods=3),
                       'category': pd.Series(list("ABC")).astype('category')})
    df['tdeltas'] = df.dates.diff()
    df['uint64'] = np.arange(3, 6).astype('u8')
-   df['other_dates'] = pd.date_range('20130101', periods=3).values
+   df['other_dates'] = pd.date_range('20130101', periods=3)
    df['tz_aware_dates'] = pd.date_range('20130101', periods=3, tz='US/Eastern')
    df
 

diff --git a/doc/source/categorical.rst b/doc/source/categorical.rst
@@ -178,7 +178,7 @@ are consistent among all columns.
 
     To perform table-wise conversion, where all labels in the entire ``DataFrame`` are used as
     categories for each column, the ``categories`` parameter can be determined programmatically by
-    ``categories = pd.unique(df.values.ravel())``.
+    ``categories = pd.unique(df.to_numpy().ravel())``.
 
 If you already have ``codes`` and ``categories``, you can use the 
 :func:`~pandas.Categorical.from_codes` constructor to save the factorize step 
@@ -955,7 +955,7 @@ Use ``.astype`` or ``union_categoricals`` to get ``category`` result.
    pd.concat([s1, s3])
 
    pd.concat([s1, s3]).astype('category')
-   union_categoricals([s1.values, s3.values])
+   union_categoricals([s1.array, s3.array])
 
 
 Following table summarizes the results of ``Categoricals`` related concatenations.

diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst
@@ -137,7 +137,43 @@ However, operations such as slicing will also slice the index.
     s[[4, 3, 1]]
     np.exp(s)
 
-We will address array-based indexing in a separate :ref:`section <indexing>`.
+.. note::
+
+   We will address array-based indexing like ``s[[4, 3, 1]]``
+   in :ref:`section <indexing>`.
+
+Like a NumPy array, a pandas Series has a :attr:`~Series.dtype`.
+
+.. ipython:: python
+
+   s.dtype
+
+This is often a NumPy dtype. However, pandas and 3rd-party libraries
+extend NumPy's type system in a few places, in which case the dtype would
+be a :class:`~pandas.api.extensions.ExtensionDtype`. Some examples within
+pandas are :ref:`categorical` and :ref:`integer_na`. See :ref:`basics.dtypes`
+for more.
+
+If you need the actual array backing a ``Series``, use :attr:`Series.array`.
+
+.. ipython:: python
+
+   s.array
+
+Again, this is often a NumPy array, but may instead be a
+:class:`~pandas.api.extensions.ExtensionArray`. See :ref:`basics.dtypes` for more.
+Accessing the array can be useful when you need to do some operation without the
+index (to disable :ref:`automatic alignment <dsintro.alignment>`, for example).
+
+While Series is ndarray-like, if you need an *actual* ndarray, then use
+:meth:`Series.to_numpy`.
+
+.. ipython:: python
+
+   s.to_numpy()
+
+Even if the Series is backed by a :class:`~pandas.api.extensions.ExtensionArray`,
+:meth:`Series.to_numpy` will return a NumPy ndarray.
 
 Series is dict-like
 ~~~~~~~~~~~~~~~~~~~
@@ -617,6 +653,8 @@ slicing, see the :ref:`section on indexing <indexing>`. We will address the
 fundamentals of reindexing / conforming to new sets of labels in the
 :ref:`section on reindexing <basics.reindexing>`.
 
+.. _dsintro.alignment:
+
 Data alignment and arithmetic
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 

diff --git a/doc/source/enhancingperf.rst b/doc/source/enhancingperf.rst
@@ -221,7 +221,7 @@ the rows, applying our ``integrate_f_typed``, and putting this in the zeros arra
 
    You can **not pass** a ``Series`` directly as a ``ndarray`` typed parameter
    to a Cython function. Instead pass the actual ``ndarray`` using the
-   ``.values`` attribute of the ``Series``. The reason is that the Cython
+   :meth:`Series.to_numpy`. The reason is that the Cython
    definition is specific to an ndarray and not the passed ``Series``.
 
    So, do not do this:
@@ -230,11 +230,13 @@ the rows, applying our ``integrate_f_typed``, and putting this in the zeros arra
 
         apply_integrate_f(df['a'], df['b'], df['N'])
 
-   But rather, use ``.values`` to get the underlying ``ndarray``:
+   But rather, use :meth:`Series.to_numpy` to get the underlying ``ndarray``:
 
    .. code-block:: python
 
-        apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
+        apply_integrate_f(df['a'].to_numpy(),
+                          df['b'].to_numpy(),
+                          df['N'].to_numpy())
 
 .. note::
 

diff --git a/doc/source/extending.rst b/doc/source/extending.rst
@@ -186,7 +186,7 @@ Instead, you should detect these cases and return ``NotImplemented``.
 When pandas encounters an operation like ``op(Series, ExtensionArray)``, pandas
 will
 
-1. unbox the array from the ``Series`` (roughly ``Series.values``)
+1. unbox the array from the ``Series`` (``Series.array``)
 2. call ``result = op(values, ExtensionArray)``
 3. re-box the result in a ``Series``
 

diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst
@@ -190,7 +190,7 @@ columns.
 
    .. ipython:: python
 
-      df.loc[:,['B', 'A']] = df[['A', 'B']].values
+      df.loc[:,['B', 'A']] = df[['A', 'B']].to_numpy()
       df[['A', 'B']]
 
 

diff --git a/doc/source/missing_data.rst b/doc/source/missing_data.rst
@@ -678,7 +678,7 @@ Replacing more than one value is possible by passing a list.
 
 .. ipython:: python
 
-   df00 = df.values[0, 0]
+   df00 = df.iloc[0, 0]
    df.replace([1.5, df00], [np.nan, 'a'])
    df[1].dtype
 

diff --git a/doc/source/reshaping.rst b/doc/source/reshaping.rst
@@ -27,12 +27,12 @@ Reshaping by pivoting DataFrame objects
    tm.N = 3
 
    def unpivot(frame):
-           N, K = frame.shape
-           data = {'value': frame.values.ravel('F'),
-                   'variable': np.asarray(frame.columns).repeat(N),
-                   'date': np.tile(np.asarray(frame.index), K)}
-           columns = ['date', 'variable', 'value']
-           return pd.DataFrame(data, columns=columns)
+      N, K = frame.shape
+      data = {'value': frame.to_numpy().ravel('F'),
+              'variable': np.asarray(frame.columns).repeat(N),
+              'date': np.tile(np.asarray(frame.index), K)}
+      columns = ['date', 'variable', 'value']
+      return pd.DataFrame(data, columns=columns)
 
    df = unpivot(tm.makeTimeDataFrame())
 
@@ -54,7 +54,7 @@ For the curious here is how the above ``DataFrame`` was created:
 
    def unpivot(frame):
        N, K = frame.shape
-       data = {'value': frame.values.ravel('F'),
+       data = {'value': frame.to_numpy().ravel('F'),
                'variable': np.asarray(frame.columns).repeat(N),
                'date': np.tile(np.asarray(frame.index), K)}
        return pd.DataFrame(data, columns=['date', 'variable', 'value'])

diff --git a/doc/source/text.rst b/doc/source/text.rst
@@ -317,8 +317,8 @@ All one-dimensional list-likes can be combined in a list-like container (includi
 
     s
     u
-    s.str.cat([u.values,
-               u.index.astype(str).values], na_rep='-')
+    s.str.cat([u.array,
+               u.index.astype(str).array], na_rep='-')
 
 All elements must match in length to the calling ``Series`` (or ``Index``), except those having an index if ``join`` is not None:
 

diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst
@@ -2436,22 +2436,22 @@ a convert on an aware stamp.
 
 .. note::
 
-   Using the ``.values`` accessor on a ``Series``, returns an NumPy array of the data.
+   Using :meth:`Series.to_numpy` on a ``Series``, returns a NumPy array of the data.
    These values are converted to UTC, as NumPy does not currently support timezones (even though it is *printing* in the local timezone!).
 
    .. ipython:: python
 
-      s_naive.values
-      s_aware.values
+      s_naive.to_numpy()
+      s_aware.to_numpy()
 
    Further note that once converted to a NumPy array these would lose the tz tenor.
 
    .. ipython:: python
 
-      pd.Series(s_aware.values)
+      pd.Series(s_aware.to_numpy())
 
    However, these can be easily converted:
 
    .. ipython:: python
 
-      pd.Series(s_aware.values).dt.tz_localize('UTC').dt.tz_convert('US/Eastern')
+      pd.Series(s_aware.to_numpy()).dt.tz_localize('UTC').dt.tz_convert('US/Eastern')