Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: deprecate some top-level non-used functions #15538

Merged
merged 4 commits into from
Mar 3, 2017

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Mar 1, 2017

closes #13790

pd.pnow()
pd.groupby()
pd.match()
pd.Term
pd.Expr

remove info.py

@jreback jreback added the Deprecate Functionality to remove in pandas label Mar 1, 2017
@jreback jreback added this to the 0.20.0 milestone Mar 1, 2017
@jorisvandenbossche
Copy link
Member

@jreback Thanks for the PR!

For me, the main question is, do we actually need those functions? If not, I don't think we have to recommend an alternative import path, but just say they are "deprecated and will be removed in a future version"

  • groupby -> referring to the method is of course OK
  • pnow -> you can also use Period.now() (similar to Timestamp.now), so if we do a recommendation, I would refer to that instead of tseries.periods.pnow (that would also mean the actual tseries.periods.pnow can be deprecated, instead of only the top-level one by wrapping it)
  • Term / Expr: I never used them, so I can't really judge: do they have a usecase? If so, it is OK to refer to a more nested path for me, but then that should be documented / added to the public API.
  • match: the same. Do we want to have this as a public function? If not, I would recommend the alternative import path.

@jreback
Copy link
Contributor Author

jreback commented Mar 2, 2017

  • pnow -> you can also use Period.now() (similar to Timestamp.now), so if we do a recommendation, I would refer to that instead of tseries.periods.pnow (that would also mean the actual tseries.periods.pnow can be deprecated, instead of only the top-level one by wrapping it)

didn't realize this, yes I will point that out as instead (and deprecate Period.pnow() as well)

  • Term / Expr: I never used them, so I can't really judge: do they have a usecase? If so, it is OK to refer to a more nested path for me, but then that should be documented / added to the public API.
    match: the same. Do we want to have this as a public function? If not, I would recommend the alternative import path.

Expr, you never need to directly use this (its used in .eval under the hood, should not have been exposed at all). Term was needed quite a while back, but string expressions replaced this (years ago).

  • match: the same. Do we want to have this as a public function? If not, I would recommend the alternative import path.

This is barely tested, not documented, and AFAICT never used anywhere. So it might have utility, but certainly not as a top-level function.

@jreback jreback force-pushed the top_level branch 3 times, most recently from da53aac to 7c52dd3 Compare March 2, 2017 14:52
@codecov-io
Copy link

codecov-io commented Mar 2, 2017

Codecov Report

Merging #15538 into master will decrease coverage by -0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15538      +/-   ##
==========================================
- Coverage   91.04%   91.02%   -0.02%     
==========================================
  Files         136      136              
  Lines       49088    49107      +19     
==========================================
+ Hits        44694    44702       +8     
- Misses       4394     4405      +11
Impacted Files Coverage Δ
pandas/computation/api.py 100% <100%> (ø)
pandas/init.py 91.66% <100%> (ø)
pandas/tseries/period.py 92.8% <100%> (+0.02%)
pandas/core/api.py 100% <100%> (ø)
pandas/io/api.py 100% <100%> (ø)
pandas/io/gbq.py 25% <0%> (-58.34%)
pandas/tools/merge.py 91.78% <0%> (-0.35%)
pandas/core/frame.py 97.82% <0%> (-0.1%)
pandas/tseries/index.py 95.32% <0%> (-0.1%)
pandas/core/common.py 91.36% <0%> (+0.33%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd67704...59d2b61. Read the comment docs.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

I personally wouldn't even mention the alternative path for match, if you say that it is "barely tested, not documented, and AFAICT never used anywhere", we shouldn't encourage it.
If someone uses it, sees the removal message, they can always complain that it was useful and we can think about how to publicly expose it (I don't think core.algorithms should be that place).

@@ -534,6 +534,12 @@ Deprecations
- ``Series.sortlevel`` and ``DataFrame.sortlevel`` have been deprecated in favor of ``Series.sort_index`` and ``DataFrame.sort_index`` (:issue:`15099`)
- importing ``concat`` from ``pandas.tools.merge`` has been deprecated in favor of imports from the ``pandas`` namespace. This should only affect explict imports (:issue:`15358`)
- ``Series/DataFrame/Panel.consolidate()`` been deprecated as a public method. (:issue:`15483`)
- The following top-level pandas functions have been deprecated and will be removed in a future version (:issue:`13790`)
* ``pd.now()``, replaced by a direct import from ``pandas.tseries.period``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have to refer to tseries.period, but just remove it altogether? (or at least not recommend it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you also have pd.Period.now(freq='H')

pandas/io/api.py Outdated

warnings.warn("pd.Term is deprecated as it is not "
"applicable to user code. Instead use in-line "
"string expressions in for searching HDFStore",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this sentence also to the whatsnew overview?

pandas - a powerful data analysis and manipulation library for Python
=====================================================================

See http://pandas.pydata.org/ for full documentation. Otherwise, see the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to remove this file. To have the same effect (people doing pd? see some info), you can put eg what is above this comment in __init__.py as module docstring. It's not that useful, but it does also no harm to have a minimal explanation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could theorectically replace this with something (maybe a minimal README type of thing, so its on the actual doc-string). any thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah should replace with something, maybe a pointer to the docs (plus some minmal stuff). I can simply define this directly I think (in init.py). Any thoughts on what this should say?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can just put the exact same string on the first lines of __init__.py, and it will be interpreted as the module docstring.

I would put just the same as here (title, and link to full docs). In principle we could do some clever parsing of a README file and adding that, but I wouldn't do that effort personally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did this

In [1]: pd?
Type:        module
String form: <module 'pandas' from '/Users/jreback/pandas/pandas/__init__.py'>
File:        ~/pandas/pandas/__init__.py
Docstring:  
pandas - a powerful data analysis and manipulation library for Python
=====================================================================

**pandas** is a Python package providing fast, flexible, and expressive data
structures designed to make working with "relational" or "labeled" data both
easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, **real world** data analysis in Python. Additionally, it has
the broader goal of becoming **the most powerful and flexible open source data
analysis / manipulation tool available in any language**. It is already well on
its way toward this goal.

Main Features
-------------
Here are just a few of the things that pandas does well:

  - Easy handling of missing data in floating point as well as
    non-floating point data
  - Size mutability: columns can be inserted and deleted from
    DataFrame and higher dimensional objects
  - Automatic and explicit data alignment: objects can  be
    explicitly aligned to a set of labels, or the user can simply
    ignore the labels and let `Series`, `DataFrame`, etc. automatically
    align the data for you in computations
  - Powerful, flexible group by functionality to perform
    split-apply-combine operations on data sets, for both aggregating
    and transforming data
  - Make it easy to convert ragged, differently-indexed data in
    other Python and NumPy data structures into DataFrame objects
  - Intelligent label-based slicing, fancy indexing, and subsetting of
    large data sets
  - Intuitive merging and joining data sets
  - Flexible reshaping and pivoting of data sets
  - Hierarchical labeling of axes (possible to have multiple
    labels per tick)
  - Robust IO tools for loading data from flat files
    (CSV and delimited), Excel files, databases,
    and saving/loading data from the ultrafast HDF5 format
  - Time series-specific functionality: date range
    generation and frequency conversion, moving window statistics,
    moving window linear regressions, date shifting and lagging, etc.

as the README.md has all kinds of html and such (and is too long)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

jreback added 2 commits March 2, 2017 13:36
closes pandas-dev#13790

pd.pnow
pd.groupby
pd.match
pd.Term
pd.Expr

remove info.py
@jreback
Copy link
Contributor Author

jreback commented Mar 2, 2017

so we have this mini example in

http://pandas-docs.github.io/pandas-docs-travis/comparison_with_r.html

In [1]: s = pd.Series(np.arange(5),dtype=np.float32)

In [2]: s
Out[2]: 
0    0.0
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float32

In [3]: pd.Series(pd.match(s,[2,4],np.nan))
Out[3]: 
0    NaN
1    NaN
2    0.0
3    NaN
4    1.0
dtype: float64

which is actually confusing / misleading.


In [7]: s.where(s.isin([2,4]))
Out[7]: 
0    NaN
1    NaN
2    2.0
3    NaN
4    4.0
dtype: float32

is much more useful (I don't know why you would want that kind of transformation)

though I don't use R much, so not sure what it is trying to show.

@jorisvandenbossche
Copy link
Member

Hmm, I would maybe just remove that example (isin is already showed just above the match example).
match has its use (if you want exactly what is in the example), question is whether we need to support that.

@jreback
Copy link
Contributor Author

jreback commented Mar 2, 2017

@jorisvandenbossche yep sounds good. Yeah the function still exists, its just not top-level (like .unique, .factorize, .value_counts); though even the latter 2 don't really need to be exposed directly to the user (.value_counts is a method, .factorize is just an implementation detail for category).

These made much more sense when directly allowing numpy arrays was useful (but that is less and less useful). But this is another story.

ok will remove that part of the doc.

@jorisvandenbossche jorisvandenbossche merged commit 524a9a0 into pandas-dev:master Mar 3, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
closes pandas-dev#13790

pd.pnow
pd.groupby
pd.match
pd.Term
pd.Expr

remove info.py
gfyoung added a commit to forking-repos/pandas that referenced this pull request Oct 27, 2018
Drops the following:

* pd.pnow
* pd.match
* pd.groupby
* pd.get_store

* pd.Expr
* pd.Term

xref pandas-devgh-15538.
xref pandas-devgh-15940.
jreback pushed a commit that referenced this pull request Oct 28, 2018
Drops the following:

* pd.pnow
* pd.match
* pd.groupby
* pd.get_store

* pd.Expr
* pd.Term

xref gh-15538.
xref gh-15940.
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Drops the following:

* pd.pnow
* pd.match
* pd.groupby
* pd.get_store

* pd.Expr
* pd.Term

xref pandas-devgh-15538.
xref pandas-devgh-15940.
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Drops the following:

* pd.pnow
* pd.match
* pd.groupby
* pd.get_store

* pd.Expr
* pd.Term

xref pandas-devgh-15538.
xref pandas-devgh-15940.
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Drops the following:

* pd.pnow
* pd.match
* pd.groupby
* pd.get_store

* pd.Expr
* pd.Term

xref pandas-devgh-15538.
xref pandas-devgh-15940.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: clean-up top-level namespace
3 participants