Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: reorg type inference & introspection #13147

Closed
wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented May 11, 2016

closes #12503

So this is the big boy. destroyed core/common.py and split up. Now have no more import issues and a nice private API. Nothing is exported by default. You have to explicity import things.

There is now a pandas.api, so:

In [1]: from pandas import api

In [2]: dir(api)
Out[2]: 
['__builtins__',
 '__doc__',
 '__file__',
 '__name__',
 '__package__',
 '__path__',
 'is_any_int_dtype',
 'is_bool',
 'is_bool_dtype',
 'is_categorical',
 'is_categorical_dtype',
 'is_complex',
 'is_complex_dtype',
 'is_datetime64_any_dtype',
 'is_datetime64_dtype',
 'is_datetime64_ns_dtype',
 'is_datetime64tz_dtype',
 'is_datetimetz',
 'is_dtype_equal',
 'is_extension_type',
 'is_float',
 'is_float_dtype',
 'is_floating_dtype',
 'is_int64_dtype',
 'is_integer',
 'is_integer_dtype',
 'is_number',
 'is_numeric_dtype',
 'is_object_dtype',
 'is_scalar',
 'is_sparse',
 'is_string_dtype',
 'is_timedelta64_dtype',
 'is_timedelta64_ns_dtype',
 'np',
 'pandas_dtype',
 'types']

Previously pandas.core.common was a huge namespace of everything under the sun. As a result, you often had to do com. imports to do stuff, and sometimes import inside a function.

These are essentially hierarchically arrange in order of deps.

  • pandas.types
    • generic : ABC* class infererence, no deps
    • dtypes : extension dtypes, no deps
    • inference : scalar / and is_list_list instrospection, no deps
    • common : imports all from inference (for convenience), adds all is_*_dtype things. deps on dtypes, generic, inference
    • missing : isnull, notnull, array_equivalent : deps on common
    • cast : the old core.convert and all types of inference / conversions that littered core.common, deps on common. Mostly this is private stuff that Series/DataFrame uses (e.g. most methods are _ leading)

pandas.core.common now doesn't depend on anything else.

So now can easily write:

from pandas.types.common import is_integer or whatever at the top of any file and it will just work. Further almost all references to com gone.

This only change was I added array_equivalent to the pd. namespace, as formerly it was accessible as by:

from pandas.core.common import array_equivalent

There should be NO user facing changes at all, but I will put a note that warns if people were using the 'private' API (IOW if they were directly importing stuff), then things have moved.

@wesm @jorisvandenbossche @shoyer @TomAugspurger @sinhrks

@jreback jreback added the Clean label May 11, 2016
@jreback jreback added this to the 0.18.2 milestone May 11, 2016
@wesm
Copy link
Member

wesm commented May 12, 2016

There is a slightly problematic aspect here -- other libraries that needed to be able to use is_categorical_dtype (I'm guilty of this: https://github.com/wesm/feather/blob/master/python/feather/ext.pyx#L83) -- it was in one place, and now it's in another. I get that this wasn't necessarily a public API -- is there some other way to get at this information, like s.dtype == 'category'?

@jreback
Copy link
Contributor Author

jreback commented May 12, 2016

no that type of dtype comparison isn't safe

i can simply import some things back into the pdcom space
though am trying to limit what we import

@shoyer
Copy link
Member

shoyer commented May 12, 2016

One safe option (the best we came up, if I recall correctly) is hasattr(s,
'cat')

On Wed, May 11, 2016 at 7:15 PM, Jeff Reback notifications@github.com
wrote:

no that type of dtype comparison isn't safe

i can simply import some things back into the pdcom space
though am trying to limit what we import


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#13147 (comment)

@jreback
Copy link
Contributor Author

jreback commented May 12, 2016

cc @llllllllll
are you guys using the 'private' API here?

@wesm
Copy link
Member

wesm commented May 12, 2016

It might be worth defining a public API for this purpose. Of course the ship has sailed on earlier versions of pandas

@jreback
Copy link
Contributor Author

jreback commented May 12, 2016

These are also added back to pandas.core.common (for backwards compat at least). I am not sure how to even go about deprecating these as this would require a module level deprecation on the import (IIRC this is possible as I think @njsmith did it somewhere)..

In [2]: from pandas.api import types as pt

In [3]: dir(pt)
Out[3]: 
['__builtins__',
 '__doc__',
 '__file__',
 '__name__',
 '__package__',
 '__path__',
 'is_any_int_dtype',
 'is_bool',
 'is_bool_dtype',
 'is_categorical',
 'is_categorical_dtype',
 'is_complex',
 'is_complex_dtype',
 'is_datetime64_any_dtype',
 'is_datetime64_dtype',
 'is_datetime64_ns_dtype',
 'is_datetime64tz_dtype',
 'is_datetimetz',
 'is_dtype_equal',
 'is_extension_type',
 'is_float',
 'is_float_dtype',
 'is_floating_dtype',
 'is_int64_dtype',
 'is_integer',
 'is_integer_dtype',
 'is_number',
 'is_numeric_dtype',
 'is_object_dtype',
 'is_scalar',
 'is_sparse',
 'is_string_dtype',
 'is_timedelta64_dtype',
 'is_timedelta64_ns_dtype',
 'np',
 'pandas_dtype']

@jreback jreback force-pushed the types branch 4 times, most recently from 166181e to 329f44c Compare May 12, 2016 13:23
@ssanderson
Copy link
Contributor

@jreback the only things we have in Zipline that are touching pandas.core are in a utility module that's already doing backwards-compat stuff anyway:

git --no-pager grep -n -e pandas.core -- 
zipline/lib/_factorize.pyx:100:        # This is all adapted from pandas.core.algorithms.factorize.
zipline/utils/munge.py:15:from pandas.core.common import mask_missing
zipline/utils/munge.py:17:    from pandas.core.common import backfill_2d, pad_2d
zipline/utils/munge.py:19:    # In 0.17, pad_2d and backfill_2d werw moved from pandas.core.common to
zipline/utils/munge.py:20:    # pandas.core.missing
zipline/utils/munge.py:21:    from pandas.core.missing import backfill_2d, pad_2d

@shoyer
Copy link
Member

shoyer commented May 12, 2016

@jreback See https://github.com/njsmith/metamodule. It requires some messy tricks to support Python <3.5 though.

@jreback
Copy link
Contributor Author

jreback commented May 12, 2016

hmm, that might do the trick of deprecating these de-facto apis (though prob will do that in the 0.19.0).

@njsmith
Copy link

njsmith commented May 12, 2016

The messy tricks aren't too bad because they only have to support a specific list of old, non-moving Python versions, and I already went through and checked that all of those versions are handled correctly. All future versions of Python have a clean upstream supported way to do this.

But, if you want to avoid that, you actually can -- the really clever stuff in meta module is required to support the case where you need to fix up a module from inside that module (e.g. if you wanted to deprecate to attribute of pandas itself), without rewriting all your API export logic. For pandas.core.common you could have pandas.core import it and then replace it with a metamodule-style subclass without getting ctypes involved.

@max-sixty
Copy link
Contributor

More generally, what do people think about having an explicit contract for what's in the public vs private API?

@shoyer
Copy link
Member

shoyer commented May 12, 2016

More generally, what do people think about having an explicit contract for what's in the public vs private API?

👍

I don't know if it's stated explicitly, but everything in pandas.core that isn't exported to the pandas namespace is private API. The problem here is that private stuff is useful, so people have used it anyways!

@jreback
Copy link
Contributor Author

jreback commented May 14, 2016

@TomAugspurger
Copy link
Contributor

Tried to do a github search for where pandas.core.common was used, but it's mostly just a bunch of people's virtualenvs with pandas. Did find

No uses of pandas.core.common in seaborn or statsmodels, so that's good.

@jreback
Copy link
Contributor Author

jreback commented May 14, 2016

yeah I added back the public stuff anyhow (though we should maybe use the meta-module tricks to deprecate it in 0.19.0).

@njsmith
Copy link

njsmith commented May 14, 2016

Tried to do a github search for where pandas.core.common was used, but it's mostly just a bunch of people's virtualenvs with pandas

There's some tooling here to work around this obnoxious feature of GH search, in case you find it useful: https://github.com/njsmith/codetrawl/

Example: https://github.com/njsmith/ufunc-abi-analysis

@jreback
Copy link
Contributor Author

jreback commented Jul 13, 2016

Ok this was pretty easy to do

In [1]: from pandas.api.types import is_integer

In [2]: is_integer(5)
Out[2]: 1

In [3]: from pandas.core.common import is_integer

In [4]: is_integer(5)
/Users/jreback/miniconda/bin/ipython:1: FutureWarning: pandas.core.common.is_integer is deprecated. import from the public API: pandas.api.types.is_integer instead
  #!/bin/bash /Users/jreback/miniconda/bin/python.app
Out[4]: 1

@shoyer
Copy link
Member

shoyer commented Jul 13, 2016

I agree with @jorisvandenbossche that it would be cleaner to simply have pandas.types. @jreback can you reiterate again the advantages of pandas.api? Currently, I think our policy is that everything outside of pandas.core is considered API?

@jorisvandenbossche
Copy link
Member

Can you reiterate again the advantages of pandas.api? Currently, I think our policy is that everything outside of pandas.core is considered API?

To not let hold up this PR, maybe we should move this discussion to another issue? (or discuss here, but regard it as independent of merging the PR) As the pandas.api thing is only a small part of this PR, and can easily changed afterwards (or split off from this PR)

@jreback
Copy link
Contributor Author

jreback commented Jul 13, 2016

yes, pandas.api is really just testing at this point, with a very small actuall exposed public API.

and I think we SHOULD do this, and not just assume EVERYTHING outside is API, that's non-controllable.

Merging on pass (had to fix some deprecations).

warnings.warn("pandas.core.common.{t} is deprecated. "
"import from the public API: "
"pandas.api.types.{t} instead".format(t=t),
FutureWarning, stacklevel=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I think in this case case we should deviate from our normal rule of using FutureWarning for deprecations, and actually use a DeprecationWarning.
Typically, if people have used this, it will rather be in a package than in interactive explorative coding. And I think it is not needed that users of those packages see this warning (as they can't change it themselves)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But can be in a follow-up PR if you want to merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh so this wont' show in downstream, ok, then. Though this does make it less visible. Let's decide in the followup issue.

@jreback jreback closed this in 7dd4091 Jul 13, 2016
@jreback jreback mentioned this pull request Jul 13, 2016
13 tasks
@sinhrks sinhrks mentioned this pull request Aug 1, 2016
2 tasks
nateGeorge pushed a commit to nateGeorge/pandas that referenced this pull request Aug 15, 2016
closes pandas-dev#12503

Author: Jeff Reback <jeff@reback.net>

Closes pandas-dev#13147 from jreback/types and squashes the following commits:

244649a [Jeff Reback] CLN: reorg type inference & introspection
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jan 3, 2018
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jan 3, 2018
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jan 3, 2018
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jan 3, 2018
jreback pushed a commit that referenced this pull request Jan 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLN & REORG core/common.py
9 participants