Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REF] implement internals as dir #21903

Merged
merged 6 commits into from
Jul 21, 2018
Merged

Conversation

jbrockmendel
Copy link
Member

In the name of a) cleaning up internals and b) isolating BlockManager from everything else, this separates core.internals into internals.managers, internals.blocks, internals.concat.


from pandas.compat import range, map, zip

import pandas.core.dtypes.generic as gt
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this usage in tests.dtypes.test_generic, think it's a nice way to cut down massive import blocks/namespaces.

else:
concat_values = concat_values.copy()
blocks.append(r)
elif isinstance(result, BlockManager):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place where internals.blocks depends on either internals.concat or internals.managers. DAG FTW

@jorisvandenbossche
Copy link
Member

I am not fully convinced this is needed, before we actually have concrete plans to actively refactor the code. For example, splitting for just splitting makes git blame not work anymore

@jbrockmendel
Copy link
Member Author

The git blame point is reasonable, but I'd like to make a case that this is a step in the right direction.

  1. The long-term goal of ripping out the BlockManager will be made easier if core.internals is better isolated from the rest of the code.
    a) Decreasing the exposed surface of core.internals will help
  2. The Block classes themselves need a lot of work (e.g. several datetime64 and timedelta64 interpolation issues), which will be much easier to do if they can be reasoned about in isolation of the rest of the core.internals machinery.

That said, while I do think this would make working in this area much easier, my opinion isn't strong enough to make a big deal about it.

@codecov
Copy link

codecov bot commented Jul 14, 2018

Codecov Report

Merging #21903 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #21903   +/-   ##
=======================================
  Coverage   91.99%   91.99%           
=======================================
  Files         167      167           
  Lines       50578    50578           
=======================================
  Hits        46530    46530           
  Misses       4048     4048
Flag Coverage Δ
#multiple 90.4% <ø> (ø) ⬆️
#single 42.17% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/internals/__init__.py 95.49% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 322dbf4...9ef46a0. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Jul 14, 2018

I am ok with this. internals is just a giant file, things are much easier to grok for bug fixes / refactorings if its not so giant. still have to review.

@jreback jreback added the Refactor Internal refactoring of code label Jul 14, 2018
@jbrockmendel
Copy link
Member Author

still have to review.

The diff GH shows for the blocks.py file is way more complicated than it needs to be since this is pretty much cut/paste. LMK if doing this in smaller pieces would be easier (i.e. first move internals.py unchanged to internals.__init__.py before breaking out the independent modules)

from pandas.io.formats.printing import pprint_thing

from .blocks import (
form_blocks,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u use absolute imports

@jreback
Copy link
Contributor

jreback commented Jul 20, 2018

The diff GH shows for the blocks.py file is way more complicated than it needs to be since this is pretty much cut/paste. LMK if doing this in smaller pieces would be easier (i.e. first move internals.py unchanged to internals.init.py before breaking out the independent modules)

yes let's do that

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm merge on green

@jbrockmendel
Copy link
Member Author

yes let's do that

done.

@jreback jreback added this to the 0.24.0 milestone Jul 20, 2018
@jreback
Copy link
Contributor

jreback commented Jul 20, 2018

needs a rebase

@jreback
Copy link
Contributor

jreback commented Jul 20, 2018

merge as soon as travis is green.

@jorisvandenbossche
Copy link
Member

@jbrockmendel to come back to your points why this is an improvement

The long-term goal of ripping out the BlockManager will be made easier if core.internals is better isolated from the rest of the code.

But Blocks and the BlockManager would be ripped out together, so they are already isolated in that sense. And the code / implementation of both are strongly intertwined., slitting those both up does make it IMO not easier to work with.

The Block classes themselves need a lot of work

In general, by using extension arrays for our internal ones, the code in the specific Blocks should decrease, as we should move towards using ExtensionBlock for all those.

So I am still not convinced of the benefit of this change in general (there might be smaller (eg generally useable) parts that can be refactored / moved out).

@jreback
Copy link
Contributor

jreback commented Jul 20, 2018

@jorisvandenbossche this is just in general a good change. Making things simplier / easier to grok is a huge +1. Wether / IF block manager is every completely removed is completely not the point here. In order to even attempt that, you have to gradually move away from it. So making code more clear is the first step.

Trying to make sweeping changes is always a disaster, they take too long to review and are very disruptive. Incremental changes are much much much better.

@jreback
Copy link
Contributor

jreback commented Jul 20, 2018

In general, by using extension arrays for our internal ones, the code in the specific Blocks should decrease, as we should move towards using ExtensionBlock for all those.

maybe, though the ExtensionBlock themselves need quite a bit of work too. As things are moving closer together it is simply better to have a way to refactor code intelligently. 2000+ line modules is not the way forward.

@jbrockmendel
Copy link
Member Author

And the code / implementation of both are strongly intertwined., slitting those both up does make it IMO not easier to work with.

If you recall the previous version of the PR, the implementations were not especially "intertwined", the dependencies ran almost entirely one-direction.

There is also a bunch of internals-aware code in core.reshape that I consider "intertwined", hopefully we can tighten the exposed surface there. (I don't know that area of the code especially well, so not sure how hard that will be)

@jbrockmendel jbrockmendel reopened this Jul 21, 2018
@jbrockmendel jbrockmendel merged commit 67b6277 into pandas-dev:master Jul 21, 2018
@jbrockmendel jbrockmendel deleted the internals branch July 21, 2018 17:32
@jorisvandenbossche
Copy link
Member

If you recall the previous version of the PR, the implementations were not especially "intertwined", the dependencies ran almost entirely one-direction.

That's what I meant with "intertwined" (BlockManager methods calling a lot of Block methods, and many Block methods only used in BlockManager)

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
* implement internals as dir

* move internals.py to internals/__init__.py unchanged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants