-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REF] implement internals as dir #21903
Conversation
pandas/core/internals/managers.py
Outdated
|
||
from pandas.compat import range, map, zip | ||
|
||
import pandas.core.dtypes.generic as gt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this usage in tests.dtypes.test_generic
, think it's a nice way to cut down massive import blocks/namespaces.
pandas/core/internals/blocks.py
Outdated
else: | ||
concat_values = concat_values.copy() | ||
blocks.append(r) | ||
elif isinstance(result, BlockManager): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place where internals.blocks depends on either internals.concat
or internals.managers
. DAG FTW
I am not fully convinced this is needed, before we actually have concrete plans to actively refactor the code. For example, splitting for just splitting makes git blame not work anymore |
The git blame point is reasonable, but I'd like to make a case that this is a step in the right direction.
That said, while I do think this would make working in this area much easier, my opinion isn't strong enough to make a big deal about it. |
Codecov Report
@@ Coverage Diff @@
## master #21903 +/- ##
=======================================
Coverage 91.99% 91.99%
=======================================
Files 167 167
Lines 50578 50578
=======================================
Hits 46530 46530
Misses 4048 4048
Continue to review full report at Codecov.
|
I am ok with this. internals is just a giant file, things are much easier to grok for bug fixes / refactorings if its not so giant. still have to review. |
The diff GH shows for the blocks.py file is way more complicated than it needs to be since this is pretty much cut/paste. LMK if doing this in smaller pieces would be easier (i.e. first move internals.py unchanged to internals.__init__.py before breaking out the independent modules) |
pandas/core/internals/managers.py
Outdated
from pandas.io.formats.printing import pprint_thing | ||
|
||
from .blocks import ( | ||
form_blocks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u use absolute imports
yes let's do that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm merge on green
done. |
needs a rebase |
merge as soon as travis is green. |
@jbrockmendel to come back to your points why this is an improvement
But Blocks and the BlockManager would be ripped out together, so they are already isolated in that sense. And the code / implementation of both are strongly intertwined., slitting those both up does make it IMO not easier to work with.
In general, by using extension arrays for our internal ones, the code in the specific Blocks should decrease, as we should move towards using ExtensionBlock for all those. So I am still not convinced of the benefit of this change in general (there might be smaller (eg generally useable) parts that can be refactored / moved out). |
@jorisvandenbossche this is just in general a good change. Making things simplier / easier to grok is a huge +1. Wether / IF block manager is every completely removed is completely not the point here. In order to even attempt that, you have to gradually move away from it. So making code more clear is the first step. Trying to make sweeping changes is always a disaster, they take too long to review and are very disruptive. Incremental changes are much much much better. |
maybe, though the ExtensionBlock themselves need quite a bit of work too. As things are moving closer together it is simply better to have a way to refactor code intelligently. 2000+ line modules is not the way forward. |
If you recall the previous version of the PR, the implementations were not especially "intertwined", the dependencies ran almost entirely one-direction. There is also a bunch of internals-aware code in core.reshape that I consider "intertwined", hopefully we can tighten the exposed surface there. (I don't know that area of the code especially well, so not sure how hard that will be) |
That's what I meant with "intertwined" (BlockManager methods calling a lot of Block methods, and many Block methods only used in BlockManager) |
* implement internals as dir * move internals.py to internals/__init__.py unchanged
In the name of a) cleaning up internals and b) isolating BlockManager from everything else, this separates core.internals into
internals.managers
,internals.blocks
,internals.concat
.