-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: Separate out DataFrame/Series Construction Helpers #24100
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #24100 +/- ##
===========================================
- Coverage 92.21% 42.53% -49.68%
===========================================
Files 161 162 +1
Lines 51684 51713 +29
===========================================
- Hits 47658 21998 -25660
- Misses 4026 29715 +25689
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24100 +/- ##
==========================================
- Coverage 92.21% 92.2% -0.01%
==========================================
Files 161 162 +1
Lines 51684 51700 +16
==========================================
+ Hits 47658 47672 +14
- Misses 4026 4028 +2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these can be done in a followup along with renaming the functions to be more consistent with what they do, e.g. init_ndarray, should really be: create_block_manager_from_ndarray (though we have a slight conflict on that name). but for sure need to separate out public / private things.
|
||
return create_block_manager_from_blocks([values], [columns, index]) | ||
return init_ndarray(values, index, columns, dtype=dtype, copy=copy) | ||
# TODO: can we just get rid of this as a method? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i would just call these directly (your new routines)
arrays = [data[k] for k in keys] | ||
|
||
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype) | ||
return init_dict(data, index, columns, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as below, yes)
@@ -0,0 +1,699 @@ | |||
""" | |||
Functions for preparing various inputs passed to the DataFrame or Series | |||
constructors before passing them to aBlockManager. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before BlockManager
thanks. prob should open an issue with a todo list for followups. |
commit 28c61d770f6dfca6857fd0fa6979d4119a31129e Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 12:18:19 2018 -0600 uncomment commit bae2e322523efc73a1344464f51611e2dc555ccb Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 12:17:09 2018 -0600 maybe fixes commit 6cb4db05c9d6ceba3794096f0172cae5ed5f6019 Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 09:57:37 2018 -0600 we back commit d97ab57fb32cb23371169d9ed659ccfac34cfe45 Merge: a117de4 b78aa8d Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 09:51:51 2018 -0600 Merge remote-tracking branch 'upstream/master' into disown-tz-only-rebased2 commit b78aa8d Author: gfyoung <gfyoung17+GitHub@gmail.com> Date: Thu Dec 6 07:18:44 2018 -0500 REF/TST: Add pytest idiom to reshape/test_tile (pandas-dev#24107) commit 2993b8e Author: gfyoung <gfyoung17+GitHub@gmail.com> Date: Thu Dec 6 07:17:55 2018 -0500 REF/TST: Add more pytest idiom to scalar/test_nat (pandas-dev#24120) commit b841374 Author: evangelineliu <hsiyinliu@gmail.com> Date: Wed Dec 5 18:21:46 2018 -0500 BUG: Fix concat series loss of timezone (pandas-dev#24027) commit 4ae63aa Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 14:44:50 2018 -0800 Implement DatetimeArray._from_sequence (pandas-dev#24074) commit 2643721 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 14:43:45 2018 -0800 CLN: Follow-up to pandas-dev#24100 (pandas-dev#24116) commit 8ea7744 Author: chris-b1 <cbartak@gmail.com> Date: Wed Dec 5 14:21:23 2018 -0600 PERF: ascii c string functions (pandas-dev#23981) commit cb862e4 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 12:19:46 2018 -0800 BUG: fix mutation of DTI backing Series/DataFrame (pandas-dev#24096) commit aead29b Author: topper-123 <contribute@tensortable.com> Date: Wed Dec 5 19:06:00 2018 +0000 API: rename MultiIndex.labels to MultiIndex.codes (pandas-dev#23752)
Why? In implementing #24096 I found it tough to tell all the paths by which a DatetimeIndex get passed to a DataFrame or Series. Collecting all these helper functions is a step towards reducing the number of paths available so these things can be caught in one place.
The main thing this PR does is move helper functions from core.series and core.frame. A few ancillary things it does
_try_cast
and_get_axes
are de-nested