REF: Separate out DataFrame/Series Construction Helpers #24100

jbrockmendel · 2018-12-04T21:03:14Z

Why? In implementing #24096 I found it tough to tell all the paths by which a DatetimeIndex get passed to a DataFrame or Series. Collecting all these helper functions is a step towards reducing the number of paths available so these things can be caught in one place.

The main thing this PR does is move helper functions from core.series and core.frame. A few ancillary things it does

remove runtime imports (there is only one left in construction.py)
use ABC classes for isinstance checks
nested functions _try_cast and _get_axes are de-nested
de-privatize functions that are imported elsewhere. A function in construction.py has a leading underscore if and only if it is not imported elsewhere.

pep8speaks · 2018-12-04T21:03:28Z

Hello @jbrockmendel! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/arrays/categorical.py !
There are no PEP8 issues in the file pandas/core/arrays/sparse.py !
There are no PEP8 issues in the file pandas/core/frame.py !
There are no PEP8 issues in the file pandas/core/internals/construction.py !
There are no PEP8 issues in the file pandas/core/series.py !
There are no PEP8 issues in the file pandas/core/sparse/frame.py !

…nstruct

codecov · 2018-12-05T03:00:30Z

Codecov Report

Merging #24100 into master will decrease coverage by 49.67%.
The diff coverage is 62.79%.

@@             Coverage Diff             @@
##           master   #24100       +/-   ##
===========================================
- Coverage   92.21%   42.53%   -49.68%     
===========================================
  Files         161      162        +1     
  Lines       51684    51713       +29     
===========================================
- Hits        47658    21998    -25660     
- Misses       4026    29715    +25689

Flag	Coverage Δ
#multiple	`?`
#single	`42.53% <62.79%> (-0.47%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/series.py	`48.39% <100%> (-45.29%)`	⬇️
pandas/core/arrays/categorical.py	`41.89% <100%> (-53.52%)`	⬇️
pandas/core/arrays/sparse.py	`44.84% <100%> (-47.12%)`	⬇️
pandas/core/frame.py	`35.92% <50%> (-61.11%)`	⬇️
pandas/core/internals/construction.py	`62.59% <62.59%> (ø)`
pandas/core/sparse/frame.py	`27.46% <66.66%> (-67.38%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/core/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
... and 125 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b5f4d1...4f0e7f4. Read the comment docs.

codecov · 2018-12-05T03:00:32Z

Codecov Report

Merging #24100 into master will decrease coverage by <.01%.
The diff coverage is 96.44%.

@@            Coverage Diff             @@
##           master   #24100      +/-   ##
==========================================
- Coverage   92.21%    92.2%   -0.01%     
==========================================
  Files         161      162       +1     
  Lines       51684    51700      +16     
==========================================
+ Hits        47658    47672      +14     
- Misses       4026     4028       +2

Flag	Coverage Δ
#multiple	`90.6% <96.44%> (-0.01%)`	⬇️
#single	`43.02% <64.69%> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/sparse/frame.py	`94.84% <100%> (+0.01%)`	⬆️
pandas/core/series.py	`93.69% <100%> (+0.01%)`	⬆️
pandas/core/arrays/categorical.py	`95.4% <100%> (ø)`	⬆️
pandas/core/arrays/sparse.py	`91.95% <100%> (ø)`	⬆️
pandas/core/frame.py	`96.79% <91.66%> (-0.23%)`	⬇️
pandas/core/internals/construction.py	`96.62% <96.62%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b5f4d1...4f0e7f4. Read the comment docs.

jreback

these can be done in a followup along with renaming the functions to be more consistent with what they do, e.g. init_ndarray, should really be: create_block_manager_from_ndarray (though we have a slight conflict on that name). but for sure need to separate out public / private things.

jreback · 2018-12-05T12:49:51Z

pandas/core/frame.py

-
-        return create_block_manager_from_blocks([values], [columns, index])
+        return init_ndarray(values, index, columns, dtype=dtype, copy=copy)
+        # TODO: can we just get rid of this as a method?


yes i would just call these directly (your new routines)

jreback · 2018-12-05T12:50:00Z

pandas/core/frame.py

-            arrays = [data[k] for k in keys]
-
-        return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
+        return init_dict(data, index, columns, dtype=dtype)


same comment as below, yes)

jreback · 2018-12-05T12:51:40Z

pandas/core/internals/construction.py

@@ -0,0 +1,699 @@
+"""
+Functions for preparing various inputs passed to the DataFrame or Series
+constructors before passing them to aBlockManager.


space before BlockManager

jreback · 2018-12-05T12:55:59Z

thanks. prob should open an issue with a todo list for followups.

commit 28c61d770f6dfca6857fd0fa6979d4119a31129e Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 12:18:19 2018 -0600 uncomment commit bae2e322523efc73a1344464f51611e2dc555ccb Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 12:17:09 2018 -0600 maybe fixes commit 6cb4db05c9d6ceba3794096f0172cae5ed5f6019 Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 09:57:37 2018 -0600 we back commit d97ab57fb32cb23371169d9ed659ccfac34cfe45 Merge: a117de4 b78aa8d Author: Tom Augspurger <tom.w.augspurger@gmail.com> Date: Thu Dec 6 09:51:51 2018 -0600 Merge remote-tracking branch 'upstream/master' into disown-tz-only-rebased2 commit b78aa8d Author: gfyoung <gfyoung17+GitHub@gmail.com> Date: Thu Dec 6 07:18:44 2018 -0500 REF/TST: Add pytest idiom to reshape/test_tile (pandas-dev#24107) commit 2993b8e Author: gfyoung <gfyoung17+GitHub@gmail.com> Date: Thu Dec 6 07:17:55 2018 -0500 REF/TST: Add more pytest idiom to scalar/test_nat (pandas-dev#24120) commit b841374 Author: evangelineliu <hsiyinliu@gmail.com> Date: Wed Dec 5 18:21:46 2018 -0500 BUG: Fix concat series loss of timezone (pandas-dev#24027) commit 4ae63aa Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 14:44:50 2018 -0800 Implement DatetimeArray._from_sequence (pandas-dev#24074) commit 2643721 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 14:43:45 2018 -0800 CLN: Follow-up to pandas-dev#24100 (pandas-dev#24116) commit 8ea7744 Author: chris-b1 <cbartak@gmail.com> Date: Wed Dec 5 14:21:23 2018 -0600 PERF: ascii c string functions (pandas-dev#23981) commit cb862e4 Author: jbrockmendel <jbrockmendel@gmail.com> Date: Wed Dec 5 12:19:46 2018 -0800 BUG: fix mutation of DTI backing Series/DataFrame (pandas-dev#24096) commit aead29b Author: topper-123 <contribute@tensortable.com> Date: Wed Dec 5 19:06:00 2018 +0000 API: rename MultiIndex.labels to MultiIndex.codes (pandas-dev#23752)

…4100)

jbrockmendel added 4 commits December 4, 2018 12:34

refactor array constructors into construction.py

21dac45

remove runtime import

9c1057b

update imports, re-privatize

d9a1dc3

update import

4f0cf90

Merge branch 'master' of https://github.com/pandas-dev/pandas into co…

4f0e7f4

…nstruct

jbrockmendel mentioned this pull request Dec 5, 2018

Implement DatetimeArray._from_sequence #24074

Merged

jreback added Internals Related to non-user accessible pandas implementation Clean labels Dec 5, 2018

jreback added this to the 0.24.0 milestone Dec 5, 2018

jreback reviewed Dec 5, 2018

View reviewed changes

jreback merged commit 669cb27 into pandas-dev:master Dec 5, 2018

jbrockmendel deleted the construct branch December 5, 2018 14:46

jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Dec 5, 2018

Follow-ups to pandas-dev#24100

792e4cd

jreback pushed a commit that referenced this pull request Dec 5, 2018

CLN: Follow-up to #24100 (#24116)

2643721

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

REF: Separate out DataFrame/Series Construction Helpers (pandas-dev#2…

08b339d

…4100)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

CLN: Follow-up to pandas-dev#24100 (pandas-dev#24116)

cc96411

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

REF: Separate out DataFrame/Series Construction Helpers (pandas-dev#2…

7a9b32a

…4100)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

CLN: Follow-up to pandas-dev#24100 (pandas-dev#24116)

3528621

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: Separate out DataFrame/Series Construction Helpers #24100

REF: Separate out DataFrame/Series Construction Helpers #24100

jbrockmendel commented Dec 4, 2018

pep8speaks commented Dec 4, 2018

codecov bot commented Dec 5, 2018

codecov bot commented Dec 5, 2018 •

edited

Loading

jreback left a comment

jreback Dec 5, 2018

jreback Dec 5, 2018

jreback Dec 5, 2018

jreback commented Dec 5, 2018

REF: Separate out DataFrame/Series Construction Helpers #24100

REF: Separate out DataFrame/Series Construction Helpers #24100

Conversation

jbrockmendel commented Dec 4, 2018

pep8speaks commented Dec 4, 2018

codecov bot commented Dec 5, 2018

Codecov Report

codecov bot commented Dec 5, 2018 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

jreback Dec 5, 2018

Choose a reason for hiding this comment

jreback Dec 5, 2018

Choose a reason for hiding this comment

jreback Dec 5, 2018

Choose a reason for hiding this comment

jreback commented Dec 5, 2018

codecov bot commented Dec 5, 2018 •

edited

Loading