Refactor initialization for simpler maintenance #730

coroa · 2023-02-22T10:56:34Z

Please confirm that this PR has done the following:

Tests Added
~~Documentation Added~~
~~Name of contributors Added to AUTHORS.rst~~
Description in RELEASE_NOTES.md Added

Description of PR

~~On-top of #729.~~ Rebased to main.

Splits utils.format_data into 6 different functions:

_convert_r_columns(df) - Check and convert R-style year columns
_knead_data(df, **kwargs) - Replace, rename and concat according to user arguments
_format_from_database(df) - Post-process database results
_intuit_column_groups(df, index) - Check and categorise columns in dataframe
_format_data_to_series(df, index) - Convert a long or wide pandas dataframe to a series with the required columns
_validate_complete_index(df)

Functionally it should be neutral. Make sure to look at individual commits to follow the refactoring trail.

codecov · 2023-02-22T11:15:03Z

Codecov Report

Merging #730 (58eb241) into main (7a97516) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@          Coverage Diff          @@
##            main    #730   +/-   ##
=====================================
  Coverage   95.0%   95.0%           
=====================================
  Files         59      59           
  Lines       6014    6020    +6     
=====================================
+ Hits        5717    5725    +8     
+ Misses       297     295    -2

Impacted Files	Coverage Δ
pyam/core.py	`95.4% <100.0%> (ø)`
pyam/utils.py	`92.6% <100.0%> (+0.7%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Co-authored-by: Matthew Gidden <matthew.gidden@gmail.com>

gidden · 2023-02-22T15:36:31Z

This LGTM but best if @danielhuppmann approves!

danielhuppmann

Looks good, a few suggestions inline.

Also, maybe move _validate_complete_index(df) above format_data() to have all components defined (in the order as they are called) before the actual function.

pyam/utils.py

danielhuppmann · 2023-02-23T11:44:29Z

pyam/utils.py

-        df = df.stack(dropna=True)
-        df.name = "value"
-        df.index.names = df.index.names[:-1] + [time_col]
+    df, time_col, extra_cols = _format_data_to_series(df, index)

    # cast value column to numeric


Might also put these two checks into their own function _check_data_integrity()...

pyam/utils.py

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

pyam/utils.py

gidden · 2023-02-24T10:51:12Z

LGTM - thanks @coroa ! Will merge after tests pass.

coroa requested a review from danielhuppmann February 22, 2023 10:56

coroa force-pushed the split-format-data branch from 145ee39 to c841dcb Compare February 22, 2023 11:05

This was referenced Feb 22, 2023

Add fast-path to format data #731

Merged

Next iteration at a fast format_data #727

Closed

Improve performance of format_data() #729

Merged

coroa and others added 7 commits February 22, 2023 13:40

Split format_data into functions

d2ed30c

Co-authored-by: Matthew Gidden <matthew.gidden@gmail.com>

Reorder format_data

1154ae6

Co-authored-by: Matthew Gidden <matthew.gidden@gmail.com>

Split out format_data_to_series from format_data

15dc654

Fix unused import

3cd632d

Avoid copying data

fc870ea

Co-authored-by: Matthew Gidden <matthew.gidden@gmail.com>

Add function doc-strings

a0497b5

Update release notes

3146714

coroa force-pushed the split-format-data branch from c841dcb to 3146714 Compare February 22, 2023 12:40

danielhuppmann approved these changes Feb 23, 2023

View reviewed changes

Apply suggestions from code review

abcc95c

Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

gidden reviewed Feb 24, 2023

View reviewed changes

pyam/utils.py Outdated Show resolved Hide resolved

Update pyam/utils.py

58eb241

gidden merged commit 128fb19 into main Feb 24, 2023

gidden deleted the split-format-data branch February 24, 2023 12:03

gidden restored the split-format-data branch February 24, 2023 12:04

danielhuppmann mentioned this pull request Mar 6, 2023

Improve performance of IamDataFrame initialization (phase 2) #580

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor initialization for simpler maintenance #730

Refactor initialization for simpler maintenance #730

coroa commented Feb 22, 2023 •

edited

Loading

codecov bot commented Feb 22, 2023 •

edited

Loading

gidden commented Feb 22, 2023

danielhuppmann left a comment

danielhuppmann Feb 23, 2023

gidden commented Feb 24, 2023

Refactor initialization for simpler maintenance #730

Refactor initialization for simpler maintenance #730

Conversation

coroa commented Feb 22, 2023 • edited Loading

Please confirm that this PR has done the following:

Description of PR

codecov bot commented Feb 22, 2023 • edited Loading

Codecov Report

gidden commented Feb 22, 2023

danielhuppmann left a comment

Choose a reason for hiding this comment

danielhuppmann Feb 23, 2023

Choose a reason for hiding this comment

gidden commented Feb 24, 2023

coroa commented Feb 22, 2023 •

edited

Loading

codecov bot commented Feb 22, 2023 •

edited

Loading