Override Pandas DataFrames created from I/O pandas operations #207

westernguy2 · 2021-01-08T09:02:19Z

There have been some issues like #188 and #150 where I/O operations have been outputting Pandas DataFrames instead of LuxDataFrames. This was fixed by manually overriding all different Pandas DataFrames that could be created by all the different pd.io modules.

Specifically, for the issue in #188, I locally tested read_sql (as well as read_sql_table and read_sql_query).

…master

* Similarity as a default action (#182) * similarity formatting fixed * added another similarity test case; fixed bug where colored heatmap dimension is temporal (invalidate all 2 msr 1 temporal case) * filter and similarity together * filter and similarity together * remove filter * black line length * file reorg and clean; change sim metric Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * bump numpy min version for travis * Special character issue (#184) * rename col * broken * fixed period replacement bug * add tests * refine tests * refine tests * remove cols * fix tests * add agg * fixed tests * clean up PR Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Colored bar interestingness bug (#189) * rewrote chi2 contingency with pd.crosstab * catching KeyError issue with chi2 contingency * padding interestingness with warning instead of error * interestingness now reuses ndim and nmsr computed in Compiler * bug fix for parser with int values * improve Vis repr to better display inferred intent when data is absent but fully compiled intent (all clauses) * Add sampling parameters as a global config (#192) * update export tutorial to add explanation for standalone argument * minor fixes and remove cell output in notebooks * added contributing doc * fix bugs and uncomment some tests * remove raise warning * remove unnecessary import * split up rename test into two parts * fix setting warning, fix data_type bugs and add relevant tests * remove ordinal data type * add test for small dataframe resetting index * add loc and iloc tests * fix attribute access directly to dataframe * add small changes to code * added test for qcut and cut * add check if dtype is Interval * added qcut test * fix Record KeyError * add tests * take care of reset_index case * small edits * add data_model to column_group Clause * small edits for row_group * fixes to row group * add config for start and cap for samples * finish sampling config and tests * black formatting * add documentation for sampling config * remove small added issues * minor changes to docs * implement heatmap flag and add tests * black formatting and documentation edits Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Coalesce all data_type attributes of frame into one (#185) * coalesce data_types into data_type_lookup * black reformat * changed to better variable names * lux not defined error * fixed * black format * Update CONTRIBUTING.md * Bug Fix: User-provided Index causes KeyError in Pandas Execution (#191) * Moved Executor Parameters to Global Config * Black formatting * Moved table_name parameter to frame.py. Removed executor_type parameter executor_type parameter no longer necessary to maintain * Fixed reference to table_name parameter table_name is now a parameter within frame.py * Adjusted Functions to Set SQL Connection Moved set_SQL_connection function to config. Added set_SQL_table function within frame.py to let users specify which database table will be associated with their dataframe * Update SQLExecutor name parameter * Fix Executor Reference Update current_vis() to reference lux.config.executor * Update frame.py * Moved set functions to global config * Fixed Index Issue in Pandas Executor Issue caused when user sets an index. The Pandas Executor was not correctly renaming this new index column to Record in execute_aggregate() * Added tests for set_index functions * Black formatting * Update Pandas Executor to handle NA values Readded missing dropna parameter within execute_aggregate() groupby function call * Updated Pandas Coverage Tests Commented out set_index case which has not been addressed yet * Black Formatting * Update to Pandas Executor Index Handling Cleaned up how execute_aggregrate renames index columns. Now retrieves the index name from vis.data instead of filtering out non-index columns. Created separate test function for when user specifies an index in read_csv. Co-authored-by: 19thyneb <thyne.boonmark@gmail.com> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Initialize Config once only during __init__ (#194) * basic matplotlib chart example * migrate register default action to init * config class * move actions * fixed tests * changes * alright * fix plot_config * black reformat * black reformat Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Ujjaini Mukhopadhyay <ujjaini@berkeley.edu> * Update README.md * Series Bugfix for describe and convert_dtypes (#197) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * Update Lux Docs (#195) * add black to travis * reformat all code and adjust test * remove .idea * fix contributing doc * small change in contributing * update * reformat, update command to fix version * remove dev dependencies * first pass -- inline comments * _config/config.py * delete test notebook * action * line length 105 * executor * interestingness * processor * vislib * tests, travis, CONTRIBUTING * .format () changed * replace tabs with escape chars * update using black * more rewrites and merges into single line * update pyproject.toml and makefile * coalesce data_types into data_type_lookup * black reformat * changed to better variable names * lux not defined error * fixed * black format * config doc updated * fix link for executor * more links * fixed overview * more links fixed * pandas methods no longer included * updates to some docstrings * black reformat * minor fixes * minor fix Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Supporting dataframe with integer columns (#203) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * various fixes to support int columns * fixed merge conflict issues. vis.data shows None DF. * Override Pandas DataFrames created from I/O pandas operations (#207) * update export tutorial to add explanation for standalone argument * minor fixes and remove cell output in notebooks * added contributing doc * fix bugs and uncomment some tests * remove raise warning * remove unnecessary import * split up rename test into two parts * fix setting warning, fix data_type bugs and add relevant tests * remove ordinal data type * add test for small dataframe resetting index * add loc and iloc tests * fix attribute access directly to dataframe * add small changes to code * added test for qcut and cut * add check if dtype is Interval * added qcut test * fix Record KeyError * add tests * take care of reset_index case * small edits * add data_model to column_group Clause * small edits for row_group * fixes to row group * add config for start and cap for samples * finish sampling config and tests * black formatting * add documentation for sampling config * remove small added issues * minor changes to docs * implement heatmap flag and add tests * black formatting and documentation edits * add pd.io equalities for DataFrames Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Merge master into sql-engine + minor mergeconflict fixes * Removing the PYNB * Cleaning up obsolete code * Configuration for topk and sort order (#206) * bugfix for describe and convert_dtypes * added back metadata series test * black * default to pandas display when df.dtypes printed * various fixes to support int columns * skip series vis for df.iterrows series element * config setting for modifying top K and sorting * note about regenerated config * Version lock for jupyter-client (#211) * move to single requirements-dev without lux-widget install manually * pin jedi version * pin jupyter-client version * add back old travis and requirement-dev * Mixed dtype issue (#205) * coalesce data_types into data_type_lookup * merge fixed * merge conflicts * add warning and suggestion on how to fix * formatting for warnings version * change to internal data * legibility update * test added * update test * test updated * xlrd in dev reqs * black * update link * changes to test logic, minor string format for warning Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * Fixes issue where value_counts was not returning LuxSeries (#210) * add series equality and value counts test * black formatting * fix old value counts test instead * minor fix Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> * bump version * update README Co-authored-by: Caitlyn Chen <caitlynachen@gmail.com> Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu> Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com> Co-authored-by: Kunal Agarwal <32151899+westernguy2@users.noreply.github.com> Co-authored-by: jinimukh <46768380+jinimukh@users.noreply.github.com> Co-authored-by: thyneb19 <thyneboonmark@berkeley.edu> Co-authored-by: 19thyneb <thyne.boonmark@gmail.com> Co-authored-by: Ujjaini Mukhopadhyay <ujjaini@berkeley.edu>

westernguy2 and others added 30 commits September 18, 2020 01:59

update export tutorial to add explanation for standalone argument

403bdd6

minor fixes and remove cell output in notebooks

be1ddc3

added contributing doc

2f1c5b2

fix bugs and uncomment some tests

a5caa69

remove raise warning

3fb197d

remove unnecessary import

ef82410

split up rename test into two parts

21d71ea

fix setting warning, fix data_type bugs and add relevant tests

2b8abe1

remove ordinal data type

7942161

add test for small dataframe resetting index

98f4c2e

add loc and iloc tests

18cace7

fix merge conflicts

6e9195b

fix attribute access directly to dataframe

dbdfdcd

add small changes to code

d63d006

Merge branch 'master' into master

4faff66

Merge branch 'master' of github.com:westernguy2/lux into westernguy2-…

083e091

…master

added test for qcut and cut

a998646

add check if dtype is Interval

acdd9c9

added qcut test

b8fa059

Merge branch 'master' of github.com:westernguy2/lux into westernguy2-…

1838ea9

…master

fix Record KeyError

a826e34

add tests

afc4f71

take care of reset_index case

a96baa4

small edits

da4c602

add data_model to column_group Clause

a03f275

small edits for row_group

4ff25e8

Merge branch 'master' of github.com:westernguy2/lux into westernguy2-…

cfe8772

…master

fixes to row group

cfcc50c

add config for start and cap for samples

3f60ca9

finish sampling config and tests

71e481d

westernguy2 and others added 9 commits December 28, 2020 18:36

black formatting

86006d6

add documentation for sampling config

f015c34

remove small added issues

f87d63b

Merge branch 'master' of github.com:westernguy2/lux into westernguy2-…

89d6310

…master

minor changes to docs

0782095

implement heatmap flag and add tests

6f09f93

black formatting and documentation edits

2ed47b8

Merge remote-tracking branch 'upstream/master'

aa968bb

add pd.io equalities for DataFrames

d30fd23

dorisjlee self-requested a review January 8, 2021 12:02

dorisjlee merged commit 9dc0958 into lux-org:master Jan 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Override Pandas DataFrames created from I/O pandas operations #207

Override Pandas DataFrames created from I/O pandas operations #207

westernguy2 commented Jan 8, 2021

Override Pandas DataFrames created from I/O pandas operations #207

Override Pandas DataFrames created from I/O pandas operations #207

Conversation

westernguy2 commented Jan 8, 2021