Skip to content

Commit

Permalink
Merging the master branch changes into sql-engine (#208)
Browse files Browse the repository at this point in the history
* Similarity as a default action (#182)

* similarity formatting fixed

* added another similarity test case; fixed bug where colored heatmap dimension is temporal (invalidate all 2 msr 1 temporal case)

* filter and similarity together

* filter and similarity together

* remove filter

* black line length

* file reorg and clean; change sim metric

Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu>
Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

* bump numpy min version for travis

* Special character issue (#184)

* rename col

* broken

* fixed period replacement bug

* add tests

* refine tests

* refine tests

* remove cols

* fix tests

* add agg

* fixed tests

* clean up PR

Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu>
Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

* Colored bar interestingness bug (#189)
* rewrote chi2 contingency with pd.crosstab
* catching KeyError issue with chi2 contingency
* padding interestingness with warning instead of error
* interestingness now reuses ndim and nmsr computed in Compiler
* bug fix for parser with int values
* improve Vis repr to better display inferred intent when data is absent but fully compiled intent (all clauses)

* Add sampling parameters as a global config (#192)

* update export tutorial to add explanation for standalone argument

* minor fixes and remove cell output in notebooks

* added contributing doc

* fix bugs and uncomment some tests

* remove raise warning

* remove unnecessary import

* split up rename test into two parts

* fix setting warning, fix data_type bugs and add relevant tests

* remove ordinal data type

* add test for small dataframe resetting index

* add loc and iloc tests

* fix attribute access directly to dataframe

* add small changes to code

* added test for qcut and cut

* add check if dtype is Interval

* added qcut test

* fix Record KeyError

* add tests

* take care of reset_index case

* small edits

* add data_model to column_group Clause

* small edits for row_group

* fixes to row group

* add config for start and cap for samples

* finish sampling config and tests

* black formatting

* add documentation for sampling config

* remove small added issues

* minor changes to docs

* implement heatmap flag and add tests

* black formatting and documentation edits

Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

* Coalesce all data_type attributes of frame into one (#185)

* coalesce data_types into data_type_lookup

* black reformat

* changed to better variable names

* lux not defined error

* fixed

* black format

* Update CONTRIBUTING.md

* Bug Fix: User-provided Index causes KeyError in Pandas Execution (#191)

* Moved Executor Parameters to Global Config

* Black formatting

* Moved table_name parameter to frame.py. Removed executor_type parameter

executor_type parameter no longer necessary to maintain

* Fixed reference to table_name parameter

table_name is now a parameter within frame.py

* Adjusted Functions to Set SQL Connection

Moved set_SQL_connection function to config. Added set_SQL_table function within frame.py to let users specify which database table will be associated with their dataframe

* Update SQLExecutor name parameter

* Fix Executor Reference

Update current_vis() to reference lux.config.executor

* Update frame.py

* Moved set functions to global config

* Fixed Index Issue in Pandas Executor

Issue caused when user sets an index. The Pandas Executor was not correctly renaming this new index column to Record in execute_aggregate()

* Added tests for set_index functions

* Black formatting

* Update Pandas Executor to handle NA values

Readded missing dropna parameter within execute_aggregate() groupby function call

* Updated Pandas Coverage Tests

Commented out set_index case which has not been addressed yet

* Black Formatting

* Update to Pandas Executor Index Handling

Cleaned up how execute_aggregrate renames index columns. Now retrieves the index name from vis.data instead of filtering out non-index columns.

Created separate test function for when user specifies an index in read_csv.

Co-authored-by: 19thyneb <thyne.boonmark@gmail.com>
Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

* Initialize Config once only during __init__ (#194)

* basic matplotlib chart example

* migrate register default action to init

* config class

* move actions

* fixed tests

* changes

* alright

* fix plot_config

* black reformat

* black reformat

Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>
Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu>
Co-authored-by: Ujjaini Mukhopadhyay <ujjaini@berkeley.edu>

* Update README.md

* Series Bugfix for describe and convert_dtypes (#197)

* bugfix for describe and convert_dtypes

* added back metadata series test

* black

* default to pandas display when df.dtypes printed

* Update Lux Docs (#195)

* add black to travis

* reformat all code and adjust test

* remove .idea

* fix contributing doc

* small change in contributing

* update

* reformat, update command to fix version

* remove dev dependencies

* first pass -- inline comments

* _config/config.py

* delete test notebook

* action

* line length 105

* executor

* interestingness

* processor

* vislib

* tests, travis, CONTRIBUTING

* .format
() changed

* replace tabs with escape chars

* update using black

* more rewrites and merges into single line

* update pyproject.toml and makefile

* coalesce data_types into data_type_lookup

* black reformat

* changed to better variable names

* lux not defined error

* fixed

* black format

* config doc updated

* fix link for executor

* more links

* fixed overview

* more links fixed

* pandas methods no longer included

* updates to some docstrings

* black reformat

* minor fixes

* minor fix

Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>

* Supporting dataframe with integer columns  (#203)

* bugfix for describe and convert_dtypes

* added back metadata series test

* black

* default to pandas display when df.dtypes printed

* various fixes to support int columns

* fixed merge conflict issues. vis.data shows None DF.

* Merge master into sql-engine + minor mergeconflict fixes

* Removing the PYNB

* Cleaning up obsolete code

Co-authored-by: Caitlyn Chen <caitlynachen@gmail.com>
Co-authored-by: Caitlyn Chen <caitlynachen@berkeley.edu>
Co-authored-by: Doris Lee <dorisjunglinlee@gmail.com>
Co-authored-by: Kunal Agarwal <32151899+westernguy2@users.noreply.github.com>
Co-authored-by: jinimukh <46768380+jinimukh@users.noreply.github.com>
Co-authored-by: thyneb19 <thyneboonmark@berkeley.edu>
Co-authored-by: 19thyneb <thyne.boonmark@gmail.com>
Co-authored-by: Ujjaini Mukhopadhyay <ujjaini@berkeley.edu>
  • Loading branch information
9 people authored Jan 9, 2021
1 parent d3819b7 commit 289f670
Show file tree
Hide file tree
Showing 70 changed files with 1,767 additions and 910 deletions.
9 changes: 5 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Lux is a project undergoing active development. If you are interested in contributing to Lux, the open tasks on [GitHub issues](https://github.com/lux-org/lux/issues), esp. issues labelled with the tag [`easy`](https://github.com/lux-org/lux/labels/easy), are good places for newcomers to contribute. This guide contains information on the workflow for contributing to the Lux codebase. For more information on the Lux architecture, see this [documentation page](https://lux-api.readthedocs.io/en/latest/source/advanced/architecture.html). For any additional questions and issues, please post on the [Slack channel](http://lux-project.slack.com/).
Lux is a project undergoing active development. If you are interested in contributing to Lux, the open tasks on [GitHub issues](https://github.com/lux-org/lux/issues), esp. issues labelled with the tag [`easy`](https://github.com/lux-org/lux/labels/easy), are good places for newcomers to contribute. This guide contains information on the workflow for contributing to the Lux codebase. For more information on the Lux architecture, see this [documentation page](https://lux-api.readthedocs.io/en/latest/source/advanced/architecture.html).


# Setting up Build and Installation Process
Expand All @@ -14,6 +14,7 @@ You can install Lux by building from the source code in your fork directly:
```bash
cd lux/
pip install --user -r requirements.txt
pip install --user -r requirements-dev.txt
python setup.py install
```

Expand All @@ -36,7 +37,7 @@ lux/
```

# Code Formatting
In order to keep our codebase clean and readible, we are using PEP8 guidelines. To help us maintain and check code style, we are using [black](https://github.com/psf/black). Simply run `black .` before commiting. Failure to do so may fail the tests run on Travis. This package should have been installed for you.
In order to keep our codebase clean and readible, we are using PEP8 guidelines. To help us maintain and check code style, we are using [black](https://github.com/psf/black). Simply run `black .` before commiting. Failure to do so may fail the tests run on Travis. This package should have been installed for you as part of [requirements-dev](https://github.com/lux-org/lux/blob/master/requirements-dev.txt).

# Running the Test Suite

Expand Down Expand Up @@ -67,11 +68,11 @@ Once the pull request is submitted, the maintainer will get notified and review

# Building Documentation

To build the documentation in HTML, you can run this command locally in the `doc/` folder:
Lux uses [Sphinx](https://www.sphinx-doc.org/en/master/) to generate the documentations, which contains both the docstring and the written documentation in the `doc/` folder. To build the documentation in HTML, you can run this command locally in the `doc/` folder:

```bash
make html
```

This generates all the HTML documentation files in `doc/_build/html/`. The configuration file `conf.py` contains information related to Sphinx settings. The Sphinx documentations are written as ReStructuredText (`*.rst` files) and mostly stored in the `source/` folder. The documentation inside `source/reference` is auto-generated by Sphinx. The repository is linked with ReadTheDocs, which triggers the build for the latest documentation based on the most recent commit. As a result, we do not commit anything inside `doc/_build` in the Github repository.
This generates all the HTML documentation files in `doc/_build/html/`. The configuration file `conf.py` contains information related to Sphinx settings. The Sphinx documentations are written as ReStructuredText (`*.rst` files) and mostly stored in the `source/` folder. The documentation inside `source/reference` is auto-generated by Sphinx. The repository is linked with [ReadTheDocs](https://readthedocs.org/projects/lux-api/), which triggers the build for the latest documentation based on the most recent commit. As a result, we do not commit anything inside `doc/_build` in the Github repository.

4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,10 @@ import lux
import pandas as pd
```

Then, Lux can be used as-is, without modifying any of your existing Pandas code. Here, we use Pandas's [read_csv](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) command to load in a [dataset of colleges](https://collegescorecard.ed.gov/data/documentation/) and their properties.
Then, Lux can be used as-is, without modifying any of your existing Pandas code. Here, we use Pandas's [read_csv](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) command to load in a [dataset of colleges](https://github.com/lux-org/lux-datasets/blob/master/data/college.csv) and their properties.

```python
df = pd.read_csv("college.csv")
df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/college.csv")
df
```

Expand Down
2 changes: 1 addition & 1 deletion doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
"sphinx_automodapi.automodsumm",
]

autodoc_default_flags = ["members", "inherited-members"]
autodoc_default_flags = ["members", "no-undoc-members"]
autodoc_member_order = "groupwise"
autosummary_generate = True
numpydoc_show_class_members = False
Expand Down
2 changes: 1 addition & 1 deletion doc/source/advanced/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,4 +80,4 @@ Number of Dimensions Number of Measures Mark Type
Executor
----------
The data executor populates each Vis with a subset of the dataframe based on the specified intent.
You can learn more about executors in Lux `here <https://lux-api.readthedocs.io/en/dfapi/source/guide/executor.html>`_.
You can learn more about executors in Lux `here <https://lux-api.readthedocs.io/en/latest/source/advanced/executor.html>`_.
2 changes: 1 addition & 1 deletion doc/source/advanced/date.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ Below we look at an example stocks dataset that also has `date` field with each

.. code-block:: python
df = pd.read_csv("../../lux/data/stocks.csv")
df = pd.read_csv("https://github.com/lux-org/lux-datasets/blob/master/data/stocks.csv?raw=true")
df.dtypes
Expand Down
33 changes: 18 additions & 15 deletions doc/source/advanced/interestingness.rst
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
**********************
*******************************
Interestingness Scoring
**********************
*******************************

In Lux, recommended visualizations are scored and ranked based on their statistical properties.
Lux uses various standard metrics for determining how interesting a visualization is.
The choice of an interestingness metric is dependent on the chart type, as shown in the following table.

+----------------+---------+------------------------------------------------------------------+
| Chart Type | Filter? | Function |
+================+=========+==================================================================+
| Bar/Line Chart || :func:`lux.interestingness.interestingness.unevenness` |
| +---------+------------------------------------------------------------------+
+----------------+---------+--------------------------------------------------------------------+
| Chart Type | Filter? | Function |
+================+=========+====================================================================+
| Bar/Line Chart || :func:`lux.interestingness.interestingness.unevenness` |
| +---------+--------------------------------------------------------------------+
| | X | :func:`lux.interestingness.interestingness.deviation_from_overall` |
+----------------+---------+------------------------------------------------------------------+
| Histogram || :func:`lux.interestingness.interestingness.skewness` |
| +---------+------------------------------------------------------------------+
+----------------+---------+--------------------------------------------------------------------+
| Histogram || :func:`lux.interestingness.interestingness.skewness` |
| +---------+--------------------------------------------------------------------+
| | X | :func:`lux.interestingness.interestingness.deviation_from_overall` |
+----------------+---------+------------------------------------------------------------------+
| Scatterplot | ✔/X | :func:`lux.interestingness.interestingness.monotonicity` |
+----------------+---------+------------------------------------------------------------------+
+----------------+---------+--------------------------------------------------------------------+
| Scatterplot | ✔/X | :func:`lux.interestingness.interestingness.monotonicity` |
+----------------+---------+--------------------------------------------------------------------+

Bar Chart Interestingness
=========================
Expand All @@ -30,7 +30,7 @@ Bar charts without filters: Unevenness

A chart is scored higher if it is more uneven, indicating high variation
in the individual bar values in the chart. The score is computed based
on the difference between the value of the bar chart .. math::`V` and the flat uniform distribution .. math::`V_{flat}`.
on the difference between the value of the bar chart :math:`V` and the flat uniform distribution :math:`V_{flat}`.
The difference is captured via the Euclidean distance (L2 norm).


Expand All @@ -42,6 +42,7 @@ The difference is captured via the Euclidean distance (L2 norm).
.. Example: "Occurrence" recommendation
.. _barWithFilter:

Bar charts with filters: Deviation from Overall
-----------------------------------------------

Expand Down Expand Up @@ -77,6 +78,7 @@ The skewness is computed based on `scipy.stats.skew <https://docs.scipy.org/doc/
.. _histoWithFilter:

Histogram with filters: Deviation from overall
-----------------------------------------------

Expand All @@ -91,9 +93,10 @@ The deviation measures how different is the filtered distribution from the overa
.. Example: "Filter" recommendation where the intent only has 1 measure.
Scatterplot Interestingness
=========================
==============================

.. _scatter:

Scatterplot: Monotonicity
-----------------------------------

Expand Down
18 changes: 7 additions & 11 deletions doc/source/getting_started/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Overview
********

.. note:: You can follow along this tutorial in a Jupyter notebook. [`Github <https://github.com/lux-org/lux-binder/blob/master/tutorial/tutorial/0-overview.ipynb>`_] [`Binder <https://mybinder.org/v2/gh/lux-org/lux-binder/master?urlpath=tree/tutorial/0-overview.ipynb>`_]
.. note:: You can follow along this tutorial in a Jupyter notebook. [`Github <https://github.com/lux-org/lux-binder/blob/master/tutorial/0-overview.ipynb>`_] [`Binder <https://mybinder.org/v2/gh/lux-org/lux-binder/master?urlpath=tree/tutorial/0-overview.ipynb>`_]

This tutorial provides an overview of how you can use Lux in your data exploration workflow.

Expand All @@ -25,8 +25,7 @@ Lux preserves the Pandas dataframe semantics -- which means that you can apply a
df = pd.read_csv("lux/data/college.csv")
Lux is built on the philosophy that generating useful visualizations should be as simple as printing out a dataframe.
When you print out the dataframe in the notebook, you should see the default Pandas table display with an additional Toggle button.
To visualize your dataframe in Lux, simply print out the dataframe. You should see the default Pandas table display with an additional toggle button.

.. code-block:: python
Expand All @@ -37,7 +36,7 @@ When you print out the dataframe in the notebook, you should see the default Pan
:align: center
:alt: click on toggle, scroll on Correlation

By clicking on the Toggle button, you can now explore the data visually through Lux. You should see three tabs of visualizations recommended to you.
By clicking on the Toggle button, you can now explore the data visually through Lux. You should see several categories of visualizations recommended to you by browsing through the different tabs.

.. image:: ../../../../lux-resources/doc_img/overview-2.gif
:width: 700
Expand Down Expand Up @@ -75,7 +74,7 @@ As shown in the example above, by default, we display three types of actions sho
:alt: Example of even and uneven category distributions


Refer to :doc:`this page <../advanced/action>` for details on different types of action in Lux.
Refer to :doc:`this page <../reference/lux.action>` for details on different types of action in Lux.

Expressing Analysis Interest and Goals with User `Intent`
----------------------------------------------------------
Expand Down Expand Up @@ -111,7 +110,7 @@ You can specify a variety of things that you might be interested in, for example
df.intent = ["MedianEarnings", "FundingModel=Public"]
df
For more advance use of intent, refer to :doc:`this page <../getting_started/intent>` on how to specify the intent.
For more advance use of intent, refer to :doc:`this page <../guide/intent>` on how to specify the intent.

Steering Recommendations via User Intent
----------------------------------------
Expand All @@ -129,7 +128,7 @@ Given the updated intent, additional actions (Enhance and Filter) are generated.
- {MedianEarnings, **AverageCost**}
- {MedianEarnings, **AverageFacultySalary**}.

.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/overview-4.png
.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/overview-4.png?raw=true
:width: 700
:align: center
:alt: screenshot of Enhance
Expand All @@ -140,10 +139,7 @@ Given the updated intent, additional actions (Enhance and Filter) are generated.
- {MedianEarnings, **Region=Southeast**}
- {MedianEarnings, **Region=Great Lakes**}.

.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/overview-5.png
.. image:: https://github.com/lux-org/lux-resources/blob/master/doc_img/overview-5.png?raw=true
:width: 700
:align: center
:alt: screenshot of Filter


.. Lux is built on the principle that users should always be able to visualize and explore anything they specify, without having to think about how the visualization should look like.
Loading

0 comments on commit 289f670

Please sign in to comment.