-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tex labels in MultiIndex columns #215
Merged
Merged
Changes from 61 commits
Commits
Show all changes
72 commits
Select commit
Hold shift + click to select a range
1f3d6c0
Made the default index be a set of 1s
williamjameshandley 4d203ca
Tested out the new weighted pandas functionality
williamjameshandley 26a0aae
Correction for 3.6. concat is preferred over append (which is depreca…
williamjameshandley 082f554
Merge branch 'master' into default_index
lukashergt 8256244
Added tests to check status of weighted series
williamjameshandley 4f20ab7
Stuck with transposes
williamjameshandley 04c73e4
Merge branch 'master' into default_index
williamjameshandley 1d05949
Reverted anesthetic samples to master
williamjameshandley 2b4efcf
First draft
williamjameshandley db1a183
All tests passing
williamjameshandley 54cee67
Merge branch 'master' into default_index
williamjameshandley 8b2a3e8
Tidied up docstrings and hidden functions
williamjameshandley 96fc8d5
Added formatting code for fixed with column labels
williamjameshandley 63cea7b
Removed annotations for 3.6 compatibility
williamjameshandley 7044548
Fix for python 3.6
williamjameshandley c8fd855
Merge branch 'master' into tex
lukashergt 6b81c3b
started merging
williamjameshandley 7780cc8
Majority of tests now passing
williamjameshandley cae0657
Fixed apart from tex deep copying
williamjameshandley 64fbcd0
Merge branch 'master' into default_index
williamjameshandley 2caa19d
Removed A.tex is B.tex tests, as these now fail in pandas
williamjameshandley a1338aa
Reorganised to improve diff, and allow set_weights to unweight with None
williamjameshandley fa95dfd
Merge branch 'default_index' into tex
williamjameshandley 988faca
Fixed bug in getdist test
williamjameshandley ba0eff5
Updated to test for unweighted correlation behaviour
williamjameshandley 0b69992
Increased coverage
williamjameshandley 4640519
Bringing up to coverage
williamjameshandley 469b6db
Upgrades to corrwith
williamjameshandley 634a06f
lint fix
williamjameshandley a18776f
Increasing coverage
williamjameshandley 9dd3756
First idea
williamjameshandley c152bc8
Merge branch 'default_index' into tex
williamjameshandley 41f3dff
Attempting a copy with a weighted labelled frame
williamjameshandley 56c853d
Merge branch 'tex' of github.com:williamjameshandley/anesthetic into tex
williamjameshandley 15cfe4f
Trying to squash importance and merging bug
williamjameshandley 897ae49
Merge branch 'master' into default_index
williamjameshandley c4faabf
Removed reordering of index
williamjameshandley a8830da
Added tests of reordering of indices
williamjameshandley bbe259c
Added some tests to check ordering
williamjameshandley 2625ce8
Issues with assignment
williamjameshandley 3998192
Merge branch 'master' into tex
williamjameshandley 8c7c0ae
Removed incomplete merge
williamjameshandley 8ffaedd
Merge branch 'default_index' into tex
williamjameshandley 82b29fb
Removed label structures
williamjameshandley 44936f0
Merge branch 'master' into tex
williamjameshandley 618a56c
LabelledSeries now tested
williamjameshandley 26da126
Trying a different strategy
williamjameshandley 65ca252
More robust setup.
williamjameshandley 5ecc6f9
Removed merge file
williamjameshandley 4e0eee9
First draft of new setup
williamjameshandley 4d80e76
Added column tests for multiindex labelled frame
williamjameshandley 0068918
Most tests passing
williamjameshandley db3ba6a
Minor corrections
williamjameshandley f90d270
increasing coverage
williamjameshandley d05a43a
lint corrections
williamjameshandley 95b65a6
Added axis functionality
williamjameshandley 54c9239
Added better labels keyword
williamjameshandley 3519fc2
set_label functionality now available
williamjameshandley 847a503
Merge branch 'master' into tex
williamjameshandley 1b463c3
Updated for when getdist is not installed
williamjameshandley 2318671
Fixes for python 3.6
williamjameshandley d5d19f6
Merge branch 'master' into tex
williamjameshandley 3158dd3
Updated test post merge
williamjameshandley 3dabdb8
Added tests for uncovered code
williamjameshandley 05c75f5
Made a new WeightedLabelled class
williamjameshandley ae15fe7
Improved passing on of labels to slices
williamjameshandley 3fe2472
Merge branch 'master' into tex
williamjameshandley 0124c98
Merge branch 'master' into tex
lukashergt 3273cae
Merge branch 'master' into tex
williamjameshandley 442e3fd
Merge branch 'tex' of github.com:williamjameshandley/anesthetic into tex
williamjameshandley 50ed374
Changed axis=0 to axis=1 for labelled portion of LabelledDataFrame
williamjameshandley 0bb2275
Removed all superfluous axis=1 specifications
williamjameshandley File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# flake8: noqa | ||
from pandas.io.formats.format import ( | ||
DataFrameFormatter as DataFrameFormatter, | ||
_make_fixed_width, is_numeric_dtype | ||
) | ||
from pandas import MultiIndex | ||
|
||
|
||
class _DataFrameFormatter(DataFrameFormatter): | ||
|
||
def _get_formatted_column_labels(self, frame): | ||
try: | ||
from pandas.core.indexes.multi import sparsify_labels | ||
except ImportError: | ||
sparsify_labels = lambda x, *args: x | ||
|
||
columns = frame.columns | ||
|
||
if isinstance(columns, MultiIndex): | ||
fmt_columns = columns.format(sparsify=False, adjoin=False) | ||
fmt_columns = list(zip(*fmt_columns)) | ||
dtypes = self.frame.dtypes._values | ||
|
||
# if we have a Float level, they don't use leading space at all | ||
restrict_formatting = any(level.is_floating for level in columns.levels) | ||
need_leadsp = dict(zip(fmt_columns, map(is_numeric_dtype, dtypes))) | ||
|
||
def space_format(x, y): | ||
if ( | ||
y not in self.formatters | ||
and need_leadsp[x] | ||
and not restrict_formatting | ||
): | ||
return " " + y | ||
return y | ||
|
||
str_columns = list( | ||
zip(*([space_format(x, y) for y in x] for x in fmt_columns)) | ||
) | ||
if self.sparsify and len(str_columns): | ||
str_columns = sparsify_labels(str_columns) | ||
|
||
str_columns = [list(x) for x in zip(*str_columns)] | ||
str_columns = [_make_fixed_width(x) for x in str_columns] | ||
else: | ||
fmt_columns = columns.format() | ||
dtypes = self.frame.dtypes | ||
need_leadsp = dict(zip(fmt_columns, map(is_numeric_dtype, dtypes))) | ||
str_columns = [ | ||
[" " + x if not self._get_formatter(i) and need_leadsp[x] else x] | ||
for i, x in enumerate(fmt_columns) | ||
] | ||
str_columns = [_make_fixed_width(x) for x in str_columns] | ||
# self.str_columns = str_columns | ||
return str_columns |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,216 @@ | ||
"""Pandas DataFrame and Series with labelled columns.""" | ||
from pandas import Series, DataFrame, MultiIndex | ||
from pandas.core.indexing import (_LocIndexer as _LocIndexer_, | ||
_AtIndexer as _AtIndexer_) | ||
import numpy as np | ||
from functools import cmp_to_key | ||
|
||
|
||
def ac(funcs, *args): | ||
"""Accessor function helper. | ||
|
||
Given a list of callables `funcs`, and their arguments `*args`, evaluate | ||
each of these, catching exceptions, and then sort results by their | ||
dimensionality, smallest first. Return the non-exceptional result with the | ||
smallest dimensionality. | ||
""" | ||
results = [] | ||
errors = [] | ||
for f in funcs: | ||
try: | ||
results.append(f(*args)) | ||
except Exception as e: | ||
errors.append(e) | ||
|
||
def cmp(x, y): | ||
if x.ndim > y.ndim: | ||
return 1 | ||
elif x.ndim < y.ndim: | ||
return -1 | ||
else: | ||
x_levels = 0 | ||
y_levels = 0 | ||
if x.ndim > 0: | ||
x_levels += x.index.nlevels | ||
y_levels += y.index.nlevels | ||
if x.ndim > 1: | ||
x_levels += x.columns.nlevels | ||
y_levels += y.columns.nlevels | ||
|
||
if x_levels < y_levels: | ||
return 1 | ||
elif x_levels > y_levels: | ||
return -1 | ||
else: | ||
return 0 | ||
|
||
results.sort(key=cmp_to_key(cmp)) | ||
|
||
for s in results: | ||
if s is not None: | ||
return s | ||
raise errors[-1] | ||
|
||
|
||
class _LocIndexer(_LocIndexer_): | ||
def __getitem__(self, key): | ||
return ac([_LocIndexer_("loc", self.obj.drop_labels(i)).__getitem__ | ||
for i in self.obj._all_axes()] + [super().__getitem__], key) | ||
|
||
|
||
class _AtIndexer(_AtIndexer_): | ||
def __getitem__(self, key): | ||
return ac([_AtIndexer_("at", self.obj.drop_labels(i)).__getitem__ | ||
for i in self.obj._all_axes()] + [super().__getitem__], key) | ||
|
||
|
||
class _LabelledObject(object): | ||
"""Common methods for LabelledSeries and LabelledDataFrame.""" | ||
|
||
def __init__(self, *args, **kwargs): | ||
self._labels = ("labels", "labels") | ||
labels = kwargs.pop(self._labels[0], None) | ||
super().__init__(*args, **kwargs) | ||
if labels is not None: | ||
self.set_labels(labels, inplace=True) | ||
|
||
def islabelled(self, axis=0): | ||
"""Determine if labels are actually present.""" | ||
return (self._labels[axis] is not None | ||
and self._labels[axis] in self._get_axis(axis).names) | ||
|
||
def get_labels(self, axis=0): | ||
"""Retrieve labels from an axis.""" | ||
if self.islabelled(axis): | ||
return self._get_axis(axis).get_level_values( | ||
self._labels[axis]).to_numpy() | ||
else: | ||
return None | ||
|
||
def get_labels_map(self, axis=0): | ||
"""Retrieve mapping from paramnames to labels from an axis.""" | ||
index = self._get_axis(axis) | ||
if self.islabelled(axis): | ||
return index.to_frame().droplevel('labels')['labels'] | ||
else: | ||
return Series('', index=index) | ||
|
||
def get_label(self, param, axis=0): | ||
"""Retrieve mapping from paramnames to labels from an axis.""" | ||
return self.get_labels_map(axis)[param] | ||
lukashergt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
def set_label(self, param, value, axis=0, inplace=False): | ||
labels = self.get_labels_map(axis) | ||
labels[param] = value | ||
return self.set_labels(labels, axis=axis, inplace=inplace) | ||
|
||
def drop_labels(self, axis=0): | ||
axes = np.atleast_1d(axis) | ||
result = self | ||
for axis in axes: | ||
if self.islabelled(axis): | ||
result = result.droplevel(self._labels[axis], axis) | ||
return result | ||
|
||
def _all_axes(self): | ||
if isinstance(self, LabelledSeries): | ||
return [0] | ||
else: | ||
return [0, 1, [0, 1]] | ||
|
||
@property | ||
def loc(self): | ||
return _LocIndexer("loc", self) | ||
|
||
@property | ||
def at(self): | ||
return _AtIndexer("at", self) | ||
|
||
def xs(self, key, axis=0, level=None, drop_level=True): | ||
return ac([super(_LabelledObject, self.drop_labels(i)).xs | ||
for i in self._all_axes()] + [super().xs], | ||
key, axis, level, drop_level) | ||
|
||
def __getitem__(self, key): | ||
return ac([super(_LabelledObject, self.drop_labels(i)).__getitem__ | ||
for i in self._all_axes()] + [super().__getitem__], key) | ||
|
||
def __setitem__(self, key, val): | ||
super().__setitem__(key, val) | ||
|
||
def set_labels(self, labels, axis=0, inplace=False, level=None): | ||
"""Set labels along an axis.""" | ||
if inplace: | ||
result = self | ||
else: | ||
result = self.copy() | ||
|
||
if labels is None: | ||
if result.islabelled(axis=axis): | ||
result = result.drop_labels(axis) | ||
else: | ||
names = [n for n in result._get_axis(axis).names | ||
if n != self._labels[axis]] | ||
index = [result._get_axis(axis).get_level_values(n) for n in names] | ||
if level is None: | ||
if result.islabelled(axis): | ||
level = result._get_axis(axis | ||
).names.index(self._labels[axis]) | ||
else: | ||
level = len(index) | ||
index.insert(level, labels) | ||
names.insert(level, self._labels[axis]) | ||
|
||
index = MultiIndex.from_arrays(index, names=names) | ||
result.set_axis(index, axis=axis, inplace=True) | ||
|
||
if inplace: | ||
self._update_inplace(result) | ||
else: | ||
return result.__finalize__(self, "set_labels") | ||
|
||
def reset_index(self, level=None, drop=False, inplace=False, | ||
*args, **kwargs): | ||
"""Reset the index, retaining labels.""" | ||
labels = self.get_labels() | ||
answer = super().reset_index(level=level, drop=drop, | ||
inplace=False, *args, **kwargs) | ||
answer.set_labels(labels, inplace=True) | ||
if inplace: | ||
self._update_inplace(answer) | ||
else: | ||
return answer.__finalize__(self, "reset_index") | ||
|
||
|
||
class LabelledSeries(_LabelledObject, Series): | ||
"""Labelled version of pandas.Series.""" | ||
|
||
_metadata = Series._metadata + ['_labels'] | ||
|
||
@property | ||
def _constructor(self): | ||
return LabelledSeries | ||
|
||
@property | ||
def _constructor_expanddim(self): | ||
return LabelledDataFrame | ||
|
||
|
||
class LabelledDataFrame(_LabelledObject, DataFrame): | ||
"""Labelled version of pandas.DataFrame.""" | ||
|
||
_metadata = DataFrame._metadata + ['_labels'] | ||
|
||
@property | ||
def _constructor_sliced(self): | ||
return LabelledSeries | ||
|
||
@property | ||
def _constructor(self): | ||
return LabelledDataFrame | ||
|
||
def transpose(self, copy=False): | ||
"""Transpose.""" | ||
result = super().transpose(copy=copy) | ||
result._labels = (result._labels[1], result._labels[0]) | ||
return result |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_labels_map
orget_labels_dict
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the latest upgrade, I've made the 'map' be a pandas dataframe (which is easy to generate). Still not happy with the nomenclature though.