Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PairGrid and missing values? #407

Closed
jseabold opened this issue Dec 24, 2014 · 7 comments
Closed

PairGrid and missing values? #407

jseabold opened this issue Dec 24, 2014 · 7 comments

Comments

@jseabold
Copy link
Contributor

Should the dropna argument work for this and do pairwise dropping?

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

dta = pd.util.testing.makeMissingDataframe()
g = sns.PairGrid(dta)
g.map_diag(plt.hist)
@jseabold
Copy link
Contributor Author

Workarounds for histogram/scatter.

def na_hist(x, *args, **kwargs):
    x.plot(kind='hist', *args, **kwargs)


def na_scatter(x, y, *args, **kwargs):
    dta = pd.DataFrame([x, y]).T.dropna()
    x, y = dta
    x = dta[x]
    y = dta[y]
    plt.scatter(x, y, *args, **kwargs)

Though it looks like I'm missing some things with these. E.g., xaxis labels and ticks.

@DizietAsahi
Copy link

this still does not work. Am I missing how dropna works?

> from matplotlib import pyplot as plt
> import numpy as np
> import pandas as pd
> import seaborn as sns
> sns.__version__
'0.7.dev'
generate an example DataFrame
> a = pd.DataFrame(data={
    'a': np.random.normal(size=(100,)),
    'b': np.random.lognormal(size=(100,)),
    'c': np.random.exponential(size=(100,))})
> sns.pairplot(a) # this works as expected
(...)
> b = a.copy()
> b.iloc[5,2] = np.nan # replace one value in col 'c' by a NaN
> sns.pairplot(b) # this fails with error 
                  # "AttributeError: max must be larger than min in range parameter."
                  # in histogram(a, bins, range, normed, weights, density)"
> sns.pairplot(b, dropna=True) # same error as above

@wehlutyk
Copy link

wehlutyk commented Apr 1, 2016

Any news on this or where it's going? 0.7.0 still has the same behaviour.

@Alsanis
Copy link

Alsanis commented May 5, 2016

Is any progress on the problem?

@cbrnr
Copy link

cbrnr commented Jun 22, 2016

In fact, the dropna argument seems to be misleading (or at least I haven't really figured out what it is supposed to do). IMO the expected behavior should be equivalent to the following line in the example by @DizietAsahi above:

sns.pairplot(b.dropna())

@DizietAsahi
Copy link

The problem with using
sns.pairplot(b.dropna())
is that b.dropna() drops the whole line, even though there are some pairs of values that could be plotted. NaNs should only be dropped at the level of each pair of values IMHO.

@cbrnr
Copy link

cbrnr commented Jun 22, 2016

I agree. Or it might even be useful to specify the dropping strategy with the argument dropna (such as implemented in R, e.g. complete cases, pairwise, ...). In any case, the current implementation of dropna simply does not work at all with the default settings of having histograms on the diagonal (it does work with kdes though).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants