-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Standard Error of the Mean (sem) aggregation method #6897
Comments
Does statsmodels do this?
|
Not as far as I can find. And I don't think it really belongs in statsmodels. In my opinion it is a pretty basic data wrangling task, like getting a mean or standard deviation, not the more advanced statistical modeling provided by statsmodel. |
can u point to the scipy method? |
http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.stats.sem.html @toddrjen What do you mean with an optimized method? And by the way, scipy.stats.sem is not that 'unoptimized'. In fact, it is even faster, as this does not do eg the extra nan-checking as pandas does:
But of course, the question still remains, do we provide a shortcut to this functionality in the form of a |
would be code-bloat IMHO, closing thanks for the suggestion. if you disagree, pls comment. |
@jreback i don't think this is code bloat relative to the alternative: You can't really use
Okay, so let's try it with
That's hardly what I would expect here, and masked arrays are almost as fun as recarrays. I'm +1 on reopening this. Here's what it would take to get the desired result from
|
no, but isn't this just ``s.std()/np.sqrt(len(s))` and even that's 'arbitrary' in my book not an issue with the code-bloat per se, but the definition |
agreed. that's really simple. i was just making a point about the nan handling, you can't just do |
not averse to this, but it just seems so simple that a user should do this (as I might want a different definition); that said if this is pretty 'standard' then would be ok |
every science institution i've ever worked in (just 3 really so not a whole lot of weight there) has used |
ok...will reopen for consideration in 0.15 then |
I have also been at three different institutions, and they also all used SEM. And I have seen it on hundreds of papers, presentations, and posters. |
ok...that's fine then, pls submit a PR! (needs to go in |
Pull request submitted: #7133 |
Pandas has df.sem() function or series.sem() |
A very common operation when trying to work with data is to find out the error range for the data. In scientific research, including error ranges is required.
There are two main ways to do this: standard deviation and standard error of the mean. Pandas has an optimized std aggregation method for both dataframe and groupby. However, it does not have an optimized standard error method, meaning users who want to compute error ranges have to rely on the unoptimized scipy method.
Since computing error ranges is such a common operation, I think it would be very useful if there was an optimized
sem
method like there is forstd
.The text was updated successfully, but these errors were encountered: