Skip to content

Commit

Permalink
Import weighted stats and moments from StatsBase to Statistics
Browse files Browse the repository at this point in the history
This includes methods for mean, quantile, median, var, std, cov and cor,
plus new functions skewness and kurtosis, and weight types.
Code is copied from StatsBase with some cleanup where needed, in particular
for dispatch, to move from `@nloops`/`@nrefs` to cartesian indexing and to
be closer to the mapreducedim code.
Weights are now passed via a keyword argument rather than by dispatching on
AbstractWeights, so as to support any array where all weights types
give the same result.
  • Loading branch information
nalimilan committed Mar 18, 2019
1 parent b93fd23 commit 4da493e
Show file tree
Hide file tree
Showing 7 changed files with 1,684 additions and 66 deletions.
76 changes: 68 additions & 8 deletions stdlib/Statistics/docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,82 @@
DocTestSetup = :(using Statistics)
```

The Statistics module contains basic statistics functionality.
The Statistics module contains basic statistics functionality: mean, median, quantiles,
standard deviation, variance, skewness, kurtosis, correlation and covariance.
Statistics can be weighted, and several weights types are distinguished to apply appropriate
corrections where necessary.

## Mean, median and quantiles

```@docs
Statistics.mean
Statistics.mean!
Statistics.median
Statistics.median!
Statistics.middle
Statistics.quantile
Statistics.quantile!
```

## Moments

```@docs
Statistics.std
Statistics.stdm
Statistics.var
Statistics.varm
Statistics.skewness
Statistics.kurtosis
Statistics.moment
```

## Correlation and covariance

```@docs
Statistics.cor
Statistics.cov
Statistics.mean!
Statistics.mean
Statistics.median!
Statistics.median
Statistics.middle
Statistics.quantile!
Statistics.quantile
```

## Weights types

Four statistical weights types are provided which inherit from the `AbstractWeights` type:

- `Weights` is a generic type for arbitary weights. Using this type will trigger an error
with functions which rely on assumptions about a particular definition of weights.
- `AnalyticWeights` describe the relative importance for each observation.
These weights may also be referred to as reliability weights, precision weights
or inverse variance weights. These are typically used when the observations
are aggregate values (e.g. averages) with differing variances.
- `FrequencyWeights` describe the number of times (or frequency) each observation
was observed. These weights may also be referred to as case weights or repeat weights.
- `ProbabilityWeights` represent the inverse of the sampling probability
for each observation, providing a correction mechanism for under- or over-sampling
certain population groups. These weights may also be referred to as sampling weights.

The choice of weights impacts how bias is corrected in several methods.
See the [`var`](@ref), [`std`](@ref), [`cov`](@ref) and [`quantile`](@ref)
docstrings for more details.

Short-hand constructors `weights`, `aweights`, `fweights` and `pweights`
are provided for convenience.

!!! note
- The weight vector is a light-weight wrapper of the input vector.
The input vector is NOT copied during construction.
- The weight vector maintains the sum of weights, which is computed upon construction.
If the value of the sum is pre-computed, one can supply it as the second argument
to the constructor and save the time of computing the sum again.

```@docs
Statistics.AbstractWeights
Statistics.Weights
Statistics.AnalyticWeights
Statistics.FrequencyWeights
Statistics.ProbabilityWeights
Statistics.weights
Statistics.aweights
Statistics.fweights
Statistics.pweights
```

```@meta
Expand Down
Loading

0 comments on commit 4da493e

Please sign in to comment.