Skip to content

Commit

Permalink
Merge pull request #658 from brk00/docs_corrections
Browse files Browse the repository at this point in the history
Other docs added and corrections made to sphinxdocs
  • Loading branch information
johnmyleswhite committed Jul 25, 2014
2 parents f052f52 + 5d7e0e9 commit a845fbd
Show file tree
Hide file tree
Showing 14 changed files with 1,772 additions and 188 deletions.
781 changes: 781 additions & 0 deletions sphinxdoc/other/design_details.rst

Large diffs are not rendered by default.

589 changes: 589 additions & 0 deletions sphinxdoc/other/function_reference_guide.rst

Large diffs are not rendered by default.

214 changes: 214 additions & 0 deletions sphinxdoc/other/specification.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
********************
Formal Specification
********************

DataFrames Data Structures
==========================

* Type Definitions and Type Hierarchy
* Constructors
* Indexing (Refs / Assigns)
* Operators

* Unary Operators:

* ``+``, ``-``, ``!``, ``'``

* Elementary Unary Functions

* ``abs``, ...

* Binary Operators:

* Arithmetic Operators:

* Scalar Arithmetic: ``+``, ``-``, ``*``, ``/``,
* Array Arithmetic: ``+``, ``.+``, ``-``, ``.-``, ``.*``, ``./``, ``.^``

* Bit Operators: ``&``, ``|``,
* Comparison Operators:

* Scalar Comparisons: ``==``, ``!=``, ``<``, ``<=``, ``>``,
* Array Comparisons: ``.==``, ``.!=``, ``.<``, ``.<=``, ``.>``, ``.>=``

* Container Operations
* Broadcasting / Recycling
* Type Promotion and Conversion
* String Representations
* IO
* Copying
* Properties

* size
* length
* ndims
* eltype

* Predicates
* Handling NA's
* Iteration
* Miscellaneous

The NAtype
==========

Behavior under Unary Operators
------------------------------

The unary operators

Behavior under Unary Operators
------------------------------

The unary operators

Behavior under Arithmetic Operators
-----------------------------------

Constructors
============

* NA's

* Constructor: ``NAtype()``
* Const alias: ``NA``

* DataVector's

* From (Vector, BitVector): ``DataArray([1, 2, 3], falses(3))``
* From (Vector, Vector{Bool}): ``DataArray([1, 2, 3], [false, false, false])``
* From (Vector): ``DataArray([1, 2, 3])``
* From (BitVector, BitVector): ``DataArray(trues(3), falses(3))``
* From (BitVector): ``DataArray(trues(3))``
* From (Range1): ``DataArray(1:3)``
* From (DataVector): ``DataArray(DataArray([1, 2, 3]))``
* From (Type, Int): ``DataArray(Int, 3)``
* From (Int): ``DataArray(3)`` (Type defaults to Float64)
* From (): ``DataArray()`` (Type defaults to Float64, length defaults to 0)
* Initialized with Float64 zeros: ``datazeros(3)``
* Initialized with typed zeros: ``datazeros(Int, 3)``
* Initialized with Float64 ones: ``dataones(3)``
* Initialized with typed ones: ``dataones(Int, 3)``
* Initialized with falses: ``datafalses(3)``
* Initialized with trues: ``datatrues(3)``
* Literal syntax: ``DataVector[1, 2, NA]``

* PooledDataVector's

* From (Vector, BitVector): ``PooledDataArray([1, 2, 3], falses(3))``
* From (Vector, Vector{Bool}): ``PooledDataArray([1, 2, 3], [false, false, false])``
* From (Vector): ``PooledDataArray([1, 2, 3])``
* From (BitVector, BitVector): ``PooledDataArray(trues(3), falses(3))``
* From (BitVector, Vector{Bool}): ``PooledDataArray(trues(3), [false, false, false])``
* From (BitVector): ``PooledDataArray(trues(3))``
* From (Range1): ``PooledDataArray(1:3)``
* From (DataVector): ``PooledDataArray(DataArray([1, 2, 3]))``
* From (Type, Int): ``PooledDataArray(Int, 3)``
* From (Int): ``PooledDataArray(3)`` (Type defaults to Float64)
* From (): ``PooledDataArray()`` (Type defaults to Float64, length defaults to 0)
* Initialized with Float64 zeros: ``pdatazeros(3)``
* Initialized with typed zeros: ``pdatazeros(Int, 3)``
* Initialized with Float64 ones: ``pdataones(3)``
* Initialized with typed ones: ``pdataones(Int, 3)``
* Initialized with falses: ``pdatafalses(3)``
* Initialized with trues: ``pdatatrues(3)``
* Literal syntax: ``PooledDataVector[1, 2, NA]``

* DataMatrix

* From (Array, BitArray): ``DataMatrix([1 2; 3 4], falses(2, 2))``
* From (Array, Array{Bool}): ``DataMatrix([1 2; 3 4], [false false; false false])``
* From (Array): ``DataMatrix([1 2; 3 4])``
* From (BitArray, BitArray): ``DataMatrix(trues(2, 2), falses(2, 2))``
* From (BitArray): ``DataMatrix(trues(2, 2))``
* From (DataVector...): ``DataMatrix(DataVector[1, NA], DataVector[NA, 2])``
* From (Range1...): ``DataMatrix(1:3, 1:3)``
* From (DataMatrix): ``DataMatrix(DataArray([1 2; 3 4]))``
* From (Type, Int, Int): ``DataMatrix(Int, 2, 2)``
* From (Int, Int): ``DataMatrix(2, 2)`` (Type defaults to Float64)
* From (): ``DataMatrix()`` (Type defaults to Float64, length defaults to (0, 0))
* Initialized with Float64 zeros: ``dmzeros(2, 2)``
* Initialized with typed zeros: ``dmzeros(Int, 2, 2)``
* Initialized with Float64 ones: ``dmones(2, 2)``
* Initialized with typed ones: ``dmones(Int, 2, 2)``
* Initialized with falses: ``dmfalses(2, 2)``
* Initialized with trues: ``dmtrues(2, 2)``
* Initialized identity matrix: ``dmeye(2, 2)``
* Initialized identity matrix: ``dmeye(2)``
* Initialized diagonal matrix: ``dmdiagm([2, 1])``
* Literal syntax: ``DataMatrix[1 2; NA 2]``

* DataFrame

* From (): ``DataFrame()``
* From (Vector{Any}, Index): ``DataFrame({datazeros(3), dataones(3)}, Index(["A", "B"]))``
* From (Vector{Any}): ``DataFrame({datazeros(3), dataones(3)})``
* From (Expr): ``DataFrame(quote A = [1, 2, 3, 4] end)``
* From (Matrix, Vector{String}): ``DataFrame([1 2; 3 4], ["A", "B"])``
* From (Matrix): ``DataFrame([1 2; 3 4])``
* From (Tuple): ``DataFrame(dataones(2), datafalses(2))``
* From (Associative): ???
* From (Vector, Vector, Groupings): ???
* From (Dict of Vectors): ``DataFrame({"A" => [1, 3], "B" => [2, 4]})``
* From (Dict of Vectors, Vector{String}): ``DataFrame({"A" => [1, 3], "B" => [2, 4]}, ["A"])``
* From (Type, Int, Int): ``DataFrame(Int, 2, 2)``
* From (Int, Int): ``DataFrame(2, 2)``
* From (Vector{Types}, Vector{String}, Int): ``DataFrame({Int, Float64}, ["A", "B"], 2)``
* From (Vector{Types}, Int): ``DataFrame({Int, Float64}, 2)``

Indexing
========

Types on indices::

NA

dv = datazeros(10)

dv[1]

dv[1:2]

dv[:]

dv[[1, 2 3]]

dv[[false, false, true, false, false]]

dmzeros(10)

Indexers: Int, Range, Colon, Vector{Int}, Vector{Bool}, String, Vector{String}

DataVector's and PooledDataVector's implement:

* Int
* Range
* Colon
* Vector{Int}
* Vector{Bool}

DataMatrix's implement the Cartesian product:

* Int, Int
* Int, Range
* Int, Colon
* Int, Vector{Int}
* Int, Vector{Bool}...
* Vector{Bool}, Int
* Vector{Bool}, Range
* Vector{Bool}, Colon
* Vector{Bool}, Vector{Int}
* Vector{Bool}, Vector{Bool}

Single Int access?

DataFrame's add two new indexer types:

* String
* Vector{String}

These can only occur as (a) the only indexer or (b) in the second slot of a paired indexer

Anything that can be getindex()'d can also be setindex!()'d

Where do we allow Expr indexing?
21 changes: 10 additions & 11 deletions sphinxdoc/source/formulas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,36 +4,35 @@ The Formula, ModelFrame and ModelMatrix Types
In regression model, we often want to describe the relationship between a
response variable and one or more input variables in terms of main effects
and interactions. To facilitate the specification of a regression model in
terms of the columns of a DataFrame, the DataFrames package provides a
`Formula` type, which is created by the `~` binary operator in Julia::
terms of the columns of a ``DataFrame``, the DataFrames package provides a
``Formula`` type, which is created by the ``~`` binary operator in Julia::

fm = Z ~ X + Y

A `Formula` object can be used to transform a DataFrame into a ModelFrame object::
A ``Formula`` object can be used to transform a ``DataFrame`` into a ``ModelFrame`` object::

df = DataFrame(X = randn(10), Y = randn(10), Z = randn(10))
mf = ModelFrame(Z ~ X + Y, df)

A `ModelFrame` object is just a simple wrapper around a `DataFrame`. For
modeling purposes, one generally wants to construct a `ModelMatrix`, which
constructs a `Matrix{Float64}` that can be used directly to fit a
A ``ModelFrame`` object is just a simple wrapper around a ``DataFrame``. For
modeling purposes, one generally wants to construct a ``ModelMatrix``, which
constructs a ``Matrix{Float64}`` that can be used directly to fit a
statistical model::

mm = ModelMatrix(ModelFrame(Z ~ X + Y, df))

Note that `mm` contains an additional column consisting entirely of `1.0`
Note that ``mm`` contains an additional column consisting entirely of ``1.0``
values. This is used to fit an intercept term in a regression model.

In addition to specifying main effects, it is possible to specify interactions
using the `&` operator inside a `Formula`::
using the ``&`` operator inside a ``Formula``::

mm = ModelMatrix(ModelFrame(Z ~ X + Y + X&Y, df))

If you would like to specify both main effects and an interaction term at once,
use the `*` operator inside a `Formula`::
use the ``*`` operator inside a `Formula`::

mm = ModelMatrix(ModelFrame(Z ~ X*Y, df))

The construction of model matrices makes it easy to formulate complex
statistical models. These are used to good effect by the
[GLM package](https://github.com/JuliaStats/GLM.jl).
statistical models. These are used to good effect by the `GLM Package. <https://github.com/JuliaStats/GLM.jl>`_
Loading

0 comments on commit a845fbd

Please sign in to comment.