A Gaussian Processes package for Julia.
This package is still in the early stages of development. If you have any suggestions to improve the package, or if you've noticed a bug, then please post an [issue] (https://github.com/STOR-i/GaussianProcesses.jl/issues/new) for us and we'll get to it as quickly as we can. Pull requests are also welcome.
Gaussian processes are a family of stochastic processes which provide a flexible nonparametric tool for modelling data. A Gaussian Process places a prior over functions, and can be described as an infinite dimensional generalisation of a multivariate Normal distribution. Moreover, the joint distribution of any finite collection of points is a multivariate Normal. This process can be fully characterised by its mean and covariance functions, where the mean of any point in the process is described by the mean function and the covariance between any two observations is specified by the kernel. Given a set of observed real-valued points over a space, the Gaussian Process is used to make inference on the values at the remaining points in the space.
For an extensive review of Gaussian Processes there is an excellent book [Gaussian Processes for Machine Learning] (http://www.gaussianprocess.org/gpml/chapters/RW.pdf) by Rasmussen and Williams, (2006).
GaussianProcesses requires Julia version 0.3 or above. To install GaussianProcesses run the following command inside a Julia session:
julia> Pkg.add("GaussianProcesses")
Most of the functionality of GaussianProcesses has been documented using the Docile package. From Julia version 0.4 this functionality will form part of the Julia base. To view the documentation the user must install the Lexicon package and load it.
julia> Pkg.add("Lexicon")
julia> using Lexicon
Documentation is accessible in the Julia REPL in help mode. Help mode can be started by typing '?' at the prompt.
julia> ?
help?> GP
[type]
GaussianProcesses.GP
Description
-–––––––––––-
Fits a Gaussian process to a set of training points. The Gaussian process is
defined in terms of its mean and covaiance (kernel) functions, which are
user defined. As a default it is assumed that the observations are noise
free.
Arguments:
-––––––––––-
• `x::Matrix{Float64}`: Training inputs
• `y::Vector{Float64}`: Observations
• `m::Mean` : Mean function
• `k::kernel` : Covariance function
• `logNoise::Float64` : Log of the observation noise. The default is
-1e8, which is equivalent to assuming no observation noise.
Returns:
-––––––––-
• `gp::GP` : A Gaussian process fitted to the training data
Details:
source: (16,"/home/jamie/.julia/v0.3/GaussianProcesses/src/GP.jl")
The first step in modelling with Gaussian Processes is to choose mean functions and kernels which describe the process. GaussianProcesses can be optionally used with a plotting package. Currently the packages [Gadfly] (https://github.com/dcjones/Gadfly.jl) and [Winston] (https://github.com/nolta/Winston.jl) are supported.
using Winston, GaussianProcesses
# Training data
n = 10
x = 2π * rand(n)
y = sin(x) + 0.05*randn(n)
# Select mean and covariance function
mZero = MeanZero() # Zero mean function
kern = SE(0.0,0.0) # Squared exponential kernel with parameters
# log(ℓ) = 0.0, log(σ) = 0.0
Note that the parameters of the kernel are given on the log-scale. This is true
for all strictly positive hyperparameters. Gaussian Processes are represented
by objects of type GP
and constructed from observation data, a mean function and kernel, and optionally the amount of observation noise.
logObsNoise = -1.0 # log standard deviation of observation noise (this is optional)
gp = GP(x,y,mZero,kern, logObsNoise) # Fit the GP
Dim = 1
Number of observations = 10
Mean function:
Type: MeanConst, Params: [0.0]
Kernel:
Type: SEIso, Params: [0.0,0.0]
Input observations =
1x10 Array{Float64,2}:
2.20808 3.62094 4.10836 2.76144 … 5.60978 4.22383 0.289366 5.61989
Output observations = [0.863383,-0.578097,-0.746813,0.339261,0.279411,0.35201,-0.565207,-0.947441,0.317193,-0.68688]
Variance of observation noise = 0.36787944117144233
Marginal Log-Likelihood = -9.056
Plotting is straightforward to apply, but the display will depend on the package loaded at the start of the session (e.g. Winston or Gadfly). It's possible to modify the confidence bands (CI) in the 1D plot, which are set to 95% by default.
plot(gp)
The hyperparameters are optimized using the Optim package. This offers users a range of optimization algorithms which can be applied to estimate the hyperparameters using type II maximum likelihood estimation. Gradients are available for all mean and kernel functions used in the package and therefore it is recommended that the user utilizes gradient based optimization techniques. As a default, the optimize!
function uses the bfgs
solver, however, alternative solvers can be applied (see 2D example below).
optimize!(gp) #Optimise the hyperparameters
plot(gp) #Plot the GP after the hyperparameters have been optimised
This is a simple 2-D regression example.
using Gadfly, GaussianProcesses
#Training data
d, n = 2, 50 # Dimension and number of observations
x = 2π * rand(d, n)
y = vec(sin(x[1,:]).*sin(x[2,:])) + 0.05*rand(n)
For problems of dimension>1 we can use isotropic (Iso) kernels or automatic relevance determination (ARD) kernels. These are implemented automatically by the user based on the choice of hyperparameters. For example, below we use the Matern 5/2 ARD kernel, if we wanted to use the Iso alternative then we would set the kernel as kern=Mat(5/2,0.0,0.0)
.
In this example we use a composite kernel represented as the sum of a Matern 5/2 ARD kernel and a Squared Exponential Iso kernel. This is easily implemented using the +
symbol, or in the case of a product kernel, using the *
symbol (i.e. kern = Mat(5/2,[0.0,0.0],0.0) \* SE(0.0,0.0)
).
#Choose mean and covariance function
mZero = MeanZero() # Zero mean function
kern = Mat(5/2,[0.0,0.0],0.0) + SE(0.0,0.0) # Sum kernel with Matern 5/2 ARD kernel
# with parameters [log(ℓ₁), log(ℓ₂)] = [0,0] and log(σ) = 0
# and Squared Exponential Iso kernel with
# parameters log(ℓ) = 0 and log(σ) = 0
Fit the Gaussian process to the data using the prespecfied mean and covariance functions.
logObsNoise = -2.0 # log standard deviation of observation noise (this is optional)
gp = GP(x,y,mZero,kern,logObsNoise) # Fit the GP
Using the Optim package we have the option to choose from a range of optimize functions including conjugate gradients. It is also possible to fix the hyperparameters in either the mean, kernel or observation noise, by settting them to false in optimize!
(e.g. optimize!(...,mean=false)
).
optimize!(gp,method=:cg) # Optimize the hyperparameters
Results of Optimization Algorithm
* Algorithm: Conjugate Gradient
* Starting Point: [-2.0,0.0,0.0,0.0,0.0,0.0,0.0]
* Minimum: [-2.4508978539051807,-0.47461801552770505,0.43590368883158864,0.3401816678461438,-0.768005238815202,0.0788474114802276,-1.2824416535170349]
* Value of Function at Minimum: 15.307393
* Iterations: 3
* Convergence: true
* |x - x'| < 1.0e-32: false
* |f(x) - f(x')| / |f(x)| < 1.0e-08: true
* |g(x)| < 1.0e-08: false
* Exceeded Maximum Number of Iterations: false
* Objective Function Calls: 83
* Gradient Call: 80
Notice that the syntax for plotting the GP is the same for Gadfly as for Winston.
plot(gp; clim=(-10.0, 10.0,-10.0,10.0)) # Plot the GP over range clim
This package also supports the ScikitLearn interface. ScikitLearn provides many tools for machine learning such as hyperparameter tuning and cross-validation. See here for an example of its usage with this package.