Skip to content

Statistics for high-dimensional data (homogeneity, sphericity, independence, spherical uniformity)

License

Notifications You must be signed in to change notification settings

brian-lau/highdim

Repository files navigation

highdim

A Matlab library for statistical testing of high-dimensional data, including one and two-sample tests for homogeneity, uniformity, sphericity and independence. Of note are implementations of some modern tests appropriate for data where dimensionality grows with samples size, possibly exceeding the number of samples.

Installation

Download highdim and add the resulting folder to your Matlab path. Folders prefixed by a + are packages that should not be explicitly added to your path, although their parent folder should be.

The Statistics toolbox is required.

Examples

The various tests are most easily accessed through three interfaces: DepTest1, DepTest2 and UniSphereTest for one-sample tests, two-sample tests and one-sample tests on the sphere, respectively.

Detailed simulations of size, power and comparisons between tests are available in the wiki. The examples below give an idea of what's available.

Multivariate (In)dependence, Sphericity and Homogeneity

% Independent, but non-spherical data
sigma = diag([ones(1,25),0.5*ones(1,5)]);
x = (sigma*randn(50,30)')';

% Independence tests (Han & Liu, 2014)
DepTest1(x,'test','spearman') 
DepTest1(x,'test','kendall') 

% Sphericity tests (Ledoit & Wolf, 2002; Wang & Yao, 2013; Zou et al., 2014)
DepTest1(x,'test','john')
DepTest1(x,'test','wang')
DepTest1(x,'test','sign')
DepTest1(x,'test','bcs')
% Non-indepedent data, with ~0 correlation, from the same distribution
x = rand(200,1); y = rand(200,1);
xx = 0.5*(x+y)-0.5; yy = 0.5*(x-y);
corr(xx,yy)

% Two-sample Independence tests (Gretton et al, 2008; Szekely & Rizzo, 2013)
DepTest2(xx,yy,'test','dcorr') % Distance correlation t-test
DepTest2(xx,yy,'test','hsic') % Hilbert Schmidt Independence Criterion

% Do the samples come from the same distribution? (Gretton et al, 2012; Szekely et al. 2007)
DepTest2(xx,yy,'test','mmd') % Maximum mean discrepancy
DepTest2(xx,yy,'test','energy') % statistical energy
% Independent data, different distributions
x = randn(200,1); y = rand(200,1);

% Two-sample Independence tests
DepTest2(x,y,'test','dcorr')
DepTest2(x,y,'test','hsic')

% Do the samples come from the same distribution?
DepTest2(x,y,'test','mmd')
DepTest2(x,y,'test','energy')

Differences in multivariate means and covariances

% Two high-dimensional samples with sparse difference in covariance matrix (4 entries)
p = 50; n = 100;
for ii = 1:p
   for jj = 1:p
      sigma(ii,jj) = 0.5^abs(ii-jj);
   end
end
D = diag(unifrnd(0.5,2.5,p,1));
S = D^.5*sigma*D^.5; U = zeros(p,p);
[~,~,k] = utils.tri2sqind(p);
r = randperm(numel(k));
U(k(r(1:4))) = unifrnd(0,4,4,1)*max(diag(S));
U = U + U';
[~,da] = eig(S); [~,db] = eig(S+U);
d = abs(min([diag(da);diag(db)])) + 0.05;

x = mvnrnd(zeros(1,p),S+d*eye(p),n);
y = mvnrnd(zeros(1,p),S+U+d*(eye(p)),n);

DepTest2(x,y,'test','covdiff')

% Directly calling the test returns M, a matrix indicating where covariance 
% elements are significantly different (FWER controlled at alpha)
[pval,stat,M] = diff.covtest(x,y);

Uniformity on hypersphere

% Non-uniform samples, antipodally distributed on the sphere
sigma = diag([1 5 1]);
x = (sigma*randn(50,3)')';

% Is projection onto unit hypersphere uniformly distributed?
UniSphereTest(x,'test','rayleigh') % Rayleigh test fails since resultant is zero
UniSphereTest(x,'test','gine-ajne') % Weighted Gine-Ajne
UniSphereTest(x,'test','randproj') % random projection
UniSphereTest(x,'test','bingham') % Bingham

Contributions

Copyright (c) 2017 Brian Lau brian.lau@upmc.fr, see LICENSE

Please feel free to fork and contribute!