Implementation of Regression Splines from scratch to predict e cosmic microwave background (CMB) angular power spectrum. 1-st Homework for the course of "Statistical learning" at La Sapienza University of Rome (Kaggle competition link)
We are talking about a snapshot of our universe in its infancy, something like 379,000 years after the Big Bang, nothing compared to the estimated age of our universe.
The map next was taken by the Wilkinson Microwave Anisotropy Probe (WMAP) and shows differences across the sky in the temperature of the cosmic microwave background (CMB), the radiant heat remaining from the Big Bang. The average temperature is 2.73 degrees above absolute zero but the temperature is not constant across the sky. The fluctuations in the temperature map provide information about the early universe. Indeed, as the universe expanded, there was a tug of war between the force of expansion and contraction due to gravity. This caused acoustic waves in the hot gas, which is why there are temperature fluctuations. The strength of the temperature fluctuations f(x) at each frequency (or multipole) x is called the power spectrum and this power spectrum can be used by cosmologists to answer cosmological questions. For example, the relative abundance of different constituents of the universe (such as baryons and dark matter) corresponds to peaks in the power spectrum. the temperature map can be reduced to the following scatterplot of power versus frequency:
In a nutshell:
- The dataset in training consists of 675 CMB angular power spectrum observations estimated from the latest WMAP data release.
- The goal is to predict the angular power spectrum at 224 additional frequencies.
- RMSE is the adopted metric.
Any d-th-order spline
-
$f(\cdot)$ is some polynomial of degree$d$ on each of the intervals:$(-\infty, \xi_1], [\xi_1, \xi_2], [\xi_2, \xi_3], \ldots, [\xi_q, +\infty)$ ; - its j-th derivative
$f^{(j)}(\cdot)$ is continuous at$\xi_1, \ldots, \xi_q$ for each$j \in {0, 1, \ldots, d - 1}$ .
Given a set of points
-
Start from truncated power functions
$G_{d,q} = { g_1(x), \ldots, g_{d+1}(x), g_{(d+1)+1}(x), \ldots, g_{(d+1)+q}(x) }$ , defined as:${ g_1(x) = 1, g_2(x) = x, \ldots, g_{d+1}(x) = x^d }$ ,${ g_{(d+1)+j}(x) = (x - \xi_j)_{+}^d }$ for$j = 1$ to$q$ , where$(x) _{+} = max(0, x)$ . -
Then, if
$f(\cdot)$ is a d-th-order spline with knots${\xi_1, \ldots, \xi_q}$ , you can show it can be obtained as a linear combination over$G_{d,q}$ :$f(x) = \sum_{j=1}^{d+1+q} \beta_j g_j(x)$ , for some set of coefficients$\beta = [\beta_1, \ldots, \beta_{d+1}, \beta_{(d+1)+1}, \ldots, \beta_{(d+1)+q}]^T$
Considering the knots as positioned on q-equispaced locations, we proceed with different Cross Validation techniques to tune the hyperparameters (knots, maximum degree of the truncated power functions, etc.) such as: Grid Search CV, Vanilla CV, and the Nested CV from the Bates et al. article. We used Repeated CV to find the best degree and number of knots.
Degree | Knots |
---|---|
Then we implemented an Elastic Net regularization and tuned the related hyperparameters always with the CV.
Shrinkage type | Shrinkage weight |
---|---|
Finally we can fit the obtained splines to our WMAP data.
Due to the Heteroschedaticity of our train data our predictions may be affected by the big noise of the training data in the final part of the shape. We use the Box-Cox transformation.
Then we fit another time the splines and obtain our final results!