- search for solar axions, hypothetical pseudoscalar particle solving
the strong
$\mathcal{CP}$ problem - potential dark matter candidate
- coupling to transverse
$B$ fields, production in the Sun!
- exp. signal rates:
$≤ \num{0.1}$ $γ$ \si{\per\hour} - background rate:
$∼ \SI{0.1}{\per \s}$ - need very good background suppression
- type of multivariate analysis object providing highly non-linear, multidimensional representations of input data
- simplest type: feed-forward multilayer perceptron
Neuron output:
\[
y_k = \varphi ∑_{j = 0}^m w_{kj} x_j
\]
\( \varphi \): activation function, \( w_k \) weight vector
Training minimizes error function \[ E(\mathbf{x_1}, …, \mathbf{x_N} | \mathbf{w}) = ∑_{a=1}^N \frac{1}{2}\left(y_{\text{ANN},a} - \hat{y}_a\right)^2 \] using gradient descent \[ \mathbf{w}^{n+1} = \mathbf{w}^n - η ∇_{w} E \]
convolutional and pooling layers alternating:
where a convolutional layer is:
import numpy as np
from scipy.signal import convolve2d
A = np.identity(6)
B = np.array([[0,0,0],[0,5,0],[0,0,0]])
C = convolve2d(A, B, 'same')
print(C)
[[5. 0. 0. 0. 0. 0.] [0. 5. 0. 0. 0. 0.] [0. 0. 5. 0. 0. 0.] [0. 0. 0. 5. 0. 0.] [0. 0. 0. 0. 5. 0.] [0. 0. 0. 0. 0. 5.]]
import numpy as np
from scipy.signal import convolve2d
A = np.identity(6)
B = np.array([[1,0,1],[0,1,0],[1,0,1]])
C = convolve2d(A, B, 'same')
print(C)
[[2. 0. 1. 0. 0. 0.] [0. 3. 0. 1. 0. 0.] [1. 0. 3. 0. 1. 0.] [0. 1. 0. 3. 0. 1.] [0. 0. 1. 0. 3. 0.] [0. 0. 0. 1. 0. 2.]]
\tiny source: http://www.songho.ca/dsp/convolution/convolution2d_example.html
- MNIST: a dataset of \num{70000} handwritten digits, size normalized
to
$\num{28}×\num{28}$ pixels, centered- in the past used to benchmark image classification; nowadays fast
to achieve good accuracies
$\geq\SI{90}{\percent}$
- in the past used to benchmark image classification; nowadays fast
to achieve good accuracies
- network layout:
- input neurons:
$\num{28}×\num{28}$ neurons (note: as \num{1}D!) - 1 hidden layer: \num{1000} neurons
- output layer: \num{10} neurons (\num{1} for each digit)
- activation function: rectified liner unit (
ReLU
):
- input neurons:
\[ f(x) = max(0, x) \]
- Program 1: trains multilayer perceptron (MLP)
- written in Nim (
C
backend), using Arraymancer- linear algebra + neural network library
- trains on \num{60000} digits, performs validation on \num{10000} digits
- written in Nim (
- after every 10 batches (1 batch: 64 digits) send to program 2:
- random test digit
- predicted output
- current error
- Program 2 plots data live: written in Nim (
JS
backend), plots usingplotly.js
- CAST is a very low rate experiment!
- detectors should reach:
$f_{\text{Background}} ≤ \SI{e-6}{\per \keV \per \cm \squared \per \s}$ - signal / background ratio:
$\frac{f_{\text{Background}}}{f_{\text{Signal}}} > \num{e5}$ - need very good signal / background classification!
- \textcolor{gray}{CAST is a very low rate experiment!}
- \textcolor{gray}{detectors should reach:}
$f_{\text{Background}} ≤ \SI{e-6}{\per \keV \per \cm \squared \per \s}$ - \textcolor{gray}{signal / background ratio:}
$\frac{f_{\text{Background}}}{f_{\text{Signal}}} > \num{e5}$ - \textcolor{gray}{need very good signal / background classification!}
- events (as on previous slides) can be interpreted as images
- Convolutional Neural Networks extremely good at image classification
- visible from comparison of background to X-ray event that geometric shapes are very different
- utilize that to remove as much background as possible
- energy range: \SIrange{0}{10}{\kilo \electronvolt}
- split into 8 unequal bins of distinct event properties
- energy range: \SIrange{0}{10}{\kilo \electronvolt}
- split into 8 unequal bins of distinct event properties
- only based on properties of X-rays
- set cut on Likelihood distribution, s.t. \SI{80}{\percent} of X-rays are recovered
- now: use artificial neural network to classify events as X-ray or background
- calculate properties of event, use properties as input neurons
- use whole events (\(\num{256} × \num{256}\) pixels) as input layer
- reg. 1:
- small layout \( ⇒ \) fast to train
- potentially biased, not all information usable
- reg. 2:
- huge layout \( ⇒ \) only trainable on GPU
- all information available
- input size:
$\num{256}×\num{256}$ neurons - 3 convolutional and pooling layers alternating w/ 30, 70, 100
kernels using
$\num{15} × \num{15}$ filters - pooling layers perform
$\num{2}×\num{2}$ max pooling -
$tanh$ activation function - 1 fully connected feed-forward layer: (1800, 30) neurons
- logistic regression layer: \num{2} output neurons
- training w/ \num{12000} events per type on Nvidia GTX 1080
- training time: $∼
\SIrange[range-phrase={\text{
to
}}]{1}{10}{\hour}$
- I hope I could teach you something new / it was still interesting regardless :)
- if you’re interested: this talk and the code for the live demo can be found on my GitHub: https://github.com/vindaar/NeuralNetworkLiveDemo