NASA SpaceAppsChallenge: Develop the Oracle of DSCOVR
- python >= 3.10
- npm >= 9.2.0
- Create virtual environment:
python3 -m venv ./venv
-
Activate the environment
- for linux:
source ./venv/bin/activate
- for windows:
.\venv\activate
-
Install the dependencies:
- python:
pip install -r requirements.txt
- npm (open the second terminal):
cd ./client && npm i
-
Download data and place it inside
/data
directory (create it beforehand) in the project root. -
Download model and place it inside
/models
directory (create it beforehand) in the project root. -
Run server:
python run.py
- Run client in the 2nd terminal:
npm start
- To train: specify data in
/nn/conf/config.yaml
and run:
python ./nn/train.py
- To validate: specify data in
/nn/main.py
and run:
python ./nn/main.py
This project encorporates:
/app
- Backend/nn
- Deep Learning model
Backend is written in Flask and the Deep Learning Model that we called DSCOVR(Y)
was developed in PyTorch.
Our Deep Learning model had been trained on data from 2 datasets that we merged together:
- raw data from the satellite - German Research Center for Geosciences
- planetary k-index - NASA
Short names for datasets:
RDS_D
- raw data from the satellite datasetKP_D
- planetary k-index dataset
RDS_D
has data for each minute of a year, whereas KP_D
has data for each 3 hour period. Moreover KP_D
data entry is not for singe discrete hour, it is a range of 1 hour and 50 minutes.
This gives as:
RDS_D
- data for each minuteKP_D
- aggregated? data for 1 hour and 50 minutes of each 3 hour period (1 hour and 10 minutes thereby is a dark spot)
Basically, we have checked each RDS_D
entry if it's in range of time (1 hour and 50 minutes) in any KP_D
period of observations.
We have used 2 scripts to clean the data, they are located in the /scripts
directory.
kp_dataset_clean.py
used to clean the Kp indices data that contained a lot of unrelevant data to the problem in the first place. It also produces well structured file with the delimeters, makeing it possible forpandas
to open it.datasets_merge.py
iterates over the satellite raw data, for each entry it then iterates over the Kp indices data and if the entry is in range of time measurement of any Kp index, then it adds the Kp index column to the entry.
DSCIVR(Y) model has 53 inputs and 1 output. It has 1,175,681 parameters.
The architecture:
- linear (53, 128)
- batch_norm (128)
- linear1 (128, 256)
- batch_norm (256)
- linear2 (256, 256)
- batch_norm (256)
- lstm (256, 256, 2 layers)
- linear3 (256, 64)
- linear4 (64, 1)
Plantary k-index data is subjected to CC BY 4.0
license.
The license of our project is Apache 2.0
.