-
Notifications
You must be signed in to change notification settings - Fork 0
ML Training
Tyler M edited this page May 12, 2024
·
35 revisions
- We train in github actions
- We have a somewhat small dataset so we do not use CUDA + cuDNN (train on the CPU)
- we train a single LSTM
- model uses multi-step (multiple time steps ahead) and multivariate (multiple features) time series forecasting
- Historical Daily Flow data (Min + Max in CFS)
- day of year + year
- Site Monitoring ID
- Snow Water Equivalent (SWE) data
- temperature data (min + max)
Station ID | Basin ID | Year (Static) | Day of Year (Static) | Current SWE % | Year Max SWE % | Flow Day 0 (Min) | Flow Day 0 (Max) | Temp Day 0 (Min) | Temp Day 0 (Max) |
---|---|---|---|---|---|---|---|---|---|
ST001 | B1 | 2023 | 45 | 45% | 80% | 140 cfs | 160 cfs | 66.2°F | 68.0°F |
ST002 | B1 | 2020 | 120 | 50% | 85% | 150 cfs | 170 cfs | 69.8°F | 71.6°F |
ST003 | B2 | 2024 | 300 | 55% | 90% | 160 cfs | 180 cfs | 73.4°F | 75.2°F |
- Flow Lags
- Future temperature windows
- Sin/Cos Day of the Year (for seasonal patterns)
- Calc handles leap years
- Drop Columns: Static data (year, day of year)
Flow Lag -1 Day | Flow Lag -3 Day | Flow Lag -7 Day | Day of Year (Sin) | Day of Year (Cos) | Temp Day +1 (Min) | Temp Day +1 (Max) | ... | Temp Day +14 (Min) | Temp Day +14 (Max) |
---|---|---|---|---|---|---|---|---|---|
+140 cfs | -130 cfs | +120 cfs | 0.6995 | 0.7147 | 66.2°F | 69.8°F | ... | 69.8°F | 73.4°F |
+150 cfs | +140 cfs | +130 cfs | 0.8827 | -0.4700 | 68.0°F | 71.6°F | ... | 71.6°F | 75.2°F |
-160 cfs | -150 cfs | -140 cfs | -0.9057 | 0.4239 | 71.6°F | 75.2°F | ... | 75.2°F | 78.8°F |
- Historical Flow lag trend (1, 3, and 7 day)
- Current Flow
- Site ID (Site specific characteristics)
- current + seasonal maximum SWE for the sub-watershed (HUC8; geolocation/region characteristics)
- Day of year (seasonal)
- current daily temp (min/max)
- Future Temperature Forecast for 14+ days (min/max)
- hidden layers: 2
- dense layer units: 5-10
- Dropout: ~20%
- Dropout layer: on every layer
- Weight Initialization: Glorot uniform initialization
- weight decay: 0.97
- activation functions: Recurrent Activation: tanh, Gate Activations: sigmoid Output layer : linear
- Learning rate: 0.00001-0.1
- momentum: 0.5-0.9
- Epochs: employ the early stopping method to find number
- batch size: 32-512
- Output units: 28 (14+ days of Flow predictions (min + max values)
- Regularization: consider L1 regularization ONLY if/when overfitting is an issue
- Adaptive learning rate: Adam
- Feature importance: SHAP's DeepExplainer (IMDB Sentiment Classification)
- produce validation curves (train vs validation):
- plot the loss over time
- plot accuracy over time
- visualize the predictions made by the model (actual vs predicted)
- feature importance with SHAP summary plot
- Using a hierarchical structure we embed the broader watershed data first, then specialize with individual Station ID embedding.
- Basin Fallback Mechanism: In training, we simulate unseen Station IDs by holding out some stations from batches. This forces model to rely on Watershed/Basin input only. We record trained stations in stations.txt. Then in inference, we implement this fallback by attempting to look up a given station in stations.txt
- 'mask_zero=True' for the station_embedding layer, which helps omit the station embedding gracefully
-
Understanding LSTM Networks
- RNNs that avoid the Long-term Dependency problem
- "The long term dependency problem is that, when you have larger network through time, the gradient decays quickly during back propagation. So training a RNN having long unfolding in time becomes impossible. But LSTM avoids this decay of gradient problem by allowing you to make a super highway (cell states) through time, these highways allow the gradient to freely flow backward in time."
- RNNs that avoid the Long-term Dependency problem
- Groundwater and/or soil moisture measurement data (predominate for Winter flow forecasting)
- Precipitation data (historical + forecasted; more predominate in North West and East Coast regions, not Colorado)
- Dam release data (numeric outflow or a binary value; requires dissecting upstream dams from a Station then a look up)