Skip to content

Using generative AI for eddy-resolving multi-modal surface ocean state estimation

Notifications You must be signed in to change notification settings

smartin98/GenDA

Repository files navigation

GenDA

GenDA - Generative Data Assimilation.

⚠️ Work In Progress. ⚠️

Alt text

Experiments in generative neural data assimilation for multi-modal surface ocean state estimation. These experiments will be more thoroughly described in a pre-print which is in prep. along with a full code release.

The problem: Estimate the multi-modal dynamical state of the surface ocean (sea surface height, temperature, salinity, and surface currents) from sparse satellite observations of sea surface height and temperature and low-resolution objective analysis products for sea surface height, temperature, and salinity.

Alt text

The approach: Given high-resolution training data from eddy-resolving numerical simulations, train a generative model to produce realistic multi-modal surface snapshots from the model (e.g. sea surface height, temperature, salinity, & surface currents). Can we then use this generative model to estimate poorly-observed quantities (e.g. surface currents/salinity) from satellite observables (e.g. sea surface height and temperature)?

Motivations for a generative approach vs regression approach:

  1. Predicting single value with regression approach smooths out small-scale features, impacting higher-order dynamical diagnostics. Generative approach potentially allows to generate ensemble of high-resolution reconstructions each of which preserves the fine-scale features.
  2. Regression approach provides no robust way to transfer from training environment (simulation data) to real-world observations. Subtle differences between real observations at inference and simulated observations during training propagate through the network with no well-defined behaviour. Generative approach would ensure fields generated from observations 'look like' the simulated data - i.e. hopefully preserve the simulation's physics.

Alt text

The Method: Score-Based Data Assimilation (referred to here as 'generative data assimilation' or 'GenDA')

Step 1: Train unconditional diffusion model to produce realistic multi-modal samples. NB: this training is conducted on full model fields with no generation of simulated observations.
Step 2: Guide the generation from the trained model using sparse observations by taking gradient steps wrt the state estimate, x, while keeping the diffusion model parameters fixed to preserve the qualitative nature of the model outputs. (Method proposed by Rozet & Louppe 2023 and recently applied to atmospheric reanalysis by Manshausen et al.).

Training data: simulation data from the 1/12 degree global reanalysis product GLORYS 12 sub-setted in a region surrounding the Gulf Stream.

Experiments:

  1. Observing System Simulation Experiment (OSSE): estimate state from simulated satellite observations and compare to known 2D ground truth.
  2. Observing System Experiment (OSE): estimate state from real-world satellite observations and compare to some independent withheld observations.

Structure of the code:

  1. ./pre-processing contains code for preparing the desired target fields from publicly available datasets. For example, we subtract geostrophic currents and Ekman currents (derived using a linear regression model) from the surface current variable we seek to reconstruct.
  2. ./src contains utility code (e.g. dataloaders, neural network architecture for a baseline UNet regression approach)
  3. The GenDA diffusion model code is adapted from NVIDIA Modulus CorrDiff(installed from upstream repo on 07/21/2024, looks like they refactored the code since).
  4. ./conf contains hydra config files used for model training.
  5. ./sda contains the code for the score-based data assimilation method (i.e. observation-guided inference given a diffusion model trained using the CorrDiff code). This is a minor adaptation from the orginal implementation incorporating the modification described in Appendix B of Manshausen et al..
  6. ./training contains training scripts.
  7. ./inference contains inference scripts for both the OSE and OSSE.
  8. ./viz.ipynb visualizes the reconstructions.
  9. More quantitative evaluation metrics coming soon...

About

Using generative AI for eddy-resolving multi-modal surface ocean state estimation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages