You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In examples/weather/dataset_download/start_mirror.py the global_means and global_stds files (used later for normalization) are computed on the entire dataset and not only on the training set.
Current implementation
if cfg.compute_mean_std:
stats_path = os.path.join(cfg.hdf5_store_path, "stats")
print(f"Saving global mean and std at {stats_path}")
if not os.path.exists(stats_path):
os.makedirs(stats_path)
era5_mean = np.array(
era5_xarray.mean(dim=("time", "latitude", "longitude")).values
)
np.save(
os.path.join(stats_path, "global_means.npy"), era5_mean.reshape(1, -1, 1, 1)
)
era5_std = np.array(
era5_xarray.std(dim=("time", "latitude", "longitude")).values
)
np.save(
os.path.join(stats_path, "global_stds.npy"), era5_std.reshape(1, -1, 1, 1)
)
print(f"Finished saving global mean and std at {stats_path}")
Proposed modification
if cfg.compute_mean_std:
# Compute stats only on training data
train_era5_xarray = era5_xarray.sel(
time=era5_xarray.time.dt.year.isin(train_years)
)
stats_path = os.path.join(cfg.hdf5_store_path, "stats")
print(f"Saving global mean and std at {stats_path}")
if not os.path.exists(stats_path):
os.makedirs(stats_path)
era5_mean = np.array(
train_era5_xarray.mean(dim=("time", "latitude", "longitude")).values
)
np.save(
os.path.join(stats_path, "global_means.npy"), era5_mean.reshape(1, -1, 1, 1)
)
era5_std = np.array(
train_era5_xarray.std(dim=("time", "latitude", "longitude")).values
)
np.save(
os.path.join(stats_path, "global_stds.npy"), era5_std.reshape(1, -1, 1, 1)
)
print(f"Finished saving global mean and std at {stats_path}")
Minimum reproducible example
No response
Relevant log output
No response
Environment details
Modulus Docker container version 24.04
The text was updated successfully, but these errors were encountered:
This seems fine to me @albertocarpentieri if you want to submit a PR. Ill mention that with the unified recipe for training global weather models we compute the statistics on the fly with a moving average.
Version
0.6.0
On which installation method(s) does this occur?
Docker
Describe the issue
In examples/weather/dataset_download/start_mirror.py the global_means and global_stds files (used later for normalization) are computed on the entire dataset and not only on the training set.
Current implementation
Proposed modification
Minimum reproducible example
No response
Relevant log output
No response
Environment details
The text was updated successfully, but these errors were encountered: