Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connect to myint cmems reanalyis dataset also #861

Closed
veenstrajelmer opened this issue May 22, 2024 · 0 comments · Fixed by #883
Closed

connect to myint cmems reanalyis dataset also #861

veenstrajelmer opened this issue May 22, 2024 · 0 comments · Fixed by #883

Comments

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented May 22, 2024

CMEMS data is split between reanalsysis and forecast periods. dfm_tools can only download from one of them, so a request like
dfmt.download.copernicusmarine_get_product(date_min=pd.Timestamp("2020-01-01"), date_max=pd.Timestamp("2023-12-31")), currently raises "ValueError: Requested timerange (2020-01-01 00:00:00 to 2023-12-31 00:00:00) is not fully within timerange of reanalysis product (1993-01-01 00:00:00 to 2021-06-30 00:00:00) or forecast product (2020-11-01 00:00:00 to 2024-05-31 00:00:00)."

This seems valid, however there are two reanalysis datasets available. When time ranges per my/anfc dataset with this code:

import copernicusmarine
import logging
logging.getLogger("copernicus_marine_root_logger").setLevel(level="CRITICAL")

dataset_id_list = ["cmems_mod_glo_phy_my_0.083deg_P1D-m",
                    "cmems_mod_glo_phy_myint_0.083deg_P1D-m",
                    "cmems_mod_glo_phy_anfc_0.083deg_P1D-m"]

for dataset_id in dataset_id_list:
    ds = copernicusmarine.open_dataset(dataset_id=dataset_id)
    print(f"time range {dataset_id}:", ds.time.to_series().index[0], "to", ds.time.to_series().index[-1])

We see that the myint dataset that has data in the period after the regular my but including a very recent period:

time range cmems_mod_glo_phy_my_0.083deg_P1D-m: 1993-01-01 00:00:00 to 2021-06-30 00:00:00
time range cmems_mod_glo_phy_myint_0.083deg_P1D-m: 2021-07-01 00:00:00 to 2024-02-27 00:00:00
time range cmems_mod_glo_phy_anfc_0.083deg_P1D-m: 2020-11-01 00:00:00 to 2024-05-31 00:00:00

This would be very valuable to include in the data retrieval.

The user manual does not provide many differences between my and my-int, but does state that this feature was added at 2023-06-16 in version 1.5 of the product/dataset. According to copernicusmarine support: "The dataset cmems_mod_glo_phy_myint_0.083deg_P1D-m is an interim dataset. Interim datasets bridge the time series continuity between NRT and associated MY products. You will find more information in this article"

Also, note that the time ranges for bio and phy are different, so make this distinction in dfmt.download.copernicusmarine_get_product()

An overview of alldatasets:

import copernicusmarine
import logging
logging.getLogger("copernicus_marine_root_logger").setLevel(level="CRITICAL")

dataset_id_list = ["cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m", # 2020-11-01 onwards
                   "cmems_mod_glo_phy-so_anfc_0.083deg_P1D-m", # 2020-11-01 onwards
                   "cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m", # 2020-11-01 onwards
                   "cmems_mod_glo_phy_anfc_0.083deg_P1D-m", # 2020-11-01 onwards
                   "cmems_mod_glo_phy_my_0.083deg_P1D-m", # my
                   "cmems_mod_glo_phy_myint_0.083deg_P1D-m", # my interim
                   "cmems_mod_glo_bgc-bio_anfc_0.25deg_P1D-m", # 2021-11-01 onwards
                   "cmems_mod_glo_bgc-car_anfc_0.25deg_P1D-m", # 2021-11-01 onwards
                   "cmems_mod_glo_bgc-co2_anfc_0.25deg_P1D-m", # 2021-11-01 onwards
                   "cmems_mod_glo_bgc-nut_anfc_0.25deg_P1D-m", # 2021-11-01 onwards
                   "cmems_mod_glo_bgc-pft_anfc_0.25deg_P1D-m", # 2021-11-01 onwards
                   "cmems_mod_glo_bgc_my_0.25_P1D-m", #my
                   ]

for dataset_id in dataset_id_list:
    ds = copernicusmarine.open_dataset(dataset_id=dataset_id)
    print(f"time range {dataset_id}:")
    print(ds.time.to_series().index[0], "to", ds.time.to_series().index[-1])

Gives:

time range cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m:
2020-11-01 00:00:00 to 2024-05-31 00:00:00
time range cmems_mod_glo_phy-so_anfc_0.083deg_P1D-m:
2020-11-01 00:00:00 to 2024-05-31 00:00:00
time range cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m:
2020-11-01 00:00:00 to 2024-05-31 00:00:00
time range cmems_mod_glo_phy_anfc_0.083deg_P1D-m:
2020-11-01 00:00:00 to 2024-05-31 00:00:00
time range cmems_mod_glo_phy_my_0.083deg_P1D-m:
1993-01-01 00:00:00 to 2021-06-30 00:00:00
time range cmems_mod_glo_phy_myint_0.083deg_P1D-m:
2021-07-01 00:00:00 to 2024-02-27 00:00:00
time range cmems_mod_glo_bgc-bio_anfc_0.25deg_P1D-m:
2021-11-01 00:00:00 to 2024-05-31 00:00:00
time range cmems_mod_glo_bgc-car_anfc_0.25deg_P1D-m:
2021-11-01 00:00:00 to 2024-05-31 00:00:00
time range cmems_mod_glo_bgc-co2_anfc_0.25deg_P1D-m:
2021-11-01 00:00:00 to 2024-05-31 00:00:00
time range cmems_mod_glo_bgc-nut_anfc_0.25deg_P1D-m:
2021-11-01 00:00:00 to 2024-05-31 00:00:00
time range cmems_mod_glo_bgc-pft_anfc_0.25deg_P1D-m:
2021-11-01 00:00:00 to 2024-05-31 00:00:00
time range cmems_mod_glo_bgc_my_0.25_P1D-m:
1993-01-01 00:00:00 to 2022-12-31 00:00:00

Update 27-6-2024: there was 1.5 years of forecast data removed, so there is now a 11 month gap between my and anfc: "ValueError: Requested timerange (2020-12-17 00:00:00 to 2021-07-05 00:00:00) is not fully within timerange of reanalysis product (1993-01-01 00:00:00 to 2021-06-30 00:00:00) or forecast product (2022-06-01 00:00:00 to 2024-06-29 00:00:00)."

Update 4-7-2024: bcg datasets were renamed recently (#879) and there is also a myint version available for that one.

Reproducible code:

import dfm_tools as dfmt
dfmt.download_CMEMS(varkey='zos', 
                    longitude_min=-1, longitude_max=1,
                    latitude_min=52, latitude_max=53, 
                    date_min="2021-06-29", date_max="2021-07-02")

Raises: "ValueError: Requested timerange (2021-06-29 00:00:00 to 2021-07-02 00:00:00) is not fully within timerange of reanalysis product (1993-01-01 00:00:00 to 2021-06-30 00:00:00) or forecast product (2022-06-01 00:00:00 to 2024-07-16 00:00:00)."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant