-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DATA REQUEST] Add COSIMA Panantarctic / GFDL_OM4 Builder & Data #175
Comments
Following on from COSIMA/cosima-recipes#369 , I am suggesting maybe adding OM4_025.JRA_RYF to the intake catalog. @dougiesquire - As this is a different model configuration, I guess this would require a new datastore "builder", so maybe its not worth the effort? The runs are used in cosima recipes to show examples of handling MOM6 data. @adele-morrison - Are their companion runs to OM4_025.JRA_RYF which also should be added? Can you help with the "Description of the data product" and "Location of the data product on Gadi" sections, and then I will edit the original post please? |
I tried using the access-om3 builder, and got these errors when using builder.parser:
|
Ah, yet another permutation of file naming. It might be safest just to write a dedicated builder, which is straightforward. I guess it would be an Is this output structured in a similar way to the regional MOM6 output? If so, it may be worth thinking about writing a builder that handles both? |
Apologies for being slow. Yes, lets add the panan experiments to Intake. We'd still to like delete a bunch of the daily data for the 1/20th panan, is that ok to do after it's added to Intake? After that frees up space on ol01 ideally I'd also like to move the 1/10th panan from ik11 to ol01. But the current locations are as follows: |
I think if we know this is going to happen then it would be better to wait until it is done. We can get a Builder set up and ready to go though. |
@anton-seaice could you please add the precise location(s) of the data on Gadi? |
@adele-morrison is more on top of it than I am ? Noting the comments above about possibly moving it.
|
Yes that’s the right location. Would be great to get this in the catalog so we can keep switching all the COSIMA recipes over. What do you need in terms of documentation? |
I don’t think there’s any plans to move OM4_025.JRA_RYF. The panan data location is still in flux. I will try to keep that moving forward. |
OK, I'll start taking a look at the current data structure and builders to see what needs to happen to get these data ingested. Stay tuned... |
The filenames all look pretty coherent, but there's a couple of things I haven't been able to work out on my own:
|
It contains fields that do not change in frequency, such as grid-related data. It is saved once per run.
contains annually-averaged 2d fields
contains annually-averaged 0d fields |
I think we want all of those files - there is a frequency = 'fx' for the static files which exists in OM2 and OM3 datastores (and maybe others) |
Ah yes, I've found the |
Dumping this here so I can find it later (for building workable test data): https://stackoverflow.com/questions/15141563/python-netcdf-making-a-copy-of-all-variables-and-attributes-but-one |
I now have what I think is a functional |
@marc-white, we definitely don't want to call this I'd suggest seeing if the data mentioned in this comment can use the same builder. If so, then we could possibly call the builder |
I've updated the Builder to be able to read the filenames found in these directories. However, I've come across an interesting conundrum whilst trying to test the resulting catalog; the data in those three directories are, when ingested in to the catalog, pretty much identical, to the point where I can't figure out how to, say, get only the data from For the uninitiated like myself, what is the difference between these three runs, and how can I differentiate between them in an |
@marc-white, each of the experiments should be separate intake-esm datastores within the catalog. |
HI @anton-seaice and @adele-morrison , I'm now at the point where I'm ready to try an all-up ingest of the data. However, the |
I've updated the We're not quite ready to add the panan simulations ending in But we could add |
@AndyHoggANU any chance you want to create the |
I've confirmed with @AndyHoggANU and |
I'll escalate this to the NCI help desk to see if they have any ideas of what's going on. |
@rbeucher turns out the issue was the |
We're getting there! Just one more request of @julia-neme and @AndyHoggANU - please ensure your experiment model:
- MOM6
- SIS2 (I know that model isn't listed as required, but it turns out it is if the data doesn't contain this information - see #223 ) Also, could I please confirm that this list of experiments that we want to ingest is the correct one (I know there's a lot of back-and-forth above about this):
|
@marc-white confirming that your list of 3 experiments above to ingest is correct. Thanks! |
OK, I added the |
@julia-neme have you been able to make this update for the zstar data? |
Yes, done now. Sorry I was on leave! |
@charles-turner-1 I'm having some troubles on the branch for this issue with the tests (see here, specifically Each of the new MOM6 test files is missing a coordinate(s) when doing the |
Interesting - I don't remember this being an issue in I think that the I think the call to |
@charles-turner-1 having stepped through the code line-by-line, I can see where, e.g., ds = ds[variables] Before this line,
And then, after that line:
I'm presuming the
The question here is, should these |
Okay, couple of things I can think of here.
I think that in the context of the test, I think we might want to add additional tests which don't apply this logic? |
|
Sorry, could have been clearer: I'd be curious to see whether 1201 xr_ds = xr.open_dataset(file, **xarray_open_kwargs)
1202
1203 * scalar_variables = [v for v in xr_ds.data_vars if len(xr_ds[v].dims) == 0]
1204 * xr_ds = xr_ds.set_coords(scalar_variables)
1205
1206 xr_ds = xr_ds[expected.variable] |
I've already removed my stack-tracing so I can have a go at de-scalaring the scalar variables, but I didn't think so? From memory, the reason I dived down into the |
I think you might have this backwards -
Interestingly,
This is the case both before and after your starred lines. |
Yeah, I think this is what I would expect - am I correctly understanding that the subsequent operations being applied to
I think xarray stores scalars internally as single element arrays, & then handles scalars by checking for the dimensions that these variables depend on? I figure this is to make the internal data structures consistent - eg. this issue.
I don't think the operations in these lines would do anything to the I think I've misunderstood the source of the error: are
If I'm understanding correctly now and it's case 2., what have you requested as variables? eg. in your test parametrisation, what is in _AccessNCFileInfo(
path = None, # type: ignore,
...
variable = VARS,
... |
@charles-turner-1 it's case #2:
The test parametrization is as follows (I worked this out by adding what was required to make the first half of the test pass): (
builders.Mom6Builder,
"mom6/output000/19000101.ice_daily.nc",
_AccessNCFileInfo(
path=None, # type: ignore
filename="19000101.ice_daily.nc",
file_id="XXXXXXXX_ice_daily",
filename_timestamp="19000101",
frequency="subhr",
start_date="1900-01-01, 00:00:00",
end_date="1900-01-01, 00:00:00",
variable=[
"xT",
"xTe",
"yT",
"yTe",
"time",
"nv",
"siconc",
"sithick",
"average_T1",
"average_T2",
"average_DT",
"time_bnds",
],
variable_long_name=[
"T point nominal longitude",
"T-cell edge nominal longitude",
"T point nominal latitude",
"T-cell edge nominal latitude",
"time",
"vertex number",
"ice concentration",
"ice thickness",
"Start time for average period",
"End time for average period",
"Length of average period",
"time axis boundaries",
],
variable_standard_name=[
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
"",
],
variable_cell_methods=[
"",
"",
"",
"",
"",
"",
"time: mean",
"time: mean",
"",
"",
"",
"",
],
variable_units=[
"degrees_E",
"degrees_E",
"degrees_N",
"degrees_N",
"days since 1900-01-01 00:00:00",
"",
"0-1",
"m-ice",
"days since 1900-01-01 00:00:00",
"days since 1900-01-01 00:00:00",
"days",
"days",
],
),
), This should be all committed in the 175 branch. |
Cool, I'll check out the branch & see if I can figure out whats up. |
@marc-white What version of intake-esm are you using to test against? I started digging into the tests - only to find they all mysteriously started passing. In Purely my cockup here - I really should have made reference to the |
I've pushed a commit which fixes the test failures to the head of 175 & tests are all passing now. If you can see a way of improving the way these dynamic xfails are handled, I think that would be a great shout - I'd rather not cause any more of these painful issues if we can avoid it. Once we get a new release of intake-esm out it shouldn't be an issue, but in the interim I can see this causing some more issues if we're not careful. |
Ah right, so we are actually expecting to see unused coordinates thrown away. Gotcha. I think there is a wider thing to consider here though. From what I can tell, the coordinates that have been thrown away in this instance are 'edge' coordinates. They seem to define the edge behaviour of the main coordinates. However, those coordinates are used. Therefore, should these 'second-order' coordinates be something we should be preserving? (Also, looks like I'm currently on |
Yeah, I think we probably do want to keep these hanging around. Are you able to open an issue about this on intake-esm - I'll start looking into it more closely once I get done with this E2E test.
Cool, so I think if you were to create a new environment & install the Once we get a new release of But to answer the question more directly, yes, I think the desired behaviour is probably that there should be some sort of 'coordinate tree' traversal, rather than just going up a single level & including those coordinates. |
Interesting, I'm now getting two test failures on the Gadi |
Description of the data product
<Please replace this text with a description of the data product to add to the ACCESS-NRI catalog. What data does it contain? What format is it in? Who is it useful for?>
Location of the data product on Gadi
Checklist
Add a "x" between the brackets to all that apply
The text was updated successfully, but these errors were encountered: