-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E3SM-2-0's source string is too long for CMOR #377
Comments
@mauzey1 thanks for catching this - how many characters over 1024 are we at? |
The current string has a length of 1128 characters. |
@mauzey1 is there a PR or viewable example that displays the 1128 character string? |
The character limit is a long-standing issue. There is some discussion at: #259 . The problem is that CMOR defines the fixed length of all string variables as 1024 characters, which turns out to occupy rather a lot of memory. Most character strings could be 64 characters, and if we did that, then the remaining variables could have a string length of say 2048 without taking up any more memory. |
It would seem that the resolution for this issue is relatively straightforward, if https://www.geeksforgeeks.org/dynamic-memory-allocation-in-c-using-malloc-calloc-free-and-realloc/ is anything to go by. But obviously, we'd have to test to ensure that such changes don't break things |
Not sure the FORTRAN interface to the C would work. |
Also PCMDI/cmor#530 is a relevant issue to refer to - this should be part of the CMOR4 spec |
Yeah, the best solution to the 1024-character limit for strings would be to dynamically allocate string memory and remove the limit. I'm also not sure how FORTRAN would handle dynamically-sized strings. As for the memory usage concern, I don't think strings would be the biggest issue. When was it decided that the max string length should be 1024? I don't think bigger strings would tax modern hardware all that much compared to CMORizing several gigabytes of data. @durack1 I agree that this should be a CMOR4 consideration as it would change CMOR from the ground up. |
@taylor13 wouldn't the proposed changes be oblivious to other software that calls it? In the case that we have a FORTRAN interface that provides input data for CMOR to write, then how CMOR manages memory (and strings) internally will not be a problem, or am I missing something? Wouldn't this just mean that even FORTRAN to provide input strings >1024 chars and CMOR would magically deal with them? |
A quick chat with some local Fortran experts suggests that the interaction between dynamically allocated arrays passed from C "should just work", but I'd be keen to see this tested. I can get more advice from them if useful.
Agreed. |
Regarding whether we must wait for CMOR4, I think there is a single scalar variable (CMOR_MAX_STRING), which is currently set to 1024 and could be made larger (2048?) without disrupting things other than possibly the FORTRAN. The fortran interface (cmor_fortran_interface.f90) has some statements that hardwire string length at: I do recall that someone at the metoffice was complaining about the size of the CMOR executable and traced it to a bunch of memory occupied by mostly empty strings, but I'm not positive about that. Regarding the Fortran just "working", we would need to check on the hardwired string lengths in cmor_fortran_interface.f90 and determine whether any conflicts might arise. It looks to me that those hardwired strings are used to temporarily hold results from "trimming" blank characters at the end of string variables. It's likely that at least one of these (the |
Thanks for the code trawl @taylor13, from a quick scan there are 83 instances of real, parameter:: CMOR_VERSION = 2.7
real, parameter:: CMOR_CF_VERSION = 1.4 @matthew-mizielinski any idea of MetOffice complaints about memory allocation/size, if memory serves, this may have been @ehogan that raised queries many moons ago |
I have tried building CMOR with CMOR_MAX_STRING increased from 1024 to 2048. However, it has resulted in the following error.
There might be a compiler flag that could resolve this error, but I would prefer trying to make CMOR not rely on a global string length. I think replacing a lot of the how strings get composed inside of functions with dynamically-sized strings will be simple albeit tedious. However, there is also the issue of CMOR using many strings in global variables that are currently set to the maximum length. https://github.com/PCMDI/cmor/blob/4f2dc8962e6aabff3845a41e6808cfe06b22e94d/include/cmor.h#L271-L579 We could make CMOR throw an error if a string exceeds the maximum length of strings used in the global variables, or display a warning that the string will get truncated to the max length.
@durack1 We should change those parameters for the CMOR 3.7 release. |
No complaints at this time, we tend to use CMOR on batch clusters where we can simply throw a lot of memory at the problem. This doesn't mean that there isn't an issue we could look at, but I don't see it as a development priority at this time. |
It looks like the immediate issue as been resolved by a user shortening the length of their "source" string, so I agree we can put this off to CMOR4 (if that's the suggestion). When addressed, I would define a few different "maximum string length" parameters (perhaps 16, 64, 256, 1024, and 4096?), and depending on the string usage, assign an appropriate maximum length. This would provide adequate flexibility, and would limit the size of memory needed for the table containing the string variables. |
@mauzey1 WCRP-CMIP/CMIP6_CVs@51d9b63 should now have solved this issue for the |
Unfortunately, the string for cmip6-cmor-tables/Tables/CMIP6_CV.json Line 1126 in cadeb95
I'm not sure how much more you could shave off from the string without losing a lot of info. If it helps, here is the code that creates the string from the WCRP files. Removing the release year in parentheses would shorten the string enough to fit but I am not sure if you want that. cmip6-cmor-tables/scripts/createCMIP6CV.py Lines 37 to 50 in 3591446
|
Hi Paul @durack1 I was trying to CMORize E3SM 2.0 data and reported the issue (PCMDI/cmor#660). Thanks a lot for your effort to rectify the E3SM 2.0 registration. I'm a little unclear about the license updates, do you advise that instead of relying on cmor to add license, we should handle adding license internally when writing out files? |
hi @chengzhuzhang. The license info is provided in the cmor_input.json file that CMOR3 takes as input at run time. If you are updating this info for E3SM 2.0, then I suggest also updating the license info to reflect the CC BY 4.0 (rather than CC BY-SA 4.0) default license, some demos can be found here and the license choices are listed, with the recommended license |
@durack1
The addition of E3SM-2-0 to CMIP6_CV.json doesn't work with CMOR since the
source
attribute's value is a string that exceeds the CMOR string length limit of 1024. Please consider shortening the string.The text was updated successfully, but these errors were encountered: