-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable factory references to create new dimensions on load. #6168
base: main
Are you sure you want to change the base?
Conversation
Now updated with prototype loading-control object
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just a couple comments.
lib/iris/__init__.py
Outdated
return policy | ||
|
||
|
||
def _apply_loading_policy(cubes, policy=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be a case for making something like this public, perhaps as a method of LoadPolicy. There could concievably be a situation where a user would want an equivalent combination, but it's not appropriate to do on load. Perhaps in cases where there needs to be some processing of cubes in order to make them mergeable which is not possible to do as a callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree this could be a useful general tool in its own right.
In that case it would be useful to expose the "POLICY" settings as simple arguments here
(especially if they get simpler, as described in the "new settings strategy" below).
Going forward, I'll try + make that workable.
self.found_multiple_refs = False | ||
|
||
|
||
# A single global object (per thread) to record whether multiple reference fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an instance per thread, is there any risk that behaviour is dependant on how multiprocessing is done? Or is the relevant loading done entirely within a single thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the loading code is all single-threaded in operation. It has to be since, while it has the input file open, it might not be safe to run parallel loading tasks since the file system interface may not be thread-safe (e.g. as for netcdf4-python) .
# the latest load operation. | ||
# This is used purely to implement the iris.LOAD_POLICY.multiref_triggers_concatenate | ||
# functionality. | ||
_MULTIREF_DETECTION = MultipleReferenceFieldDetector() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this would be instantiated at the module level rather than during a load call. Would this not change your loading behaviour depending on what files you have already loaded during your session?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's explained here, that all the iris.load_xxx
load functions call via _load_collection
, which resets this. It is admittedly a weak contract (!).
We need it to behave like the settings controls. i.e. to exist as a common (but per-thread) global reference, so it can affect low-level behaviour without the need to pass the control down through all functions in the call chain.
lib/iris/__init__.py
Outdated
def __init__( | ||
self, | ||
support_multiple_references: bool = False, | ||
multiref_triggers_concatenate: bool = False, | ||
use_concatenate: bool = False, | ||
use_merge: bool = True, | ||
cat_before_merge: bool = False, | ||
repeat_until_done: bool = False, | ||
): | ||
"""Container for loading controls.""" | ||
self.support_multiple_references = support_multiple_references | ||
self.multiref_triggers_concatenate = multiref_triggers_concatenate | ||
self.use_concatenate = use_concatenate | ||
self.use_merge = use_merge | ||
self.cat_before_merge = cat_before_merge | ||
self.repeat_until_done = repeat_until_done | ||
|
||
def __repr__(self): | ||
msg = ( | ||
"LoadPolicy(" | ||
f"support_multiple_references={self.support_multiple_references}, " | ||
f"multiref_triggers_concatenate={self.multiref_triggers_concatenate}, " | ||
f"use_concatenate={self.use_concatenate}, " | ||
f"use_merge={self.use_merge}, " | ||
f"cat_before_merge={self.cat_before_merge}, " | ||
f"repeat_until_done={self.repeat_until_done}" | ||
")" | ||
) | ||
return msg | ||
|
||
def copy(self): | ||
return LoadPolicy(**{key: getattr(self, key) for key in self._allkeys}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think data classes give you some/all of this, and you can still add your own methods in addition.
lib/iris/__init__.py
Outdated
"support_multiple_references", | ||
"multiref_triggers_concatenate", | ||
"use_concatenate", | ||
"use_merge", | ||
"cat_before_merge", | ||
"repeat_until_done", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of giving users more controls to play with, but having 6 booleans is very confusing, so I'm keen to look for ways it could be simpler to understand.
- It doesn't look like
support_multiple_references
is necessary use_concatenate
,use_merge
andcat_before_merge
could become some sort of controllable sequence e.g.{"c", "m"}
/{"m"}
/{"m", "c"}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, as discussed offline, I think we can simplify this with benefits.
See new strategy proposals, below.
lib/iris/__init__.py
Outdated
LOAD_POLICY_LEGACY = LoadPolicy() | ||
LOAD_POLICY_RECOMMENDED = LoadPolicy( | ||
support_multiple_references=True, multiref_triggers_concatenate=True | ||
) | ||
LOAD_POLICY_COMPREHENSIVE = LoadPolicy( | ||
support_multiple_references=True, use_concatenate=True, repeat_until_done=True | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be simpler to follow if these 'presets' were all part of another object. Could they be class attributes e.g. LoadPolicy.recommended
? It would be good if the only global constant were LOAD_POLICY
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I don't like this much.
Under review, watch this space!
lib/iris/__init__.py
Outdated
"""Return context manager for temporary options. | ||
|
||
Modifies the given parameters within a context, for the active thread. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do with more detail and examples - I'm struggling to picture use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This is not intended to be complete yet -- still just draft ideas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had my rough look around.
I think the user experience is almost there. Seems like a good compromise that still holds the user's hand as much as possible (they don't write the code themselves), while still offering more customisation than currently.
The developer experience feels very complicated. I think this is worth some attention, since the developer experience around loading is already too complicated and I have noticed can be a big barrier for less experienced people. Afraid I don't have any immediate suggestions, only that it's worth some time.
Thanks!
I'm not providing additional docstrings yet, but it's clearly needed (if not a userguide section?) |
You're totally right, and I'm currently planning to remove all the pieces here to a single sub-module "iris.io.loading", which I think will help a bit. However, I want to add some proper testcases before I attempt a lift-and-shift, as otherwise some breakages might go unnoticed. |
Current best ideas for a "new control strategy"Following offline discussion wuith @trexfeathers Just 3 control keywords:
Plus...
|
Progress Update :For now I'm deferring the user-API improvements detailed above.
|
3942968
to
e135573
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6168 +/- ##
==========================================
- Coverage 89.81% 89.64% -0.18%
==========================================
Files 88 88
Lines 23178 23277 +99
Branches 4313 4333 +20
==========================================
+ Hits 20818 20867 +49
- Misses 1628 1663 +35
- Partials 732 747 +15 ☔ View full report in Codecov by Sentry. |
… in load;load_cube/load_cubes.
f7ddb65
to
0f1bae3
Compare
0f1bae3
to
801f9e2
Compare
Relates to #5369 , #6165