-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(draft): add report=
argument for uproot.dask
; trigger report collection (take 2!)
#1058
feat(draft): add report=
argument for uproot.dask
; trigger report collection (take 2!)
#1058
Conversation
In this PR the failure and success reports are completely defined and used here in uproot, specifically in the As a POC I've made the fields
|
Here's a quick script that I've been using to check things! from coffea.nanoevents import NanoEventsFactory, NanoAODSchema
#from distributed import Client
import dask
#import dask_awkward as dak
if __name__ == "__main__":
#client = Client()
#dask.config.set({"awkward.optimization.enabled": True, "awkward.raise-failed-meta": True, "awkward.optimization.on-fail": "raise"})
events, report = NanoEventsFactory.from_root(
["https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root:Events", "/not/actually/a/root/file.root:Events"],
metadata={"dataset": "nano_dy"},
schemaclass=NanoAODSchema,
uproot_options={"allow_read_errors_with_report": True}
).events()
pt, creport = dask.compute(events.Muon.pt, report) (edited to follow @jpivarski's review comments) |
Though with the latest report and the above script I get:
Whereas the last report version completed just fine. |
src/uproot/_dask.py
Outdated
def time_it(f): | ||
@functools.wraps(f) | ||
def wrapper(*args, **kwargs): | ||
start = time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use time.monotonic() for start and stop!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the time package has some odd aspects. time.time() can't be used with time.monotonic(), they measure from different starting points, the latter to ensure consistency.
So need time.time() for the start time since epoch and then a consistent duration is formed from pairs of time.monotonic().
Annoying, right?
The latest whoops commits gives me:
So somehow it's not catching the FileNotFound/OSError correctly? |
I think the exception is getting raised before dask-awkward gets to the place where it allows exceptions (at the I'm still digging/working on this! |
Some good news here though: In [11]: import uproot
...: import dask
...: files = ["/Users/ddavis/software/repos/coffea/tests/samples/nano_dy.root:Events"]
...: thing, report = uproot.dask(files, report=True)
...: a, b = dask.compute(thing, report)
...:
...:
In [12]: a
Out[12]: <Array [{run: 1, ...}, ..., {run: 1, ...}] type='40 * {run: uint32, luminos...'>
In [13]: b.tolist()
Out[13]:
[{'duration': -1701769030.9317346,
'args': ["<TTree 'Events' (1499 branches) at 0x000176803c10>", '0', '40'],
'kwargs': [],
'exception': None,
'message': None,
'fqdn': None,
'hostname': None}] |
I use open_files=False in uproot.dask so it shouldn't be trying to open anything! |
Ah you're right. I didnt catch the whole traceback :) there is some uproot code getting used in |
import uproot
import dask
if __name__ == "__main__":
events, report = uproot.dask(
["tests/samples/nano_dy.root:Events", "/not/actually/a/root/file.root:Events"],
allow_read_errors_with_report=True,
open_files=False,
)
pt, creport = dask.compute(events.Muon_pt, report) Here's a non-coffea script. (edit for comments from @jpivarski's review) |
OK just got the coffea-using script to work. |
Co-authored-by: Lindsey Gray <lindsey.gray@gmail.com>
commented in that review thread but putting it here: call_time = time.time_ns() # integer time since epoch, convertible to a datetime
start = time.monotonic() # since start of program for calibration purposes
call(stuff)
stop = time.monotonic()
return call_time, stop - start |
ad45af6
to
582119b
Compare
and, more or less, we care about call time if it fails and duration if it succeeds! |
very nice:
|
OK I think this is in a pretty good place now |
Yep - this seems to be in the right place! |
Need to polish the PR in dask-awkward and that'll be in the next dask-awkward release |
@btovar could you try this with taskvine to make sure the report works with very different backends? |
@lgray will do! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@douglasdavis gave me a walkthrough of this PR in our meeting, and I think it looks good, with just a few changes before merging.
- For the
uproot.dask
user, let's change the name fromreport: bool = False
toallow_read_errors_with_report: bool = False
. Internally, the argument can be passed around asreport: bool
. This is a long argument name, but I think that extra context is needed, and not just in a docstring. This makes user scripts more self-documenting. - The
args
andkwargs
fields in the report are too generic: the arguments in question here are internal within Uproot, but "args
andkwargs
" suggests that they're user-visible arguments, which can be confusing. They're generic arguments to dask-awkward, but in Uproot, when making the report, they can be namedfile_path
,object_path
,entry_start
,entry_stop
, etc., which are the user-visible names for these things.
Other than that, we're just waiting for @btovar's test, and then this is done and can be merged. It will go into 5.2.0 on Wednesday.
Currently I'm getting:
versions: awkward-2.5.1rc1 coffea-2023.12.0rc0 dask-awkward-2023.12.0 I get that error even with the example above without coffea. Let me try with an env just with this uproot commit. import uproot
import dask
if __name__ == "__main__":
events, report = uproot.dask(
["tests/samples/nano_dy.root:Events", "/not/actually/a/root/file.root:Events"],
report=True,
open_files=False,
)
pt, creport = dask.compute(events.Muon_pt, report) |
You'll need to checkout the branch @ dask-contrib/dask-awkward#433 |
I'm still getting the same error from a clean env. I'm checking out the given commits for uproot5 (cbad36e) and dask-awkward (d64f8a5aacf1f995c7aad5a380b9b7e406fd0194), and doing a
|
Ah, latest uproot commit requires - report=True,
+ allow_read_errors_with_report=True, in So coffea would need |
Yeah my bad for not updating the stuff in the comments above, have been traveling today - done now. |
Thanks for your help, it seems to be working now. I tried both with local and remote taskvine workers. |
I've added a simple test and the upstream feature is now available in the latest version of dask-awkward ( |
Supersedes #1050
This again boils down to adding a
report=
argument, but now the implementation to actually build the report is significant changed and based on PR dask-contrib/dask-awkward#433 in dask-awkward instead of the combination of #1050 + dask-contrib/dask-awkward#415@lgray if you could take this for a spin on something that you know will potentially have
OSError
raised, that would be great!