-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask improvements #208
Comments
Have also noticed some unexpected failures with Dask TimeOut/Heartbeat errors. In some instances this is actually after the final chunk appears to have been written to file. |
job.err.txt |
@rosepearson is it making the whole code crash? Dask can be a bit verbose with errors during the tearing down procedure of the Dask cluster... although this doesn't cause any real issue (apart from logs reporting errors) in my experience. |
@jennan it is making the whole Cylc task crash. I have another one below where we hit the time limit - but again the file had been completely written out. It could be a coincident that the file just finished writing before the time limit was reached and it didn't have time to execute the print "Job Succeeded".. but it seems to me like there might be something slightly fishy. Just attaching as a reference for now - not expecting any action on this ticket right now :) |
job.err.txt |
Noting another repeated error that occurs for a particular roughness run. It repeatedly fails when launched on Maui through Cylc, but has completed producing the expected output. e.e. appears to be a false failure. I have checked running on the NIWA maui nodes without SLURM accessed through https://jupyter.maui.niwa.co.nz/ and it runs fine there. This might be a good first place to look into regarding seemingly random errors after the job has run successfully. tile 2728. roughness stage. takes ~15min. |
One more note about the failures I'm experiencing. They are often related to One is copied below. I am wondering if we should restructure how we deal with upsampling the coarse DEM to do that directly instead of breaking into chunks and doing each chunk individually. This would mean many fewer explicit dask delayed calls and leave it up to Dask exactly how it chooses to manage the compute load. This would mean the upsampling is all done with linear interpolation - but that seems sensible anyway.
|
SUMMARYLooking back through my notes/comments:
I've also been thinking about how I break up the course DEM stage into a bunch of explicit chunks and wonder if that would be best to leave up to Dask.
Old relevant issues
|
Update after Consultancy 1Most recent commit associated with this comment ae30061
I've been looking into xarray.interp with chunking. I've come across the following cases:
I've also tried various ways to force/encourage dask in the interp call:
Questions
|
@jennan thanks for your notebook. I've implemented it with a few minor tweaks to deal with the std map layout (yx with y decreasing). Also a seemingly odd issue where the chunking needs dask.array.map_blocks needs the second array to have a chunk of its length or smaller - odd at the first array is autmatically given a chunk of its size if the specified chunk size is greater. No action required - just an update Test profiling for tests/test_dem_generation_westport_4/test_dem_generation_westport_4.py Testing on (previously failing larger example.It ran successfully produced the netCDF file which is new. It did fail with a
|
Update from consultancy 2
Identified test cases
Separate bugfix to fix regression errors
Next consultancy
Error logs
|
Consultancy 3Tasks for Rose
|
This is an issue for an upcoming NeSI consultancy with @jennan. The focus will be on improving the performance and stability of GeoFabrics for larger scale problems.
Focus on better making use of Dask throughout the GeoFabrics stages. Two identified areas are:
RasterArray.interpolate_na
Profiling has shown that pinch points are quite different for larger scale problems than smaller problems. Take the two profiles below.
is a 6min problem with all geofabric stages. (small_2m_res.html)
is a 4hr problem with all geograbric stages. The only difference with 1 is it is 1m instead of 2m resolution. (small_1m_res.html)
RasterArray.clip
Another area of focus (although it hasn't showed up as an issue in the 1m profiling it is vivible in the 2m) is
Could make use of either pandas or dask-geopandas. A weak attempt that I didn't get off the ground can be seen as a comment in processor.py
Also worth noting that we may be able to do this more directly using a rolling.min call to the xarray with an appropriate size window.
The text was updated successfully, but these errors were encountered: