-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximum amount of pairs and batch processing #235
Comments
The behavior remains the same when the short circuits are not included. Could it be related to the amount of habitat patches/ points? |
Parallel processing issue likely related to #165 |
I would strongly recommend looking into Omniscape for this problem. You might consider reading up about it here, and check out the docs for the Omniscape.jl Julia package here. You will likely still need a supercomputer for a problem of your size. |
As I pointed out in email - this is a huge amount of compute on extremely large grids. I think looking at Omniscape is certainly a good idea. |
Thank you for the recommendations! The issues are not related to grid size as the same issues persist on small subsamples (as in the linked partial_raster.zip. However, I am testing omniscape on the supercomputer cluster at the moment! |
Are you seeing a specific pair that is being written that should not be? Note that Circuitscape states the total number of pairs regardless of "include" status, and AFAIK it skips any pairs that are not in the include file, so it might say solving pair 3 of 5000000, but that does not mean it will solve all 5000000 pairs. Is that right @ranjanan? |
You also have |
If I remember correct I tested this with only 5 points and it was not restraining to the given points only. |
What OS, Julia version, and Circuitscape version are you using? |
Linux, Julia Version 1.4.0 and Circuitscape 5.5.5 (however in the .out file it states version = 5.0.0). I update circuitscape each time in my script on the supercomputer with I tested again and as stated before, even with only 5 pairs in the pairs file it keeps computing pairs, overwriting output and not working in parallel. mode include Info: 2020-06-02 20:08:46 : Logs will recorded to file: log_file |
Found a small mistake in the test case. I have to try some things again to be sure of the issue. |
Okay good to know. Some additional info below: To update the package you will need to run Pkg.update("Circuitscape") if an old version of Circuitscape is already installed. It looks like you still have multiple pixels with the same value in your pairs file, so that will prevent parallel processing from working, see #232 (comment). I'm not sure if this also affects whether or not you can use included pairs. Will need @ranjanan to confirm. As for the overwriting output issue, I'm not sure what is going on there. To clarify for the thread, essentially output files like |
@vlandau Thank you for all the assistance. I got a more or less working example on 1000m resolution and with a subset of pairs. However, there are still a lot of things puzzling me.
Any guidance would be helpful cause I got a lot more scenario's to test! |
Ok, so I made this work by doing the following:
Happy it worked out but it seems some things could improve on the functionality of large pair datasets which is a neccesity to analyse which patches will be reachable in SDM modelling of future climates. |
The next step for Circuitscape's parallel processing (which is currently being worked on) is to use multi threading instead of distributed parallel processing, which could help with the overhead problems you're experiencing. But because this is in the works, we don't want to put too much effort into the current parallel processing framework since it is going to be changed soon. I think the idea of processing in batches is a good one, and glad that worked out. You should be able to pretty easily do single solves on grids much larger than 15 million pixels, but if you have 10's of thousands of pairs to solve, yeah, it would take a lot longer. |
Glad to hear about the multi-threading! Also looking forward to the .tif functionality. A simple statement in the ini file to determine the batch size of solves would fix a lot of the other problems I was experiencing. Would be good to know if circuitscape is able to handle integer 64 centroids. Because due to the interacting problems I am currently unsure. You can close the issue if you wish! Thank you! With kind regards, |
There is a |
Was using the "use_64_bit_indexing" so not sure why I get the following error when using more than 400 pairs (same error occurs using CG + AMG) and thus when the amount of possible combination exceeds 32767 (int32) or when more than 32767 of different values are available in the raster. ERROR: ArgumentError: dense matrix construction failed for unknown reasons. Please submit a bug report. When I use tif I get the following error: ERROR: Base.InvalidCharError{Char}('\xac') |
The first error seems to be triggered by the SuiteSparse package (cc @ranjanan). The second error suggests that you're not using the latest master. Running |
Dear @vlandau , You are right. Installing the master solver solved the tif problem! |
Dear @ViralBShah Thank you for the information! I will try to make it work like that. Thank you for all the help! [ Info: 2020-06-16 19:54:46 : Logs will recorded to file: log_file ERROR: LoadError: ArgumentError: dense matrix construction failed for unknown reasons. Please submit a bug report. |
I am going to make a new ticket with just this error because this bug is the essence of all my problems. |
It would be best if you can provide the smallest example files that can reproduce the bug. |
Dear Julia and Circuitscape associates,
When using a pairs file with (and without) short circuits the pairs (and parallel arguments) are ignored. When the pairs file is included it starts to solve all possible pairs (5601952476) based on 105848 habitat patches (or centroids). Example subset to reproduce the problem can be found below.
The pairs file is formatted as:
mode include
4 2
7 2
6 3
1 5
Not running in parallel
Also, when I use the short circuit file combined with the centroid point file, the parallel processing doesn't seem to work:
[ Info: 2020-05-26 12:07:38 : Logs will recorded to file: log_file
[ Info: 2020-05-26 12:07:38 : Precision used: Double
[ Info: 2020-05-26 12:07:43 : Reading maps
[ Info: 2020-05-26 12:07:49 : Resistance/Conductance map has 10893750 nodes
[ Info: 2020-05-26 12:08:01 : Total number of pair solves = 10743930
[ Info: 2020-05-26 12:08:01 : Solving pair 1 of 10743930
[ Info: 2020-05-26 12:08:12 : Solver used: CHOLMOD
[ Info: 2020-05-26 12:08:12 : Graph has 10893705 nodes, 2 focal points and 1 connected components
[ Info: 2020-05-26 12:09:04 : Time taken to construct cholesky factor = 50.789335301
[ Info: 2020-05-26 12:09:06 : Time taken to construct local nodemap = 1.766986406 seconds
[ Info: 2020-05-26 12:09:06 : Solving points 1 to 1
[ Info: 2020-05-26 12:09:10 : Solving pair 2 of 10743930
Output overwrites itself
While its solving sequentially it is also overwriting its own output, thus only resistances between one pair stays visable.
Output after 5 min:
0.0 5.0 14.0
5.0 0.0 0.0011597292571570218
14.0 0.0011597292571570218 0.0
Output after10 min:
0.0 5.0 17.0
5.0 0.0 0.0011460580515374435
17.0 0.0011460580515374435 0.0
So, currently it solves only pairwise comparisons of all pairs related to habitat patch 5 and overwrites its own output.
Extra info:
I successfully calibrated this resistance map with a genetic optimisation algorithm (400 scenarios and 400 pairs) but now I want to measure resistances between all habitat patches in Europe. I made a simplified problem (downsampled to 400m resolution) with 127093750 nodes and 105848 habitat patches. If possible I would like to calculate my final problem on 2033500000 nodes and aprox. 436945 habitat patches for 16 scenarios.
I want to use use polygons (asc) as shortcircuits with a centroid point file (asc) with an additional pairs file (txt). I tested the script with and without asc/tif, CG+AMG/ cholmod, single/parallel, with and without short circuits and 32/64bit_indexing. Normally I write my patches to asc grid with int2s but currently there are more pairs than this integer type can handle. Therefore the asc grid was created using the int4s integer type (https://www.rdocumentation.org/packages/raster/versions/3.1-5/topics/dataType) and NODATA values were forced to -9999. The final habitat file has 105848 habitat patches with a unique ID and the pairs file has 79829 unique pairs. Am I missing something that could cause this behavior?
All asc rastes line up with the following dimensions: dimensions : 10375, 8750, 90781250 (nrow, ncol, ncell), extent : 2500000, 6000000, 1350000, 5500000 (xmin, xmax, ymin, ymax)
Thank you for your insights!
The text was updated successfully, but these errors were encountered: