-
I've been trying to send a steady stream of up to 50 concurrent jobs thru a large all-purpose driver node (in azure databricks). However, the jobs seem to come back to me with unexpected durations (overall time from start to finish). I'm wondering if .net for spark is synchronizing them in same artificial way, or imposing some concurrency restrictions. I'm using v.1.1.1. I noticed the following message that doesn't make that much sense to me. I will investigate the code as well ... but thought I would ask to see if anyone knows what this means.
This message may be the reason for the unexpected durations of my jobs. Any help would be appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I found the environment variable that controls the message above. It is "DOTNET_SPARK_NUM_BACKEND_THREADS" in ConfigurationService. Oddly enough there is a different environment variable used in scala:
Is this intentional? Here are logs from the .net side and JVM side, with seemingly contradictory information:
Any pointers would be appreciated. I think I will start setting both environment variables for good measure. |
Beta Was this translation helpful? Give feedback.
I found the environment variable that controls the message above. It is "DOTNET_SPARK_NUM_BACKEND_THREADS" in ConfigurationService.
Oddly enough there is a different environment variable used in scala:
Is this intentional?
Here are logs from the .net side and JVM side, with seemingly contradictory information: