Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cuebot] Jobs without os set, will not dispatch #1591

Closed
lithorus opened this issue Nov 18, 2024 · 6 comments
Closed

[cuebot] Jobs without os set, will not dispatch #1591

lithorus opened this issue Nov 18, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@lithorus
Copy link
Contributor

lithorus commented Nov 18, 2024

Describe the bug
If the os parameter is not set, cuebot will not dispatch frames from the job

Setting the str_os field in the database to non-null value will make it dispatch frames to rqd.

@lithorus lithorus added the bug Something isn't working label Nov 18, 2024
@lithorus lithorus changed the title Jobs without os set, will not dispatch [cuebot] Jobs without os set, will not dispatch Nov 18, 2024
@DiegoTavares
Copy link
Collaborator

Fixed by #1590

@lithorus
Copy link
Contributor Author

I'll have to disagree that this is not fixed by #1590. I did already test with that fix in place.

@lithorus
Copy link
Contributor Author

This is what I get when it tries to dispatch a job :

2024-11-20 22:11:47.945  INFO 16748 --- [pool-1-thread-1] c.i.spcue.dispatcher.CoreUnitDispatcher  : Frames found: 1 for host 192.168.31.160 652/10801152 on job testing-test-jimmy_samurai
2024-11-20 22:11:47.961  INFO 16748 --- [pool-1-thread-1] c.i.s.dispatcher.DispatchSupportService  : creating proc 192.168.31.160 for 0001-layer1
2024-11-20 22:11:47.978  INFO 16748 --- [pool-1-thread-1] c.i.spcue.dispatcher.CoreUnitDispatcher  : dispatchProcToJob failed booking proc 192.168.31.160/39c75ff3-df93-4e25-9203-03b3f91e392f on job testing-test-jimmy_samurai/94baa341-401a-4aaf-bce1-7dab31258b8c

com.imageworks.spcue.dispatcher.DispatcherException: 192.168.31.160 could not be booked on 0001-layer1, java.lang.NullPointerException
	at com.imageworks.spcue.dispatcher.DispatchSupportService.runFrame(DispatchSupportService.java:214) ~[main/:na]
	at com.imageworks.spcue.dispatcher.DispatchSupportService$$FastClassBySpringCGLIB$$39539eb5.invoke(<generated>) ~[main/:na]
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.2.1.RELEASE.jar:5.2.1.RELEASE]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:769) ~[spring-aop-5.2.1.RELEASE.jar:5.2.1.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) ~[spring-aop-5.2.1.RELEASE.jar:5.2.1.RELEASE]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:747) ~[spring-aop-5.2.1.RELEASE.jar:5.2.1.RELEASE]
	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:366) ~[spring-tx-5.2.1.RELEASE.jar:5.2.1.RELEASE]
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:99) ~[spring-tx-5.2.1.RELEASE.jar:5.2.1.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.2.1.RELEASE.jar:5.2.1.RELEASE]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:747) ~[spring-aop-5.2.1.RELEASE.jar:5.2.1.RELEASE]
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:689) ~[spring-aop-5.2.1.RELEASE.jar:5.2.1.RELEASE]
	at com.imageworks.spcue.dispatcher.DispatchSupportService$$EnhancerBySpringCGLIB$$c48bb835.runFrame(<generated>) ~[main/:na]
	at com.imageworks.spcue.dispatcher.CoreUnitDispatcher.dispatch(CoreUnitDispatcher.java:392) ~[main/:na]
	at com.imageworks.spcue.dispatcher.CoreUnitDispatcher$1.wrapDispatchFrame(CoreUnitDispatcher.java:310) ~[main/:na]
	at com.imageworks.spcue.dispatcher.CoreUnitDispatcher$DispatchFrameTemplate.execute(CoreUnitDispatcher.java:483) ~[main/:na]
	at com.imageworks.spcue.dispatcher.CoreUnitDispatcher.dispatchHost(CoreUnitDispatcher.java:314) ~[main/:na]
	at com.imageworks.spcue.dispatcher.CoreUnitDispatcher.dispatchJobs(CoreUnitDispatcher.java:176) ~[main/:na]
	at com.imageworks.spcue.dispatcher.CoreUnitDispatcher.dispatchHost(CoreUnitDispatcher.java:235) ~[main/:na]
	at com.imageworks.spcue.dispatcher.commands.DispatchBookHost$1.wrapDispatchCommand(DispatchBookHost.java:106) ~[main/:na]
	at com.imageworks.spcue.dispatcher.commands.DispatchCommandTemplate.execute(DispatchCommandTemplate.java:36) ~[main/:na]
	at com.imageworks.spcue.dispatcher.commands.DispatchBookHost.run(DispatchBookHost.java:117) ~[main/:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]

@lithorus
Copy link
Contributor Author

lithorus commented Nov 20, 2024

I did a trace and this is what I get :
cuebot/src/main/java/com/imageworks/spcue/dispatcher/DispatchSupportService.java
DispatchSupportService > runFrame > rqdClient.launchFrame(prepareRqdRunFrame(proc, frame), proc);

param_1 = {VirtualProc@10068} "192.168.31.160/7c133ad0-91bb-4a96-992e-90f4709bcdfb"
 hostId = "fcc88160-7cad-49de-997d-445dda14f1a3"
 allocationId = "00000000-0000-0000-0000-000000000000"
 frameId = "c84b22e3-bf1b-4ce9-af2f-0a3a205e26a9"
 hostName = "192.168.31.160"
 os = null
 childProcesses = null
 canHandleNegativeCoresRequest = true
 coresReserved = 100
 memoryReserved = 3354624
 memoryUsed = 0
 memoryMax = 0
 virtualMemoryUsed = 0
 virtualMemoryMax = 0
 gpusReserved = 0
 gpuMemoryReserved = 0
 gpuMemoryUsed = 0
 gpuMemoryMax = 0
 unbooked = false
 usageRecorded = false
 isLocalDispatch = false
 layerId = "86bf147f-3709-4398-80ef-d1c0f604a430"
 version = 0
 showId = "00000000-0000-0000-0000-000000000000"
 facilityId = "AAAAAAAA-AAAA-AAAA-AAAA-AAAAAAAAAAA1"
 jobId = "94baa341-401a-4aaf-bce1-7dab31258b8c"
 id = "7c133ad0-91bb-4a96-992e-90f4709bcdfb"
 name = "unknown"
param_2 = {DispatchFrame@10069} "0001-layer1/c84b22e3-bf1b-4ce9-af2f-0a3a205e26a9"
 retries = 0
 state = {FrameState@10086} "WAITING"
 show = "testing"
 shot = "test"
 owner = "jimmy"
 uid = {Optional@10090} "Optional[1000]"
 logDir = "/var/tmp//testing/test/logs/testing-test-jimmy_samurai--94baa341-401a-4aaf-bce1-7dab31258b8c"
 command = "python3 -c "import os;print(os.path.expanduser('~/test'))""
 range = "1-1"
 chunkSize = 1
 layerName = "layer1"
 jobName = "testing-test-jimmy_samurai"
 minCores = 100
 maxCores = 100
 threadable = false
 minGpus = 0
 maxGpus = 0
 minGpuMemory = 0
 services = "blender"
 os = null
 minMemory = 3354624
 softMemoryLimit = 3690086
 hardMemoryLimit = 4696473
 layerId = "86bf147f-3709-4398-80ef-d1c0f604a430"
 version = 8
 showId = "00000000-0000-0000-0000-000000000000"
 facilityId = "AAAAAAAA-AAAA-AAAA-AAAA-AAAAAAAAAAA1"
 jobId = "94baa341-401a-4aaf-bce1-7dab31258b8c"
 id = "c84b22e3-bf1b-4ce9-af2f-0a3a205e26a9"
 name = "0001-layer1"

notice that the os is null in each case and later on in the code it expects it not do be null.

@lithorus
Copy link
Contributor Author

lithorus commented Nov 20, 2024

It fails in cuebot/src/compiled_protobuf/main/java/com/imageworks/spcue/grpc/rqd/RunFrame.java :
RunFrame > Builder :

    /**
     * <code>string os = 25;</code>
     * @param value The os to set.
     * @return This builder for chaining.
     */
    public Builder setOs(
        java.lang.String value) {
      if (value == null) {
    throw new NullPointerException();
  }

which expects a string or will fail with a NullPointerException

@DiegoTavares
Copy link
Collaborator

(face palm) I'm sorry, I got this issue confused by another issue fixed by the mentioned PR. I'm reopening this.

@DiegoTavares DiegoTavares reopened this Nov 20, 2024
DiegoTavares pushed a commit that referenced this issue Nov 28, 2024
…1600)

**Link the Issue(s) this Pull Request is related to.**
#1591  

**Summarize your change.**
This is a different take on the MR #1594 which only changes the SQL
queries instead of the Java code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants