You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.
This is a recent issue I hadn't faced before. When I try to create a pool with NC6 vm_size it fails with the error starttask failed. The error in stderr.txt is mentioned below in additional logs.
Batch Shipyard Version
Installed using git clone and install script
Steps to Reproduce
Just create a pool of standard nc6 size using pool.yaml specified below.
Expected Results
Pool gets created successfully
Actual Results
Fails at startup task. stderr.txt shows the error shown in additional logs section down below.
I hadn't originally included the gpu param in the pool.yaml configuration above. I was still getting the same error.
Additional Logs
Warning: apt-key output should not be parsed (stdout is not a terminal)
Synchronizing state of docker.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable docker
WARNING: API is accessible on http://127.0.0.1:2375 without encryption.
Access to the remote API is equivalent to root access on the host. Refer
to the 'Docker daemon attack surface' section in the documentation for
more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
WARNING: No swap limit support
rmmod: ERROR: Module nouveau is not currently loaded
WARNING: nvidia-installer was forced to guess the X library path '/usr/lib' and X module path '/usr/lib/xorg/modules'; these paths were not queryable from the system. If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver.
WARNING: Unable to find a suitable destination to install 32-bit compatibility libraries. Your system may not be set up for 32-bit compatibility. 32-bit compatibility files will not be installed; if you wish to install them, re-run the installation and set a valid directory with the --compat32-libdir option.
ERROR: Failed to run `/usr/sbin/dkms build -m nvidia -v 418.87.01 -k 5.3.0-1020-azure`:
Kernel preparation unnecessary for this kernel. Skipping...
Building module:
cleaning build area...
'make' -j6 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.3.0-1020-azure IGNORE_CC_MISMATCH='' modules.....(bad exit status: 2)
ERROR (dkms apport): binary package for nvidia: 418.87.01 not found
Error! Bad return status for module build on kernel: 5.3.0-1020-azure (x86_64)
Consult /var/lib/dkms/nvidia/418.87.01/build/make.log for more information.
ERROR: Failed to install the kernel module through DKMS. No kernel module was installed; please try installing again without DKMS, or check the DKMS logs for more information.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
Some more important logs from startup directory:
Warning: apt-key output should not be parsed (stdout is not a terminal)
Synchronizing state of docker.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable docker
WARNING: API is accessible on http://127.0.0.1:2375 without encryption.
Access to the remote API is equivalent to root access on the host. Refer
to the 'Docker daemon attack surface' section in the documentation for
more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
WARNING: No swap limit support
rmmod: ERROR: Module nouveau is not currently loaded
./nvidia-driver_cc37.run: line 1: syntax error near unexpected token `newline'
./nvidia-driver_cc37.run: line 1: `!<arch>'
Additonal Comments
The text was updated successfully, but these errors were encountered:
Problem Description
This is a recent issue I hadn't faced before. When I try to create a pool with NC6 vm_size it fails with the error starttask failed. The error in stderr.txt is mentioned below in additional logs.
Batch Shipyard Version
Installed using git clone and install script
Steps to Reproduce
Just create a pool of standard nc6 size using pool.yaml specified below.
Expected Results
Pool gets created successfully
Actual Results
Fails at startup task. stderr.txt shows the error shown in additional logs section down below.
Redacted Configuration
I hadn't originally included the gpu param in the pool.yaml configuration above. I was still getting the same error.
Additional Logs
Some more important logs from startup directory:
Additonal Comments
The text was updated successfully, but these errors were encountered: