Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote-desktop module should provision remote chromoting tools #1019

Closed
proppy opened this issue Mar 13, 2023 · 7 comments
Closed

remote-desktop module should provision remote chromoting tools #1019

proppy opened this issue Mar 13, 2023 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@proppy
Copy link
Member

proppy commented Mar 13, 2023

Describe the bug

After deploying https://github.com/GoogleCloudPlatform/hpc-toolkit/tree/main/community/modules/remote-desktop/chrome-remote-desktop and ssh'ing into the vm instance and running the chromoting setup command snippet according to the provided instructions, the chromoting tools don't seem to be installed on the system.

Steps to reproduce

https://github.com/GoogleCloudPlatform/hpc-toolkit/blob/main/community/modules/remote-desktop/chrome-remote-desktop/README.md#setting-up-the-remote-desktop

Expected behavior

Setup process continue and ask the user to enter a pin.

Actual behavior

The chromoting setup command snippet fails with the following command:

-bash: /opt/google/chrome-remote-desktop/start-host: No such file or directory

Version (ghpc --version)

hpc-toolkit 👺 ./ghpc --version
ghpc version - not built from official release
Built from 'develop' branch.
Commit info: v1.14.0-84-g907d0e7d

Blueprint

blueprint_name: remote-desktop

vars:
  project_id: catx-demo-radlab
  deployment_name: radlab-remote-desktop
  region: us-central1
  zone: us-central1-c

deployment_groups:
- group: primary
  modules:
  - id: network1
    source: modules/network/vpc

  - id: remote-desktop
    source: community/modules/remote-desktop/chrome-remote-desktop
    use: [network1]
    settings:
      install_nvidia_driver: true

Expanded Blueprint

If applicable, please attach or paste the expanded blueprint. The expanded blueprint can be obtained by running ghpc expand your-blueprint.yaml.

Disregard if the bug occurs when running ghpc expand ... as well.

blueprint_name: remote-desktop
validators:
  - validator: test_module_not_used
    inputs: {}
  - validator: test_project_exists
    inputs:
      project_id: ((var.project_id))
  - validator: test_apis_enabled
    inputs: {}
  - validator: test_region_exists
    inputs:
      project_id: ((var.project_id))
      region: ((var.region))
  - validator: test_zone_exists
    inputs:
      project_id: ((var.project_id))
      zone: ((var.zone))
  - validator: test_zone_in_region
    inputs:
      project_id: ((var.project_id))
      region: ((var.region))
      zone: ((var.zone))
validation_level: 1
vars:
  deployment_name: radlab-remote-desktop
  labels:
    ghpc_blueprint: remote-desktop
    ghpc_deployment: radlab-remote-desktop
  project_id: catx-demo-radlab
  region: us-central1
  zone: us-central1-c
deployment_groups:
  - group: primary
    terraform_backend:
      type: ""
      configuration: {}
    modules:
      - source: modules/network/vpc
        kind: terraform
        id: network1
        modulename: ""
        use: []
        wrapsettingswith: {}
        settings:
          deployment_name: ((var.deployment_name))
          project_id: ((var.project_id))
          region: ((var.region))
        required_apis:
          ((var.project_id)):
            - compute.googleapis.com
      - source: community/modules/remote-desktop/chrome-remote-desktop
        kind: terraform
        id: remote-desktop
        modulename: ""
        use:
          - network1
        wrapsettingswith: {}
        settings:
          deployment_name: ((var.deployment_name))
          install_nvidia_driver: true
          labels:
            ghpc_role: remote-desktop
          network_self_link: ((module.network1.network_self_link))
          project_id: ((var.project_id))
          region: ((var.region))
          subnetwork_self_link: ((module.network1.subnetwork_self_link))
          zone: ((var.zone))
        required_apis:
          ((var.project_id)): []
    kind: terraform
terraform_backend_defaults:
  type: ""
  configuration: {}

Execution environment

  • OS: Linux proppy0 5.19.11-1rodete1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.19.11-1rodete1 (2022-10-31) x86_64 GNU/Linux
  • Shell (To find this, run ps -p $$): 82709 pts/2 00:00:00 bash
  • go version: go version go1.20.1 linux/amd64

Additional context

This can be easily worked around by installing https://dl.google.com/linux/direct/chrome-remote-desktop_current_amd64.deb after ssh'ing in the vm and before running the chromoting setup command snippet.

@proppy proppy added the bug Something isn't working label Mar 13, 2023
@proppy
Copy link
Member Author

proppy commented Mar 13, 2023

This seems like a bug in the setup process since installing the chromoting tool seems to be part of the startup scripts of the module:
https://github.com/GoogleCloudPlatform/hpc-toolkit/blob/e7c0c242ddcaf5359c28a74b4a44e69fa07b42a7/community/modules/remote-desktop/chrome-remote-desktop/scripts/configure-chrome-desktop.yml#L28-L38

@proppy
Copy link
Member Author

proppy commented Mar 13, 2023

There seems to be a conflict wrt to apt/dpkg locking in the startup script:

Mar 13 14:03:27 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: Mon Mar 13 14:03:27 +0000 2023 Info [1778]: === start executing runner: configure-grid-drivers.yml ===
Mar 13 14:03:27 radlab-remote-desktop-0 systemd[1]: Started Daemon for generating UUIDs.
Mar 13 14:03:28 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script:
Mar 13 14:03:28 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: PLAY [Ensure nvidia grid drivers and other binaries are installed] *************
Mar 13 14:03:28 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script:
Mar 13 14:03:28 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: TASK [Gathering Facts] *********************************************************
Mar 13 14:03:28 radlab-remote-desktop-0 dbus-daemon[635]: [system] Reloaded configuration
Mar 13 14:03:28 radlab-remote-desktop-0 dbus-daemon[635]: message repeated 4 times: [ [system] Reloaded configuration]
Mar 13 14:03:29 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: ok: [localhost]
Mar 13 14:03:29 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script:
Mar 13 14:03:29 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: TASK [Get kernel release] ******************************************************
Mar 13 14:03:29 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: ok: [localhost]
Mar 13 14:03:29 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script:
Mar 13 14:03:29 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: TASK [Install binaries for GRID drivers] ***************************************
Mar 13 14:03:31 radlab-remote-desktop-0 systemd[1]: Starting Update APT News...
Mar 13 14:03:31 radlab-remote-desktop-0 systemd[1]: Starting Update the local ESM caches...
Mar 13 14:03:31 radlab-remote-desktop-0 systemd[1]: apt-news.service: Deactivated successfully.
Mar 13 14:03:31 radlab-remote-desktop-0 systemd[1]: Finished Update APT News.
Mar 13 14:03:31 radlab-remote-desktop-0 systemd[1]: esm-cache.service: Deactivated successfully.
Mar 13 14:03:31 radlab-remote-desktop-0 systemd[1]: Finished Update the local ESM caches.
Mar 13 14:03:32 radlab-remote-desktop-0 dbus-daemon[635]: [system] Reloaded configuration
Mar 13 14:03:32 radlab-remote-desktop-0 dbus-daemon[635]: message repeated 2 times: [ [system] Reloaded configuration]
Mar 13 14:03:32 radlab-remote-desktop-0 dbus-daemon[635]: Unknown username "rtkit" in message bus configuration file
Mar 13 14:03:32 radlab-remote-desktop-0 dbus-daemon[635]: [system] Reloaded configuration
Mar 13 14:03:32 radlab-remote-desktop-0 dbus-daemon[635]: Unknown username "rtkit" in message bus configuration file
Mar 13 14:03:32 radlab-remote-desktop-0 dbus-daemon[635]: [system] Reloaded configuration
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: fatal: [localhost]: FAILED! => {"cache_update_time": 1678716211, "cache_updated": true, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"       install 'gdebi-core' 'mesa-utils' 'gdm3'' failed: E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 3339 (apt-get)\nE: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n", "rc": 100, "stderr": "E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 3339 (apt-get)\nE: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n", "stderr_lines": ["E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 3339 (apt-get)", "E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?"], "stdout": "", "stdout_lines": []}
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script:
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: PLAY RECAP *********************************************************************
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: localhost                  : ok=2    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script:
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: Mon Mar 13 14:03:33 +0000 2023 Info [1778]: === configure-grid-drivers.yml finished with exit_code=2 ===
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: Mon Mar 13 14:03:33 +0000 2023 Error [1778]: === execution of configure-grid-drivers.yml failed, exiting ===
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: Mon Mar 13 14:03:33 +0000 2023 Info [1576]: === passed_startup_script.sh finished with exit_code=2 ===
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: Mon Mar 13 14:03:33 +0000 2023 Error [1576]: === execution of passed_startup_script.sh failed, exiting ===
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script exit status 2
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: Finished running startup scripts.

@proppy
Copy link
Member Author

proppy commented Mar 13, 2023

maybe there is an option to have ansible try to acquire a lock on the dpkg stuff before running the recipe?

@proppy
Copy link
Member Author

proppy commented Mar 13, 2023

/cc @nick-stroud

@nick-stroud
Copy link
Collaborator

I suspect this is coming from a conflict with unattended-upgrades holding the lock. We have seen similar before with startup scripts on debian based images. Historically our approach has been to add retries.

@nick-stroud
Copy link
Collaborator

I have added retries that will hopefully prevent failure on this in the future. I have also added an integration test for the chrome-remote-desktop which will help to keep this installation robust to changes. I am going to consider this bug fixed. Please re-open if you feel the fix does not address the bug.

@nick-stroud nick-stroud added the fixed; not released issues that have been fixed on the develop branch but have not yet been part of a tagged release. label Mar 16, 2023
@nick-stroud nick-stroud self-assigned this Mar 16, 2023
@nick-stroud
Copy link
Collaborator

Released in v1.16.0.

@nick-stroud nick-stroud removed the fixed; not released issues that have been fixed on the develop branch but have not yet been part of a tagged release. label Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants