fail to assign pci device at start of usbvm #1544

Rudd-O · 2015-12-26T12:33:03Z

I'm experiencing a weird error starting a usbvm:

[user@dom0 ~]$ qvm-start usbvm
--> Creating volatile image: /var/lib/qubes/appvms/usbvm/volatile.img...
--> Loading the VM (type = AppVM)...
Traceback (most recent call last):
  File "/usr/bin/qvm-start", line 125, in <module>
    main()
  File "/usr/bin/qvm-start", line 109, in main
    xid = vm.start(verbose=options.verbose,
preparing_dvm=options.preparing_dvm, start_guid=not options.noguid,
notify_function=tray_notify_generic if options.tray else None)
  File "/usr/lib64/python2.7/site-packages/qubes/modules/000QubesVm.py",
line 1849, in start
    nd.dettach()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 5249, in
dettach
    if ret == -1: raise libvirtError ('virNodeDeviceDettach() failed')
libvirt.libvirtError: Requested operation is not valid: PCI device
0000:00:1a.0 is in use by driver xenlight, domain usbvm

Restarting libvirtd only aggravates the issue:

[user@dom0 ~]$ qvm-start usbvm
--> Creating volatile image: /var/lib/qubes/appvms/usbvm/volatile.img...
--> Loading the VM (type = AppVM)...
Traceback (most recent call last):
  File "/usr/bin/qvm-start", line 125, in <module>
    main()
  File "/usr/bin/qvm-start", line 109, in main
    xid = vm.start(verbose=options.verbose, preparing_dvm=options.preparing_dvm, start_guid=not options.noguid, notify_function=tray_notify_generic if options.tray else None)
  File "/usr/lib64/python2.7/site-packages/qubes/modules/000QubesVm.py", line 1857, in start
    self.libvirt_domain.createWithFlags(libvirt.VIR_DOMAIN_START_PAUSED)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1059, in createWithFlags
    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirt.libvirtError: internal error: libxenlight failed to create new domain 'usbvm'

Weird errors in libxl log:

2015-12-26 x:31:47 TZ libxl: error: libxl_pci.c:1000:do_pci_add: xc_assign_device failed: Operation not permitted
2015-12-26 x:31:47 TZ libxl: error: libxl_create.c:1422:domcreate_attach_pci: libxl_device_pci_add failed: -3

The hypervisor log says:

(XEN) [VT-D] It's disallowed to assign 0000:00:1a.0 with shared RMRR at dbe9a000 for Dom19.
(XEN) XEN_DOMCTL_assign_device: assign 0000:00:1a.0 to dom19 failed (-1)

The text was updated successfully, but these errors were encountered:

Rudd-O · 2015-12-26T12:37:24Z

What's that about the RMRR?

Rudd-O · 2015-12-26T12:39:59Z

Appears to be some new shit:

http://www.gossamer-threads.com/lists/xen/devel/391684

marmarek · 2015-12-26T12:59:46Z

The VM has never been started.

Not even using autostart at boot?

What's that about the RMRR?
Appears to be some new shit:
http://www.gossamer-threads.com/lists/xen/devel/391684

We have a way to set rdm_policy=relaxed, bundled with pci_strictreset=false - it should be set by default salt formula for sys-usb, exactly for this reason.
My understanding is that those devices in fact shares some resources, so can't be safely isolated from each other. And Xen doesn't support group assignment (at least for now), so don't know that you are going to assign all such devices to the same VM (which should be safe).

Rudd-O · 2015-12-26T13:06:34Z

Yes, the VM had autostart at boot and the systemd service had failed for this reason.

How do I determine which devices share the RMRR? I couldn't find anything in my logs.

Rudd-O · 2015-12-26T13:09:19Z

Holy shit, setting pci_strictreset to False actually let me start that stupid VM!

marmarek · 2015-12-26T13:10:21Z

I don't know, but guess it is the other USB controller (or USB2.0/USB3.0). If you assign both/all of them to the same VM, you'll see the same address in xen log (assuming you set pci_strictreset=False first, otherwise VM start will fail at the first device...). Yes, kinda ugly way to determine that...

Rudd-O · 2015-12-26T13:10:46Z

Wait, spoke too soon. The VM never ran qrexec-daemon.

Rudd-O · 2015-12-26T13:11:35Z

It now says in the hypervisor log "It's risky to assign blah blah".

Rudd-O · 2015-12-26T13:12:41Z

libxl log:

<date> libxl: error: libxl_device.c:1215:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/24/0 not ready

marmarek · 2015-12-26T13:15:06Z

Did it crashed just after startup (state "c" on xl list)? If so, probably not enough continuous memory available (take a look #1038 for details) . You can try to free some with xl mem-set 0 <some-number-in-MB> to reduce dom0 memory drastically. For example down to 1500. Sometimes it helps. Otherwise, reboot...

Rudd-O · 2015-12-26T13:16:10Z

Yes. It's state crashed. I just checked.

Assigning all USB devices to the same VM worked to fix the problem.

This sucks. Now I don't have my mouse.

Rudd-O · 2015-12-26T13:17:20Z

Note that assigning all USB PCI devices did NOT help start the VM. Even with pci_strictreset set to false. It just killed my mouse.

Rudd-O · 2015-12-26T13:18:18Z

I will try rebooting now. BRB.

marmarek · 2015-12-26T13:18:22Z

The VM crash at startup is generally a problem with starting VM with PCI devices after some system uptime, memory is much fragmented then. It is independent of previous problem (which is solved with pci_strictreset).

Rudd-O · 2015-12-26T13:30:57Z

Alright, excellent. My USB VM has now received the assignment of the two USB PCI devices I intended to isolate (the Bluetooth and camera devices). I still keep the ability to use my mouse. This is GREAT.

Thanks for the pci_strictreset trick.

Improvement: it really should be somehow autodetected whether it is necessary or not.

marmarek · 2015-12-26T13:34:05Z

It is set for USB VM by salt formula by default.

Rudd-O · 2015-12-26T13:35:18Z

Yes, that's true, but it would not be the default for a manually created USB VM, which was my case, and I bet the case in many cases. A smart default lower in the stack would reduce the support load.

marmarek · 2015-12-26T13:35:37Z

The proper solution would be to have PCI group assignment supported by Xen. This way it would detect whether it is really risky to assign particular devices to the VM.

Rudd-O · 2015-12-26T13:36:04Z

Agreed.

marmarek · 2016-01-06T23:09:45Z

Summary:

Xen missing feature of PCI group device assignment
libvirt bug in tracking which PCI device is used where (libvirt.libvirtError: Requested operation is not valid: PCI device 0000:00:1a.0 is in use by driver xenlight, domain usbvm while starting usbvm)

Rudd-O · 2016-01-16T20:44:40Z

Quick update: I assigned two of my USB PCI devices (out of three) to a USB VM. That caused hangs and reboots to happen around once each day. They stopped happening as soon as I decided never to power on that VM again. I still have yet to try adding all three USB PCI devices to the USB VM (doing so would disable all USB ports on this machine) We'll see if that causes hangs.

andrewclausen · 2016-03-14T22:08:36Z

The pci_strictreset option didn't have any effect for me. (Exactly the same error messages, etc.)

Rudd-O · 2016-03-15T05:09:15Z

I found my problem. It was a mouse whose receiver stopped working properly, and started causing lockups and hard reboots whenever it was plugged, irrespective of which VMs it was assigned to. The mouse is now in the trash. PCI strict reset did work for starting the VM though.

marmarek · 2016-03-15T12:38:05Z

There is a strange issue related to some Logitech receivers: #1689 . I can confirm it indeed happens, but no idea why. I'd rather blame some kernel driver, not the device itself.

nothingmuch · 2016-10-23T01:05:15Z

I am getting this error too, with pci_strictreset set to false, on a clean install of R3.2 on a Lenovo Yoga 12 which previously had R3.1 working with a usbvm. Disabling USB3 in the bios seemed to work, upgrading the BIOS as mentioned in this thread https://groups.google.com/forum/#!msg/qubes-users/Z6bEMZTjiz4/FbV6T-l_AQAJ did not seem to make a difference.

Rudd-O · 2016-10-23T19:59:38Z

The Logitech device issue is no longer a problem in modern kernels.

nothingmuch · 2016-10-24T06:35:27Z

I'm seeing this with no external USB devices connected.

marmarek · 2016-10-28T20:38:54Z

Probably well known memory fragmentation issue - PV VM with PCI device needs few megs of physically continuous memory for DMA purpose. You can free some by getting it away from dom0: xl mem-set 0 <some-memory-size-in-MB>, where the size is smaller than the current one (for example 500MB smaller). If it does not help, try shutdown some VMs. If still nothing, reboot...

Rudd-O · 2016-10-28T20:54:47Z

Let me try. But if this works, this really should be documented somewhere!

Rudd-O · 2016-10-28T20:55:54Z

Nope, it did not work at all.

marmarek · 2016-10-28T22:40:44Z

Ok, lets try harder: touch /var/run/qubes/do-not-membalance. Then try again xl mem-set and qvm-start. And if it doesn't work, repeat (just one more time).

Rudd-O · 2016-10-30T19:06:20Z

On 10/28/2016 10:40 PM, Marek Marczykowski-Górecki wrote:

Ok, lets try harder: |touch /var/run/qubes/do-not-membalance|. Then
try again |xl mem-set| and |qvm-start|. And if it doesn't work, repeat
(just one more time).

A reboot fixed it.

Rudd-O
http://rudd-o.com/

xloem · 2017-01-01T08:04:44Z

Same experience. I needed to to touch /var/run/qubes/do-not-membalance to get xl mem-set to do anything at all. I kept dropping the dom0 ram in 512MB increments, and qvm-start kept failing, until the system stopped responding. Then things worked after reboot.

Maybe some file to review to determine memory fragmentation, and where the VM memory is getting allocated, for next time? Or some way to determine what made the VM crash?

andrewdavidwong · 2021-05-08T07:59:14Z

This bug report has seen no activity in a very long time, and it is not assigned to any current release milestone. It looks like it was left open by mistake, so I'm closing it now. However, if anyone is still affected by this bug on a currently-supported release, please leave a comment, and we'll be happy to reopen this. Thank you.

brendanhoar · 2022-05-15T01:29:45Z

R4.0 kernel-latest=5.17.7 current-testing:

Just ran into the "failed to get contiguous memory for dma from xen" in sys-net-dm after shutting down all networking VMs and trying to start them again.

Several retries failed.

I saved all my work, shut down everything but dom0 and sys-net started w/o issue.

Pretty sure this happened one other time recently as well, can't prove it though.

Next time it happens I'll try the xl mem-set 0 (smaller size) approach.

B

xloem · 2022-05-15T01:40:06Z

I'm no longer using Qubes, but it looks like a workable next step here would be to take the effort to find what device file (and possibly kernel parameters) display the physical memory mapping of the system. Then this can be reviewed on next occurrence to verify that the instance is memory fragmentation, see if shrinking dom0 resolves it, and possibly discern a minimum contiguous block needed by the device.

It's likely possible to remap memory to resolve this confidently, but might need implementation by a dev.

andrewdavidwong · 2023-04-07T22:06:25Z

Just a reminder that, for bug reports, the milestone designates the earliest supported release in which the bug is known to exist, not when we plan to fix it.

Rudd-O · 2023-04-19T13:31:43Z

This is from 2015 and I have not been able to repro this since.

andrewdavidwong · 2023-04-19T14:20:11Z

This is from 2015 and I have not been able to repro this since.

Closing as "cannot reproduce" (we were unable to reproduce this issue). If anyone believes this is a mistake, or if anyone can reproduce the issue, please leave a comment, and we'll be happy to reopen this. Thank you.

marmarek added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: Xen labels Jan 6, 2016

marmarek added this to the Far in the future milestone Jan 6, 2016

schnurentwickler mentioned this issue Feb 19, 2018

sys-usb does not start anymore - libxl: no permission to assign #3608

Closed

andrewdavidwong closed this as completed May 8, 2021

andrewdavidwong reopened this May 15, 2022

andrewdavidwong added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels May 15, 2022

andrewdavidwong modified the milestones: Release TBD, Release 4.0 updates May 15, 2022

DemiMarie modified the milestones: Release 4.0 updates, Release TBD Mar 14, 2023

andrewdavidwong modified the milestones: Release TBD, Release 4.1 updates Apr 6, 2023

andrewdavidwong closed this as not planned Won't fix, can't repro, duplicate, stale Apr 19, 2023

andrewdavidwong added R: cannot reproduce Resolution: Attempts to replicate the problem have not been reliably successful enough to proceed. and removed S: blocked Status: blocked. Work on this issue is currently blocked. labels Apr 19, 2023

andrewdavidwong removed this from the Release 4.1 updates milestone Aug 25, 2023

fail to assign pci device at start of usbvm #1544

fail to assign pci device at start of usbvm #1544

Comments

Rudd-O commented Dec 26, 2015

Rudd-O commented Dec 26, 2015

Rudd-O commented Dec 26, 2015

marmarek commented Dec 26, 2015

Rudd-O commented Dec 26, 2015

Rudd-O commented Dec 26, 2015

marmarek commented Dec 26, 2015

Rudd-O commented Dec 26, 2015

Rudd-O commented Dec 26, 2015

Rudd-O commented Dec 26, 2015

marmarek commented Dec 26, 2015

Rudd-O commented Dec 26, 2015

Rudd-O commented Dec 26, 2015

Rudd-O commented Dec 26, 2015

marmarek commented Dec 26, 2015

Rudd-O commented Dec 26, 2015

marmarek commented Dec 26, 2015

Rudd-O commented Dec 26, 2015

marmarek commented Dec 26, 2015

Rudd-O commented Dec 26, 2015

marmarek commented Jan 6, 2016

Rudd-O commented Jan 16, 2016

andrewclausen commented Mar 14, 2016

Rudd-O commented Mar 15, 2016

marmarek commented Mar 15, 2016

nothingmuch commented Oct 23, 2016 • edited Loading

Rudd-O commented Oct 23, 2016

nothingmuch commented Oct 24, 2016

marmarek commented Oct 28, 2016

Rudd-O commented Oct 28, 2016

Rudd-O commented Oct 28, 2016

marmarek commented Oct 28, 2016

Rudd-O commented Oct 30, 2016

xloem commented Jan 1, 2017

andrewdavidwong commented May 8, 2021

brendanhoar commented May 15, 2022 • edited Loading

xloem commented May 15, 2022

andrewdavidwong commented Apr 7, 2023

Rudd-O commented Apr 19, 2023

andrewdavidwong commented Apr 19, 2023

nothingmuch commented Oct 23, 2016 •

edited

Loading

brendanhoar commented May 15, 2022 •

edited

Loading