Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtio-queue: Error out early on invalid available ring index #196

Merged

Conversation

likebreath
Copy link
Contributor

@likebreath likebreath commented Sep 27, 2022

The number of descriptor chain heads to process should always be smaller or equal to the queue size, as the driver should never ask the VMM to process a available ring entry more than once. Checking and reporting such incorrect driver behavior can prevent potential hanging and Denial-of-Service from happening on the VMM side.

Signed-off-by: Bo Chen chen.bo@intel.com

Summary of the PR

Please summarize here why the changes in this PR are needed.

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

  • All commits in this PR are signed (with git commit -s), and the commit
    message has max 60 characters for the summary and max 75 characters for each
    description line.
  • All added/changed functionality has a corresponding unit/integration
    test.
  • Any newly added unsafe code is properly documented.

@likebreath
Copy link
Contributor Author

/cc @rbradford @sboeuf

@likebreath likebreath force-pushed the 0927/virt_queue/report_abnormal_index branch from d1a8982 to 0195667 Compare September 27, 2022 21:28
// be smaller or equal to the queue size, as the driver should
// never ask the VMM to process a available ring entry more than
// once. Checking and reporting such incorrect driver behavior
// can avoid potential hanging and Denial-of-Service from VMMs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a reproduction for this DoS? When does this happen?

When you call next on the returned object from this function we already check that it is less then size because we're doing a modulo:

.

According to the spec the driver is allowed to pass a number that is larger than QUEUE_SIZE: https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-380006
"idx field indicates where the driver would put the next descriptor entry in the ring (modulo the queue size)."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the logic in the if block is correct (because of wrapping) but the goal is to avoid iterating and processing over the same descriptors over and over again.

e.g. if the queue size was n and guest set the next descriptor to current + 1000 * n it would force the virtio device to consider each descriptor in the array 1000 times. This would effectively DoS the thread processing the queue especially if the computation on the descriptor is costly. This was found with fuzzing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This manifests itself by .next() continuing to return descriptor chains until next_avail and last_index are equal: https://github.com/rust-vmm/vm-virtio/blob/main/crates/virtio-queue/src/queue.rs#L743

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbradford @likebreath do you think you can work on a unit test that exemplifies this scenario? We can use it as a regression test for the fix. I am trying to do that myself, but I am still not very sure how this can happen. The guest can flood the VMM with requests nevertheless, so some sort of a rate limiter needs to be implemented at the VMM side nevertheless. If we can improve the situation in any case at the device level, that would be great as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guest can flood the VMM with requests nevertheless, so some sort of a rate limiter needs to be implemented at the VMM side nevertheless. If we can improve the situation in any case at the device level, that would be great as well.

A well behaved guest would never do this because they wouldn't want the same descriptors handled multiple times. I don't know if we could also use the used ring to identify this troublesome behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like QEMU has the same check in place: https://github.com/qemu/qemu/blob/master/hw/virtio/virtio.c#L966

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like QEMU has the same check in place: https://github.com/qemu/qemu/blob/master/hw/virtio/virtio.c#L966

Cool. Do you think this is the right place or should it be in the implementation of .next() ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't directly do idx - self.next_avail because that operation might underflow and cause a panic.

Both these are Wrapping<u16> so I think that is it safe.

Right, that is safe because it will return u16::MAX which is going to be > queue size irrespective of the queue size set by the driver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Do you think this is the right place or should it be in the implementation of .next() ?

I am checking to see if there is any other code path that could trigger this behavior outside of iter. It might make sense to have it in the constructor of AvailIter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @likebreath looks like the actions that you need to do on this PR are address the comment I made about the commit message/comment and add a unit tests to ensure that invalid and only invalid ranges generate an error.

.map(move |idx| AvailIter::new(mem, idx, self))
let idx = self.avail_idx(mem.deref(), Ordering::Acquire)?;

// The number of descriptor chain heads to process should always
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment or that in the commit message is quite right. The VMM isn't the part that can cause the DoS it's the guest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for letting me know the message was misleading. That was not what I intended to say.

Both the comments and commit message are updated. PTAL.

crates/virtio-queue/src/queue.rs Show resolved Hide resolved
// be smaller or equal to the queue size, as the driver should
// never ask the VMM to process a available ring entry more than
// once. Checking and reporting such incorrect driver behavior
// can avoid potential hanging and Denial-of-Service from VMMs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Do you think this is the right place or should it be in the implementation of .next() ?

I am checking to see if there is any other code path that could trigger this behavior outside of iter. It might make sense to have it in the constructor of AvailIter.

@likebreath likebreath force-pushed the 0927/virt_queue/report_abnormal_index branch from 0195667 to d37d46d Compare September 28, 2022 21:05
@likebreath
Copy link
Contributor Author

Thank you all the the feedbacks. All comments should be addressed. PTAL.

A summary of changes:

  1. Moved the check for invalid available ring index to the constructor of AvailIter;
  2. Clarified the commit message and comments;
  3. Added a related unit test;

@likebreath likebreath force-pushed the 0927/virt_queue/report_abnormal_index branch from d37d46d to 7391d16 Compare September 28, 2022 21:19
The number of descriptor chain heads to process should always be smaller
or equal to the queue size, as the driver should never ask the VMM to
process a available ring entry more than once. Checking and reporting
such incorrect driver behavior can prevent potential hanging and
Denial-of-Service from happening on the VMM side.

Signed-off-by: Bo Chen <chen.bo@intel.com>
This test ensures constructing a descriptor chain iterator succeeds with
valid available ring indexes while produces an error with invalid
indexes.

Signed-off-by: Bo Chen <chen.bo@intel.com>
@andreeaflorescu andreeaflorescu merged commit 74e516d into rust-vmm:main Sep 29, 2022
dianpopa added a commit to dianpopa/firecracker that referenced this pull request Sep 29, 2022
The number of descriptor chain heads to process should always be smaller
or equal to the queue size, as the driver should never ask the VMM to
process a available ring entry more than once. Checking and reporting
such incorrect driver behavior can prevent potential hanging and
Denial-of-Service from happening on the VMM side.

Issue reported in rust-vmm/vm-virtio and fixed in
rust-vmm/vm-virtio#196.

Signed-off-by: Diana Popa <dpopa@amazon.com>
Co-authored-by: Bo Chen <chen.bo@intel.com>
dianpopa added a commit to dianpopa/firecracker that referenced this pull request Oct 3, 2022
The number of descriptor chain heads to process should always be smaller
or equal to the queue size, as the driver should never ask the VMM to
process a available ring entry more than once. Checking and reporting
such incorrect driver behavior can prevent potential hanging and
Denial-of-Service from happening on the VMM side.

Issue reported in rust-vmm/vm-virtio and fixed in
rust-vmm/vm-virtio#196.

Signed-off-by: Diana Popa <dpopa@amazon.com>
Co-authored-by: Bo Chen <chen.bo@intel.com>
dianpopa added a commit to dianpopa/firecracker that referenced this pull request Oct 3, 2022
The number of descriptor chain heads to process should always be smaller
or equal to the queue size, as the driver should never ask the VMM to
process a available ring entry more than once. Checking and reporting
such incorrect driver behavior can prevent potential hanging and
Denial-of-Service from happening on the VMM side.

Issue reported in rust-vmm/vm-virtio and fixed in
rust-vmm/vm-virtio#196.

Signed-off-by: Diana Popa <dpopa@amazon.com>
Co-authored-by: Bo Chen <chen.bo@intel.com>
dianpopa added a commit to dianpopa/firecracker that referenced this pull request Oct 3, 2022
The number of descriptor chain heads to process should always be smaller
or equal to the queue size, as the driver should never ask the VMM to
process a available ring entry more than once. Checking and reporting
such incorrect driver behavior can prevent potential hanging and
Denial-of-Service from happening on the VMM side.

Issue reported in rust-vmm/vm-virtio and fixed in
rust-vmm/vm-virtio#196.

Signed-off-by: Diana Popa <dpopa@amazon.com>
Co-authored-by: Bo Chen <chen.bo@intel.com>
dianpopa added a commit to dianpopa/firecracker that referenced this pull request Oct 3, 2022
The number of descriptor chain heads to process should always be smaller
or equal to the queue size, as the driver should never ask the VMM to
process a available ring entry more than once. Checking and reporting
such incorrect driver behavior can prevent potential hanging and
Denial-of-Service from happening on the VMM side.

Issue reported in rust-vmm/vm-virtio and fixed in
rust-vmm/vm-virtio#196.

Signed-off-by: Diana Popa <dpopa@amazon.com>
Co-authored-by: Bo Chen <chen.bo@intel.com>
dianpopa added a commit to dianpopa/firecracker that referenced this pull request Oct 4, 2022
The number of descriptor chain heads to process should always be smaller
or equal to the queue size, as the driver should never ask the VMM to
process a available ring entry more than once. Checking and reporting
such incorrect driver behavior can prevent potential hanging and
Denial-of-Service from happening on the VMM side.

Issue reported in rust-vmm/vm-virtio and fixed in
rust-vmm/vm-virtio#196.

Signed-off-by: Diana Popa <dpopa@amazon.com>
Co-authored-by: Bo Chen <chen.bo@intel.com>
dianpopa added a commit to firecracker-microvm/firecracker that referenced this pull request Oct 4, 2022
The number of descriptor chain heads to process should always be smaller
or equal to the queue size, as the driver should never ask the VMM to
process a available ring entry more than once. Checking and reporting
such incorrect driver behavior can prevent potential hanging and
Denial-of-Service from happening on the VMM side.

Issue reported in rust-vmm/vm-virtio and fixed in
rust-vmm/vm-virtio#196.

Signed-off-by: Diana Popa <dpopa@amazon.com>
Co-authored-by: Bo Chen <chen.bo@intel.com>
dianpopa added a commit to dianpopa/firecracker that referenced this pull request Oct 4, 2022
The number of descriptor chain heads to process should always be smaller
or equal to the queue size, as the driver should never ask the VMM to
process a available ring entry more than once. Checking and reporting
such incorrect driver behavior can prevent potential hanging and
Denial-of-Service from happening on the VMM side.

Issue reported in rust-vmm/vm-virtio and fixed in
rust-vmm/vm-virtio#196.

Signed-off-by: Diana Popa <dpopa@amazon.com>
Co-authored-by: Bo Chen <chen.bo@intel.com>
dianpopa added a commit to dianpopa/firecracker that referenced this pull request Oct 4, 2022
The number of descriptor chain heads to process should always be smaller
or equal to the queue size, as the driver should never ask the VMM to
process a available ring entry more than once. Checking and reporting
such incorrect driver behavior can prevent potential hanging and
Denial-of-Service from happening on the VMM side.

Issue reported in rust-vmm/vm-virtio and fixed in
rust-vmm/vm-virtio#196.

Signed-off-by: Diana Popa <dpopa@amazon.com>
Co-authored-by: Bo Chen <chen.bo@intel.com>
dianpopa added a commit to firecracker-microvm/firecracker that referenced this pull request Oct 5, 2022
The number of descriptor chain heads to process should always be smaller
or equal to the queue size, as the driver should never ask the VMM to
process a available ring entry more than once. Checking and reporting
such incorrect driver behavior can prevent potential hanging and
Denial-of-Service from happening on the VMM side.

Issue reported in rust-vmm/vm-virtio and fixed in
rust-vmm/vm-virtio#196.

Signed-off-by: Diana Popa <dpopa@amazon.com>
Co-authored-by: Bo Chen <chen.bo@intel.com>
dianpopa added a commit to firecracker-microvm/firecracker that referenced this pull request Oct 5, 2022
The number of descriptor chain heads to process should always be smaller
or equal to the queue size, as the driver should never ask the VMM to
process a available ring entry more than once. Checking and reporting
such incorrect driver behavior can prevent potential hanging and
Denial-of-Service from happening on the VMM side.

Issue reported in rust-vmm/vm-virtio and fixed in
rust-vmm/vm-virtio#196.

Signed-off-by: Diana Popa <dpopa@amazon.com>
Co-authored-by: Bo Chen <chen.bo@intel.com>
@andreeaflorescu andreeaflorescu mentioned this pull request Feb 16, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants