devices: virtio: ignore driver interrupt suppression

The arcvm guest has been observed to fail early in boot with a low
probability; the root cause seems to be a missing interrupt from the
virtio-block device causing the guest to wait forever for a notification
that does not happen.

In testing, it seems that ignoring the VIRTQ_AVAIL_F_NO_INTERRUPT flag
set by the guest avoids the issue; possibly there is a race in the
publishing of used descriptors versus the checking of this flag, but I
was not able to find a definitive solution.

Ignoring the flag means that we may occasionally trigger an interrupt
when the guest has indicated that one is not necessary; this is, at
worst, a slight performance degradation, not a correctness issue, since
the guest kernel will ignore spurious interrupts.

BUG=b:172975852
TEST=Restart arcvm in a loop without failures 1000+ iterations

Change-Id: I4c91ccc5dadc03dde43008ff20fae1c4592e19da
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2648009
Tested-by: Daniel Verkamp <dverkamp@chromium.org>
Commit-Queue: Daniel Verkamp <dverkamp@chromium.org>
Reviewed-by: Zach Reizner <zachr@chromium.org>
Reviewed-by: Dylan Reid <dgreid@chromium.org>
This commit is contained in:
Daniel Verkamp 2021-01-20 16:25:05 -08:00 committed by Commit Bot
parent 486cffc41d
commit 66f975eec7

View file

@ -21,6 +21,7 @@ const VIRTQ_DESC_F_WRITE: u16 = 0x2;
const VIRTQ_DESC_F_INDIRECT: u16 = 0x4;
const VIRTQ_USED_F_NO_NOTIFY: u16 = 0x1;
#[allow(dead_code)]
const VIRTQ_AVAIL_F_NO_INTERRUPT: u16 = 0x1;
/// An iterator over a single descriptor chain. Not to be confused with AvailIter,
@ -356,6 +357,7 @@ impl Queue {
// Query the value of a single-bit flag in the available ring.
//
// Returns `true` if `flag` is currently set (by the driver) in the available ring flags.
#[allow(dead_code)]
fn get_avail_flag(&self, mem: &GuestMemory, flag: u16) -> bool {
let avail_flags: u16 = mem.read_obj_from_addr(self.avail_ring).unwrap();
avail_flags & flag == flag
@ -518,7 +520,13 @@ impl Queue {
// so no need to inject new interrupt.
self.next_used - used_event - Wrapping(1) < self.next_used - self.last_used
} else {
!self.get_avail_flag(mem, VIRTQ_AVAIL_F_NO_INTERRUPT)
// TODO(b/172975852): This branch should check the flag that requests interrupt
// supression:
// ```
// !self.get_avail_flag(mem, VIRTQ_AVAIL_F_NO_INTERRUPT)
// ```
// Re-enable the flag check once the missing interrupt issue is debugged.
true
}
}