If the MSI-X table entry 0 is mapped to queue 0, and queue 0 is
offloaded to a vhost backend, and later the guest maps table entry
1 to queue 0, we will need to find out the iqrfd of table entry 1,
and send a message to the vhost backend to update the irqfd. This is
much more complicated than just updating table entry 0, which only
involves updating the GSI table at the VMM side.
For now we simply prevent such changes if a queue is already mapped
to an MSI-X entry that is linked to an irqfd.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Private anonymous memory is useful when a device does not want other
devices to access its memory.
Fixes: a536818653 ("feat(mem)!: create anonymous mem with memfd_create")
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
The bit VIRTIO_F_ACCESS_PLATFORM is a little bit confusing:
- when a Linux guest driver sees this bit, it uses DMA API for virtq
transactions.
- For a confidential VM, the DMA layer will use swiotlb to copy
data from/to the shared memory.
- In the non-CoCo case, I observed 5% performance regression of the
virtio-net throughput.
- when an in-kernel vhost device sees this bit, addresses in the
virtq are viewed as io virtual addresses and it expects the
userspace VMM to setup the IOTLB.
We do not have an emulated IOMMU, so we should not set
ACCESS_PLATFORM on backend devices. On the other hand, in the CoCo
case, we must advertise this bit to the guest.
So for now, the solution is, we let the middle layer turn on
ACCESS_PLATFORM if necessary but never activate the device backend
with this bit. We will need to find a better solution in the future.
Fixes: 5ad2ea658c ("feat(virtio): enable ACCESS_PLATFORM bits")
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
A host directory can be shared with the guest by
1. virtiofsd[1] flag: `--shared-dir /path/to/dir --socket-path /tmp/virtiofsd`
2. Alioth flag: `--fs vu,socket=/tmp/virtiofsd,tag=host-dir`
[1]: https://gitlab.com/virtio-fs/virtiofsd
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Two new methods are added to the trait `IrqSender` for converting a
data entry in the msix table into an irqfd upon device requests.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
When a guest writes to the msix table, it can either just update the
table data (which will be accessed when KVM_SIGNAL_MSI is called),
or trigger an update of the GSI routing table of KVM, so that later
writes to the irqfd can inject the correct MSI.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
An irqfd can be shared with the vfio module or a vhost-user backend
for injecting interrupts directly based on the pre-defined GSI
routing table.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
The fd from memfd_create() enables Alioth to share the guest memory
with another process by sending the fd to the target process, which
is necessary for supporting vhost-user backends.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Add the associated type `Feature` to the trait `Virtio` so that the
feature bits can be pretty printed universally, benefited from the
crate bitflags.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Previously, each device only returns the device specific features
and VirtioDevice::new() adds the general virtio feature bits.
This causes troubles for supporting vhost-user backends, which may
implement a different set of general virtio feature bits.
This commit lets individual devices return the full feature sets.
The general virtio feature bits implemented by the module virtio
is moved to `FEATURE_BUILT_IN`, which implicitly turns on the
`EVENT_IDX` for the entropy and block device.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
KVM_IOEVENTFD avoids the VM exits of VCPU threads from kernel space
to user space.
Further we use a non-zero notify_off_multiplier [1] in virtio device
configs. By just looking at the MMIO address we are able to tell
which queue is sending the notification. The value written to the
MMIO address is not needed. Thus the instruction decoding in the KVM
is avoided.
Test setup:
- Host CPU: AMD Ryzen 9 5950X
- VM: memory size 1G, 1 VCPU
virtio-net thought put by iperf3:
- VM -> host
- without KVM_IOEVENTFD: 30.6 Gbits/sec
- with KVM_IOEVENTFD: 33.5 Gbits/sec
- Host -> VM
- without KVM_IOEVENTFD: 19.5 Gbits/sec
- with KVM_IOEVENTFD: 25.4 Gbits/sec
[1] Virtio Spec 1.2, Sec 4.1.4.4.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Add kernel, initramfs, and cmdline to the FwCfg device when a
firmware image is provided at the same time. This enables Alioth to
boot SEV enabled guests with a compressed bzImage file.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
TUN_F_USO4/TUN_F_USO6 were added in Linux 6.2.
There is no easy way to query the supported features from the tap
device, so similar to QEMU (tap_fd_set_offload() in net/tap-linux.c),
we try tun_set_offload() until success.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
For now we do not need the new features of
KVM_GET_SREGS2/KVM_SET_SREGS2. Use the old ioctls for better
compatibility.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
With all the preparation, SEV guests are ready to go,
* SEV guests: --coco sev,policy=0x1
* SEV-ES guests: --coco sev,policy=0x5
We still need to make virtio devices work with SEV guests.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
This includes
* parse the firmware blob to get the AP EIP value
* set up AP registers based on the parsed EIP
* call sev_launch_update_vmsa before booting CPUs
Ref:
[1] QEMU hw/i386/pc_sysfw_ovmf.c
[2] QEMU docs/specs/sev-guest-firmware.rst
[3] https://github.com/project-oak/oak snp_measurement/src/stage0.rs
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
For now PhysAddrReduction and CbitPosition is hardcoded to 1 and 51,
which is good for milan CPUs.
Ref: AMD64 Architecture Programmer's Manual Vol. 3, section E.4.17.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
This includes opening the sev char device file and issuing
KVM_SEV_INIT or KVM_SEV_ES_INIT command.
Signed-off-by: Changyuan Lyu <changyuanl@google.com>