Commit graph

138 commits

Author SHA1 Message Date
Changyuan Lyu
1a93102f9f build(deps): bump mio from 0.8.11 to 1
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-14 22:43:49 -07:00
dependabot[bot]
a2dd2936bf ci: Bump actions/checkout from 4.1.6 to 4.1.7
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.1.6 to 4.1.7.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](a5ac7e51b4...692973e3d9)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-06-14 13:21:44 -07:00
Changyuan Lyu
84387e0a53 refactor(kvm): move check_extension() to VmInner
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-09 10:11:22 -07:00
Changyuan Lyu
f10a1367e2 feat(cli): track error sources with snafu (2/n)
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-09 10:11:22 -07:00
Changyuan Lyu
089a7a2e67 feat(hv)!: track error sources with snafu (1/n)
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-09 10:11:22 -07:00
Changyuan Lyu
71de77a793 perf(kvm): remove unneeded entries in GSI table
Some checks failed
Rust / build_test (push) Has been cancelled
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-04 23:11:35 -07:00
Changyuan Lyu
7fe508c17f refactor(kvm): move shared data to a single Arc
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-04 23:11:35 -07:00
Changyuan Lyu
8fa22300de feat(kvm): log the GSI routing table for debugging
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-04 23:11:35 -07:00
Changyuan Lyu
4367fa5ec7 fix(fs): deregister the backend channel upon reset
Some checks failed
Rust / build_test (push) Has been cancelled
Fixed: 359ab26fb2 ("feat(fs): support virtio-fs DAX mapping")

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-02 21:49:14 -07:00
Changyuan Lyu
9fa390dcc2 fix(virtio): reset MSI-X entries upon device reset
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-02 21:44:01 -07:00
Changyuan Lyu
b36af492fb fix(virtio): prevent MSI-X change if irqfd linked
If the MSI-X table entry 0 is mapped to queue 0, and queue 0 is
offloaded to a vhost backend, and later the guest maps table entry
1 to queue 0, we will need to find out the iqrfd of table entry 1,
and send a message to the vhost backend to update the irqfd. This is
much more complicated than just updating table entry 0, which only
involves updating the GSI table at the VMM side.

For now we simply prevent such changes if a queue is already mapped
to an MSI-X entry that is linked to an irqfd.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-02 21:38:43 -07:00
Changyuan Lyu
d1baa6e0ce fix: reset memory and devices before shutdown
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-02 21:23:18 -07:00
Changyuan Lyu
359ab26fb2 feat(fs): support virtio-fs DAX mapping
Some checks are pending
Rust / build_test (push) Waiting to run
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-02 13:44:47 -07:00
Changyuan Lyu
dc0c8c6271 fix(mem): add private anonymous memory back
Private anonymous memory is useful when a device does not want other
devices to access its memory.

Fixes: a536818653 ("feat(mem)!: create anonymous mem with memfd_create")

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-02 13:44:47 -07:00
Changyuan Lyu
0005413ff2 feat(vsock): add the vsock device flag
Some checks are pending
Rust / build_test (push) Waiting to run
usage: --vsock vhost,cid=$CID,dev=/dev/vhost-vsock

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-01 15:23:33 -07:00
Changyuan Lyu
d9f42d9bfe feat(vsock): front end for in-kernel vhost-vsock
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-01 15:23:33 -07:00
Changyuan Lyu
d7b639efc9 feat(vhost): add bindings for in-kernel vhost dev
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-01 15:23:33 -07:00
Changyuan Lyu
7525f4563c fix(virtio)!: enable ACCESS_PLATFORM for CoCo only
The bit VIRTIO_F_ACCESS_PLATFORM is a little bit confusing:

- when a Linux guest driver sees this bit, it uses DMA API for virtq
  transactions.
  - For a confidential VM, the DMA layer will use swiotlb to copy
    data from/to the shared memory.
  - In the non-CoCo case, I observed 5% performance regression of the
    virtio-net throughput.
- when an in-kernel vhost device sees this bit, addresses in the
  virtq are viewed as io virtual addresses and it expects the
  userspace VMM to setup the IOTLB.

We do not have an emulated IOMMU, so we should not set
ACCESS_PLATFORM on backend devices. On the other hand, in the CoCo
case, we must advertise this bit to the guest.

So for now, the solution is, we let the middle layer turn on
ACCESS_PLATFORM if necessary but never activate the device backend
with this bit. We will need to find a better solution in the future.

Fixes: 5ad2ea658c ("feat(virtio): enable ACCESS_PLATFORM bits")

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-06-01 15:23:33 -07:00
Changyuan Lyu
7d2b04aa51 feat(fs): add the virtio-fs device flag
Some checks are pending
Rust / build_test (push) Waiting to run
A host directory can be shared with the guest by
1. virtiofsd[1] flag: `--shared-dir /path/to/dir --socket-path /tmp/virtiofsd`
2. Alioth flag: `--fs vu,socket=/tmp/virtiofsd,tag=host-dir`

[1]: https://gitlab.com/virtio-fs/virtiofsd

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-31 15:55:34 -07:00
Changyuan Lyu
679fce382f feat(fs): front end of a vhost-user virtio-fs dev
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-31 15:55:34 -07:00
Changyuan Lyu
597162d281 feat(virtio): add bindings for vhost-user backends
Ref: https://qemu-project.gitlab.io/qemu/interop/vhost-user.html

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-31 15:55:34 -07:00
Changyuan Lyu
cb13f5cd6a feat(virtio)!: create MSI irqfds for dev queues
Two new methods are added to the trait `IrqSender` for converting a
data entry in the msix table into an irqfd upon device requests.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-31 15:55:34 -07:00
Changyuan Lyu
634d139f78 feat(pci)!: use irqfds in the msix table
When a guest writes to the msix table, it can either just update the
table data (which will be accessed when KVM_SIGNAL_MSI is called),
or trigger an update of the GSI routing table of KVM, so that later
writes to the irqfd can inject the correct MSI.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-31 15:55:34 -07:00
Changyuan Lyu
ea079b9294 feat(kvm)!: create irqfd and the GSI routing table
An irqfd can be shared with the vfio module or a vhost-user backend
for injecting interrupts directly based on the pre-defined GSI
routing table.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-31 15:55:34 -07:00
Changyuan Lyu
5bd521ba40 feat(kvm): add bindings for KVM_SET_GSI_ROUTING
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-31 15:55:34 -07:00
Changyuan Lyu
a536818653 feat(mem)!: create anonymous mem with memfd_create
The fd from memfd_create() enables Alioth to share the guest memory
with another process by sending the fd to the target process, which
is necessary for supporting vhost-user backends.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-31 15:55:34 -07:00
Changyuan Lyu
fd7e56b7b5 fix: reset PCI devices before resetting memory
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-31 15:55:34 -07:00
Changyuan Lyu
0dc74e164a style: work around the borrow_deref_ref warnings
Some checks are pending
Rust / build_test (push) Waiting to run
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-31 14:47:39 -07:00
Changyuan Lyu
faff93b726 style: remove needless borrow
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-31 14:47:39 -07:00
Changyuan Lyu
0f410d78ac style: remove unneeded return statement
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-31 14:47:39 -07:00
Changyuan Lyu
60fedf3858 feat(virtio)!: associated type Feature for Virtio
Some checks are pending
Rust / build_test (push) Waiting to run
Add the associated type `Feature` to the trait `Virtio` so that the
feature bits can be pretty printed universally, benefited from the
crate bitflags.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-30 22:26:59 -07:00
Changyuan Lyu
e2f11df347 fix(virtio): devices return the full feature sets
Previously, each device only returns the device specific features
and VirtioDevice::new() adds the general virtio feature bits.
This causes troubles for supporting vhost-user backends, which may
implement a different set of general virtio feature bits.

This commit lets individual devices return the full feature sets.
The general virtio feature bits implemented by the module virtio
is moved to `FEATURE_BUILT_IN`, which implicitly turns on the
`EVENT_IDX` for the entropy and block device.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-30 22:26:59 -07:00
Changyuan Lyu
8c9494c2f5 perf(virtio)!: use KVM_IOEVENTFD for queue notify
KVM_IOEVENTFD avoids the VM exits of VCPU threads from kernel space
to user space.

Further we use a non-zero notify_off_multiplier [1] in virtio device
configs. By just looking at the MMIO address we are able to tell
which queue is sending the notification. The value written to the
MMIO address is not needed. Thus the instruction decoding in the KVM
is avoided.

Test setup:

- Host CPU: AMD Ryzen 9 5950X
- VM: memory size 1G, 1 VCPU

virtio-net thought put by iperf3:

- VM -> host
  - without KVM_IOEVENTFD: 30.6 Gbits/sec
  - with KVM_IOEVENTFD: 33.5 Gbits/sec

- Host -> VM
  - without KVM_IOEVENTFD: 19.5 Gbits/sec
  - with KVM_IOEVENTFD: 25.4 Gbits/sec

[1] Virtio Spec 1.2, Sec 4.1.4.4.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-26 11:13:09 -07:00
Changyuan Lyu
6272ff5a5e feat(kvm)!: add bindings for KVM_IOEVENTFD
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-26 11:13:09 -07:00
Changyuan Lyu
7ca26a8781 feat(sev): boot Oak/Stage0 with Linux bzImage
Add kernel, initramfs, and cmdline to the FwCfg device when a
firmware image is provided at the same time. This enables Alioth to
boot SEV enabled guests with a compressed bzImage file.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-25 22:03:34 -07:00
Changyuan Lyu
3c241aa63e chore: bump version to 0.2.0
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-24 13:56:06 -07:00
Changyuan Lyu
82e2167873 fix(net): set dev feature bits based on the tap
TUN_F_USO4/TUN_F_USO6 were added in Linux 6.2.

There is no easy way to query the supported features from the tap
device, so similar to QEMU (tap_fd_set_offload() in net/tap-linux.c),
we try tun_set_offload() until success.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-24 13:46:01 -07:00
Changyuan Lyu
3571e91452 fix(kvm): use old KVM_GET_SREGS/KVM_SET_SREGS
For now we do not need the new features of
KVM_GET_SREGS2/KVM_SET_SREGS2. Use the old ioctls for better
compatibility.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-24 13:46:01 -07:00
Changyuan Lyu
17f33e6b68 feat(kvm)!: allow specifying char dev file paths
This enables Alioth to work in environments where the devtmpfs
is not mounted at /dev.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-24 13:46:01 -07:00
Changyuan Lyu
f88f290ab9 docs(sev): boot AMD-SEV guests with Oak/Stage0
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-22 23:10:49 -07:00
Changyuan Lyu
5ad2ea658c feat(virtio): enable ACCESS_PLATFORM bits
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-22 23:10:49 -07:00
Changyuan Lyu
04f01b350e feat(sev): add a flag for launching SEV guests
With all the preparation, SEV guests are ready to go,

* SEV guests: --coco sev,policy=0x1
* SEV-ES guests: --coco sev,policy=0x5

We still need to make virtio devices work with SEV guests.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-22 23:10:49 -07:00
Changyuan Lyu
f968fcb0a8 feat(sev): set up AP registers for SEV-ES guests
This includes

* parse the firmware blob to get the AP EIP value
* set up AP registers based on the parsed EIP
* call sev_launch_update_vmsa before booting CPUs

Ref:
[1] QEMU hw/i386/pc_sysfw_ovmf.c
[2] QEMU docs/specs/sev-guest-firmware.rst
[3] https://github.com/project-oak/oak snp_measurement/src/stage0.rs

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-22 23:10:49 -07:00
Changyuan Lyu
709829beb7 feat(sev): set up CPUID bits for SEV guests
For now PhysAddrReduction and CbitPosition is hardcoded to 1 and 51,
which is good for milan CPUs.

Ref: AMD64 Architecture Programmer's Manual Vol. 3, section E.4.17.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-22 23:10:49 -07:00
Changyuan Lyu
d14a30be31 feat(sev): call SEV launch OPs in BSP thread
This includes

* sev_launch_start,
* sev_launch_update_data (called in firmware setup)
* sev_launch_measure,
* sev_launch_finish.

Ref:
[1] QEMU target/i386/sev.c
[2] AMD Secure Encrypted Virtualization API 0.24, 1.3 Guest Lifecycle

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-22 23:10:49 -07:00
Changyuan Lyu
c8d9fb0833 feat(sev): update the firmware bytes with AMD PSP
This allows the guest to see the correct firmware blob instead
of some random bytes.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-22 23:10:49 -07:00
Changyuan Lyu
9458fce313 feat(sev): register fw and RAM as encrypted
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-22 23:10:49 -07:00
Changyuan Lyu
34135d3c43 feat(sev): add wrappers for SEV-related KVM ops
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-22 23:10:49 -07:00
Changyuan Lyu
1dd5849d30 feat(sev)!: initialize SEV for confidential guest
This includes opening the sev char device file and issuing
KVM_SEV_INIT or KVM_SEV_ES_INIT command.

Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-22 23:10:49 -07:00
Changyuan Lyu
ac106e00ff feat(sev): add AMD-SEV related bindings
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
2024-05-22 23:10:49 -07:00