No description
Find a file
Chirantan Ekbote 448516e3f9 balloon: Implement device policy
Implement a policy for the balloon device so that it starts taking
memory away from the VM when the system is under low memory conditions.
There are a few pieces here:

* Change the madvise call in MemoryMapping::dont_need_range to use
  MADV_REMOVE instead of MADV_DONTNEED.  The latter does nothing when
  the memory mapping is shared across multiple processes while the
  former immediately gives the pages in the specified range back to the
  kernel.  Subsequent accesses to memory in that range returns zero
  pages.
* Change the protocol between the balloon device process and the main
  crosvm process.  Previously, the device process expected the main
  process to send it increments in the amount of memory consumed by the
  balloon device.  Now, it instead just expects the absolute value of
  the memory that should be consumed.  To properly implement the policy
  the main process needs to keep track of the total memory consumed by
  the balloon device so this makes it easier to handle all the policy in
  one place.
* Add a policy for dealing with low memory situations.  When the VM
  starts up, we determine the maximum amount of memory that the balloon
  device should consume:

    * If the VM has more than 1.5GB of memory, the balloon device max is
      the size of the VM memory minus 1GB.
    * Otherwise, if the VM has at least 500MB, the balloon device max is
      50% of the size of the VM memory.
    * Otherwise, the max is 0.

  The increment used to change the size of the balloon is defined as
  1/16 of the max memory that the balloon device will consume.  When the
  crosvm main process detects that the system is low on memory, it
  immediately increases the balloon size by the increment (unless it has
  already reached the max).  It then starts 2 timers: one to check for
  low memory conditions again in 1 seconds (+ jitter) and another to
  check if the system is no longer low on memory in 1 minute (+ jitter)
  with a subsequent interval of 30 seconds (+ jitter).

  Under persistent low memory conditions the balloon device will consume
  the maximum memory after 16 seconds.  Once there is enough available
  memory the balloon size will shrink back down to 0 after at most 9
  minutes.

BUG=chromium:866193
TEST=manual
Start 2 VMs and write out a large file (size > system RAM) in each.
Observe /sys/kernel/mm/chromeos-low_mem/available and see that the
available memory steadily decreases until it goes under the low memory
margin at which point the available memory bounces back up as crosvm
frees up pages.
CQ-DEPEND=CL:1152214

Change-Id: I2046729683aa081c9d7ed039d902ad11737c1d52
Signed-off-by: Chirantan Ekbote <chirantan@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/1149155
Reviewed-by: Sonny Rao <sonnyrao@chromium.org>
2018-07-27 15:29:07 -07:00
aarch64 mptable: Add ability to allocate pci interrupts 2018-07-23 21:05:03 -07:00
arch mptable: Add ability to allocate pci interrupts 2018-07-23 21:05:03 -07:00
crosvm_plugin plugin: allow retrieving and setting VCPU events 2018-07-11 18:48:50 -07:00
data_model data_model: add offset, copy_to_volatile_slice, Copy to VolatileSlice 2018-07-20 05:30:56 -07:00
devices balloon: Implement device policy 2018-07-27 15:29:07 -07:00
fuzz Add kernel_loader fuzzing 2018-01-12 22:37:48 -08:00
gpu_buffer gpu_buffer: fix reading and writing to GPU buffers 2018-07-25 00:14:24 -07:00
gpu_display gpu_display: provides wayland based output for virtio-gpu 2018-07-09 15:48:21 -07:00
gpu_renderer gpu_renderer: add virglrenderer bindings 2018-07-20 05:30:54 -07:00
io_jail io_jail: fix missing null terminator for close_fds test 2018-05-11 23:22:19 -07:00
kernel_cmdline crosvm: move kernel_cmdline to it's own crate 2018-02-02 23:53:42 -08:00
kernel_loader kernel_loader: implement error trait 2018-02-27 22:26:08 -08:00
kvm plugin: allow retrieving and setting VCPU events 2018-07-11 18:48:50 -07:00
kvm_sys kvm: fix definition of KVM_SET_XCRS ioctl 2018-05-16 05:08:21 -07:00
net_sys net_util: add tap support for mac address 2018-02-21 01:06:42 -08:00
net_util net: Allow passing in a configured tap fd on the command line 2018-06-12 00:36:27 -07:00
p9 p9: Fix file and directory creation mode 2018-06-27 22:07:22 -07:00
plugin_proto plugin: allow retrieving and setting VCPU events 2018-07-11 18:48:50 -07:00
qcow qcow: Set refcounts for initial clusters. 2018-07-16 03:42:07 -07:00
qcow_utils qcow: Set refcounts for initial clusters. 2018-07-16 03:42:07 -07:00
resources Move gpu allocator to resources 2018-07-09 17:59:23 -07:00
seccomp balloon: Implement device policy 2018-07-27 15:29:07 -07:00
src balloon: Implement device policy 2018-07-27 15:29:07 -07:00
sys_util balloon: Implement device policy 2018-07-27 15:29:07 -07:00
syscall_defines fix armv7a and aarch64 build errors and warnings 2017-09-01 12:39:18 -07:00
tests plugin: allow retrieving and setting VCPU events 2018-07-11 18:48:50 -07:00
vhost hw/virtio/vhost: Add simple tests backed by fakes 2018-02-02 16:32:12 -08:00
virtio_sys Implement virtio-vsock 2017-09-18 16:48:43 -07:00
vm_control Move gpu allocator to resources 2018-07-09 17:59:23 -07:00
x86_64 mptable: Add ability to allocate pci interrupts 2018-07-23 21:05:03 -07:00
.gitignore gitignore: Remove Cargo.lock 2017-06-17 01:12:44 -07:00
build_test add build_test script to automate crosvm test running 2017-09-01 12:39:19 -07:00
build_test.py Remove the device manager and use the new resource allocator 2018-06-29 17:50:17 -07:00
Cargo.lock gpu: implement virtio-gpu 2018-07-20 05:30:54 -07:00
Cargo.toml balloon: Implement device policy 2018-07-27 15:29:07 -07:00
LICENSE add LICENSE and README 2017-04-17 14:06:21 -07:00
README.md README: use /run paths 2017-10-23 18:22:24 -07:00

crosvm - The Chrome OS Virtual Machine Monitor

This component, known as crosvm, runs untrusted operating systems along with virtualized devices. No actual hardware is emulated. This only runs VMs through the Linux's KVM interface. What makes crosvm unique is a focus on safety within the programming language and a sandbox around the virtual devices to protect the kernel from attack in case of an exploit in the devices.

Usage

To see the usage information for your version of crosvm, run crosvm or crosvm run --help.

Boot a Kernel

To run a very basic VM with just a kernel and default devices:

$ crosvm run "${KERNEL_PATH}"

The uncompressed kernel image, also known as vmlinux, can be found in your kernel build directory in the case of x86 at arch/x86/boot/compressed/vmlinux.

Rootfs

In most cases, you will want to give the VM a virtual block device to use as a root file system:

$ crosvm run -r "${ROOT_IMAGE}" "${KERNEL_PATH}"

The root image must be a path to a disk image formatted in a way that the kernel can read. Typically this is a squashfs image made with mksquashfs or an ext4 image made with mkfs.ext4. By using the -r argument, the kernel is automatically told to use that image as the root, and therefore can only be given once. More disks can be given with -d or --rwdisk if a writable disk is desired.

To run crosvm with a writable rootfs:

WARNING: Writable disks are at risk of corruption by a malicious or malfunctioning guest OS.

crosvm run --rwdisk "${ROOT_IMAGE}" -p "root=/dev/vda" vmlinux

NOTE: If more disks arguments are added prior to the desired rootfs image, the root=/dev/vda must be adjusted to the appropriate letter.

Control Socket

If the control socket was enabled with -s, the main process can be controlled while crosvm is running. To tell crosvm to stop and exit, for example:

NOTE: If the socket path given is for a directory, a socket name underneath that path will be generated based on crosvm's PID.

$ crosvm run -s /run/crosvm.sock ${USUAL_CROSVM_ARGS}
    <in another shell>
$ crosvm stop /run/crosvm.sock

WARNING: The guest OS will not be notified or gracefully shutdown.

This will cause the original crosvm process to exit in an orderly fashion, allowing it to clean up any OS resources that might have stuck around if crosvm were terminated early.

Multiprocess Mode

By default crosvm runs in multiprocess mode. Each device that supports running inside of a sandbox will run in a jailed child process of crosvm. The appropriate minijail seccomp policy files must be present either in /usr/share/policy/crosvm or in the path specified by the --seccomp-policy-dir argument. The sandbox can be disabled for testing with the '--disable-sandbox` option.

Virtio Wayland

Virtio Wayland support requires special support on the part of the guest and as such is unlikely to work out of the box unless you are using a Chrome OS kernel along with a termina rootfs.

To use it, ensure that the XDG_RUNTIME_DIR enviroment variable is set and that the path $XDG_RUNTIME_DIR/wayland-0 points to the socket of the Wayland compositor you would like the guest to use.

Defaults

The following are crosvm's default arguments and how to override them.

  • 256MB of memory (set with -m)
  • 1 virtual CPU (set with -c)
  • no block devices (set with -r, -d, or --rwdisk)
  • no network (set with --host_ip, --netmask, and --mac)
  • virtio wayland support if XDG_RUNTIME_DIR enviroment variable is set (disable with --no-wl)
  • only the kernel arguments necessary to run with the supported devices (add more with -p)
  • run in single process mode (run in multiprocess mode with -u)
  • no control socket (set with -s)

System Requirements

A Linux kernel with KVM support (check for /dev/kvm) is required to run crosvm. In order to run certain devices, there are additional system requirements:

  • virtio-wayland - The memfd_create syscall, introduced in Linux 3.17, and a Wayland compositor.
  • vsock - Host Linux kernel with vhost-vsock support, introduced in Linux 4.8.
  • multiprocess - Host Linux kernel with seccomp-bpf and Linux namespaceing support.
  • virtio-net - Host Linux kernel with TUN/TAP support (check for /dev/net/tun) and running with CAP_NET_ADMIN privileges.

Emulated Devices

Device Description
CMOS/RTC Used to get the current calendar time.
i8042 Used by the guest kernel to exit crosvm.
serial x86 I/O port driven serial devices that print to stdout and take input from stdin.
virtio-block Basic read/write block device.
virtio-net Device to interface the host and guest networks.
virtio-rng Entropy source used to seed guest OS's entropy pool.
virtio-vsock Enabled VSOCKs for the guests.
virtio-wayland Allowed guest to use host Wayland socket.

Contributing

Code Health

build_test

There are no automated tests run before code is committed to crosvm. In order to maintain sanity, please execute build_test before submitting code for review. All tests should be passing or ignored and there should be no compiler warnings or errors. All supported architectures are built, but only tests for x86_64 are run. In order to build everything without failures, sysroots must be supplied for each architecture. See build_test -h for more information.

rustfmt

New code should be run with rustfmt, but not all currently checked in code has already been autoformatted. If running rustfmt causes a lot of churn for a file, do not check in lines unrelated to your change.

Dependencies

With a few exceptions, external dependencies inside of the Cargo.toml files are not allowed. The reason being that community made crates tend to explode the binary size by including dozens of transitive dependencies. All these dependencies also must be reviewed to ensure their suitability to the crosvm project. Currently allowed crates are:

  • byteorder - A very small library used for endian swaps.
  • gcc - Build time dependency needed to build C source code used in crosvm.
  • libc - Required to use the standard library, this crate is a simple wrapper around libc's symbols.

Code Overview

The crosvm source code is written in Rust and C. To build, crosvm requires rustc v1.20 or later.

Source code is organized into crates, each with their own unit tests. These crates are:

  • crosvm - The top-level binary front-end for using crosvm.
  • devices - Virtual devices exposed to the guest OS.
  • io_jail - Creates jailed process using libminijail.
  • kernel_loader - Loads elf64 kernel files to a slice of memory.
  • kvm_sys - Low-level (mostly) auto-generated structures and constants for using KVM.
  • kvm - Unsafe, low-level wrapper code for using kvm_sys.
  • net_sys - Low-level (mostly) auto-generated structures and constants for creating TUN/TAP devices.
  • net_util - Wrapper for creating TUN/TAP devices.
  • sys_util - Mostly safe wrappers for small system facilities such as eventfd or syslog.
  • syscall_defines - Lists of syscall numbers in each architecture used to make syscalls not supported in libc.
  • vhost - Wrappers for creating vhost based devices.
  • virtio_sys - Low-level (mostly) auto-generated structures and constants for interfacing with kernel vhost support.
  • vm_control - IPC for the VM.
  • x86_64 - Support code specific to 64 bit intel machines.

The seccomp folder contains minijail seccomp policy files for each sandboxed device. Because some syscalls vary by architecturs, the seccomp policies are split by architecture.