Linux KVM set-up
Ubuntu:
Fedora / RHEL:
openSUSE:
zypper
install -y -t pattern kvm_server
zypper install -y spice-client virt-utils
systemctl start libvirtd.service
systemctl enable libvirtd.service
Fedora and SUSE (some versions): may need to add
user to policy kit.
virsh
virt-manager
When creating virtual machines:
- Select spice/QXL as video + add channel = yes for clipboard
- TAP-style network connection (plugs at MAC level): macvtap,
virtio or rtl, Bridge
- TUN-style network connection (plugs at IP level):
- NAT network
- routed network (aka public bridge)
- isolated network
- Note: NFS requires routed network (public bridge) or
isolated network to connect local VM to the host
virsh --connect qemu:///system
virsh# list --all
virt-top
Remote connection (assuming listener is enabled):
virt-viewer
--connect qemu+ssh://user@host/system
domain
virt-viewer --connect qemu+tcp://host/system
domain
To pass alt-right-click, need to disable it locally:
ccsm -> General
-> General Options -> Key bindings -> change Window Menu to <Super>Button3
Disk image utilities:
To install:
Ubuntu:
apt-get
update
apt-get install libguestfs-tools
update-guestfs-appliance
RHEL/Fedora:
dnf install
-y libguestfs guestfish
libguestfs-tools libguestfs-mount libguestfs-winsupport
openSUSE:
zypper
install -y guestfs-tools
Utilities:
virt-filesystems
--domain vmname --all -l
virt-filesystems --add disk.img --all -l
virt-inspector --domain vmname
virt-inspector --add disk.img
guestmount [--ro] --add disk.img [-m /dev/sda1] /mnt
guestfish
virt-df --add q.img
virt-dm --domain <vmname>
virt-ls
virt-cat
virt-diff
virt-edit
virt-log
virt-make-fs
virt-rescue
virt-resize
virt-sparsify
virt-tar-in
virt-tar-out
virt-win-reg
Attaching PCI device to a
virtual machine:
- Enable VT-d or AMD-Vi in BIOS
- boot with intel_iommu=on
iommu=pt
"pt" limits IOMMU device
remapping to KVM only (not host kernel itself)
- to make it permanent:edit /etc/default/grub
- GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on
iommu=pt"
- Ubuntu: update-grub
- Fedora: grub2-mkconfig -o /boot/grub2/grub.cfg
- sync; reboot
- dmesg | grep -iE "dmar|iommu|VT-d|AMD-Vi"
- lspci -tv
lspci -D
virsh nodedev-list
--cap pci
virsh nodedev-dumpxml
pci_0000_04_00_0
virsh nodedev-dumpxml
pci_0000_04_00_1
lspci [-s
04:00.0] -k
lspci [-s
04:00.1] -k
- add device in virt-manager
if the card has multiple physical functions, then all of the functions
must
be added
that's it: virt-manager
will unbind the device from host driver automatically when VM starts,
and rebind it back to the host driver when VM exits
- example of created entries:
<devices>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
</hostdev>
</devices>
<address>
is
optional:
<devices>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</source>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x04' slot='0x00' function='0x1'/>
</source>
</hostdev>
</devices>
Note: some devices can be
attached to the VM only as a group, collectively.
They
cannot be assigned some to the VM and some to the host, albeit it is
possible to assign only some to the VM, and detach the rest from the
host.
These are devices that belong to the same IOMMU group.
Typically, PFs of the card belong to the same single IOMMU
group, and VFs each have their own IOMMU group.
To check which devices belong to the same IOMMU group, execute
ls
/sys/bus/pci/devices/0000:04:00.0/iommu_group/devices
The output will be
something like
0000:04:00.0
0000:04:00.1
To assign only one to the
guest, the unused endpoint should be detached from the host before
stating the guest:
virsh
nodedev-detach pci_0000_04_00_1
To reattach it back to the
host:
virsh
nodedev-reattach pci_0000_04_00_1
Attaching SR-IOV device
to a virtual machine:
- before doing anything else, make network interface names
predictable/consistent, or they are going to jump arround and create a
mess
Ubuntu (depending on
version):
here
and
here
RHEL/Fedora:
here
(ch. 8)
/etc/sysconfig/network-scripts/ifcfg-xxx, specify DEVICE
and HWADDR
BIOS naming:
Often, BIOS device naming
is best:
Install (or
build)
package
biosdevname.
Verify that BIOS provides enumeration data.
One of the following is required, and a set
must cover
all of the devices,
otherwise BIOS naming is not possible:
dmidecode
-t 41 // on-board devices
dmidecode -t 9 // slot devices
biosdecode
// interrupt routing table
SMBIOS 2.6 or later is required.
If
biosdevname -i xxx
cannot display a name, BIOS naming won't work.
BIOS naming of SR-IOV functions located on the motherboard (as opposed
to slots) is not supported.
If BIOS naming is supported:
- Delete (backing up a copy)
/etc/udev/rules.d/70-persistent-net.rules
UDEV way:
Edit NAME fields in
/etc/udev/rules.d/70-persistent-net.rules
To regenerate
/etc/udev/rules.d/70-persistent-net.rules
from scratch:
rm
/etc/udev/rules.d/70-persistent-net.rules
#for each interface
export INTERFACE=eth1
export MATCHADDR=`ip addr show $INTERFACE | grep ether | awk '{print
$2}'`
/lib/udev/write_net_rules the file
# now edit NAME fields in /etc/udev/rules.d/70-persistent-net.rules
Reboot
Change references to old interface names with their new names (e.g.
eth1 ->
em1).
To change existing virtual machine definitions, can use
virsh edit <vmid>.
- d the number of possible virtual functions (VFs)
use lspci -vv
or
cat
/sys/bus/pci/devices/0000:04:00.0/sriov_totalvfs
cat /sys/bus/pci/devices/0000:04:00.1/sriov_totalvfs
currently created number
of virtual functions can be seen with lspci or via
cat
/sys/bus/pci/devices/0000:04:00.0/sriov_numvfs
cat /sys/bus/pci/devices/0000:04:00.1/sriov_numvfs
- create device functions (example for igb Intel 82576
driver):
modprobe
-rv igb
modprobe igb max_vfs=7
- if dmesg says " SR-IOV: bus number out of range", this is a
BIOS bug;
to work around it, let Linux enumerate the buses (and to ignore
firmware enum), by adding boot parameter
pci=assign-busses
make sure
CONFIG_PCI_IOV=y
CONFIG_PCI_REALLOC_ENABLE_AUTO=y
add pci=assign-busses
to /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_iommu=on iommu=pt
pci=assign-busses"
and rebuild grub.cfg
sync; reboot
- to make the number of VFs permanent
- create or edit /etc/modprobe.d/igb.conf (or
local.conf etc.)
options
igb max_vfs=7
- sync; reboot (not just yet!!)
- however, specifically for igb this alone is not enough
since igb module is loaded before the root fs is mounted, therefore
also need to update initramfs:
Ubuntu:
update-initramfs
-u -k `uname -r` [-v]
Fedora: dracut
--force /boot/initramfs-`uname -r`.img `uname -r`
sync; reboot
- disable auto-DHCP on created virtual interfaces
however
links should be enabled on PFs, otherwise physical port's resources may
be disabled and corresponding VFs may show up non-functional
(no
carrier, random MAC) inside the VM
On Ubuntu:
Edit
/etc/network/interfaces like this:
#
interfaces(5) file used by ifup(8) and ifdown(8)
auto lo
iface lo inet loopback
auto p4p1
iface p4p1 inet manual
up ifconfig $IFACE 0.0.0.0 up
up sysctl -w net.ipv6.conf.$IFACE.disable_ipv6=1
up ip link set $IFACE up
down ifconfig $IFACE down
auto p4p2
iface p4p2 inet manual
up ifconfig $IFACE 0.0.0.0 up
up sysctl -w net.ipv6.conf.$IFACE.disable_ipv6=1
up ip link set $IFACE up
down ifconfig $IFACE down
iface p4p1_0 inet manual
iface p4p1_1 inet manual
iface p4p1_2 inet manual
iface p4p1_3 inet manual
iface p4p1_4 inet manual
iface p4p1_5 inet manual
iface p4p1_6 inet manual
iface p4p2_0 inet manual
iface p4p2_1 inet manual
iface p4p2_2 inet manual
iface p4p2_3 inet manual
iface p4p2_4 inet manual
iface p4p2_5 inet manual
iface p4p2_6 inet manual
May also want to disable these interfaces in Ubuntu NetworkManager GUI
("Automatically connect to this network" = off)
or in /etc/NetworkManager/system-connections. See
https://wiki.debian.org/NetworkConfiguration.
Reboot
On RHEL/Fedora/SUSE:
Edit
/etc/sysconfig/network-scripts/ifcfg-
xxx
DEVICE=xxx
BOOTPROTO=none
Reboot
Make sure (
ifconfig -a)
that all interfaces show up and do not have DHCP-assigned IPv4 or IPv6
addresses
- it is possible to simply attach a particular predefined VF
to a VM via virt-manager
(with <hostdev>)
this will cause random mac address to be assigned to the interface
inside the VM
and also does not allow to specify <vlan>
and <virtualport>
properties (however vlan can be configured inside the VM)
if this is ok, use this route, otherwise read on ...
- to assign particular VFs (with fixed ids) to a VM, run:
export
EDITOR=`which vs`
virsh edit
<vmid>
and change
hostdev
descriptors ->
interface
type='hostdev'
<devices>
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x2'/>
</source>
<mac address='52:54:00:6d:90:02' />
<driver name='vfio' />
</interface>
<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x3'/>
</source>
<mac address='52:54:00:6d:90:03' />
<driver name='vfio' />
</interface>
</devices>
<mac>
elements are optional.
Other optional tags are
<vlan>
and
<virtualport>.
<driver>
elements are optional, but desirable for performance, or if
<mac>
need to be specified, and in other cases (see
19.1.6).
It
defaults
to
vfio
when VFIO is available, to
kvm
otherwise.
To generate unique mac address: use
http://www.marsching.com/2009/mac-address-generator
or use
this
script.
To find relationship of PCI addresses to network interfaces:
lspci
systool -c net -v | grep
--color=never -E "Device =|address"
lshw -C network -businfo
run netif-enum.py
- alternatively, to automatically assign SRIOV VFs
from a pool of VFs
create passthrough
network definition file referencing the interface
providing a pool (e.g. sriov-p4p1.xml)
<network>
<name>sriov-p4p1</name>
<forward mode='hostdev' managed='yes'>
<pf dev='p4p1'/>
</forward>
</network>
define passthrough network:
virsh
net-define sriov-p4p1.xml
virsh net-autostart sriov-p4p1
virsh net-start sriov-p4p1
execute virsh edit <vmid>
and add a section to VM definition under <devices>
tag
<devices>
<interface type='network'>
<source network='sriov-p4p1' />
<mac address='52:54:a8:2e:18:c8' />
</interface>
</devices>
after starting the first
guest, execute virsh
net-dumpxml sriov-p4p1 – it should display approximately
the following:
<network connections='1'>
<name>sriov-p4p1</name>
<uuid>a6b49429-d353-d7ad-3185-4451cc786437</uuid>
<forward mode='hostdev' managed='yes'>
<pf dev='p4p1'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x0' function='0x1'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x0' function='0x3'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x0' function='0x5'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x0' function='0x7'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x1' function='0x1'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x1' function='0x3'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x1' function='0x5'/>
</forward>
</network>
- alternatively, can also attach SR-IOV VF without loosing a
migration capability with mactvap
in passthrough
mode;
however that would be virtio or emulated device, not SR-IOV PCI
passthrough
use virt-manager GUI or:
<devices>
<interface type='direct'>
<source dev='p4p1_6' mode='passthrough' />
<mac address='52:54:00:6d:90:02' />
</interface>
</devices>
Set up fixed DHCP addresses in libvirt:
virsh net-edit
<netname>
e.g. virsh net-edit default
Add elements:
<network>
<name>default</name>
<uuid>5c0448c7-4240-4325-9a80-eb9f575c962e</uuid>
<forward mode='nat'/>
<bridge name='virbr0' stp='on' delay='0' />
<mac address='52:54:00:BE:58:83'/>
<ip address='192.168.122.1' netmask='255.255.255.0'>
<dhcp>
<range start='192.168.122.100' end='192.168.122.254' />
<host mac='52:54:00:c9:06:c5' name='foo' ip='192.168.122.10' />
<host mac='52:54:00:c9:06:c6' name='bar' ip='192.168.122.11' />
</dhcp>
</ip>
</network>
May also need to restart libvirtd and dnsmasq processes:
# caution!
# stopping libvirt-bin will kill all running VMs
stop libvirt-bin
killall -9 dnsmasq
start libvirt-bin
AppArmor:
# disable apparmor (Ubuntu
pre-systemd)
sudo invoke-rc.d
apparmor kill [or stop]
sudo update-rc.d -f
apparmor remove
# enable apparmor
(Ubuntu pre-systemd)
sudo invoke-rc.d
apparmor start
sudo update-rc.d
apparmor start 37 S .
# reload all
profiles (Ubuntu pre-systemd)
sudo invoke-rc.d
apparmor reload
/etc/apparmor.d/libvirt/libvirt-<guid>
/etc/apparmor.d/libvirt/libvirt-<guid>.files
# if
profile does not exist:
export
VM=foo
virsh
dumpxml $VM | sudo /usr/lib/libvirt/virt-aa-helper -c -u libvirt-`virsh
domuuid $VM`
#
if profile already exists:
export
VM=foo
virsh
dumpxml $VM | sudo /usr/lib/libvirt/virt-aa-helper -r -u libvirt-`virsh
domuuid $VM`
/var/log/libvirt/...
/var/run/livirt/...
/var/lib/libvirt/...
For example:
libvirt-e83aba02-6a56-e4b7-5f4f-f64f4171aa98
#
# This profile is for the domain whose UUID matches this file.
#
#include <tunables/global>
profile libvirt-e83aba02-6a56-e4b7-5f4f-f64f4171aa98 {
#include <abstractions/libvirt-qemu>
#include <libvirt/libvirt-e83aba02-6a56-e4b7-5f4f-f64f4171aa98.files>
}
libvirt-bfa46efa-96d0-e063-05bb-1ecbffb19216.files
#
DO NOT EDIT THIS FILE DIRECTLY. IT IS MANAGED BY LIBVIRT.
"/var/log/libvirt/**/w7a.log" w,
"/var/lib/libvirt/**/w7a.monitor" rw,
"/var/run/libvirt/**/w7a.pid" rwk,
"/run/libvirt/**/w7a.pid" rwk,
"/var/run/libvirt/**/*.tunnelmigrate.dest.w7a" rw,
"/run/libvirt/**/*.tunnelmigrate.dest.w7a" rw,
"/home/sergey/vms/w7a/disk1" rw,
"/home/sergey/iso/en_windows_7_ultimate_with_sp1_x64_dvd_u_677332.iso"
r,
# don't
audit writes to readonly files
deny
"/home/sergey/iso/en_windows_7_ultimate_with_sp1_x64_dvd_u_677332.iso"
w,