March 5th the Xen Project announced 10 new security advisories. The previous Saturday we live-patched the ones that applied to us.
Of these, XSA-293 is probably the most interesting. Rather than a denial of service or privilege escalation between guests, it is an in-guest privilege escalation which applies to 64-bit paravirtualized guests only.
The x86 architecture has different modes of operation it goes through during the boot process, somewhat similar to a developing organism that goes through stages resembling earlier evolutionary steps. There’s 16 bit “real” mode, 32 bit “protected” mode, and 64 bit “long” mode.
x86 also has the concept of memory segments that were originally used to access more memory than could be directly addressed, but later included information about permissions. Memory segments are selected using memory segment registers.
The 286 had four 16-bit segment registers: the code segment (CS), data segment (DS), stack segment (SS), and the extra segment (ES) used for string operations. These memory segment registers can be used either implicitly or explicitly.
The 386 architecture added the FS and GS segment registers. The FS and GS segment registers are used only when explicitly referenced by an instruction. These FS/GS registers are typically used for thread-local storage.
In real mode, the memory segment register contents are directly used to modify the address of a memory reference before it’s sent to hardware. In protected mode, the register contents are “segment selectors” that index into a table containing segment descriptors that have the permissions and size information. In long mode, the “base address” - the offset applied to a memory reference - is ignored for all the segment registers except for the FS and GS registers.
While the FS/GS registers were originally 16 bits, for 64-bit processors there are “shadow” FS/GS registers - fsbase and gsbase - which are 64 bits. They can be accessed by using the model-specific register CPU instructions “wrmsr” and “rdmsr”. The wrmsr and rdmsr are privileged instructions and can’t be used from user space. On Linux, user programs may perform a system call arch_prctl with ARCH_SET_GS or ARCH_SET_FS to set these registers.
On newer processors, it’s possible for user space to set these registers directly by using instructions specifically added to read and write the FS/GS registers. This will be faster for the user process than performing a system call.
The operating system must set a flag FSGSBASE in a CPU control register to allow these new instructions. The operating system should not set this bit without also handling saving and restoring these registers across context switches. Currently, Linux does not do this though it looks as if support may be merged this year.
The XSA is that for paravirtualized virtual machines, where Xen is responsible for managing the control registers for the guest, Xen always left the FSGSBASE bit on. It did not show this bit as on when the guest requested to know the control register state and it ignored writes to it. This meant that user-space processes could write to these FS/GS registers directly even though the guest kernel running in the virtual machine was probably not saving and restoring the register state across context switches. The fix is to start honoring and managing this FSGSBASE setting for each virtual machine.
This vulnerability does not exist for hardware virtualized machines (HVM) systems as Xen is not responsible in the same way for managing the control register state for them.