standards based, low-cost commodity hosting services for
the technically adept. At prgmr.com, we don't assume
you're stupid.
===Tuning CPU usage===
Some of these servers, of course, require more resources than others.
Memory and disk size are easy to tune, (just specify them in the
config file, although disk bandwidth can complicate things,) but
allocating CPU requires you to adjust the scheduler.
One way to allocate CPU usage is to tie CPU usage to RAM size -- a
customer with 25% of the RAM also has a minimum share of 25% of the
CPU cycles.
Xen treats this more as a target than as an inflexible dictum. The
scheduling isn't perfect by any stretch of the imagination. In
particular, cycles spent servicing I/O by domain 0 are not charged to
the domain responsible, leading to situations where I/O intensive
clients get a disproportionate share of CPU usage. Nonetheless, you
can get pretty good allocation in non-pathological cases, and we give
some suggestions for dealing with deliberate attacks at the end of
this chapter. (Also, in our experience, the CPU sits idle most of the
time anyway.)
===Scheduler basics===
So let's talk about the scheduler. The scheduler acts as a referee
between the running domains. In some ways it's a lot like the
Linux scheduler: it can pre-empt processes as needed, it tries its
best to ensure fair allocation, and it ensures that the CPU wastes as
few cycles as possible. It schedules domUs to run on the physical
CPU; these domUs in turn schedule and run processes from their
internal queues.
Xen can use a variety of scheduling algorithms, ranging from the
simple to the baroque. Although Xen's shipped with a number of
schedulers in the past, we're going to concentrate on the credit
scheduler; it's the current default and recommended choice, and the
only one in which the Xen team has indicated any interest in keeping around.
The xm dmesg command will tell you, among other things, what scheduler
Xen is using.
# xm dmesg | grep scheduler
(XEN) Using scheduler: SMP Credit Scheduler (credit)
If you want to change the scheduler, you can set it as a boot
parameter -- append sched=sedf to the kernel line in GRUB. (That's
the Xen kernel, not the dom0 loaded by the first "module" line.)
====VCPUs and physical CPUs====
For convenience, we consider each Xen domain to have one or more
virtual CPUs (VCPUs,) which periodically run on the physical CPUs.
These are the entities that consume credits when run. To examine
vcpus, use "xm vcpu-list [domain]":
# xm vcpu-list cent1
Name ID VCPUs CPU State Time(s) CPU Affinity
cent1 16 0 0 --- 140005.6 any cpu
cent1 16 1 2 r-- 139968.3 any cpu
In this case, the domain has two VCPUs, 0 and 1. VCPU 1 is in the
"running" state on (physical) CPU 1. Note that Xen will try to spread
VCPUs across CPUs as much as possible. Unless you've pinned them
manually, VPUs can occasionally switch CPUs.
You can also change the number of VCPUs while a domain is running
using xm vcpu-set. However, that much like ram allocation, you can
decrease the number of vcpus but you can't increase the number of
vcpus beyond the initial count.
To set the CPU affinity, use xm vcpu-pin
. For
example, to switch the CPU assignment in the domain cent1:
# xm vcpu-pin cent1 0 2
# xm vcpu-pin cent1 1 0
Equivalently, you can pin VCPUs in the xm config file like this:
vcpus=2
cpus=[0,2]
This gives the domain 2 vcpus, pins the first vcpu to the first
physical cpu, and pins the second vcpu to the third physical cpu.
====Credit Scheduler====
The Xen team designed the credit scheduler to minimize wasted CPU
time. This makes it a "work-conserving" scheduler, in that it tries
to ensure that the CPU will always be working, whenever there is work
for it to do.
As a consequence, if there is more real CPU available than the domUs
are demanding, all domUs get all the CPU they want. When there is
contention -- that is, when the domUs in aggregate want more CPU than
actually exists -- then the scheduler arbitrates fairly between the
domains that want CPU.
The credit scheduler assigns each domain a /weight/, and optionally
a /cap/. The weight indicates the relative CPU allocation of a
domain -- if the CPU is scarce, a domain with a weight of 512 will
receive twice as much CPU time as a domain with a weight of 256 (the
default.) The cap sets an absolute limit on the amount of time a
domain can receive, expressed in hundredths of a CPU (note that this
number can exceed 100 on multiprocessor hosts.)
The scheduler transforms the weight into a /credit/ allocation for
each VCPU, using a separate accounting thread. As a VCPU runs, it
consumes credits. Once the VCPU runs out of credits, it only runs
when other, more thrify VCPUs have finished executing. Periodically,
the accounting thread goes through and gives everybody more credits.
In this case, the details are probably less important than the
practical application. For example, here we'll increase a domain's
CPU allocation. First, to list the weight and cap for a domain:
# xm sched-credit -d domain
{'cap': 0, 'weight': 256}
Then, to modify it:
# xm sched-credit -d domain -w 512
# xm sched-credit -d domain
{'cap': 0, 'weight': 512}
Of course the value "512" only has meaning relative to the other
domains running on the machine. Make sure to set all the domains'
weights appropriately.
To set the cap for a domain:
# xm sched-credit -d domain -c cap
====scheduling as a provider====
We decided to divide the CPU along the same lines as the available RAM
-- it stands to reason that a user paying for half the RAM in a box
will want more CPU than someone with a 64 MB domain.
The simple way to do this is to assign each CPU a weight equal to the
number of megabytes of memory it has, and leave the cap empty. The
scheduler will then handle converting that into fair proportions -- so
that our aforementioned user with half the ram will get about as much
CPU time as the rest of the users put together.
Of course, that's worst case; that is what the user will get in an
environment of constant struggle for the CPU. If all domains but one
are idle, that one can have the entire CPU to itself.
Prgmr.com. We don't assume you are stupid.