mediate disk access with ionice(1).

| 1 Comment
When you've got a bunch of people running virtual machines on your hardware, there's a certain chance that you'll see contention for disk I/O.  (Disks are slow.  Everybody knows that.) Although you can't set hard limits and partitions as you can with the network QoS, you can use the ionice command to prioritize the different domains into subclasses, with a syntax like:

# ionice -p <PID> -c <class> -n <priority within class>

where -n ranges from 0 to 7, with lower numbers taking precedence.  We recommend always specifying "2" for the class.  Other classes exist -- 3 is idle and 1 is realtime -- but idle is extremely conservative, while 1 is so aggressive as to have a good chance of locking up the system.

Here we'll test ionice with two different domains, one with the highest "normal" priority, the other with the lowest.

First, ionice only works with the CFQ I/O scheduler.  To check that you're using the CFQ scheduler, run this command in the dom0:

# cat /sys/block/[sh]d[a-z]*/queue/scheduler
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]

The word in brackets is the selected scheduler.  If it's not [cfq], reboot with the parameter elevator=cfq.

Next we find the processes we want to ionice.  Since I'm using tap:aio devices, the dom0 process is tapdisk.  If I were using phy: devices, it'd be [xvd <domain id> <device specifier>]  (You can see that phy: devices give you a bunch more information.)

# ps aux | grep tapdisk
root      1054   0.5   0.0   13588   556   ?   Sl   05:45   0:10   tapdisk /dev/xen/tapctrlwrite1 /dev/xen/tapctrlread1
root      1172   0.6   0.0   13592   560   ?   Sl   05:45   0:10   tapdisk /dev/xen/tapctrlwrite2 /dev/xen/tapctrlread2

Now we can ionice our domains.  Note that the numbers correspond to the order the domains were started in, not the domain id.

# ionice -p 1054 -c 2 -n 7
# ionice -p 1172 -c 2 -n 0

To test ionice, I'll run a couple of bonnie++ processes, and time them.  (After bonnie, i dd, just to make sure that conditions for the other domain remain unchanged.)

prio 7 domU tmp # /usr/bin/time -v  bonnie++  -u 1 && dd if=/dev/urandom of=load
prio 0 domU tmp # /usr/bin/time -v  bonnie++  -u 1 && dd if=/dev/urandom of=load

Results?  Well, wall-clock-wise, the domU with prio 0 took 3:32.33 to finish, while the prio 7 domU needed 5:07.98.  The bonnie++ results themselves were a bit confusing -- some stuff showed a great difference, others not so much.  Try it for yourself.

(Of course, this is another thing to integrate into our increasingly baroque domain config files.)

1 Comment

Hi, I'm playing around with doing exactly this. Do you know of any way to reliably map the disktap process to the VM? Using the order they were launched in makes me cringe.

Also, have you played around with tap:ram or tap:qcow? I'm going to give them a fiddle. Let me know if you're interested in what I turn up.


Leave a comment

About this Entry

This page contains a single entry by chris t published on June 18, 2008 5:59 AM.

the clouds will be a daisy chain. was the previous entry in this blog.

try to avoid looking like a moron. is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.