When you've got a bunch of people running virtual machines on your hardware, there's a certain chance that you'll see contention for disk I/O. (Disks are slow. Everybody knows that.) Although you can't set hard limits and partitions as you can with the network QoS, you can use the ionice command to prioritize the different domains into subclasses, with a syntax like:
# ionice -p <PID> -c <class> -n <priority within class>
where -n ranges from 0 to 7, with lower numbers taking precedence. We recommend always specifying "2" for the class. Other classes exist -- 3 is idle and 1 is realtime -- but idle is extremely conservative, while 1 is so aggressive as to have a good chance of locking up the system.
Here we'll test ionice with two different domains, one with the highest "normal" priority, the other with the lowest.
First, ionice only works with the CFQ I/O scheduler. To check that you're using the CFQ scheduler, run this command in the dom0:
# cat /sys/block/[sh]d[a-z]*/queue/scheduler
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]
The word in brackets is the selected scheduler. If it's not [cfq], reboot with the parameter elevator=cfq.
Next we find the processes we want to ionice. Since I'm using tap:aio devices, the dom0 process is tapdisk. If I were using phy: devices, it'd be [xvd <domain id> <device specifier>] (You can see that phy: devices give you a bunch more information.)
# ps aux | grep tapdisk
root 1054 0.5 0.0 13588 556 ? Sl 05:45 0:10 tapdisk /dev/xen/tapctrlwrite1 /dev/xen/tapctrlread1
root 1172 0.6 0.0 13592 560 ? Sl 05:45 0:10 tapdisk /dev/xen/tapctrlwrite2 /dev/xen/tapctrlread2
Now we can ionice our domains. Note that the numbers correspond to the order the domains were started in, not the domain id.
# ionice -p 1054 -c 2 -n 7
# ionice -p 1172 -c 2 -n 0
To test ionice, I'll run a couple of bonnie++ processes, and time them. (After bonnie, i dd, just to make sure that conditions for the other domain remain unchanged.)
prio 7 domU tmp # /usr/bin/time -v bonnie++ -u 1 && dd if=/dev/urandom of=load
prio 0 domU tmp # /usr/bin/time -v bonnie++ -u 1 && dd if=/dev/urandom of=load
Results? Well, wall-clock-wise, the domU with prio 0 took 3:32.33 to finish, while the prio 7 domU needed 5:07.98. The bonnie++ results themselves were a bit confusing -- some stuff showed a great difference, others not so much. Try it for yourself.
(Of course, this is another thing to integrate into our increasingly baroque domain config files.)
# ionice -p <PID> -c <class> -n <priority within class>
where -n ranges from 0 to 7, with lower numbers taking precedence. We recommend always specifying "2" for the class. Other classes exist -- 3 is idle and 1 is realtime -- but idle is extremely conservative, while 1 is so aggressive as to have a good chance of locking up the system.
Here we'll test ionice with two different domains, one with the highest "normal" priority, the other with the lowest.
First, ionice only works with the CFQ I/O scheduler. To check that you're using the CFQ scheduler, run this command in the dom0:
# cat /sys/block/[sh]d[a-z]*/queue/scheduler
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]
The word in brackets is the selected scheduler. If it's not [cfq], reboot with the parameter elevator=cfq.
Next we find the processes we want to ionice. Since I'm using tap:aio devices, the dom0 process is tapdisk. If I were using phy: devices, it'd be [xvd <domain id> <device specifier>] (You can see that phy: devices give you a bunch more information.)
# ps aux | grep tapdisk
root 1054 0.5 0.0 13588 556 ? Sl 05:45 0:10 tapdisk /dev/xen/tapctrlwrite1 /dev/xen/tapctrlread1
root 1172 0.6 0.0 13592 560 ? Sl 05:45 0:10 tapdisk /dev/xen/tapctrlwrite2 /dev/xen/tapctrlread2
Now we can ionice our domains. Note that the numbers correspond to the order the domains were started in, not the domain id.
# ionice -p 1054 -c 2 -n 7
# ionice -p 1172 -c 2 -n 0
To test ionice, I'll run a couple of bonnie++ processes, and time them. (After bonnie, i dd, just to make sure that conditions for the other domain remain unchanged.)
prio 7 domU tmp # /usr/bin/time -v bonnie++ -u 1 && dd if=/dev/urandom of=load
prio 0 domU tmp # /usr/bin/time -v bonnie++ -u 1 && dd if=/dev/urandom of=load
Results? Well, wall-clock-wise, the domU with prio 0 took 3:32.33 to finish, while the prio 7 domU needed 5:07.98. The bonnie++ results themselves were a bit confusing -- some stuff showed a great difference, others not so much. Try it for yourself.
(Of course, this is another thing to integrate into our increasingly baroque domain config files.)
Hi, I'm playing around with doing exactly this. Do you know of any way to reliably map the disktap process to the VM? Using the order they were launched in makes me cringe.
Also, have you played around with tap:ram or tap:qcow? I'm going to give them a fiddle. Let me know if you're interested in what I turn up.
-Dylan