From Friday August 11th through Monday August 14th we upgraded our customer-hosting systems and patched the following XSAs:

All but XSA 230 were privilege escalation vulnerabilities. At least one was probably found during review of the previous round of XSAs, and XSA 230 was found during updates to XSA 226.

We were unable to apply live patching because of XSA-228. There was an addition of a lock to persistent data structures that would have had a number of outstanding references. Generally live patching only works if only code is changed or if any data structures changed don’t persist across hypercalls.

Furthermore, an earlier version of XSA-226 prevented live migration for some Linux VPSs though there were no details on which Linux versions were affected. We minimized downtime for new signups where we weren’t able to give sufficient downtime notice, but because live migration wasn’t safe they were subject to an extra reboot.

The average delay from the beginning of the window to a given VPS coming back up was 34 minutes. The longest delay was two and a half hours due to a low level software issue, while the shortest delay was 12 minutes.

When we shut down one of our servers, half of the drives were kicked from RAID on reboot and this significantly delayed us bringing the server back up. The issue appears to be related to mpt3sas. To prevent this issue from occurring on other similar systems, we manually shut down our RAID volumes before reboot.


init: Re-executing /sbin/init
[4780009.765708] EXT4-fs (md0): re-mounted. Opts: (null)
Please stand by while rebooting the system...
[4780010.980443] sd 8:0:2:0: [sdf] Synchronizing SCSI cache
[4780010.980788] sd 8:0:1:0: [sde] Synchronizing SCSI cache
[4780010.981089] sd 8:0:0:0: [sdd] Synchronizing SCSI cache
[4780010.981374] sd 7:0:2:0: [sdc] Synchronizing SCSI cache
[4780010.981614] sd 7:0:1:0: [sdb] Synchronizing SCSI cache
[4780010.981840] sd 7:0:0:0: [sda] Synchronizing SCSI cache
[4780010.982281] mpt3sas_cm0: sending message unit reset !!
[4780014.248128] drbd resource0: Discarding network configuration.
[4780014.248511] sd 8:0:2:0: [sdf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[4780014.248750] sd 8:0:2:0: [sdf] tag#0 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[4780014.248967] blk_update_request: I/O error, dev sdf, sector 20973584
[4780014.249123] md: super_written gets error=-5
[4780014.249231] md/raid10:md127: Disk failure on sdf2, disabling device.
[4780014.249231] md/raid10:md127: Operation continuing on 5 devices.
[4780014.249573] sd 8:0:1:0: [sde] tag#1 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[4780014.249800] sd 8:0:1:0: [sde] tag#1 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[4780014.250028] blk_update_request: I/O error, dev sde, sector 20973584
[4780014.250188] md: super_written gets error=-5
[4780014.250294] md/raid10:md127: Disk failure on sde2, disabling device.
[4780014.250294] md/raid10:md127: Operation continuing on 4 devices.
[4780014.250596] sd 8:0:0:0: [sdd] tag#2 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[4780014.250808] sd 8:0:0:0: [sdd] tag#2 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[4780014.251033] blk_update_request: I/O error, dev sdd, sector 20973584
[4780014.251201] md: super_written gets error=-5
[4780014.251307] md/raid10:md127: Disk failure on sdd2, disabling device.
[4780014.251307] md/raid10:md127: Operation continuing on 3 devices.
[4780014.253458] drbd resource0: Connection closed
[4780014.253816] drbd resource0: conn( Disconnecting -> StandAlone )
[4780014.254057] drbd resource0: receiver terminated
[4780014.254210] drbd resource0: Terminating drbd_r_resource
[4780034.718113] mpt3sas_cm0: _base_wait_for_doorbell_ack: failed due to timeout count(15000), int_status(c0000000)!
[4780034.718369] mpt3sas_cm0: message unit reset: FAILED
[4780034.718491] mpt3sas_cm0: sending diag reset !!
[4780035.690689] mpt3sas_cm0: diag reset: SUCCESS
[4780036.746631] mpt2sas_cm0: sending message unit reset !!
[4780036.747997] mpt2sas_cm0: message unit reset: SUCCESS
[4780037.019600] reboot: Restarting system