• PSA: Please update jenkins

    Tue, 24 Nov 2015 15:20:00 -0800 - Sarah Newman

    If you are running jenkins, please update it to the latest version and check your logs (including your mail log) for any suspicious activity. A zero-day in jenkins was patched on November 11. If you believe you have been compromised and want help reinstalling, please contact support.

  • Recent Xen Security Advisories

    Thu, 29 Oct 2015 10:30:00 -0700 - Sarah Newman

    Update 16:01 PDT: In case it wasn’t clear, we are on the pre-disclosure list and all of the reboots to apply patches to public facing machines happened before public disclosure.

    Here is a rundown of our vulnerability and response for the xen security advisories released today:

    • xsa-145 - Arm only, not affected
    • xsa-146 - Arm only, not affected
    • xsa-147 - Arm only, not affected
    • xsa-148 - This was patched in affected public facing systems. This is high impact - “malicious PV guest administrators can escalate privilege so as to control the whole system.” While the minimum vulnerable version is specified as xen 3.4, I still reviewed the commit specified as being the source of the vulnerability in the attached patches, the patches supplied with the original XSA (which do not apply to Xen 3.4), and additional patches later sent out by someone against 3.4 to verify that the remainder of our systems were not affected.
    • xsa-149 - Not vulnerable because we use xl.
    • xsa-150 - All publicly facing systems with HVM guests have been patched, though the only such systems are test systems wholly used by prgmr.com.
    • xsa-151 - Patched in the affected public facing systems. This is a denial of service attack that we might have caught due to a job that runs each night to find, shutdown, and notify us of rebooting guests.
    • xsa-152 - Patched in the affected subset of the systems also patched for xsa-148 and xsa-151, not patched in the remainder of the systems. Since the result is a denial of service attack and not a privilege escalation, we will address this as needed, as patching it would have led to loss of service as well. Systems that exploit this vulnerability will be apparent from the log messages.
    • xsa-153 - Not vulnerable because we do not use HVM guests, and if we did we still still not be vulnerable - we would not use memory populate-on-demand as we do not oversubscribe ram.
  • Maintenance for prgmr.com services Tues. 27th 22:00 -0700

    Mon, 26 Oct 2015 11:18:00 -0700 - Sarah Newman

    The following services will be taken down for maintenance Tuesday the 27th at 22:00 PDT -0700:

    • billing.prgmr.com
    • prgmr.com
    • mirror.prgmr.com
    • Our ticketing system

    I’m allowing 3 hours for the maintenance to complete.

  • High inbound packet loss for lefanu

    Sat, 24 Oct 2015 10:58:00 -0700 - Sarah Newman

    Update 16:34 -0700: The reason why this happened is that someone, when setting up the nagios monitoring, used default values from some example. It turns out we were not paging until we got to 60% packet loss. Warn at 2% loss and page at 6% loss is way more reasonable. Luke set the nagios thresholds to page when the packet loss exceeds the new, lower thresholds.

    Update 13:25 -0700: ipv6 was disrupted at the time we switched ports. ipv6 connectivity should now be restored.

    Update: The immediate problem is fixed. Luke changed the physical ports on both sides of the connection and it appears that there’s a problem with the original port in use on lefanu. While there might be a hardware problem, that’s not 100% clear. We haven’t decided what to do about it long term. We’ll put some effort today into figuring out how to tweak our monitoring such that we get paged for a problem like this.

    - Lefanu is experiencing high inbound packet loss. We are investigating potential physical issues as this machine is identical to another which should have an identical software and hardware configuration. Affected customers will receive a credit for the downtime.

    The larger problem is why our monitoring tools did not alert us; we will be looking into how to add or adjust the thresholds.

  • Downtime on cattle/girdle proceeding as scheduled

    Sun, 18 Oct 2015 21:50:00 -0700 - Sarah Newman

    UPDATE 2015-10-18 22:15 -0700 PDT: All affected instances should be back up.

    We will beginning shutting down domains in about 10 minutes.