Update 2019-02-11 23:30 UTC: We discovered that for non-standard installs with accept_ra enabled, the router advertisement we sent to invalidate autoconfigured IPv6 addresses also invalidated statically configured addresses identical to the autoconfigured ones. We are emailing everyone whom we believe to be affected based on traffic counters and current address reachability.

At about 21:40 UTC 2019-02-10, we accidentally enabled IPv6 stateless autoconfiguration for some of our network. This resulted in some hosts using the wrong IPv6 address, including our mail system, causing email to support to bounce. Our monitoring system didn’t alert us because the desired IPv6 still worked; it just wasn’t used for outbound connections.

At 2:20 UTC a customer wrote a technical contact for prgmr.com (me) which doesn’t go through our support system, at which time we discovered the issue. We would have noticed the issue by the following morning as we have daily emails sent when our internal backups run.

About an hour after being notified, the issue was resolved as best we could. Hosts that were originally affected but also had IPv6 enabled should be ok. Systems that did not have an IPv6 address by default, but didn’t disable IPv6 and didn’t disable router advertisements will still have an IPv6 address for up to 2 hours following. The extra address can be manually removed.

We looked up which emails bounced to our support system and have written them. We also have resent emails sent during that time period since the IPv6 address used to was not white-listed for sending mail by SPF.

IPv6 stateless autoconfiguration is a method for hosts to automatically assign themselves an IPv6 address and is used in place of DHCP for IPv6. There are three settings important here:

  • The autonomous flag, which declares whether the advertised subnet can be used for stateless autoconfiguration at all.
  • The preferred lifetime setting, which is how many seconds the autoconfigured address should be the default address.
  • The valid lifetime setting, which is how long the address can be used.

We were moving these router advertisements in between routers. On the old router we deleted the autonomous-flag false setting but didn’t delete the router advertisement entirely until about an hour later. During that time this router was advertising that autonomously configuring an IP address was OK. It was also advertising a very long “preferred lifetime” of 604800 seconds (7 days) and a “valid lifetime” of 30 days. These are presumably the default settings. Addresses automatically configured using these settings would not go away after the autonomous flag was turned off until those times expired.

To mitigate the issue, we temporarily re-enabled the autonomous flag and set the preferred lifetime and valid lifetime for the address to 10 seconds. As expected, the preferred time expired after 10 seconds. But for reasons explained in the RFC, the valid lifetime was reset to 2 hours rather than 10 seconds.

After the preferred lifetime expires, if there’s a static IPv6 address, the static IPv6 should be used instead. However, systems without another IPv6 address would continue to use this address until the valid lifetime expires.

To detect this problem in the future, we could have a canary system where we alert if it starts to respond on a given IPv6 address. We also at some time may switch to IPv6 router advertisements on a per-host basis, in which case this sort of issue would no longer be relevant.