2009-02-12: 00:44 UTC     New SSL Certificates

Over the next several days we will be replacing the SSL certifcates on all web, SMTP, IMAP, and POP3 servers. This is being done in response to the recent publication of a possible attack on MD5 signed SSL certificates. The short story is that these researchers have created a CA, Certificate Authority, signing certificate that can be used to sign end entity SSL certificates that will appear to have been issued by the real CA. The gory details are here.

To exploit this MD5 vulnerability requires considerable cryptography knowledge and a significant amount of computing power to create the fake CA signing certificate. The attacker then has to convince the victim to connect to the fake server via DNS hijacking, social engineering, or with phishing techniques. Financial institutions would be the likely target should generating the fake CA certificate actually be acomplished outside of the laboratory.

2009-02-09: 15:19 UTC     Internal routng problem

Apologies for the delay., its been a trying day

We use the OSPF routing protocol internally to advertise the IP addresses of each service to the border routers providing load balancing and failover. The routers were loosing OSPF adjacency and the assumption was that this was an OSPF bug in the routers or in the routing daemons running on the physical servers. OSPF bugs are not unheard of. It appeared that the OSPF processess in the routers were consuming most the the router CPU.

Much time was wasted shutting down all OSPF daemons and adding static routes to provide access to the IMAP and SMTP servers when the real problem was elsewhere. With OSPF shut down the routers were still seeing bursts of 100% CPU causing periods of total packet loss.

The problem was isolated to to a switch in our first floor rack by disconnecting all trunks to the first floor and to our upstreams and reconnecting one by one. Eveything was then disconnected from the first floor switches and reconnected one machine at a time and tested. This was a time consuming process.

The culprit was a machine in our first floor rack that was spewing packets of some sort that was driving the routers to 100% CPU. Counters on the switches and on the machines themselves were not out of the ordinary hiding the real problem.

We have redundant routing, trunks, switches, with two Ethernet interfaces on each server. With this configuration, the network will survive total hardware failures but not what we experienced today. We are not new to routing and this is the first time a failure like this has been seen.

No mail was lost. The network being down will not cause mail to be lost (unless its an Exchange server but that's not our problem). SMTP is a robust queue and retry protocol. Mail is queued untill it can be delivered to the next hop and a positive acknowledgement of receipt is received. Its worked that way for 20 years.

Webmail is now working.

Mail is back up and beginning to flow - for those using IMAP desktop clients. All other processes should be coming online in the next few hours, if not before. Static routes will be put in place within 40 minutes and should fix the problem We apologize for this extremely unusual interruption. It is a routing problem - no mail will be lost.

Router reload did not fix the problem.

Reloading routers now.

Page delivered in 0.026511 seconds, 40 files included