2006-03-23: 08:20 UTC     Server down

Server is back up at 03:55 EST (08:55 UTC).

Web services have been moved to a machine running a new kernel with a fix for a bug that most likely caused the file system corruption on the machine running web services on the 21st. The problem on the 21st was compounded by the RAID system not taking a failing SCSI drive offline and rebuilding on a hot spare drive.

The crash this morning was caused by hardware problem. Replacement hardware will arrive tomorrow.

The IMAP server panic on the 10th was most likely caused by the same kernel bug that caused the file system corruption on the 21st. The kernel on this IMAP server will be updated this weekend or next when system usage is low.

At 03:15 EST (08:15 UTC) web services are not available due to a server crash. IMAP and SMTP services are not affected.

2006-03-21: 15:30 UTC     Raid array issues

20:20 EST Due to a configuration error, the standard HTTP port, port 80, was not permitted through the firewall. The secure HTTPS port, port 443 was allowed and customers using HTTPS were able to access the web services. 20:05 EST Web services restored at 20:02 EST (01:02 UTC). This morning at 10:07 EST (15:07 UTC) a machine developed a disk space problem that was resolved by 10:20 EST (15:20 UTC). There is an ongoing issue with the RAID system that will require shutting down this machine.

The affected machine runs two customer visible services, a server in the SMTP cluster and a web server for the Manager and the production web clients. Taking this machine offline will be transparent to customers except for web services which will be unavaliable for approximately two minutes between 20:00 EST and 20:05 EST (01:00 and 01:05 UTC). Some customers may loose their login session when web services are restored. Our apologies in advance for this disruption.

2006-03-10: 22:50 UTC     IMAP server problem

Update: 18:56 EST (23:56 UTC) The IMAP server holding most of our customers mailboxes paniced resulting in damage to the file system containing the Cyrus metadata. The file system has been repaired but some metadata files have been lost. These files can be reconstructed but some customers will not be able to receive new mail or access No existing mail was lost and new mail is queued on the MX servers.

We have elected not to switch to a replica IMAP sever and we will continue to investigate the cause of the panic.

The affected IMAP server should be back online by 19:10 EST (00:10 UTC). We apologize for this outage.

We are having trouble with one IMAP server. We are working the problem but there is no ETR at this time.

2006-03-10: 15:13 UTC     Changes in forwarding update

Effective today at 12:00 EST (17:00 UTC), all forwarded mail scoring 6.0 and higher is being discarded.



Page delivered in 0.02697 seconds, 42 files included