Tag Archive | "outage"

Rackspace explains what went wrong

Tags: ,


Let me start off by saying I have been using Rackspace for some time for our hosting and have been VERY impressed by their support they give us. Yesterday’s downtime was still a very frustrating day for many folks though who weathered their outage. Below is a copy of the report Rackspace offered its customers less than 30 minutes ago.


 

INCIDENT REPORT

 


06/29/2009

Grapevine, DFW Power Interruption





At approximately 3:15CDT, a portion of our DFW Datacenter experienced a power outage.
 
The breaker on the primary utility feeder tripped, initiating a sequence of events that ultimately caused a power interruption in Phase I and Phase II of the data center. All systems initially came up on generator power without customer impact.The ‘A’ bank of generators, which support UPS clusters A and B in Phase I and UPS cluster E in Phase II, then experienced excitation failure which escalated to the point where the generators were no longer able to maintain the electrical load. Rackspace then attempted to switch to our secondary utility feeder, but was unable to do so due to an issue in the Pad Mounted Switch (PMS). At approximately 3:15pm CDT, power supply through UPS clusters A, B and E was lost when the batteries in those clusters discharged, and equipment receiving power through those clusters experienced an interruption in service.

 

Once the primary utility feed was restored, Rackspace brought cluster E up on utility power. Devices supported by clusters A and B were brought up on generator power, as the generators were able to hold the reduced electrical load. During this transition, the batteries in UPS clusters A, B and E were recharged.

 

Rackspace then initiated steps to bring UPS clusters A and B online and complete the transition back to utility power. Cluster B was moved to utility power with battery protection. Cluster A required repair to module 2 of the UPS, and remained on generator power. The generators experienced a subsequent excitation failure forcing transition of cluster A back to the primary utility feed prior to the completion of UPS repairs.

 

Once repairs were completed, that module was re-introduced into the A cluster for redundancy. As of the writing of this Incident Report, the infrastructure behind UPS clusters A, B, and E is being fed via the primary utility feed with UPS protection. The generator vendor and UPS/battery will continue to troubleshoot issues, and conduct further root cause analysis.

 

Conclusion:


Rackspace will provide you with more information as it becomes available. Please let us know if you have any further question or comments around this incident.