[outage] UPS Failure (May 18)

Historical archive of system events. These events are posted for reference purposes only.

[outage] UPS Failure (May 18)

Postby RollerNetSupport » Sun May 18, 2008 8:16 pm

We just had one of our APC Smart-UPS RT 7500 XL's literally explode and trip every breaker near it - we're working to recover as quickly as possible. Inital impression is that the unit went into bypass mode and crossed line to ground or the bypass relay didn't fully throw. Our emergency bypass 50 amp breaker also tripped, as did the redundant feed 3-phase 100 amp breaker, resulting in a full power outage in the datacenter.
Last edited by RollerNetSupport on Fri Aug 01, 2008 7:15 pm, edited 3 times in total.
Technical Support support@rollernet.us
Roller Network LLC
RollerNetSupport
Site Admin
 
Posts: 850
Joined: Wed Nov 17, 2004 11:05 pm
Location: Nevada

Postby RollerNetSupport » Sun May 18, 2008 10:23 pm

We're fully online using our 'B' power feed, however the APC UPS that caused this whole debacle is completely destroyed. Here's what we know:

* A routine transfer test was run on the UPS to verify its functionality.

* When the unit came back to normal after the test, it was making an abnormal buzzing sound.

* The unit was only reporting 11% load, much lower than the expected value.

* In an attempt to rectify the abnormal condition of the unit, we commanded it to cycle through internal bypass and back online. (It's a double-online conversion type, so this would cycle the inverter/rectifier.)

* As soon as the unit attempted to go into bypass mode, it made a very loud popping sound, followed by several cascading breaker trips that disabled the emergency 'B' feed.

* After executing an emergency power off on the unit, we cycled the breakers for the 'B' feed. Our network successfully recovered from a cold start on its own.

After all of this, we tried to bring the unit back online from a complete cold start, but it began making a repetitive clicking sound and started to emit a strong electrical fire smell. We immediately cut power once again, and removed the unit from the rack.

The unit in question is an APC SURT7500XLT.

Our total outage time from exploding UPS, emergency power cut, and cold start after cycling the 'B' feed was about 15 minutes.
Technical Support support@rollernet.us
Roller Network LLC
RollerNetSupport
Site Admin
 
Posts: 850
Joined: Wed Nov 17, 2004 11:05 pm
Location: Nevada

Postby RollerNetSupport » Mon May 19, 2008 6:58 am

APC's support suspects the failure was caused by a fault in the battery charger circuitry.
Technical Support support@rollernet.us
Roller Network LLC
RollerNetSupport
Site Admin
 
Posts: 850
Joined: Wed Nov 17, 2004 11:05 pm
Location: Nevada

Postby RollerNetSupport » Fri May 23, 2008 10:58 am

We've discovered that our transfer switch (which handles the A and B feeds to the main distribution bus) was damaged and the relays are fused permanently to the "B" side. This damage wasn't apparent until we attempted to switch back to the newly repaired "A" side feed and the relays wouldn't throw.

Unfortunately, this means we need to replace the damaged switchgear, and this will require a full shutdown. We will be scheduling an emergency maintenance window soon. The replacement gear is already in place and waiting, so once we cut power, we will quickly re-wire the output and bring the system back up. We estimate the total maintenance window to be no more than 15 minutes.
Technical Support support@rollernet.us
Roller Network LLC
RollerNetSupport
Site Admin
 
Posts: 850
Joined: Wed Nov 17, 2004 11:05 pm
Location: Nevada

Postby RollerNetSupport » Mon May 26, 2008 7:08 pm

Through a carefully orchestrated web of extra PDUs and power cables, we completed the replacement without any downtime.

We're back on the primary "A" feed, which resolves the original issue. We've made a few changes in the power distribution layout since we had the opportunity, and we intend to install a secondary UPS system to reduce the effect a rogue UPS may have in the future.

If you have any questions regarding this incident, please feel free to contact us.
Technical Support support@rollernet.us
Roller Network LLC
RollerNetSupport
Site Admin
 
Posts: 850
Joined: Wed Nov 17, 2004 11:05 pm
Location: Nevada


Return to System Status Archives

Who is online

Users browsing this forum: No registered users and 1 guest

cron