UPPMAX was shutdown on Monday at 13:00 CEST due to loss of cooling closed
We are currently having an issue with cooling in the computer hall. If we soon do not make any progress to get the cooling back we will be forced to shutdown our systems. This affects all UPPMAX systems.
Final ticket report
All systems are up and running after the ångström wide cooling failure.
At around 12:48 our computer room lost cooling due to failure of the Ångström cooling curcuit. The cooling failed because the controlling system was disconnected by mistake by one of the contractors involved in building the new Ångström houses.
The temperature then started to rise about 1°C per minute as you can see in the picture below and the situation quickly became critical.
Here is a short outline of what happened:
- 12:48 Loss of cooling and sms notifications are sent out to sysadmins.
- 12:54 Sysadmins arrive in the computer room
- 12:57 The campus manager is notified by UPPMAX about the problem.
- 12:59 Temperature becomes critical, emergency shutdown of all systems begin.
- 13:06 Service request to akademiska hus is submitted by UPPMAX
- 14:07 The campus manager notifies all personell about the loss of cooling
- 14:20 The cooling returns
- 14:48 We decide to start putting systems back into production
- 14:50 Campus IT-manager notifies all personell about return of cooling
- 16:15 Irma, Rackham and Dis are back into production
- 16:47 Campus IT-manager notifies all personell that cooling is in full prodcution
- 11:20 day 2 Bianca’s wharf is back online.
- 14:20 day 2 Bianca is back online.
Update 2018-06-20 14:20
Bianca is now back online.
Update 2018-06-19 11:20
Bianca’s wharf is online now, but the rest of Bianca is still down. Work in progress.
Update 2018-06-18 17:15
Bianca will unfortunately not be online again until tomorrow.
Update 2018-06-18 16:14
UPPMAX Cloud back online. All VMs needs to be powered on again.
Update 2018-06-18 15:30
Irma and Rackham is back online.
Update 2018-06-18 14:55
According to Akademiska Hus they have staff on-site to help with the cooling issue that is, as far as can tell, still an issue but under control. We have not yet received information what might have caused the cooling to stop. We have begun restoring our systems but we do not yet have an ETA.
Update 2018-06-18 14:41
All systems were shutdown around 13:00. We are waiting on confirmation from Akademiska Hus that the problem has been fixed.