Cooling issues in the UPPMAX compute room closed
We have an ongoing issue with cooling in the server hall. All systems have been closed down to prevent hardware damage.
Final ticket report
All systems are up and running after the ångström wide cooling failure.
At around 08:09 our computer room lost cooling due to failure of the Ångström cooling curcuit.
According to to Akademiska Hus the cooling failed because of two expansion tanks beeing empty. At this time they dont know why and 800L of water have mysteriously dissapeared. This made the pumps that drive the cooling curcuit to stop.
The temperature then started to rise about 1°C per minute and the situation quickly became critical.
Here is a short outline of what happened:
- 08:09 Loss of cooling and sms notifications are sent out to sysadmins
- 08:25 Sysadmins arrive in the computer room and service request is sent to Akademiska hus
- 08:33 Campus manager and Security and safety division are notified about the problem
- 08:35 Temperature becomes critical, emergency shutdown of all systems begin
- 08:52 Akadeniska hus reports that cooling is back
- 08:59 The campus manager notifies all personell about the loss of cooling
- 09:50 Rackham is back up
- 09:58 Grus is back up
- 10:40 Irma is back up
- 14:29 Bianca is back up
- 2018-12-20 13:35 Dis is back up.
Update 2018-12-20 13:30
The UPPMAX Cloud is back up.
Update 2018-12-19 14:30
Bianca is back up.
Update 2018-12-19 11:50
Rackham, Snowy and Irma are up and the queues are running.
Update 2018-12-19 09:30
The problem has been fixed according to Akademiska Hus. It was related to an expansion tank losing pressure, which made the circulation stop. We will begin restoring our systems.