Cooling issues in the UPPMAX compute room closed
The storm that flew passed Uppsala around 15:00 managed to turn off the cooling pumps and we were forced to emergency shutoff the Rackham, Irma, Dis and Bianca As soon as we have received confirmation from the technicians from Akademiska Hus we will try to restore as many systems as possible, but since this occurred during a Friday afternoon we can not guarantee that all systems will be up before Monday.
Final ticket report
All systems are up and running after the Ångström wide cooling failure.
At around 15:05 our computer room lost cooling due to failure of the Ångström cooling curcuit. The cooling failed because the main pumps that circulate the cooling medium stopped because of the thunder storms.
The temperature then started to rise about 1°C per minute and the situation quickly became critical.
Here is a short outline of what happened:
- ~ 14:56 Storm hits Polacksbacken.
- 15:03 A short power outage is noticed by UPPMAX staff and equipment in the computer room.
- 15:08 Loss of cooling and sms notifications are sent out to sysadmins.
- 15:23 Service request to akademiska hus is submitted by UPPMAX.
- 15:26 Temperature becomes critical, emergency shutdown of all systems begin.
- 15:55 Technician tells UPPMAX that pumps are up and running again.
- 16:02 Temperature of the cooling medium drops quickly.
- 16:30 We decide to start putting systems back into production.
Update 2018-08-20 13:00
All systems online.
Update 2018-08-17 17:34
Rackham is now online.
Update 2018-08-17 16:34
The pumps have started and the temperature is back to normal. We have started the recovery process.
Update 2018-08-19 15:57
Bianca is now available again. We reopened irma for jobs friday night.