Cooling failure closed
We had a sudden cooling outage about 3am to 4am. About 600 compute nodes had an emergency shutdown and many jobs are unfortunately lost.
Update 2023-10-28 07:00
Bianca is now back in production.
Update 2023-10-27 15:00
We initially thought that Bianca had only lost compute nodes in the recent incident, but upon further investigation, we discovered that one of the control plane nodes had shut down due to overheating. Regrettably, this necessitates the reinitialization of our virtual network infrastructure, a process that will take several hours. We anticipate that Bianca will be back in production later this evening. Thank you for your understanding.
Update 2023-10-27 09:00
Most parts of Rackham, Snowy and Miarka is back online and in production.
Affected systems: bianca, dis, snowy, miarka, and rackham
Written by Support Team on