October maintenance window closed

The service on Wednesday 2nd of September will begin at 09:00 and Rackham, Snowy, Irma, Bianca, Grus, Castor and the UPPMAX Cloud will receive the usual system bug fixes and security updates.

Our aim is to upgrade the OS to CentOS 7.7 and to upgrade Slurm to version 19.05.

All queues will be stopped.

Crex, the project storage on Rackham and Snowy will be upgraded to a newer Lustre version, and this work is expected to take two to three days. During this time the Slurm queues on Rackham and Snowy will be stopped.

Login nodes will be up part of the time but without Crex and Slurm commands will be restricted.

All disturbances while the service window remains open should be assumed as related to the service window and await contacting the support. If the problems remain after the service window has been closed you are of course welcome to contact the support at support@uppmax.uu.se.

You can follow our progress on this page throughout the day.

Update 2019-10-02 09:00

Service has started. Next update at 12:00.

Update 2019-10-02 12:00

Service progressing without issues so far. Next update 15:00.

Update 2019-10-02 15:00

Service on Grus completed. Service on Irma expected to be completed next. We are awaiting updates from the vendor to complete the maintenance on Rackham and Snowy. No issues found from the Slurm upgrade. Service on Bianca progressing without issues.

Update 2019-10-02 17:00

Irma is back in production. Service on Rackham, Snowy and Bianca will continue tomorrow. The vendor is still working on the storage system, other than that, we have not had any issues migrating to Slurm 19.05 and to CentOS 7.7. Next update at 09:00 tomorrow.

Update 2019-10-03 09:00

Service continues on Rackham, Snowy and Bianca. Bianca expected to be back in production soon. The storage vendor reports that work on Crex (storage system for Rackham and Snowy) has progressed without issues, and a few things remain to be updated. Upgrade to CentOS 7.7 for Rackham and Snowy progressing without issues.

Update 2019-10-03 12:00

Service ongoing for Bianca, Rackham and Snowy. Next update at 15:00.

Update 2019-10-03 15:00

An issue with Slurm 19.05 in Bianca has surfaced which needs to be solved before we can return Bianca to production. For Rackham and Snowy we are waiting for the storage vendor to complete service before we can mount the project directories and release the queues.

Update 2019-10-03 18:00

Work progressing on Bianca and Slurm 19.05 issue. Rackham and Snowy is still waiting for the vendor to complete service on the storage system.

Update 2019-10-04 09:00

The Slurm issues have been resolved on Bianca and the vendor has completed the service on Crex. We are running the final tests on Bianca, Rackham and Snowy.

Update 2019-10-04 10:18

Maintenance on Bianca, Rackham, UPPMAX Cloud and Snowy completed. The queues on Rackham and Snowy will be released and we will slowly start adding nodes back, as we verify the correct behavior of Slurm 19.05 in production.

This concludes the October maintenance.