May maintenance window closed

The May maintenance window will start at 09:00 CEST on May 4.

On May 4 we will perform the final step in our storage expansion for spring 2021 and migrate metadata to a new and more performant flash storage system. The new system will provide faster data access as well as support for more files (inodes). The queues on Rackham and Snowy will be stopped and /proj will be unavailable during the expansion. We expect the work to take at least 24 hours hence /proj might be unavailable on May 5 (Thursday) and possibly part of May 6 (Friday).

Any disturbances on the accessible systems while the service window remains open should be assumed as related to the service window. Please hold contacting the support until the service is announced completed. You can follow our progress on this page throughout the day.

Update 2022-05-06 13:20

Maintenance completed. We have relased the queues on a subset of nodes and will slowly scale towards full production as we confirm Crex is behaving as expected.

Update 2022-05-06 12:00

Maintenance almost completed for Crex.

Update 2022-05-06 09:00

Maintenace continues with Crex. Crex is expected to soon return to production, albeit without the planned optimizations as explained in status update from yesterday. The optimization will be added at a later time as we sort through the discovered issues with the vendor and discuss a new plan.

Update 2022-05-05 17:00

Maintenance on Bianca including slurm upgrade is completed. All clusters in Bianca should now be running Slurm 20.10. As for the project storage system Crex, we have migrated the metadata to the new flash controller but not every optimization as planned for has been implemented, and we still basically have the same number of inodes available. We will revisit these issues at a later time. The focus now is getting Crex back into production. Work will continue during the evening but it is unlikely that Rackham and Snowy will return to production today.

Update 2022-05-05 15:00

Bianca has been delayed partly as we had to rebuild our base images as SchedMD (maintainers of Slurm) released a new version this morning (…) which forced us to redo some of the steps from yesterday. Maintenance on Crex is unfortunately still ongoing and is stalling as parts of the migration plan did not work as intended. There is no risk to the metadata as we are working on a copy and on a physically separate controller. We are working closely with the vendor to complete the migration as soon as possible.

Update 2022-05-05 12:00

Final testing of Bianca. Maintenance on Crex is still ongoing.

Update 2022-05-05 09:00

Maintenance continues.

Update 2022-05-04 17:00

Metadata from Crex is migrated to the new flash controller. We are tuning and testing the configuration. Maintenance on Rackham and Snowy will continue tomorrow. Maintenance on Bianca is almost completed and Bianca is expected to return to production tomorrow morning.

Update 2022-05-04 15:00

Maintenance on-going without any issues.

Update 2022-05-04 12:00

Maintenance continues. Update of home directory system Domus completed, update of Slurm on Rackham and Miarka is completed. Metadata migration in progress for project storage system Crex. Maintenance ongoing for Bianca.

Update 2022-05-04 09:00

Maintenance starts.