June maintenance window (/proj and Slurm NOT yet available on Rackham and Snowy!) closed

The June maintenance window will start at 09:00 CEST on June 1.

On June 1 we will complete the final step in our storage expansion that we started during the last maintenance window on May 4. Queues on Rackham and Snowy will be stopped and /proj will be unavailable. We estimate the work to be completed within approximately 30 hours hence /proj might be unavailable on June 2 (Thursday).

Queues on Bianca is stopped as we perform another update of Slurm and work on updating the project and home storage system Castor.

This is the last major planned maintenance before the summer. Maintenance in July and August is expected to be light, i.e. security and bug fixes at most, due to lower availability of staff.

Update 2022-06-07 09:00

Crex is still being investigated but the maintenance of all other systems are completed. Please follow our work with Crex in this separate news. Thank you.

Update 2022-06-03 17:00

The filesystem is fully up, however we have discovered an issue related to symlinks which unfortunately requires further inspection before we are able to allow access to Crex (and jobs to start). It is at this moment difficult to estimate if Crex will be available in some capacity later today. This is the last scheduled update for today.

Update 2022-06-03 15:00

The tests so far all seem OK. A scrub of the metadata (>1B inodes) completed without issues. Crex is likely to be available on login nodes before 17:00 pending no surprises. If all goes well, the queues can be slowly released during the evening.

Update 2022-06-03 12:00

Continued testing and checking of Crex - no issues so far.

Update 2022-06-03 09:00

We consider the migration completed and are now preparing and testing the system for production.

Update 2022-06-02 17:00

Migration on Crex will continue and tomorrow. The migration is at 100% excluding a few dozen inodes, left for manual inspection.

Update 2022-06-02 16:00

Maintenance on Bianca completed. Slurm queues to be released shortly.

Update 2022-06-02 15:00

Migration at 94% for Crex for the last MDT, however we have discovered issues transfering some metadata that will require additional steps to correct which unfortunately makes it unlikely for us to return to production today. For Bianca all active projects has completed its upgrade and we are preparing Bianca for production.

Update 2022-06-02 12:00

Migration continues on Crex. One of two MDTs has been 100% migrated, the remaining one is ~70% completed. For Bianca, about 65% of projects are now running Slurm 20, the rest are queued for upgrade.

Update 2022-06-02 09:00

Maintenance continues on Crex and Bianca. We are still migrating metadata on Crex. Part of the reason this takes time is due to the large amount of files in /crex/proj. Simplified, the metadata part of a file is called inode and we have close to 1 billion inodes on Crex. Each inode needs to be unpacked and installed correctly to the flash-based SFA200V controller for Crex to gain additional inode space, as well as to turn on optimization for smaller files. The work on the storage project system on Bianca completed yesterday evening. Bianca will return to production when all clusters are upgraded from from Slurm 19 to Slurm 20 (made available on May 4), as this is a requirement for running Slurm 21 (installed yesterday). This is an automatic process and expected to complete soon.

Update 2022-06-03 09:00

Update 2022-06-01 17:00

One part of Castor refused to come-up correctly which has caused delays. The problem would now appear to have been solved. We will continue to work on Castor throughout the evening, but production of Bianca is expected for Thursday morning. Migration of metadata on Crex is still ongoing. Next update at 09:00 tomorrow.

Update 2022-06-01 15:00

Metadata migration running on Crex. Rackham and Snowy is not expected to return into production for today. Upgrade of Slurm and update of Castor (Bianca storage system) progressing without issues. Next update at 17:00.

Update 2022-06-01 12:00

Maintenance progressing well.

Update 2022-06-01 09:00

Maintenance begins.