Status with Crex (UPDATE: Rackham and Snowy /proj and /sw/data available again) closed

The project storage system Crex (/proj and /sw/data) for Rackham and Snowy is currently unavailable as we are investigating an issue with metadata. The issue is related to the work during the maintenance on June 1st. We are since June 3 working with DDN (and DDN with Whamcloud) to have this issue resolved ASAP.

The problem: After migration, two symbolic links in the /crex/data directory was unexpectedly broken. The metadata is expected to be identical before and after migration, thus we we need to investigate the cause of these symblic links, if they were broken before the migration or if something has happened, before we proceed. We have at this time not discovered any other broken links, but the majority of the data has not been checked, due to the large number (>1 billion) of files.

Rolling-back the migration is possible, however, the old metadata target is at 98% utilization with regards to inodes, with the implication that relatively few regular files can be created. If we continue to allow Rackham and Snowy jobs to run at 98% utilization it will not be long before no more files can be created inside /proj and we will be forced to move (or ask users to remove) files. The high utilization of inodes together with moving to supported hardware is the reasons we have been working extensively with Crex during the spring.

Update 2022-06-08 11:00

Crex is fully available. We had to revert to the previous metadata targets. Queues are running again on Rackham and Snowy.

One of the goals of the operation we reverted was to get more inodes which makes it possible to store more files. We are currently back on the old number of inodes. We will manually rebalance but if you can reduce the number of files by removing files or archiving many files into a single file that is appreciated.

Update 2022-06-07 19:00

/crex is now available on the login nodes. Queues expected to be released soon.

Update 2022-06-07 17:00

We are in the process of reverting to the previous metadata targets.