Please remove unneeded data in your project directory closed

During the weekend 18-19th of January the project storage system on Rackham ran out of inodes, which prevents new files from being creted inside the /proj directories. We have cleaned up and recovered many inodes and the queues on both Rackham and Snowy are now running again. The situation is however still far from ideal and the process of freeing up more inodes would be much quicker if all users could remove old and unneeded data inside the proj directories.

An inode is a data structure that describes a file or directory (e.g. owner, time of last change, etc.). All files have a corresponding inode. When no more inodes are available you will get message similar as when the storage system is full, for example:

$ mkdir /proj/snic2020-1-23/my_new_directory
mv: cannot create directory ‘/proj/snic2020-1-23/my_new_directory’: No
space left on device

If you have jobs that ended between 18-19th of January, please take extra care to check the validity and completeness of the output.

Unfortunately this is what happened during the weekend. We have stopped the queues to prevent more jobs from failing.

We will update this news when the problem is solved.

Update 2020-01-20 16:00

We have recovered roughly three million inodes and the acute problem is solved. Although three million inodes (i.e. three million potential) files may sound alot, for our use cases it is quite small, as we run many jobs that often generate a large number of small files. We would feel much more safe with at least 10x this number, which we can easily but with some delay achieve by moving a few large (in terms of many files) and expired projects to another metadata target. As soon as we feel comfortable with the number of available inodes we will release the queues again.

Update 2020-01-22 11:00

We have now recovered > 10 million inodes and will continue to recover many more in the coming days. However, this process will become much faster if users removes unnedded data inside their project directories and in particular if it is spread across many files. The queues on Rackham and Snowy has been rolling since morning yesterday.