Please remove unneeded data in your project directory closed
During the weekend 18-19th of January the project storage system on Rackham ran
out of inodes, which prevents new files from being creted inside the /proj
directories. We have cleaned up and recovered many inodes and the queues on
both Rackham and Snowy are now running again. The situation is however still
far from ideal and the process of freeing up more inodes would be much quicker
if all users could remove old and unneeded data inside the proj
directories.
An inode is a data structure that describes a file or directory (e.g. owner, time of last change, etc.). All files have a corresponding inode. When no more inodes are available you will get message similar as when the storage system is full, for example:
$ mkdir /proj/snic2020-1-23/my_new_directory
mv: cannot create directory ‘/proj/snic2020-1-23/my_new_directory’: No
space left on device
If you have jobs that ended between 18-19th of January, please take extra care to check the validity and completeness of the output.
Unfortunately this is what happened during the weekend. We have stopped the queues to prevent more jobs from failing.
We will update this news when the problem is solved.
Update 2020-01-20 16:00
We have recovered roughly three million inodes and the acute problem is solved. Although three million inodes (i.e. three million potential) files may sound alot, for our use cases it is quite small, as we run many jobs that often generate a large number of small files. We would feel much more safe with at least 10x this number, which we can easily but with some delay achieve by moving a few large (in terms of many files) and expired projects to another metadata target. As soon as we feel comfortable with the number of available inodes we will release the queues again.
Update 2020-01-22 11:00
We have now recovered > 10 million inodes and will continue to recover many more in the coming days. However, this process will become much faster if users removes unnedded data inside their project directories and in particular if it is spread across many files. The queues on Rackham and Snowy has been rolling since morning yesterday.