Issues with offload storage (/lutra) closed

There is currently an issue with the offload project storage system Lutra, mounted at /lutra. We are investigating this issue.

Update 2020-10-28 14:30

We have worked on parts of Lutra during October and now our tests reveals no more issues. All files in all projects should now be working as expected. If you find any files that reports a “stale file handle” or you suspect there are invisible files, please let us know by contacting support@uppmax.uu.se

Update 2020-10-05 18:30

We have identified two Glusterfs optimization settings that may be causing us stale files. We have temporarily turned these off with inital promising results and we are now counting the remaining number of stale files. We have run a repair script during the weekend that should have corrected many of the remaining stale files from last week.

Update 2020-09-30 22:00

We are unfortunately still finding a large amount of files that report stale file handle. This is an indication that the filesystem (Gluster) is not able to resolve the files using the trusted.gfid (an extended file attribute Gluster uses for bookkeeping). The current situation is that we are able to repair stale files, but some of the repaired files will become stale again after some time. We are investigating why this occurs. This is not a unique case of Gluster misbehaving, we have seen similar albeit not as stubborn cases before.

Update 2020-09-17 17:33

Most of the issues has now been solved and we have mounted Lutra on the compute nodes at /lutra. We are noting that some projects still have stale files (“Stale file handle”) which we are continuing to work on.

Update 2020-09-11 09:45

The minor update to the latest 6.x version of Gluster (6.10) appears to have not solved this issue. We will be trying different kernel versions as we see signs in the logs that this problem existed in smaller scale in early fall. Unfortunately /lutra will remain inaccessible while we continue our work.

Update 2020-09-10 11:10

We are planning to do an emergency update of the server version of Lutra. During this upgrade the access to /lutra is expected to be lost and any processes or jobs attempting to access data might die and results should be double-checked. We apologize for this inconvenience.