Crex - Transport endpoint shutdown closed
The following error is visible for some users when accessing files under
/proj
on Rackham and Snowy. It is an issue with the Lustre file system
for the project storage system Crex.
$ cd /proj/<my_project_directory>
$ ls -l
ls: cannot access <file>: Cannot send after transport endpoint shutdown
We are investigating this issue.
Update 2021-08-30 16:30
The problem only affects the login node rackham2
. We will restart
rackham2
to try and resolve the problem.
Update 2021-08-30 23:30
Restarting rackham2 fixed the problem.
The problem on rackham2 started Friday evening 2021-08-27T20:37:06. Compute nodes and Rackham’s other login nodes was not affected.
Technical details: rackham2 lost its connection to crex-OST0036; one of 84 OSTs (Object Storage Targets) in the filesystem. The Lustre client failed to reconnect and got stuck in an error state. Trying to read/write/stat any file striped to this OST resulted in errors.