Expalanation of server problem on 5/13


For those interested, we’ve written a more detailed explanation of yesterday’s server problem.

To be clear, nothing was actually wrong with the network – the problem was one specific server to which most people on campus have a connection of some kind. That server is Fileserver1. It houses the majority of H:, I:, and W: drives for employees, as well as all student login profiles, H: and W: drives, and the temporary storage drive.

Yesterday morning, fileserver1 began experiencing errors at a rate of about 5 errors per second. The error text was extremely vague and unhelpful, but it caused serious performance issues. Fileserver1 normally runs at about 50% of its resource capacity – while these errors occurred, it was “pegged out” at 100% constantly. Because of this, the large amount of data input and output had to wait, effectively making it seem either extremely slow or even frozen.

Because so many of us interact with Fileserver1 on a regular basis, most people felt this problem as general slowness – while computers continually attempted to connect to fileserver1 and had to wait because of the 100% utilization, it effectively locked up the machines.

We were hesitant to reboot the server because it was the last Tuesday of the block and we didn’t want to disrupt people’s work if we could avoid it. We were also wary of the fact that there were likely many files open on fileserver1 which may not have been saved because it was running so poorly – forcing a reboot would cause any changes since the last save to be lost.

After trying a few things, it seemed to be behaving better – it had gone down to about 65% utilization, though the errors were still occurring. We decided to leave it be and delay the reboot until after the block was over. Unfortunately, it crept back up to 100% and so we had to reboot it after all. Once we did so, the errors stopped and it has been running ever since.

Let us know if you have any questions, and we hope you find these explanations valuable!