Using keep-alive control for off-line ssh connections for avoiding dead-locks

Emin Gabrielyan

Switzernet

2008-03-26

When ssh connections are established in background we need to be aware of a dead server. Often, and especially when there is a script managing several simultaneous connections [http://4z.com/public/080212-remote-cpu-monitor/], we need to terminate the dead ssh sessions as soon as possible in order to not lock the rest of the script.

By default when an ssh session is established with a server, and no data is exchanged, the server (or the connection) can go dead without the client knowing about the outage. The below scenario shows a long outage occurred immediately after the establishment of the ssh session. Since no data exchange occurred during the outage time, the ssh session does not detect the outage and stays open.

In this example, the outage is simulated by unplugging the uplink of the switch of the user. Later when the connection comes back, we enter the exit command and close the ssh session ourselves. If the script expects data from a dead server it can get dead-locked.

The ssh option ServerAliveInterval sets a timeout interval in seconds after which if no data has been received from the server, ssh will send a message through the encrypted channel to request a response from the server. The default is 0, indicating that these messages will not be sent to the server. Reference [http://drupal.star.bnl.gov/STAR/comp/sofi/facility-access/ssh-stable-con].

Below is the demonstration of the sane outage scenario but with the ssh session established with the ServerAliveInterval option.

date; ssh -o ServerAliveInterval=5 sona@fr1.youroute.net; date

With the ServerAliveInterval=5 option the ssh session is disconnected alone, shortly after the outage occurred. Even without any data being exchanged between the client and the server the ssh client checks the aliveness of the server (in a control channel) according to the 5 second interval option:

* * *