From owner-freebsd-net@FreeBSD.ORG Wed Jul 27 11:50:44 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 69BD3106564A for ; Wed, 27 Jul 2011 11:50:44 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id 2DBDF8FC08 for ; Wed, 27 Jul 2011 11:50:44 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.76 (FreeBSD)) (envelope-from ) id 1Qm2dN-000Fd1-Ic; Wed, 27 Jul 2011 07:50:41 -0400 Date: Wed, 27 Jul 2011 07:50:41 -0400 From: Gary Palmer To: Paul Keusemann Message-ID: <20110727115041.GE1339@in-addr.com> References: <4E159C5A.5090702@visi.com> <13D65A4C-F874-4970-A070-AA0392416680@mac.com> <4E1C9FEA.2080608@visi.com> <20110720201502.GA37199@in-addr.com> <4E2EAAD7.6040906@visi.com> <20110726130549.GD1339@in-addr.com> <4E2F08E4.2070100@visi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E2F08E4.2070100@visi.com> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false Cc: freebsd-net@freebsd.org Subject: Re: Debugging dropped shell connections over a VPN X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 11:50:44 -0000 On Tue, Jul 26, 2011 at 01:35:16PM -0500, Paul Keusemann wrote: > On 07/26/11 08:05, Gary Palmer wrote: > >On Tue, Jul 26, 2011 at 06:53:59AM -0500, Paul Keusemann wrote: > >>Again, sorry for the sluggish response. > >> > >>On 07/20/11 15:15, Gary Palmer wrote: > >>>On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote: > >>>>On 07/07/11 14:39, Chuck Swiger wrote: > >>>>>On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote: > >>>>>>My setup is something like this: > >>>>>>- My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris > >>>>>>machines running various OS versions. > >>>>>>- My gateway / firewall machine is running FreeBSD-8.1-RELEASE-p1 > >>>>>>with > >>>>>>ipfw, nat and racoon for the firewall and VPN. > >>>>>> > >>>>>>The problem is that rlogin, ssh and telnet connections over the VPN > >>>>>>get > >>>>>>dropped after some period of inactivity. > >>>>>You're probably getting NAT timeouts against the VPN connection if it > >>>>>is > >>>>>left idle. racoon ought to have a config setting called natt_keepalive > >>>>>which sends periodic keepalives-- see whether that's disabled. > >>>>> > >>>>>Regards, > >>>>Thanks for the suggestions Chuck, sorry it's taken so long to respond > >>>>but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in > >>>>order to try this out. > >>>> > >>>>One thing that I did not explicitly mention before is that I am routing > >>>>a network over the VPN. > >>>Hi Paul, > >>> > >>>Even if you are not being NAT'd on the VPN there may be a firewall (or > >>>other active network component like a load balancer) with an > >>>overflowing state table somewhere at the remote end. We see this > >>>frequently where I work with customer networks and the > >>>firewall/VPN/network > >>>admin denies that its a time out issue so there is likely some device in > >>>the network that has a state table and if the connection is idle for a > >>>few minutes it gets dropped. > >>Hmmm, this seems likely. Have you had any luck in finding the culprit > >>and resolving the problem? > >Unfortunately no. We know the problem exists but as a vendor we have > >very little success in getting the customer to identify the problematic > >device inside their network as it only seems to affect our connections > >to them when we are helping them with problems, so there is almost > >always something more important going on and the timeout issue gets put > >on the back burner and forgotten. We've worked around it in some > >places by using the ssh 'ServerAliveInterval' directive to make ssh > >send packets and keep the session open even if we're idle, but that > >doesn't always work. > > OK, I found the ClientAliveInterval, and ClientAliveCountMax setting in > the ssh_config man page. I assume these are what you are referring to. > I tried setting ClientAliveInterval to 15 seconds with > ClientAliveCountMax set to 3 and this seems to help. I've only tried > this a couple of times but I have seen an ssh session stay alive for > over an hour. The bad news is that the sessions are still getting > dropped, at least now I know when it happens. Now I'm getting the > following message: > > Received disconnect from 10.64.20.69: 2: Timeout, your session not > responding. > > From a quick perusal of the openssh source, it is not obvious whether > this message is coming from the client or the server side. Initially, > because the keep alive timer is a server side setting, I assumed the > message was coming from the server side but if the session is not > responding how is the message getting to the client? If it is a client > side problem, then I have much more flexibility to fix. All I can do is > whine about server side problems. Hi Paul, ServerAliveInterval is actually a client setting. e.g. put this in your ~/.ssh/config file host * ServerAliveInterval 15 will set the client to ping the server every 15 seconds and try to keep the connection alive. You can replace '*' you want to be more targeted in your configuration. I've never played with the server side settings for various reasons. Regards, Gary