From owner-freebsd-net@FreeBSD.ORG  Thu Nov 11 20:40:00 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 579DC106566C
	for <freebsd-net@freebsd.org>; Thu, 11 Nov 2010 20:40:00 +0000 (UTC)
	(envelope-from julian@freebsd.org)
Received: from out-0.mx.aerioconnect.net (outg.internet-mail-service.net
	[216.240.47.230])
	by mx1.freebsd.org (Postfix) with ESMTP id 387F28FC15
	for <freebsd-net@freebsd.org>; Thu, 11 Nov 2010 20:39:59 +0000 (UTC)
Received: from idiom.com (postfix@mx0.idiom.com [216.240.32.160])
	by out-0.mx.aerioconnect.net (8.13.8/8.13.8) with ESMTP id
	oABKdwm2026332; Thu, 11 Nov 2010 12:39:58 -0800
X-Client-Authorized: MaGic Cook1e
X-Client-Authorized: MaGic Cook1e
Received: from julian-mac.elischer.org
	(h-67-100-89-137.snfccasy.static.covad.net [67.100.89.137])
	by idiom.com (Postfix) with ESMTP id CE04F2D6017;
	Thu, 11 Nov 2010 12:39:44 -0800 (PST)
Message-ID: <4CDC5490.7030109@freebsd.org>
Date: Thu, 11 Nov 2010 12:39:44 -0800
From: Julian Elischer <julian@freebsd.org>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US;
	rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6
MIME-Version: 1.0
To: Christopher Penney <penney@msu.edu>
References: <AANLkTikmpXDsi9N36D+M1ZFfyNGAZ3A-asaTNm5U7PwK@mail.gmail.com>
In-Reply-To: <AANLkTikmpXDsi9N36D+M1ZFfyNGAZ3A-asaTNm5U7PwK@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.67 on 216.240.47.51
Cc: freebsd-net@freebsd.org
Subject: Re: NFS + FreeBSD TCP Behavior with Linux NAT
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Nov 2010 20:40:00 -0000

On 11/11/10 6:36 AM, Christopher Penney wrote:
> Hi,
>
> I have a curious problem I'm hoping someone can help with or at least
> educate me on.
>
> I have several large Linux clusters and for each one we hide the compute
> nodes behind a head node using NAT.  Historically, this has worked very well
> for us and any time a NAT gateway (the head node) reboots everything
> recovers within a minute or two of it coming back up.  This includes NFS
> mounts from Linux and Solaris NFS servers, license server connections, etc.
>
> Recently, we added a FreeBSD based NFS server to our cluster resources and
> have had significant issues with NFS mounts hanging if the head node
> reboots.  We don't have this happen much, but it does occasionally happen.
>   I've explored this and it seems the behavior of FreeBSD differs a bit from
> at least Linux and Solaris with respect to TCP recovery.  I'm curious if
> someone can explain this or offer any workarounds.
>
> Here are some specifics from a test I ran:
>
> Before the reboot two Linux clients were mounting the FreeBSD server.  They
> were both using port 903 locally.  On the head node clientA:903 was remapped
> to headnode:903 and clientB:903 was remapped to headnode:601.  There is no
> activity when the reboot occurs.  The head node takes a few minutes to come
> back up (we kept it down for several minutes).
>
> When it comes back up clientA and clientB try to reconnect to the FreeBSD
> NFS server.  They both use the same source port, but since the head node's
> conntrack table is cleared it's a race to see who gets what port and this
> time clientA:903 appears as headnode:601 and clientB:903 appears as
> headnode:903 (>>>  they essentially switch places as far as the FreeBSD
> server would see<<<  ).
>
> The FreeBSD NFS server, since there was no outstanding acks it was waiting
> on, thinks things are ok so when it gets a SYN from the two clients it only
> responds with an ACK.  The ACK for each that it replies with is bogus
> (invalid seq number) because it's using the return path the other client was
> using before the reboot so the client sends a RST back, but it never gets to
> the FreeBSD system since the head node's NAT hasn't yet seen the full
> handshake (that would allow return packets).  The end result is a
> "permanent" hang (at least until it would otherwise cleanup idle TCP
> connections).
>
> This is in stark contrast to the behavior of the other systems we have.
>   Other systems respond to the SYN used to reconnect with a SYN/ACK.  They
> appear to implicitly tear down the return path based on getting a SYN from a
> seemingly already established connection.
>
> I'm assuming this is one of the grey areas where there is no specific
> behavior outlined in an RFC?  Is there any way to make the FreeBSD system
> more reliable in this situation (like making it implicitly tear down the
> return)?  Or is there a way to adjust the NAT setup to allow the RST to
> return to the FreeBSD system?  Currently, NAT is setup with simply:
>
> iptables -t nat -A POSTROUTING -s 10.1.0.0/16 -o bond0 -j SNAT --to 1.2.3.4
>
> Where 1.2.3.4 is the intranet address and 10.1.0.0 is the cluster network.

I just added NFS to the subject because the NFS people are thise you 
need to
connect with.
> Thanks!
>
>      Chris
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>