From owner-freebsd-fs@FreeBSD.ORG  Wed Nov 21 15:27:37 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 2139750C
 for <freebsd-fs@freebsd.org>; Wed, 21 Nov 2012 15:27:37 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54])
 by mx1.freebsd.org (Postfix) with ESMTP id 96BB88FC08
 for <freebsd-fs@freebsd.org>; Wed, 21 Nov 2012 15:27:36 +0000 (UTC)
Received: by mail-ee0-f54.google.com with SMTP id c13so5128967eek.13
 for <freebsd-fs@freebsd.org>; Wed, 21 Nov 2012 07:27:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=pkryNHqlwB7YBlYEN1bV0Vw2d2JUoki+AGJFhp9HiLo=;
 b=lc7a5PLKJm56AQuopboE4eYOe9XVWQ8L70LQCgka6dDII6VMIVi0iEPoQPz2JcuwfS
 QfPa7lBo62Pc2/ak1D0ciyRn1nzgtYLx+yiizzuib/aomTnQn0PwlKMAlqu7wmQSqSCD
 jp7SQPkaQXMniHXeNcbyoX3Nmd7FxceqniYfXXRUj50ogPwk6CFsmkUgyDfQL8nxGvW7
 plHozuZXvzf/Lmm4kuEIb0uoemu0BIH6mNXzLbm+L5jGAyAtn8k5mv0FpnIKyyKkXbAR
 4Jc3v0W3wdiPh1EjigABi/Iml5TAgy9dnnxOrPyKLzmxUte5H0TMWboqNXvz7oawhhC/
 9nLA==
Received: by 10.14.209.201 with SMTP id s49mr46858309eeo.7.1353511655114;
 Wed, 21 Nov 2012 07:27:35 -0800 (PST)
Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com.
 [217.18.249.148])
 by mx.google.com with ESMTPS id b44sm781529eep.12.2012.11.21.07.27.33
 (version=TLSv1/SSLv3 cipher=OTHER);
 Wed, 21 Nov 2012 07:27:34 -0800 (PST)
Subject: Re: nfsd hang in sosend_generic
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Content-Type: text/plain; charset=windows-1252
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <1183657468.630412.1353506493075.JavaMail.root@erie.cs.uoguelph.ca>
Date: Wed, 21 Nov 2012 17:27:32 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <8C72CE97-6D19-4847-9A89-DF8A05B984DD@gmail.com>
References: <1183657468.630412.1353506493075.JavaMail.root@erie.cs.uoguelph.ca>
To: Rick Macklem <rmacklem@uoguelph.ca>
X-Mailer: Apple Mail (2.1499)
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.ORG>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Nov 2012 15:27:37 -0000


On Nov 21, 2012, at 4:01 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Nikolay Denev wrote:
>> Hello,
>>=20
>> First of all, I'm not sure if this is actually nfsd issue and not
>> network stack issue.
>>=20
>> I've just had nfsd hang in unkillable state while doing some IO from
>> Linux host running Oracle DB using Oracle's Direct NFS.
>>=20
>> I was watching from some time how the Direct NFS client loads the NFS
>> server differently, i.e.:
>> with the linux kernel NFS client I see single TCP session to port =
2049
>> and all traffic goes there, while the Direct NFS client
>> is much more aggressive and creates multiple TCP sessions, and often
>> was able to generate pretty big Send/Recv-Q's on FreeBSD's side.
>> I'm mentioning this as probably is related.
>>=20
> I don't know anything about the Oracle client, but it might be =
creating
> new TCP connections to try and recover from a "hung" state. Your =
netstat
> for the client below shows that there are several ESTABLISHED TCP =
connections
> with large receive queues. I wouldn't expect to see this and it =
suggests
> that the Oracle client isn't receiving/reading data off the TCP socket =
for
> some reason. Once it isn't receiving/reading an RPC reply off the TCP =
socket,
> it might create a new one to attempt a retry of the RPC. (NFSv4 =
requires that
> any retry of an RPC be done on a new TCP connection. Although that =
requirement
> doesn't exist for NFSv3, it would probably be considered "good =
practice" and
> will happen if NFSv3 and NFSv4 share the same RPC socket handling =
code.)
>=20
>> Here's the procstat -kk of the hanged nfsd process :
>>=20
>> [... snipped huge procstat output =85]
>>=20
> It appears that all the nfsd threads are trying to send RPC replies
> back to the client and are stuck there. As you can see below, the
> send queues for the TCP sockets are big, so the data isn't getting
> through to the client. The large receive queue in the ESTABLISHED
> connections on the Linux client suggests that Oracle isn't taking
> data off the TCP socket for some reason, which would result in this,
> once the send window is filled. At least that's my rusty old
> understanding of TCP. (That would hint at an Oracle client bug,
> but I don't know anything about the Oracle client.)
>=20
> Why? Well, I can't even guess, but a few things you might try are:
> - disabling TSO and rx/tx checksum offload on the FreeBSD server's
>  network interface(s).
> - try a different type of network card, if you have one handy.
> I doubt these will make a difference, since the large receive queues
> for the ESTABLISHED TCP connections in the Linux client suggests that
> the data is getting through. Still might be worth a try, since there
> might be one packet that isn't getting through and that is causing
> issues for the Oracle client.
>=20
> - if you can do it, try switching the Oracle client mounts to UDP.
>  (For UDP, you want to start with a rsize, wsize no bigger than
>   16384 and then be prepared to make it smaller if the
>   "fragments dropped due to timeout" becomes non-zero for UDP when
>   you do a "netstat -s".)
>   - There might be a NFS over TCP bug in the Oracle client.
> - when it is stuck again, do a "vmstat -z" and "vmstat -m" to
>  see if there is a large "InUse" for anything.
>  - in particular, check mbuf clusters
>=20
> Also, you could try capturing packets when it
> happens and look at then in wireshark to see if/what
> related traffic is going on the wire. Focus on the TCP layer
> as well as NFS.
>=20

Looking at it again, It really looks like a bug in the Oracle client, so
for now we've decided to disable the Direct NFS client and switch back =
to the
standard linux kernel NFS client.

Unfortunately testing with UDP won't be possible as I think oracle's NFS =
client only support TCP.

What is curious is why the kernel NFS mount from the Linux host was also =
stuck because of the misbehaving userspace client.
I should have tested mounting from another host to see if the NFS server =
would respond, as this seems like a DoS attack to the NFS server :)

Anyways, I've started collecting and graphing the output of netstat -m =
and vmstat -z in case
something like this happens again.

>>=20
>> Here is a netstat output for the nfs sessions from FreeBSD server
>> side:
>>=20
>> Proto Recv-Q Send-Q Local Address Foreign Address (state)
>> tcp4 0 37215456 10.101.0.1.2049 10.101.0.2.42856 ESTABLISHED
>> tcp4 0 14561020 10.101.0.1.2049 10.101.0.2.62854 FIN_WAIT_1
>> tcp4 0 3068132 10.100.0.1.2049 10.100.0.2.9712 FIN_WAIT_1
>>=20
>> Linux host sees this :
>>=20
>> tcp 1 0 10.101.0.2:9270 10.101.0.1:2049 CLOSE_WAIT
>> tcp 477940 0 10.100.0.2:9712 10.100.0.1:2049 ESTABLISHED
> ** These hint that the Oracle client isn't reading the socket
>   for some reason. I'd guess that the send window is now full,
>   so the data is backing up in the send queue in the server.
>> tcp 1 0 10.101.0.2:10588 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:12254 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:12438 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:17583 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:20285 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:20678 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:22892 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:28850 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:33851 10.100.0.1:2049 CLOSE_WAIT
>> tcp 165 0 10.100.0.2:34190 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:35643 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:39498 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:39724 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:40742 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:41674 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:42942 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:42956 10.100.0.1:2049 CLOSE_WAIT
>> tcp 477976 0 10.101.0.2:42856 10.101.0.1:2049 ESTABLISHED
>> tcp 1 0 10.100.0.2:42045 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:42048 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:43063 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:44771 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:49568 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:50813 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:51418 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:54507 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:57201 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:58553 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:59638 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:62289 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:61848 10.101.0.1:2049 CLOSE_WAIT
>> tcp 476952 0 10.101.0.2:62854 10.101.0.1:2049 ESTABLISHED
>>=20
>> Then I used "tcpdrop" on FreeBSD's side to drop the sessions, the =
nfsd
>> was able to die and be restarted.
>> During the "hanged" period, all NFS mounts from the Linux host were
>> inaccessible, and IO hanged.
>>=20
>> The nfsd is running with drc2/drc3 and lkshared patches from Rick
>> Macklem.
>>=20
> These shouldn't have any effect on the above, unless you've exhausted
> your mbuf clusters. Once you are out of mbuf clusters, I'm not sure
> what might happen within the lower layers TCP->network interface.
>=20
> Good luck with it, rick
>=20

Thank you for the response!

Cheers,
Nikolay

>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"