From owner-freebsd-fs@FreeBSD.ORG Wed Nov 21 15:27:37 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2139750C for ; Wed, 21 Nov 2012 15:27:37 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 96BB88FC08 for ; Wed, 21 Nov 2012 15:27:36 +0000 (UTC) Received: by mail-ee0-f54.google.com with SMTP id c13so5128967eek.13 for ; Wed, 21 Nov 2012 07:27:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=pkryNHqlwB7YBlYEN1bV0Vw2d2JUoki+AGJFhp9HiLo=; b=lc7a5PLKJm56AQuopboE4eYOe9XVWQ8L70LQCgka6dDII6VMIVi0iEPoQPz2JcuwfS QfPa7lBo62Pc2/ak1D0ciyRn1nzgtYLx+yiizzuib/aomTnQn0PwlKMAlqu7wmQSqSCD jp7SQPkaQXMniHXeNcbyoX3Nmd7FxceqniYfXXRUj50ogPwk6CFsmkUgyDfQL8nxGvW7 plHozuZXvzf/Lmm4kuEIb0uoemu0BIH6mNXzLbm+L5jGAyAtn8k5mv0FpnIKyyKkXbAR 4Jc3v0W3wdiPh1EjigABi/Iml5TAgy9dnnxOrPyKLzmxUte5H0TMWboqNXvz7oawhhC/ 9nLA== Received: by 10.14.209.201 with SMTP id s49mr46858309eeo.7.1353511655114; Wed, 21 Nov 2012 07:27:35 -0800 (PST) Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com. [217.18.249.148]) by mx.google.com with ESMTPS id b44sm781529eep.12.2012.11.21.07.27.33 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 21 Nov 2012 07:27:34 -0800 (PST) Subject: Re: nfsd hang in sosend_generic Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Content-Type: text/plain; charset=windows-1252 From: Nikolay Denev In-Reply-To: <1183657468.630412.1353506493075.JavaMail.root@erie.cs.uoguelph.ca> Date: Wed, 21 Nov 2012 17:27:32 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <8C72CE97-6D19-4847-9A89-DF8A05B984DD@gmail.com> References: <1183657468.630412.1353506493075.JavaMail.root@erie.cs.uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.1499) Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Nov 2012 15:27:37 -0000 On Nov 21, 2012, at 4:01 PM, Rick Macklem wrote: > Nikolay Denev wrote: >> Hello, >>=20 >> First of all, I'm not sure if this is actually nfsd issue and not >> network stack issue. >>=20 >> I've just had nfsd hang in unkillable state while doing some IO from >> Linux host running Oracle DB using Oracle's Direct NFS. >>=20 >> I was watching from some time how the Direct NFS client loads the NFS >> server differently, i.e.: >> with the linux kernel NFS client I see single TCP session to port = 2049 >> and all traffic goes there, while the Direct NFS client >> is much more aggressive and creates multiple TCP sessions, and often >> was able to generate pretty big Send/Recv-Q's on FreeBSD's side. >> I'm mentioning this as probably is related. >>=20 > I don't know anything about the Oracle client, but it might be = creating > new TCP connections to try and recover from a "hung" state. Your = netstat > for the client below shows that there are several ESTABLISHED TCP = connections > with large receive queues. I wouldn't expect to see this and it = suggests > that the Oracle client isn't receiving/reading data off the TCP socket = for > some reason. Once it isn't receiving/reading an RPC reply off the TCP = socket, > it might create a new one to attempt a retry of the RPC. (NFSv4 = requires that > any retry of an RPC be done on a new TCP connection. Although that = requirement > doesn't exist for NFSv3, it would probably be considered "good = practice" and > will happen if NFSv3 and NFSv4 share the same RPC socket handling = code.) >=20 >> Here's the procstat -kk of the hanged nfsd process : >>=20 >> [... snipped huge procstat output =85] >>=20 > It appears that all the nfsd threads are trying to send RPC replies > back to the client and are stuck there. As you can see below, the > send queues for the TCP sockets are big, so the data isn't getting > through to the client. The large receive queue in the ESTABLISHED > connections on the Linux client suggests that Oracle isn't taking > data off the TCP socket for some reason, which would result in this, > once the send window is filled. At least that's my rusty old > understanding of TCP. (That would hint at an Oracle client bug, > but I don't know anything about the Oracle client.) >=20 > Why? Well, I can't even guess, but a few things you might try are: > - disabling TSO and rx/tx checksum offload on the FreeBSD server's > network interface(s). > - try a different type of network card, if you have one handy. > I doubt these will make a difference, since the large receive queues > for the ESTABLISHED TCP connections in the Linux client suggests that > the data is getting through. Still might be worth a try, since there > might be one packet that isn't getting through and that is causing > issues for the Oracle client. >=20 > - if you can do it, try switching the Oracle client mounts to UDP. > (For UDP, you want to start with a rsize, wsize no bigger than > 16384 and then be prepared to make it smaller if the > "fragments dropped due to timeout" becomes non-zero for UDP when > you do a "netstat -s".) > - There might be a NFS over TCP bug in the Oracle client. > - when it is stuck again, do a "vmstat -z" and "vmstat -m" to > see if there is a large "InUse" for anything. > - in particular, check mbuf clusters >=20 > Also, you could try capturing packets when it > happens and look at then in wireshark to see if/what > related traffic is going on the wire. Focus on the TCP layer > as well as NFS. >=20 Looking at it again, It really looks like a bug in the Oracle client, so for now we've decided to disable the Direct NFS client and switch back = to the standard linux kernel NFS client. Unfortunately testing with UDP won't be possible as I think oracle's NFS = client only support TCP. What is curious is why the kernel NFS mount from the Linux host was also = stuck because of the misbehaving userspace client. I should have tested mounting from another host to see if the NFS server = would respond, as this seems like a DoS attack to the NFS server :) Anyways, I've started collecting and graphing the output of netstat -m = and vmstat -z in case something like this happens again. >>=20 >> Here is a netstat output for the nfs sessions from FreeBSD server >> side: >>=20 >> Proto Recv-Q Send-Q Local Address Foreign Address (state) >> tcp4 0 37215456 10.101.0.1.2049 10.101.0.2.42856 ESTABLISHED >> tcp4 0 14561020 10.101.0.1.2049 10.101.0.2.62854 FIN_WAIT_1 >> tcp4 0 3068132 10.100.0.1.2049 10.100.0.2.9712 FIN_WAIT_1 >>=20 >> Linux host sees this : >>=20 >> tcp 1 0 10.101.0.2:9270 10.101.0.1:2049 CLOSE_WAIT >> tcp 477940 0 10.100.0.2:9712 10.100.0.1:2049 ESTABLISHED > ** These hint that the Oracle client isn't reading the socket > for some reason. I'd guess that the send window is now full, > so the data is backing up in the send queue in the server. >> tcp 1 0 10.101.0.2:10588 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:12254 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:12438 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:17583 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:20285 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:20678 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:22892 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:28850 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:33851 10.100.0.1:2049 CLOSE_WAIT >> tcp 165 0 10.100.0.2:34190 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:35643 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:39498 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:39724 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:40742 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:41674 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:42942 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:42956 10.100.0.1:2049 CLOSE_WAIT >> tcp 477976 0 10.101.0.2:42856 10.101.0.1:2049 ESTABLISHED >> tcp 1 0 10.100.0.2:42045 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:42048 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:43063 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:44771 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:49568 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:50813 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:51418 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:54507 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:57201 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:58553 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:59638 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:62289 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:61848 10.101.0.1:2049 CLOSE_WAIT >> tcp 476952 0 10.101.0.2:62854 10.101.0.1:2049 ESTABLISHED >>=20 >> Then I used "tcpdrop" on FreeBSD's side to drop the sessions, the = nfsd >> was able to die and be restarted. >> During the "hanged" period, all NFS mounts from the Linux host were >> inaccessible, and IO hanged. >>=20 >> The nfsd is running with drc2/drc3 and lkshared patches from Rick >> Macklem. >>=20 > These shouldn't have any effect on the above, unless you've exhausted > your mbuf clusters. Once you are out of mbuf clusters, I'm not sure > what might happen within the lower layers TCP->network interface. >=20 > Good luck with it, rick >=20 Thank you for the response! Cheers, Nikolay >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"