From owner-freebsd-stable@FreeBSD.ORG Sat Jan 18 11:24:41 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B5D221E3 for ; Sat, 18 Jan 2014 11:24:41 +0000 (UTC) Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.116.12]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 66ED61855 for ; Sat, 18 Jan 2014 11:24:41 +0000 (UTC) Received: from station-18.bs.cs.huji.ac.il ([132.65.179.107]) by kabab.cs.huji.ac.il with esmtp id 1W4U0u-000Byx-8E; Sat, 18 Jan 2014 13:24:32 +0200 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\)) Subject: Re: on 9.2-stable nfs/zfs and 10g hang From: Daniel Braniss In-Reply-To: <588564685.11730322.1389970076386.JavaMail.root@uoguelph.ca> Date: Sat, 18 Jan 2014 13:24:43 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <2C287272-7B57-4AAD-B22F-6A65D9F8677B@cs.huji.ac.il> References: <588564685.11730322.1389970076386.JavaMail.root@uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.1827) Cc: FreeBSD stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Jan 2014 11:24:41 -0000 On Jan 17, 2014, at 4:47 PM, Rick Macklem wrote: > Daniel Braniss wrote: >> hi all, >>=20 >> All was going ok till I decided to connect this host via a 10g nic >> and very soon it started >> to hang. Running multiple make buildworlds from other hosts connected >> via 10g and >> using both src and obj on the server via tcp/nfs did ok. but running >> find =85 -exec md5 {} + (the find finds over 6M files) >> from another host (at 10g) will hang it very quickly. >>=20 >> If I wait a while (can=92t be more specific) it sometimes recovers - >> but my users are not very >> patient :-) >>=20 > This suggests that an RPC request/reply gets dropped in a way that TCP > doesn't recover. Eventually (after up to about 15min, I think?) the = TCP > connection will be shut down and a new TCP connection started, with a > retry of outstanding RPCs. >=20 >> I will soon try the same experiment using the old 1G nic, but in the >> meantime, if someone >> could shed some light would be very helpful >>=20 >> I=92m attaching core.txt, but if it doesn=92t make it, it=92s also >> available at: >> ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt.16 >>=20 > You might try disabling TSO on the net interface. There are been = issues > with TSO for segments around 64K in the past (or use = rsize=3D32768,wsize=3D32768 > options on the client mount, to avoid RPCs over about 32K in size). >=20 BINGO! disabling tso did it. I=92ll try reducing the packet size later. some numbers: there where some 7*10^6 files doing it locally (the find + md5) took about 3hs, via nfs at 1g took 11 hrs. at 10g it took 4 hrs. thanks! danny > Beyond that, capturing a packet trace for the case that hangs easily = and > looking at what goes on near the end of it in wireshark might give you > a hint about what is going on. >=20 > rick >=20 >> thanks, >> danny >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to >> "freebsd-stable-unsubscribe@freebsd.org" >>=20