From owner-freebsd-stable@FreeBSD.ORG Fri Jan 17 14:48:03 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 305AFB81 for ; Fri, 17 Jan 2014 14:48:03 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id E9114198E for ; Fri, 17 Jan 2014 14:48:02 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: X-IronPort-AV: E=Sophos;i="4.95,670,1384318800"; d="scan'208";a="88527824" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 17 Jan 2014 09:47:56 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 617F3B4033; Fri, 17 Jan 2014 09:47:56 -0500 (EST) Date: Fri, 17 Jan 2014 09:47:56 -0500 (EST) From: Rick Macklem To: Daniel Braniss Message-ID: <588564685.11730322.1389970076386.JavaMail.root@uoguelph.ca> In-Reply-To: <012BE46A-DA0F-422F-85D0-8C1E71BC3C51@cs.huji.ac.il> Subject: Re: on 9.2-stable nfs/zfs and 10g hang MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: FreeBSD stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jan 2014 14:48:03 -0000 Daniel Braniss wrote: > hi all, >=20 > All was going ok till I decided to connect this host via a 10g nic > and very soon it started > to hang. Running multiple make buildworlds from other hosts connected > via 10g and > using both src and obj on the server via tcp/nfs did ok. but running > =09find =E2=80=A6 -exec md5 {} + (the find finds over 6M files) > from another host (at 10g) will hang it very quickly. >=20 > If I wait a while (can=E2=80=99t be more specific) it sometimes recovers = - > but my users are not very > patient :-) >=20 This suggests that an RPC request/reply gets dropped in a way that TCP doesn't recover. Eventually (after up to about 15min, I think?) the TCP connection will be shut down and a new TCP connection started, with a retry of outstanding RPCs. > I will soon try the same experiment using the old 1G nic, but in the > meantime, if someone > could shed some light would be very helpful >=20 > I=E2=80=99m attaching core.txt, but if it doesn=E2=80=99t make it, it=E2= =80=99s also > available at: > =09ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt.16 >=20 You might try disabling TSO on the net interface. There are been issues with TSO for segments around 64K in the past (or use rsize=3D32768,wsize=3D= 32768 options on the client mount, to avoid RPCs over about 32K in size). Beyond that, capturing a packet trace for the case that hangs easily and looking at what goes on near the end of it in wireshark might give you a hint about what is going on. rick > thanks, > =09danny > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org" >=20