From owner-freebsd-stable@FreeBSD.ORG Sat Jan 18 16:13:23 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B8786AC1 for ; Sat, 18 Jan 2014 16:13:23 +0000 (UTC) Received: from mail-qc0-x22e.google.com (mail-qc0-x22e.google.com [IPv6:2607:f8b0:400d:c01::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 75A371DC8 for ; Sat, 18 Jan 2014 16:13:23 +0000 (UTC) Received: by mail-qc0-f174.google.com with SMTP id x13so4568121qcv.5 for ; Sat, 18 Jan 2014 08:13:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=uMuU/HXoQLCwowMwIrbE+GtmSgziyzY0s7Zm1p1QM1M=; b=FfTbHZ1Cb38Caj6avD3NpW+mDmC45exPEW58EQjtNM3Mk6+ZMrfOHfNCDu5KQLaq4w U68R9aXyHpYXcpLKq0+4rxbcngtn5flfbgLnS5DHR/cnQseugf0xfwDEkD6+B4mlwm/s tyLsyP9UuCWNLuWTSu95R2t1qsp//NVc3TwjFRZdXjyt1wsQk5vf0LnuDd6p/7IUpCSe fjjnItrFBuwbSzU6JrAyFUbZF58+A6VBMpk57fSiwo9eyIO89nAt+k/UQ/K/8DDMnBG1 zonmgIcAjgf6YsLSgZB2slhzx/3AJn16gmu/v22IhMKcdztm+UnaDFSKt/tEmtObc7sY dUNg== MIME-Version: 1.0 X-Received: by 10.140.42.180 with SMTP id c49mr12991396qga.24.1390061602597; Sat, 18 Jan 2014 08:13:22 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.52.8 with HTTP; Sat, 18 Jan 2014 08:13:22 -0800 (PST) In-Reply-To: <2C287272-7B57-4AAD-B22F-6A65D9F8677B@cs.huji.ac.il> References: <588564685.11730322.1389970076386.JavaMail.root@uoguelph.ca> <2C287272-7B57-4AAD-B22F-6A65D9F8677B@cs.huji.ac.il> Date: Sat, 18 Jan 2014 08:13:22 -0800 X-Google-Sender-Auth: YLgIhKwEWAT6sYS2OLGTHFL1Fbc Message-ID: Subject: Re: on 9.2-stable nfs/zfs and 10g hang From: Adrian Chadd To: Daniel Braniss Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Cc: Rick Macklem , FreeBSD stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Jan 2014 16:13:23 -0000 Hi! Please try reducing the size down to 32k but leave TSO enabled. It's 9.2, so there may be some bugfixes that haven't been backported from 10 or -HEAD. Would you be able to try a -HEAD snapshot here? What's the NFS server and hosts? I saw the core.txt.16 that says "ix0/ix1" so I can glean the basic chipset family but which NIC in particular is it? What would people need to try and reproduce it? -a On 18 January 2014 03:24, Daniel Braniss wrote: > > On Jan 17, 2014, at 4:47 PM, Rick Macklem wrote: > >> Daniel Braniss wrote: >>> hi all, >>> >>> All was going ok till I decided to connect this host via a 10g nic >>> and very soon it started >>> to hang. Running multiple make buildworlds from other hosts connected >>> via 10g and >>> using both src and obj on the server via tcp/nfs did ok. but running >>> find =85 -exec md5 {} + (the find finds over 6M files) >>> from another host (at 10g) will hang it very quickly. >>> >>> If I wait a while (can=92t be more specific) it sometimes recovers - >>> but my users are not very >>> patient :-) >>> >> This suggests that an RPC request/reply gets dropped in a way that TCP >> doesn't recover. Eventually (after up to about 15min, I think?) the TCP >> connection will be shut down and a new TCP connection started, with a >> retry of outstanding RPCs. >> >>> I will soon try the same experiment using the old 1G nic, but in the >>> meantime, if someone >>> could shed some light would be very helpful >>> >>> I=92m attaching core.txt, but if it doesn=92t make it, it=92s also >>> available at: >>> ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt.16 >>> >> You might try disabling TSO on the net interface. There are been issues >> with TSO for segments around 64K in the past (or use rsize=3D32768,wsize= =3D32768 >> options on the client mount, to avoid RPCs over about 32K in size). >> > BINGO! disabling tso did it. I=92ll try reducing the packet size later. > some numbers: > there where some 7*10^6 files > doing it locally (the find + md5) took about 3hs, > via nfs at 1g took 11 hrs. > at 10g it took 4 hrs. > > thanks! > danny > > >> Beyond that, capturing a packet trace for the case that hangs easily and >> looking at what goes on near the end of it in wireshark might give you >> a hint about what is going on. >> >> rick >> >>> thanks, >>> danny >>> _______________________________________________ >>> freebsd-stable@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >>> To unsubscribe, send any mail to >>> "freebsd-stable-unsubscribe@freebsd.org" >>> > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"