From owner-freebsd-fs@freebsd.org Fri Feb 19 06:55:09 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 28CD7AAC0B3 for ; Fri, 19 Feb 2016 06:55:09 +0000 (UTC) (envelope-from zanchey@ucc.gu.uwa.edu.au) Received: from mail-ext-sout1.uwa.edu.au (mail-ext-sout1.uwa.edu.au [130.95.128.72]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "IronPort Appliance Demo Certificate", Issuer "IronPort Appliance Demo Certificate" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 5DC781D62 for ; Fri, 19 Feb 2016 06:55:07 +0000 (UTC) (envelope-from zanchey@ucc.gu.uwa.edu.au) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DtBADMu8ZW/8+AX4JUCoQMbaZSAQEBAQEBBpVBIYI8gzACgiEBAQEBAQFlJ4RCAQEEOjoFEAsOCi4sKwYTiBoOt1qEHgEBAQEGAQEBAQEXBIVKgj2CRoQLBQ0JhEkFjh2IaIVXiWKHaIUvjkdiggIZgVVdAYcCAR6BGgEBAQ X-IPAS-Result: A2DtBADMu8ZW/8+AX4JUCoQMbaZSAQEBAQEBBpVBIYI8gzACgiEBAQEBAQFlJ4RCAQEEOjoFEAsOCi4sKwYTiBoOt1qEHgEBAQEGAQEBAQEXBIVKgj2CRoQLBQ0JhEkFjh2IaIVXiWKHaIUvjkdiggIZgVVdAYcCAR6BGgEBAQ X-IronPort-AV: E=Sophos;i="5.22,469,1449504000"; d="scan'208";a="200827212" Received: from f5-new.net.uwa.edu.au (HELO mooneye.ucc.gu.uwa.edu.au) ([130.95.128.207]) by mail-ext-out1.uwa.edu.au with ESMTP/TLS/ADH-AES256-SHA; 19 Feb 2016 14:54:58 +0800 Received: by mooneye.ucc.gu.uwa.edu.au (Postfix, from userid 801) id F13E33C04E; Fri, 19 Feb 2016 14:54:58 +0800 (AWST) Received: from motsugo.ucc.gu.uwa.edu.au (motsugo.ucc.gu.uwa.edu.au [130.95.13.7]) by mooneye.ucc.gu.uwa.edu.au (Postfix) with ESMTP id C2F6F3C04E; Fri, 19 Feb 2016 14:54:58 +0800 (AWST) Received: by motsugo.ucc.gu.uwa.edu.au (Postfix, from userid 11251) id BABB120083; Fri, 19 Feb 2016 14:54:58 +0800 (AWST) Received: from localhost (localhost [127.0.0.1]) by motsugo.ucc.gu.uwa.edu.au (Postfix) with ESMTP id B5D2420081; Fri, 19 Feb 2016 14:54:58 +0800 (AWST) Date: Fri, 19 Feb 2016 14:54:58 +0800 (AWST) From: David Adam To: Tom Curry cc: FreeBSD Filesystems Subject: Re: Poor ZFS+NFSv3 read/write performance and panic In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Feb 2016 06:55:09 -0000 On Sun, 14 Feb 2016, Tom Curry wrote: > On Sun, Feb 14, 2016 at 3:40 AM, David Adam > wrote: > > > On Mon, 8 Feb 2016, Tom Curry wrote: > > > On Sun, Feb 7, 2016 at 11:58 AM, David Adam > > > wrote: > > > > > > > Just wondering if anyone has any idea how to identify which devices are > > > > implicated in ZFS' vdev_deadman(). I have updated the firmware on the > > > > mps(4) card that has our disks attached but that hasn't helped. > > > > > > I too ran into this problem and spent quite some time troubleshooting > > > hardware. For me it turns out it was not hardware at all, but software. > > > Specifically the ZFS ARC. Looking at your stack I see some arc reclaim up > > > top, it's possible you're running into the same issue. There is a monster > > > of a PR that details this here > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 > > > > > > If you would like to test this theory out, the fastest way is to limit > > the > > > ARC by adding the following to /boot/loader.conf and rebooting > > > vfs.zfs.arc_max="24G" > > > > > Thanks Tom - this certainly did sound promising, but setting the ARC to > > 11G of our 16G of RAM didn't help. `zfs-stats` confirmed that the ARC was > > the expected size and that there was still 461 MB of RAM free. > > Did the system still panic or did it merely degrade in performance? When > performance heads south are you swapping? I had booted back into a GENERIC kernel, so it slowed down and then deadlocked - no network traffic and no response on the console. I've never actually managed to capture the panic with a GENERIC kernel, only with one built with DDB/WITNESS/DIAGNOSTIC. My colleagues tended to try and reboot the server before it got to that stage (and then ask who was going to install Linux). It seems to be fixed now but I have committed a mortal sin and changed two things at once - upgraded to 10.3-BETA1 (as suggested by jwd@ off-list) but also dropped the ARC size further to 10G. If I can make it happen again, I'll certainly be asking for more help and will see what the swap state is. Thanks to everyone who replied on and off list. David Adam Wheel Group University Computer Club, The University of Western Australia zanchey@ucc.gu.uwa.edu.au