From owner-freebsd-fs@FreeBSD.ORG Mon Apr 23 17:33:37 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A3DCC106564A for ; Mon, 23 Apr 2012 17:33:37 +0000 (UTC) (envelope-from Devin.Teske@fisglobal.com) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) by mx1.freebsd.org (Postfix) with ESMTP id 64B098FC14 for ; Mon, 23 Apr 2012 17:33:37 +0000 (UTC) Received: from pps.filterd (ltcfislmsgpa02 [127.0.0.1]) by ltcfislmsgpa02.fnfis.com (8.14.4/8.14.4) with SMTP id q3NHPjHq004568; Mon, 23 Apr 2012 12:33:00 -0500 Received: from smtp.fisglobal.com ([10.132.206.31]) by ltcfislmsgpa02.fnfis.com with ESMTP id 14db7pr2y9-32 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Mon, 23 Apr 2012 12:33:00 -0500 Received: from dtwin (10.14.152.15) by smtp.fisglobal.com (10.132.206.31) with Microsoft SMTP Server (TLS) id 14.2.283.3; Mon, 23 Apr 2012 12:32:57 -0500 From: Sender: Devin Teske To: "'Tim Bishop'" , References: <20120327181457.GC24787@carrick-users.bishnet.net> <20120423143810.GA95448@carrick-users.bishnet.net> In-Reply-To: <20120423143810.GA95448@carrick-users.bishnet.net> Date: Mon, 23 Apr 2012 10:33:19 -0700 Message-ID: <03ef01cd2177$2d2e5b50$878b11f0$@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQFJC9tWBftSbr5JPhqT6zFTDN/DsQLTVqmhl5oza5A= Content-Language: en-us X-Originating-IP: [10.14.152.15] X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.6.7580, 1.0.260, 0.0.0000 definitions=2012-04-23_04:2012-04-23, 2012-04-23, 1970-01-01 signatures=0 Cc: Subject: RE: ZFS: processes hanging when trying to access filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Apr 2012 17:33:37 -0000 > -----Original Message----- > From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] > On Behalf Of Tim Bishop > Sent: Monday, April 23, 2012 7:38 AM > To: freebsd-fs@freebsd.org > Subject: Re: ZFS: processes hanging when trying to access filesystems > > Here's a comparison of top output. This shows the higher context > switching. I'm not sure if this is part of the cause of the problems, or > just an effect: > > "top -Sj -m io" > > last pid: 95277; load averages: 0.04, 0.11, 0.13 up 20+05:31:54 15:29:52 > 186 processes: 2 running, 182 sleeping, 1 stopped, 1 waiting > CPU: 4.1% user, 0.0% nice, 3.6% system, 0.0% interrupt, 92.3% idle > Mem: 412M Active, 488M Inact, 4685M Wired, 52M Cache, 551M Buf, 288M Free > Swap: 6144M Total, 316M Used, 5828M Free, 5% Inuse > > PID JID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT > COMMAND > 12 0 root 617 1 0 0 0 0 0.00% intr > 11 0 root 584 1212 0 0 0 0 0.00% idle > 0 0 root 322 46 0 0 0 0 0.00% kernel > 3 0 root 257 1 0 0 0 0 0.00% g_up > 4 0 root 175 3 0 0 0 0 0.00% g_down > 13 0 root 20 0 0 0 0 0 0.00% yarrow > 5 0 root 17 0 0 16 0 16 88.89% zfskern > 641 0 _pflogd 4 0 0 0 0 0 0.00% pflogd > > last pid: 92079; load averages: 0.39, 0.22, 0.18 up 20+05:22:39 15:20:37 > 197 processes: 2 running, 192 sleeping, 1 stopped, 1 zombie, 1 waiting > CPU: 0.0% user, 0.0% nice, 5.3% system, 1.5% interrupt, 93.2% idle > Mem: 484M Active, 478M Inact, 4655M Wired, 52M Cache, 551M Buf, 257M Free > Swap: 6144M Total, 316M Used, 5828M Free, 5% Inuse > > PID JID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT > COMMAND > 11 0 root 3945 6837 0 0 0 0 0.00% idle > 12 0 root 2130 1 0 0 0 0 0.00% intr > 0 0 root 2008 99 0 0 0 0 0.00% kernel > 3 0 root 1810 0 0 0 0 0 0.00% g_up > 4 0 root 1486 12 0 0 0 0 0.00% g_down > 13 0 root 20 2 0 0 0 0 0.00% yarrow > 5 0 root 19 0 2 66 0 68 95.77% zfskern > 20 0 root 9 0 0 0 0 0 0.00% g_mirror r > > The latter shows the machine when it's unresponsive and processes are > starting to hang. > > Tim. > > On Tue, Mar 27, 2012 at 07:14:57PM +0100, Tim Bishop wrote: > > I have a machine running 8-STABLE amd64 from the end of last week. I > > have a problem where the machine starts to freeze up. Any process > > accessing the ZFS filesystems hangs, which eventually causes more and > > more processes to be spawned (cronjobs, etc, never complete). Although > > the root filesystem is on UFS (the machine hosts jails on ZFS), > > eventually I can't log in anymore. > > > > The problem occurs when the frequently used part of the ARC gets too > > large. See this graph: > > > > http://dl.dropbox.com/u/318044/zfs_arc_utilization-day.png > > > > At the right of the graph things started to hang. > > > > At the same time I see a high amount of context switching. > > > > I picked a hanging process and procstat showed the following: > > > > PID TID COMM TDNAME KSTACK > > 24787 100303 mutt - mi_switch+0x176 sleepq_wait+0x42 > _cv_wait+0x129 txg_wait_open+0x85 dmu_tx_assign+0x170 zfs_inactive+0xf1 > zfs_freebsd_inactive+0x1a vinactive+0x71 vputx+0x2d8 null_reclaim+0xb3 > vgonel+0x119 vrecycle+0x7b null_inactive+0x1f vinactive+0x71 vputx+0x2d8 > vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23 > > > > I'm running a reduced amount of jails on the machine at the moment which > > is limiting the speed at which the machine freezes up completely. I'd > > like to debug this problem further, so any advice on useful information > > to collect would be appreciated. > > > > I've had this problem on the machine before[1] but adding more RAM > > allievated the issue. > > > > Tim. > > > > [1] http://lists.freebsd.org/pipermail/freebsd-stable/2010- > September/058541.html > We've seen this happen and were able to not only devise a way of replicating it quickly, but also able to identify the root cause (and have since fixed the issue). When this first started, it was occurring on one system only and that system was crashing maybe once per day (by crash, I mean hang exactly as described by the OP). Then we started doing volume load testing and that's when things changed for the worst. By running the following shell script 7 times on local vty's, we were able to replicate the problem within 15 minutes (even faster if threw in some more running via SSH -- but in diagnosing the issue we wanted to prove first to ourselves that it was not network related): #!/bin/sh while :;do echo "############################## $(date)" bonnie done Eventually (for us, about 15 minutes -- with 7 concurrent instances running with a CWD within the ZFS filesystem; YMMV) bonnie stops. We did lots of things. First we watched the stats using: #!/bin/sh while :;do echo "############################## $(date)" sysctl vfs.zfs kstat.zfs sleep 1 done | tee zfs-stats.${HOSTNAME%%.*} And it *looked* like we were hitting some limit (kstat.zfs.misc.arcstats.p appears to be ever-climbing and then BOOM! -- hang). But, after tuning the following parameters in /boot/loader.conf: # ZFS kernel tune vm.kmem_size="2560M" vfs.zfs.arc_min="1536M" vfs.zfs.arc_max="1536M" vfs.zfs.prefetch_disable="1" vfs.zfs.txg.timeout="5" vfs.zfs.vdev.min_pending="3" vfs.zfs.vdev.max_pending="6" Things changed enough that we no longer thought we were hitting some limit (arcstats.p would grow then drop, grow then drop, ad nauseum several times before reaching the hang). So... We finally resolved to dismember the array (something we avoided doing until we ran out of other ideas) and test each member-disk with "SpinRite" -- a FreeDOS based bootable disc/diskette capable of performing miracles on drives. Well, turned out that the 5th (out of 6) drives was overheating under load. We were able to reliably reproduce the overheating with SpinRite (which actively monitors drive temperature during all disk operations). We saw the drive (still in our possession) routinely reach 127 degrees (Farhenheit) about 10-15 minutes under nominal load. We replaced that drive with a new one, re-assembled the array with the new drive, and started the rebuild operation. Within the hour we were back in operation AND... the system now survives multi-day heavy-volume loads running continuously throughout -- no more issues to report. We've kept the drive! I brought this up at the last BAFUG meeting and a couple guys said that we should use that drive to improve ZFS. I'm game if anybody else is. -- Devin _____________ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.