From owner-freebsd-stable@FreeBSD.ORG Mon Jul 19 03:01:19 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E14481065673 for ; Mon, 19 Jul 2010 03:01:19 +0000 (UTC) (envelope-from mike@sentex.net) Received: from lava.sentex.ca (pyroxene.sentex.ca [199.212.134.18]) by mx1.freebsd.org (Postfix) with ESMTP id 4EBA28FC15 for ; Mon, 19 Jul 2010 03:01:18 +0000 (UTC) Received: from mdt-xp.sentex.net (simeon.sentex.ca [192.168.43.27]) by lava.sentex.ca (8.14.4/8.14.3) with ESMTP id o6J31Hs1045607; Sun, 18 Jul 2010 23:01:17 -0400 (EDT) (envelope-from mike@sentex.net) Message-Id: <201007190301.o6J31Hs1045607@lava.sentex.ca> X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9 Date: Sun, 18 Jul 2010 23:01:03 -0400 To: Jeremy Chadwick From: Mike Tancsa In-Reply-To: <20100719023419.GA91006@icarus.home.lan> References: <201007182108.o6IL88eG043887@lava.sentex.ca> <20100718211415.GA84127@icarus.home.lan> <201007182142.o6ILgDQW044046@lava.sentex.ca> <20100719023419.GA91006@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Cc: freebsd-stable@freebsd.org Subject: Re: deadlock or bad disk ? RELENG_8 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 03:01:20 -0000 At 10:34 PM 7/18/2010, Jeremy Chadwick wrote: >On Sun, Jul 18, 2010 at 05:42:14PM -0400, Mike Tancsa wrote: > > At 05:14 PM 7/18/2010, Jeremy Chadwick wrote: > > > > >Where exactly is your swap partition? > > > > On one of the areca raidsets. > > > > # swapctl -l > > Device: 1024-blocks Used: > > /dev/da0s1b 10485760 108 > >So is da0 actually a RAID volume "behind the scenes" on the Areca >controller? How many disks are involved in that set? yes, da0 is a RAID volume with 4 disks behind the scenes. >Well, the thread I linked you stated that the problem has to do with a >controller or disk "taking too long". I have no idea what the threshold >is. I suppose it could also indicate that your system is (possibly) >running low on resources (RAM); I would imagine swap_pager would get >called if a processes needed to be offloaded to swap. So maybe this is >a system tuning thing more than a hardware thing. Prior to someone rebooting it, it had been stuck in this state for a good 90min. Apart from upgrading to a later RELENG_8 to get the security patches, the machine had been running a few versions of RELENG_8 doing the same workloads every week without issue. /boot/loader.conf has ahci_load="YES" siis_load="YES" sysctl.conf has net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.recvspace=131072 net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.sendspace=32768 net.inet.udp.recvspace=65536 kern.ipc.somaxconn=1024 kern.ipc.maxsockbuf=4194304 net.inet.ip.redirect=0 net.inet.ip.intr_queue_maxlen=4096 net.route.netisr_maxqlen=1024 kern.ipc.nmbclusters=131072 I do track some basic mem stats via rrd. Looking at the graphs upto that period, nothing unusual was happening CPU: 16.6% user, 0.0% nice, 4.3% system, 0.2% interrupt, 78.8% idle Mem: 443M Active, 5707M Inact, 1462M Wired, 147M Cache, 828M Buf, 166M Free Swap: 10G Total, 124K Used, 10G Free > > smartctl -a -d 3ware,1 /dev/twa0 > >Now I'm confused -- this indicates twa(4) is involved, not arcmsr(4). The other controllers (3ware and onboard ich in ahci mode) provider other storage on the same box. I only noted them in that I checked all their disks for errors of which there were none either. The dmesg from the original post enumerates all the devices on the box. ---Mike -------------------------------------------------------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet since 1994 www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike