From owner-freebsd-stable@FreeBSD.ORG Mon Jul 19 20:17:01 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4816E106566B for ; Mon, 19 Jul 2010 20:17:01 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta05.emeryville.ca.mail.comcast.net (qmta05.emeryville.ca.mail.comcast.net [76.96.30.48]) by mx1.freebsd.org (Postfix) with ESMTP id 2AFBB8FC19 for ; Mon, 19 Jul 2010 20:17:00 +0000 (UTC) Received: from omta11.emeryville.ca.mail.comcast.net ([76.96.30.36]) by qmta05.emeryville.ca.mail.comcast.net with comcast id jok11e0010mlR8UA5wH0dL; Mon, 19 Jul 2010 20:17:00 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta11.emeryville.ca.mail.comcast.net with comcast id jwGz1e0053LrwQ28XwH0vS; Mon, 19 Jul 2010 20:17:00 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 825AB9B425; Mon, 19 Jul 2010 13:16:59 -0700 (PDT) Date: Mon, 19 Jul 2010 13:16:59 -0700 From: Jeremy Chadwick To: Mike Tancsa Message-ID: <20100719201659.GA21088@icarus.home.lan> References: <201007182108.o6IL88eG043887@lava.sentex.ca> <20100718211415.GA84127@icarus.home.lan> <201007182142.o6ILgDQW044046@lava.sentex.ca> <20100719023419.GA91006@icarus.home.lan> <201007190301.o6J31Hs1045607@lava.sentex.ca> <20100719033424.GA92607@icarus.home.lan> <20100719035844.GA93487@icarus.home.lan> <201007191241.o6JCfcq5049355@lava.sentex.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201007191241.o6JCfcq5049355@lava.sentex.ca> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-stable@freebsd.org Subject: Re: deadlock or bad disk ? RELENG_8 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jul 2010 20:17:01 -0000 On Mon, Jul 19, 2010 at 08:41:40AM -0400, Mike Tancsa wrote: > At 11:58 PM 7/18/2010, Jeremy Chadwick wrote: > > >So I believe this indicates the message only gets printed during swapin, > >not swapout. Meaning it's happening during an I/O read from da0. > > Yes, and from my existing ssh sessions, it would _seem_ no disk IO > was completing. ie I tried a killall -9 watchdogd which would need > to load killall from the disk, read whatever its linked against. > However, after hitting enter it was just blocking on trying to read. > So I would describe it as if the entire system was waiting from that > "swapper Indefinite wait" to finish, or I could not read anything > from drives associated with that controller. Hmm, okay, so it sounds like the controller wedged or arcmsr(4) started acting oddly. I would open up a case with Areca on the problem, *especially* if it happens again. > >So what's hz? Well, I want to assume it's kern.hz, which defaults to > >1000. 1000*20 = 20000, so the timeout would be 20000/1000 = 20 seconds. > >That's a pretty long time to be waiting for an I/O read to return. > > I think the messages were printing to the serial console faster than > that, but I could be wrong. If it happens again, I will time it Come to think of it, I'm betting you'd get large batches of these messages if/when it happens. That VM code isn't something I'm familiar with (nor msleep(9)), I just happen to dig around and find what I can. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |