From owner-freebsd-stable@FreeBSD.ORG Wed Apr 6 01:25:17 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EE4C616A4CF for ; Wed, 6 Apr 2005 01:25:16 +0000 (GMT) Received: from FS.denninger.net (wsip-68-15-213-52.at.at.cox.net [68.15.213.52]) by mx1.FreeBSD.org (Postfix) with ESMTP id AB16B43D46 for ; Wed, 6 Apr 2005 01:25:15 +0000 (GMT) (envelope-from karl@FS.denninger.net) Received: from fs.denninger.net (localhost [127.0.0.1]) by FS.denninger.net (8.13.3/8.13.1) with SMTP id j361PEvW009009 for ; Tue, 5 Apr 2005 20:25:14 -0500 (CDT) (envelope-from karl@FS.denninger.net) Received: from fs.denninger.net [127.0.0.1] by Spamblock-sys; Tue Apr 5 20:25:14 2005 Received: (from karl@localhost) by FS.denninger.net (8.13.3/8.13.1/Submit) id j361P8Y1009006; Tue, 5 Apr 2005 20:25:08 -0500 (CDT) (envelope-from karl) Message-ID: <20050405202504.A6216@denninger.net> Date: Tue, 5 Apr 2005 20:25:04 -0500 From: Karl Denninger To: "Matthew N. Dodd" References: <20050329200841.A772@denninger.net> <20050329233843.L328@sasami.jurai.net> <20050329230830.A3222@denninger.net> <20050329234318.A3883@denninger.net> <20050330210830.A46956@denninger.net> <20050330230046.A68235@denninger.net> <20050331120100.P328@sasami.jurai.net> <20050331110608.A81295@denninger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <20050331110608.A81295@denninger.net>; from Karl Denninger on Thu, Mar 31, 2005 at 11:06:08AM -0600 Organization: Karl's Sushi and Packet Smashers X-Die-Spammers: Spammers cheerfully broiled for supper and served with ketchup! cc: freebsd-stable@freebsd.org Subject: Re: DANGER WILL ROBINSON! SERIOUS problem with current 5.4-PRERELEASE - FURTHER UPDATE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Apr 2005 01:25:17 -0000 On Thu, Mar 31, 2005 at 11:06:08AM -0600, Karl Denninger wrote: > On Thu, Mar 31, 2005 at 12:02:20PM -0500, Matthew N. Dodd wrote: > > On Wed, 30 Mar 2005, Karl Denninger wrote: > > > Removing the FIRST delta, which is: > > > > > > 218a219,221 > > > if (!dumping) > > > callout_reset(&request->callout, request->timeout * hz, > > > (timeout_t*)ata_timeout, request); > > > > > > appears to get rid of the crashes while not harming data integrity OR the > > > reqeueing. > > > > I'd be interested to know if the attached patch does anything. > > > > -- > > 10 40 80 C0 00 FF FF FF FF C0 00 00 00 00 10 AA AA 03 00 00 00 08 00 > > Index: ata-queue.c > > =================================================================== > > RCS file: /home/ncvs/src/sys/dev/ata/ata-queue.c,v > > retrieving revision 1.32.2.6 > > diff -u -u -r1.32.2.6 ata-queue.c > > --- ata-queue.c 23 Mar 2005 04:50:26 -0000 1.32.2.6 > > +++ ata-queue.c 31 Mar 2005 17:00:46 -0000 > > @@ -217,8 +217,7 @@ > > } > > else { > > if (!dumping) > > - callout_reset(&request->callout, request->timeout * hz, > > - (timeout_t*)ata_timeout, request); > > + callout_drain(&request->callout); > > if (request->bio && !(request->flags & ATA_R_TIMEOUT)) { > > ATA_DEBUG_RQ(request, "finish bio_taskqueue"); > > bio_taskqueue(request->bio, (bio_task_t *)ata_completed, request); > > > > It'll be a few hours before I will know on the production machine - the RAID > array has to rebuild before I can trigger the problem, and we're scheduled > for some power work here in an hour or so - which I suspect will get in the > way. > > What do you expect the patch to do, given that removing the delta appears to > fix the instability problem? This patch appears to be "safe". I have about 2 hours on the production machine right now post-rebuild (which had to complete first) with the added "callout_drain" in, have taken two DMA WRITE retries, and have not yet seen any evidence of destabilization. This is good evidence but not proof - before I took out the original line the FIRST write retry would immediately cause the system to become unstable. -- -- Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist http://www.denninger.net My home on the net - links to everything I do! http://scubaforum.org Your UNCENSORED place to talk about DIVING! http://www.spamcuda.net SPAM FREE mailboxes - FREE FOR A LIMITED TIME! http://genesis3.blogspot.com Musings Of A Sentient Mind