Date: Tue, 5 Apr 2005 20:25:04 -0500 From: Karl Denninger <karl@denninger.net> To: "Matthew N. Dodd" <mdodd@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: DANGER WILL ROBINSON! SERIOUS problem with current 5.4-PRERELEASE - FURTHER UPDATE Message-ID: <20050405202504.A6216@denninger.net> In-Reply-To: <20050331110608.A81295@denninger.net>; from Karl Denninger on Thu, Mar 31, 2005 at 11:06:08AM -0600 References: <20050329200841.A772@denninger.net> <20050329233843.L328@sasami.jurai.net> <20050329230830.A3222@denninger.net> <20050329234318.A3883@denninger.net> <20050330210830.A46956@denninger.net> <20050330230046.A68235@denninger.net> <20050331120100.P328@sasami.jurai.net> <20050331110608.A81295@denninger.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Mar 31, 2005 at 11:06:08AM -0600, Karl Denninger wrote: > On Thu, Mar 31, 2005 at 12:02:20PM -0500, Matthew N. Dodd wrote: > > On Wed, 30 Mar 2005, Karl Denninger wrote: > > > Removing the FIRST delta, which is: > > > > > > 218a219,221 > > > if (!dumping) > > > callout_reset(&request->callout, request->timeout * hz, > > > (timeout_t*)ata_timeout, request); > > > > > > appears to get rid of the crashes while not harming data integrity OR the > > > reqeueing. > > > > I'd be interested to know if the attached patch does anything. > > > > -- > > 10 40 80 C0 00 FF FF FF FF C0 00 00 00 00 10 AA AA 03 00 00 00 08 00 > > Index: ata-queue.c > > =================================================================== > > RCS file: /home/ncvs/src/sys/dev/ata/ata-queue.c,v > > retrieving revision 1.32.2.6 > > diff -u -u -r1.32.2.6 ata-queue.c > > --- ata-queue.c 23 Mar 2005 04:50:26 -0000 1.32.2.6 > > +++ ata-queue.c 31 Mar 2005 17:00:46 -0000 > > @@ -217,8 +217,7 @@ > > } > > else { > > if (!dumping) > > - callout_reset(&request->callout, request->timeout * hz, > > - (timeout_t*)ata_timeout, request); > > + callout_drain(&request->callout); > > if (request->bio && !(request->flags & ATA_R_TIMEOUT)) { > > ATA_DEBUG_RQ(request, "finish bio_taskqueue"); > > bio_taskqueue(request->bio, (bio_task_t *)ata_completed, request); > > > > It'll be a few hours before I will know on the production machine - the RAID > array has to rebuild before I can trigger the problem, and we're scheduled > for some power work here in an hour or so - which I suspect will get in the > way. > > What do you expect the patch to do, given that removing the delta appears to > fix the instability problem? This patch appears to be "safe". I have about 2 hours on the production machine right now post-rebuild (which had to complete first) with the added "callout_drain" in, have taken two DMA WRITE retries, and have not yet seen any evidence of destabilization. This is good evidence but not proof - before I took out the original line the FIRST write retry would immediately cause the system to become unstable. -- -- Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist http://www.denninger.net My home on the net - links to everything I do! http://scubaforum.org Your UNCENSORED place to talk about DIVING! http://www.spamcuda.net SPAM FREE mailboxes - FREE FOR A LIMITED TIME! http://genesis3.blogspot.com Musings Of A Sentient Mind
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050405202504.A6216>