Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Apr 2005 20:25:04 -0500
From:      Karl Denninger <karl@denninger.net>
To:        "Matthew N. Dodd" <mdodd@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: DANGER WILL ROBINSON! SERIOUS problem with current 5.4-PRERELEASE - FURTHER UPDATE
Message-ID:  <20050405202504.A6216@denninger.net>
In-Reply-To: <20050331110608.A81295@denninger.net>; from Karl Denninger on Thu, Mar 31, 2005 at 11:06:08AM -0600
References:  <20050329200841.A772@denninger.net> <20050329233843.L328@sasami.jurai.net> <20050329230830.A3222@denninger.net> <20050329234318.A3883@denninger.net> <20050330210830.A46956@denninger.net> <20050330230046.A68235@denninger.net> <20050331120100.P328@sasami.jurai.net> <20050331110608.A81295@denninger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Mar 31, 2005 at 11:06:08AM -0600, Karl Denninger wrote:
> On Thu, Mar 31, 2005 at 12:02:20PM -0500, Matthew N. Dodd wrote:
> > On Wed, 30 Mar 2005, Karl Denninger wrote:
> > > Removing the FIRST delta, which is:
> > >
> > > 218a219,221
> > >       if (!dumping)
> > >           callout_reset(&request->callout, request->timeout * hz,
> > >                         (timeout_t*)ata_timeout, request);
> > >
> > > appears to get rid of the crashes while not harming data integrity OR the
> > > reqeueing.
> > 
> > I'd be interested to know if the attached patch does anything.
> > 
> > -- 
> > 10 40 80 C0 00 FF FF FF FF C0 00 00 00 00 10 AA AA 03 00 00 00 08 00
> > Index: ata-queue.c
> > ===================================================================
> > RCS file: /home/ncvs/src/sys/dev/ata/ata-queue.c,v
> > retrieving revision 1.32.2.6
> > diff -u -u -r1.32.2.6 ata-queue.c
> > --- ata-queue.c	23 Mar 2005 04:50:26 -0000	1.32.2.6
> > +++ ata-queue.c	31 Mar 2005 17:00:46 -0000
> > @@ -217,8 +217,7 @@
> >      }
> >      else {
> >  	if (!dumping)
> > -	    callout_reset(&request->callout, request->timeout * hz,
> > -			  (timeout_t*)ata_timeout, request);
> > +            callout_drain(&request->callout);
> >  	if (request->bio && !(request->flags & ATA_R_TIMEOUT)) {
> >  	    ATA_DEBUG_RQ(request, "finish bio_taskqueue");
> >  	    bio_taskqueue(request->bio, (bio_task_t *)ata_completed, request);
> > 
> 
> It'll be a few hours before I will know on the production machine - the RAID
> array has to rebuild before I can trigger the problem, and we're scheduled
> for some power work here in an hour or so - which I suspect will get in the
> way.
> 
> What do you expect the patch to do, given that removing the delta appears to
> fix the instability problem?

This patch appears to be "safe".

I have about 2 hours on the production machine right now post-rebuild (which
had to complete first) with the added "callout_drain" in, have taken two DMA
WRITE retries, and have not yet seen any evidence of destabilization.

This is good evidence but not proof - before I took out the original line
the FIRST write retry would immediately cause the system to become unstable.

--
-- 
Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist
http://www.denninger.net	My home on the net - links to everything I do!
http://scubaforum.org		Your UNCENSORED place to talk about DIVING!
http://www.spamcuda.net		SPAM FREE mailboxes - FREE FOR A LIMITED TIME!
http://genesis3.blogspot.com	Musings Of A Sentient Mind




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050405202504.A6216>