Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 Mar 2005 18:24:53 -0600
From:      Karl Denninger <karl@denninger.net>
To:        "Matthew N. Dodd" <mdodd@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: DANGER WILL ROBINSON! SERIOUS problem with current 5.4-PRERELEASE
Message-ID:  <20050330182453.A44361@denninger.net>
In-Reply-To: <20050330000611.A29180@denninger.net>; from Karl Denninger on Wed, Mar 30, 2005 at 12:06:11AM -0600
References:  <20050329200841.A772@denninger.net> <20050329233843.L328@sasami.jurai.net> <20050329234318.A3883@denninger.net> <20050330004740.U328@sasami.jurai.net> <20050330000611.A29180@denninger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Mar 30, 2005 at 12:06:11AM -0600, Karl Denninger wrote:
> On Wed, Mar 30, 2005 at 12:50:31AM -0500, Matthew N. Dodd wrote:
> > On Tue, 29 Mar 2005, Karl Denninger wrote:
> > > 245a252
> > >>           request->donecount = 0;
> > >
> > > Without the last delta the requeue doesn't happen at all.
> > 
> > So you're saying that this change:
> > 
> >  	1.42: When resubmitting a timed out request, reset donecount.
> > 
> > produces the problem?
> 
> I believe so, yes, if my recollection from earlier in the month is correct.
> 
> Without it, the problem doesn't exist, but the requeueing doesn't happen
> either (well technically according to the code it does, but it doesn't do
> anything) - so whether that's the problem or whether it simply MASKS the
> problem I can't say without further investigation.
> 
> I am loading up my sandbox machine with an exact copy now, and expect to
> know more sometime tomorrow - I will have to go through a full
> buildworld/installworld/buildkernel/installkernel to bring the sandbox up 
> to date and then stuff a SATA disk and adapter in there to re-create the
> environment closely enough to be sure that I'm looking at the same issue.
> 
> More as soon as I know with certainty.

It appears that the change was backed out of the CVS tree late last night.

I've reproduced the original problem on my sandbox, and am now testing
removing the patch lines one at a time.  Hopefully I can isolate exactly
which line causes trouble.

BTW, it appears that the original problem (DMA write errorrs) ONLY occur if
you have at least two SATA devices - at least in my system - and at least
one of them is on the SI chipset.  

A single UDMA100 PATA disk and a single SATA150 disk DO NOT trigger retries,
no matter how high the load.  Had to diddle with things to get it to go
"bang" so I can isolate....

Hopefully more this evening - first round (removing the "!" change) being
run now.

--
-- 
Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist
http://www.denninger.net	My home on the net - links to everything I do!
http://scubaforum.org		Your UNCENSORED place to talk about DIVING!
http://www.spamcuda.net		SPAM FREE mailboxes - FREE FOR A LIMITED TIME!
http://genesis3.blogspot.com	Musings Of A Sentient Mind




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050330182453.A44361>