Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 25 Apr 2003 23:03:21 +0100
From:      ian j hart <ianjhart@ntlworld.com>
To:        Sean Chittenden <sean@chittenden.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: ATA tag queuing broken...
Message-ID:  <200304252303.21515.ianjhart@ntlworld.com>
In-Reply-To: <20030425173622.GI79923@perrin.int.nxad.com>
References:  <20030424054114.GY79923@perrin.int.nxad.com> <200304251743.06239.ianjhart@ntlworld.com> <20030425173622.GI79923@perrin.int.nxad.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday 25 April 2003 6:36 pm, Sean Chittenden wrote:
> > > > > Alright, well it's apparently no surprise to folks that ATA tag
> > > > > queuing is broken at the moment.  Are there any objections to me
> > > > > adding a few cautious words to ata(4) and tuning(7) that advise
> > > > > _against_ the use of ata tag queuing given that they're likely the
> > > > > fastest way to reboot a -STABLE box?
> > > > >
> > > > > Here's a PR that I tacked a tad bit of info into:
> > > > >
> > > > > http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3Dkern/42563
> > > >
> > > > That's news to me, works just fine here (4.8-R).
> > >
> > > That's what my box is as well.  See the bottom of the PR for details,
> > > but an egrep -r via NFS reboots the box consistently as well as a
> > > local CVSup + nice +20 buildworld.
> >
> > Does it die during the cvs or the buildworld? Buildworld is not very di=
sk
> > intensive. If you nice +20, even more so.
>
> Buildworld + mild disk load as an NFS server and it does okay.  CVSup
> with mild disk load is also okay.  But if you toss the three together,
> you're sure to get the box to panic.
>

Numbers would be better :) systat -vm is probably good enough.

If the box is really that flakey you should be able to panic it with some=20
other dummy load. I'd bet real money (as high as $5) that NFS panics the bo=
x=20
when the underlying disk *goes away* so I'd thrash the living daylights out=
=20
of the disks. I'd also bet that you knew this.

Anyway I'm rather partial to
#ls -R / > /dev/null

and different device combinations of

#dd of=3D/dev/null bs=3D'63*512' if=3D/dev/ad0

Anyway I have KDE running, typing this email. The ls in one xterm, dd'ing o=
ne=20
of the raw disks in another xterm, and systat -vm in a third xterm and it's=
=20
solid as a rock.

=46YI, with just one dd I get 1500tps 45MB/s 97% usage.

> > > > What do you mean by "at the moment"? That pr is six months old.
> > >
> > > Agreed, but since there's no voting for bugs in gnats, I figured I'd
> > > "me too" the PR with an updated time/date and slightly more info.
> > >
> > > > Did you check the list first? I sent another "works for me" less
> > > > than a month ago. (Thread: Status of ATA tagging in Stable Kevin
> > > > Oberman 20030329)
> > >
> > > Yup.  It "works" in the sense that under low load, the box works.  As
> > > soon as I push it, however, it panics and resets.
> > >
> > > > I note that the pr originator also has the *known to be broken* DTLA
> > > > drives.
> > >
> > > Hrm, well, according to the man pages I've got the right stuff... or
> > > not, I don't remember the qualifications mentioned in tuning(7):
> > >
> > > atapci0: <VIA 8233 ATA100 controller> port 0xdc00-0xdc0f at device 17=
=2E1
> > > on pci0 ata0: at 0x1f0 irq 14 on atapci0
> > > ata1: at 0x170 irq 15 on atapci0
> > > ad0: 58644MB <IC35L060AVER07-0> [119150/16/63] at ata0-master tagged
> > > UDMA100 ad2: 58644MB <IC35L060AVER07-0> [119150/16/63] at ata1-master
> > > tagged UDMA100
> >
> > I have very similar hardware. I should be able to reproduce any given
> > disk load. Perhaps we should take this off list and try a few things.
> >
> > Before I go, I should mention that I did have similar "tag" error
> > messages a few weeks ago. I also had a reproducable panic when starting
> > vinum from single user mode. This turned out to be one (or more) of the
> > following.
> >
> > o	1 bad RAM stick
> > o	1 marginal (on spec) RAM stick
> > o	Aggressive BIOS settings
> > o	Air filters clogged
> > o	Unseasonably warm weather (+10F)
> > o	Phase of the moon
>
> Of all of those, it could either be a bios setting or ram, but that's
> if that's a problem.  The machine has been running for a year and a
> half and the panics have only been recently (last 6-9mo) or so.

Well that's not an exhaustive list, you'd want to add

o	The disks
o	The cables (esp length)
o	The controller
o	The PSU
o	everything else

The point I was trying to make was that I didn't immediately post to the li=
st=20
saying ATA tags are broken (or vinum for that matter). I devoted time, ener=
gy=20
and cash to narrowing down the problem, eliminating alternate causes etc.

=46rom what you've posted the evidence is "anecdotal" and no-one else has c=
ome=20
forward to support it. IMHO that doesn't justify labeling ATA tags as broke=
n.=20
My copy of man 7 tuning says that this is "new experimental". Isn't that=20
enough?

>
> > Find a display card and run memtest86 for an hour or so. Take a note
> > of the memory throughput (for BIOS tuning).
>
> Eh, not so wild about the prospect of the machine dumping given that
> I'm 700+mi away from this particular box, but next time I'm in the
> data center I will... but like I said, I don't think it's hardware.

Not sure what you mean here, re dumping.

So you'll have a serial console setup then? I've not tried it but memtest h=
as=20
serial console support. Is there a floppy disk and someone on-site with two=
=20
brain cells?

>
> -sc

=2D-=20
ian j hart

Quoth the raven, bite me!
	Salem Saberhagen (Episode LXXXI: The Phantom Menace)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200304252303.21515.ianjhart>