Date: Fri, 25 Apr 2003 23:03:21 +0100 From: ian j hart <ianjhart@ntlworld.com> To: Sean Chittenden <sean@chittenden.org> Cc: freebsd-stable@freebsd.org Subject: Re: ATA tag queuing broken... Message-ID: <200304252303.21515.ianjhart@ntlworld.com> In-Reply-To: <20030425173622.GI79923@perrin.int.nxad.com> References: <20030424054114.GY79923@perrin.int.nxad.com> <200304251743.06239.ianjhart@ntlworld.com> <20030425173622.GI79923@perrin.int.nxad.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday 25 April 2003 6:36 pm, Sean Chittenden wrote: > > > > > Alright, well it's apparently no surprise to folks that ATA tag > > > > > queuing is broken at the moment. Are there any objections to me > > > > > adding a few cautious words to ata(4) and tuning(7) that advise > > > > > _against_ the use of ata tag queuing given that they're likely the > > > > > fastest way to reboot a -STABLE box? > > > > > > > > > > Here's a PR that I tacked a tad bit of info into: > > > > > > > > > > http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3Dkern/42563 > > > > > > > > That's news to me, works just fine here (4.8-R). > > > > > > That's what my box is as well. See the bottom of the PR for details, > > > but an egrep -r via NFS reboots the box consistently as well as a > > > local CVSup + nice +20 buildworld. > > > > Does it die during the cvs or the buildworld? Buildworld is not very di= sk > > intensive. If you nice +20, even more so. > > Buildworld + mild disk load as an NFS server and it does okay. CVSup > with mild disk load is also okay. But if you toss the three together, > you're sure to get the box to panic. > Numbers would be better :) systat -vm is probably good enough. If the box is really that flakey you should be able to panic it with some=20 other dummy load. I'd bet real money (as high as $5) that NFS panics the bo= x=20 when the underlying disk *goes away* so I'd thrash the living daylights out= =20 of the disks. I'd also bet that you knew this. Anyway I'm rather partial to #ls -R / > /dev/null and different device combinations of #dd of=3D/dev/null bs=3D'63*512' if=3D/dev/ad0 Anyway I have KDE running, typing this email. The ls in one xterm, dd'ing o= ne=20 of the raw disks in another xterm, and systat -vm in a third xterm and it's= =20 solid as a rock. =46YI, with just one dd I get 1500tps 45MB/s 97% usage. > > > > What do you mean by "at the moment"? That pr is six months old. > > > > > > Agreed, but since there's no voting for bugs in gnats, I figured I'd > > > "me too" the PR with an updated time/date and slightly more info. > > > > > > > Did you check the list first? I sent another "works for me" less > > > > than a month ago. (Thread: Status of ATA tagging in Stable Kevin > > > > Oberman 20030329) > > > > > > Yup. It "works" in the sense that under low load, the box works. As > > > soon as I push it, however, it panics and resets. > > > > > > > I note that the pr originator also has the *known to be broken* DTLA > > > > drives. > > > > > > Hrm, well, according to the man pages I've got the right stuff... or > > > not, I don't remember the qualifications mentioned in tuning(7): > > > > > > atapci0: <VIA 8233 ATA100 controller> port 0xdc00-0xdc0f at device 17= =2E1 > > > on pci0 ata0: at 0x1f0 irq 14 on atapci0 > > > ata1: at 0x170 irq 15 on atapci0 > > > ad0: 58644MB <IC35L060AVER07-0> [119150/16/63] at ata0-master tagged > > > UDMA100 ad2: 58644MB <IC35L060AVER07-0> [119150/16/63] at ata1-master > > > tagged UDMA100 > > > > I have very similar hardware. I should be able to reproduce any given > > disk load. Perhaps we should take this off list and try a few things. > > > > Before I go, I should mention that I did have similar "tag" error > > messages a few weeks ago. I also had a reproducable panic when starting > > vinum from single user mode. This turned out to be one (or more) of the > > following. > > > > o 1 bad RAM stick > > o 1 marginal (on spec) RAM stick > > o Aggressive BIOS settings > > o Air filters clogged > > o Unseasonably warm weather (+10F) > > o Phase of the moon > > Of all of those, it could either be a bios setting or ram, but that's > if that's a problem. The machine has been running for a year and a > half and the panics have only been recently (last 6-9mo) or so. Well that's not an exhaustive list, you'd want to add o The disks o The cables (esp length) o The controller o The PSU o everything else The point I was trying to make was that I didn't immediately post to the li= st=20 saying ATA tags are broken (or vinum for that matter). I devoted time, ener= gy=20 and cash to narrowing down the problem, eliminating alternate causes etc. =46rom what you've posted the evidence is "anecdotal" and no-one else has c= ome=20 forward to support it. IMHO that doesn't justify labeling ATA tags as broke= n.=20 My copy of man 7 tuning says that this is "new experimental". Isn't that=20 enough? > > > Find a display card and run memtest86 for an hour or so. Take a note > > of the memory throughput (for BIOS tuning). > > Eh, not so wild about the prospect of the machine dumping given that > I'm 700+mi away from this particular box, but next time I'm in the > data center I will... but like I said, I don't think it's hardware. Not sure what you mean here, re dumping. So you'll have a serial console setup then? I've not tried it but memtest h= as=20 serial console support. Is there a floppy disk and someone on-site with two= =20 brain cells? > > -sc =2D-=20 ian j hart Quoth the raven, bite me! Salem Saberhagen (Episode LXXXI: The Phantom Menace)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200304252303.21515.ianjhart>