From owner-freebsd-stable@FreeBSD.ORG Fri Apr 25 15:03:26 2003 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0CCD837B401 for ; Fri, 25 Apr 2003 15:03:26 -0700 (PDT) Received: from mta02-svc.ntlworld.com (mta02-svc.ntlworld.com [62.253.162.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8D7FC43FA3 for ; Fri, 25 Apr 2003 15:03:24 -0700 (PDT) (envelope-from ianjhart@ntlworld.com) Received: from pc2-cove3-6-cust88.brhm.cable.ntl.com ([81.107.10.88]) by mta02-svc.ntlworld.comESMTP <20030425220323.EUEH9882.mta02-svc.ntlworld.com@pc2-cove3-6-cust88.brhm.cable.ntl.com>; Fri, 25 Apr 2003 23:03:23 +0100 Received: from alpha.private.lan (alpha.private.lan [192.168.0.2]) id h3PM3L6S074324; Fri, 25 Apr 2003 23:03:22 +0100 (BST) (envelope-from ianjhart@ntlworld.com) From: ian j hart To: Sean Chittenden Date: Fri, 25 Apr 2003 23:03:21 +0100 User-Agent: KMail/1.5.1 References: <20030424054114.GY79923@perrin.int.nxad.com> <200304251743.06239.ianjhart@ntlworld.com> <20030425173622.GI79923@perrin.int.nxad.com> In-Reply-To: <20030425173622.GI79923@perrin.int.nxad.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200304252303.21515.ianjhart@ntlworld.com> cc: freebsd-stable@freebsd.org Subject: Re: ATA tag queuing broken... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Apr 2003 22:03:26 -0000 On Friday 25 April 2003 6:36 pm, Sean Chittenden wrote: > > > > > Alright, well it's apparently no surprise to folks that ATA tag > > > > > queuing is broken at the moment. Are there any objections to me > > > > > adding a few cautious words to ata(4) and tuning(7) that advise > > > > > _against_ the use of ata tag queuing given that they're likely the > > > > > fastest way to reboot a -STABLE box? > > > > > > > > > > Here's a PR that I tacked a tad bit of info into: > > > > > > > > > > http://www.FreeBSD.org/cgi/query-pr.cgi?pr=3Dkern/42563 > > > > > > > > That's news to me, works just fine here (4.8-R). > > > > > > That's what my box is as well. See the bottom of the PR for details, > > > but an egrep -r via NFS reboots the box consistently as well as a > > > local CVSup + nice +20 buildworld. > > > > Does it die during the cvs or the buildworld? Buildworld is not very di= sk > > intensive. If you nice +20, even more so. > > Buildworld + mild disk load as an NFS server and it does okay. CVSup > with mild disk load is also okay. But if you toss the three together, > you're sure to get the box to panic. > Numbers would be better :) systat -vm is probably good enough. If the box is really that flakey you should be able to panic it with some=20 other dummy load. I'd bet real money (as high as $5) that NFS panics the bo= x=20 when the underlying disk *goes away* so I'd thrash the living daylights out= =20 of the disks. I'd also bet that you knew this. Anyway I'm rather partial to #ls -R / > /dev/null and different device combinations of #dd of=3D/dev/null bs=3D'63*512' if=3D/dev/ad0 Anyway I have KDE running, typing this email. The ls in one xterm, dd'ing o= ne=20 of the raw disks in another xterm, and systat -vm in a third xterm and it's= =20 solid as a rock. =46YI, with just one dd I get 1500tps 45MB/s 97% usage. > > > > What do you mean by "at the moment"? That pr is six months old. > > > > > > Agreed, but since there's no voting for bugs in gnats, I figured I'd > > > "me too" the PR with an updated time/date and slightly more info. > > > > > > > Did you check the list first? I sent another "works for me" less > > > > than a month ago. (Thread: Status of ATA tagging in Stable Kevin > > > > Oberman 20030329) > > > > > > Yup. It "works" in the sense that under low load, the box works. As > > > soon as I push it, however, it panics and resets. > > > > > > > I note that the pr originator also has the *known to be broken* DTLA > > > > drives. > > > > > > Hrm, well, according to the man pages I've got the right stuff... or > > > not, I don't remember the qualifications mentioned in tuning(7): > > > > > > atapci0: port 0xdc00-0xdc0f at device 17= =2E1 > > > on pci0 ata0: at 0x1f0 irq 14 on atapci0 > > > ata1: at 0x170 irq 15 on atapci0 > > > ad0: 58644MB [119150/16/63] at ata0-master tagged > > > UDMA100 ad2: 58644MB [119150/16/63] at ata1-master > > > tagged UDMA100 > > > > I have very similar hardware. I should be able to reproduce any given > > disk load. Perhaps we should take this off list and try a few things. > > > > Before I go, I should mention that I did have similar "tag" error > > messages a few weeks ago. I also had a reproducable panic when starting > > vinum from single user mode. This turned out to be one (or more) of the > > following. > > > > o 1 bad RAM stick > > o 1 marginal (on spec) RAM stick > > o Aggressive BIOS settings > > o Air filters clogged > > o Unseasonably warm weather (+10F) > > o Phase of the moon > > Of all of those, it could either be a bios setting or ram, but that's > if that's a problem. The machine has been running for a year and a > half and the panics have only been recently (last 6-9mo) or so. Well that's not an exhaustive list, you'd want to add o The disks o The cables (esp length) o The controller o The PSU o everything else The point I was trying to make was that I didn't immediately post to the li= st=20 saying ATA tags are broken (or vinum for that matter). I devoted time, ener= gy=20 and cash to narrowing down the problem, eliminating alternate causes etc. =46rom what you've posted the evidence is "anecdotal" and no-one else has c= ome=20 forward to support it. IMHO that doesn't justify labeling ATA tags as broke= n.=20 My copy of man 7 tuning says that this is "new experimental". Isn't that=20 enough? > > > Find a display card and run memtest86 for an hour or so. Take a note > > of the memory throughput (for BIOS tuning). > > Eh, not so wild about the prospect of the machine dumping given that > I'm 700+mi away from this particular box, but next time I'm in the > data center I will... but like I said, I don't think it's hardware. Not sure what you mean here, re dumping. So you'll have a serial console setup then? I've not tried it but memtest h= as=20 serial console support. Is there a floppy disk and someone on-site with two= =20 brain cells? > > -sc =2D-=20 ian j hart Quoth the raven, bite me! Salem Saberhagen (Episode LXXXI: The Phantom Menace)