From owner-freebsd-stable Tue Sep 19 2:26:45 2000 Delivered-To: freebsd-stable@freebsd.org Received: from atrn.bpa.nu (CPE-144-132-209-248.nsw.bigpond.net.au [144.132.209.248]) by hub.freebsd.org (Postfix) with ESMTP id 665DC37B423; Tue, 19 Sep 2000 02:26:35 -0700 (PDT) Received: from juju.bsn (juju.bsn [192.168.1.5]) by atrn.bpa.nu (8.9.3/8.9.3) with ESMTP id UAA85708; Tue, 19 Sep 2000 20:27:16 +1000 (EST) (envelope-from andy@ska.bsn) Received: (from andy@localhost) by juju.bsn (8.9.3/8.9.3) id UAA03568; Tue, 19 Sep 2000 20:26:24 +1100 (EST) (envelope-from andy) Message-Id: <200009190926.UAA03568@juju.bsn> Date: Tue, 19 Sep 2000 20:26:24 +1100 (EST) From: Andy Newman Reply-To: atrn@zeta.org.au Subject: Re: MFC of ahc driver updates (long-ish) To: stable@FreeBSD.org Cc: "Brandon D. Valentine" , gibbs@FreeBSD.org In-Reply-To: <200009162021.OAA02336@pluto.plutotech.com> MIME-Version: 1.0 Content-Type: TEXT/plain; CHARSET=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Justin Gibbs wrote: > I'd be more than happy to make patches available relative to -stable > (sys/dev/aic7xxx/... can simply be copied to a 4.X system and it should > work with an added register define in sys/pci/pcireg.h and some minor > changes to sys/conf/files), but I can only sanction the merge to stable > if adequate testing occurrs. Count me in for testing then. I currently have a very unstable system with a 29160. I'm not exactly sure if its the 29160 or vinum that is causing the problems. There's a new controller coming but while the 29160 is there it may as well be used for some good. System details are: Gigabyte 6vxe+ VIA chipset m/b, PIII 600E, 512MB, 29160, Seagate 9GB Cheetah + 4x 72GB Cheetahs FreeBSD 4.1-STABLE (several versions from release up to today) vinum RAID 5 over the 4x 72GB drives, system on the 9GB drive Initial install hit the firmware problems in the Cheetahs. Quick (ha!) Windows install on a spare IDE drive and call to Seagate support (very helpful) fixes that (Windows install takes longest time of course and Seagate s/w crashes after upgrading firmware on new 72GB disk drive, means I have to buy new pair of underpants :) It actually upgraded the firmware okay, just crashed in the process. Did the same for all 72GB drives. Throw away IDE disk. Reboot. The latest firmware has appeared to cure any troubles with the 9GB Cheetah (ST39204LW). The vinum'd ST173404LW's however, when under high random I/O load, panic. Bulk sequential I/O seems fine, vinum can init the array, dd can fill it with zeros, benchmarks run. But a "find . >/dev/null" from the root of the RAID 5 file system will surely panic it (gee, I get 2am reboots for nothing :) I've tried various file system configurations with some differences in behavior. It appeared that soft updates or async mounts were quicker to panic than a noasync mount (didn't try sync, didn't seem to be much fun with GB's of I/O). Guess its the probably higher I/O ops causing different patterns of buffer, control block usage, interrupt activity, etc.., more chance of mess up with more complex pattern. I'm also suspicious of the buffer corruption problem Greg Lehey still mentions on his vinum know bugs page. Is that still around? Or is the page stale? I'm yet to get a good crash dump of the machine. Following the instructions in the handbook (config -g or a makeoptions DEBUG=-g, dumpon & dumpdev in rc.conf, everything has enough space but no dumps) Many of the panics I've caused remotely which isn't much use either. One I did observe (just today) was curious ... multiple panics in succession followed by a total reset (and I really wanted to catch that one, sigh). Building a debug kernel helped interestingly. It was dammed difficult to make the machine fall over with the debug kernel (It just turns on symbols doesn't it? No code gen differences are there? Or is there #ifdef debug code to modify timing sufficiently?) Multiple concurrent operations on the array (a make -j4 buildworld's /usr/obj on it, copying the array out to another machine, copying 1GB of stuff to it at the same time and multiple find's all running continuously) and it stayed up (compared to dying with a single find previously). Of course it died within five minutes of me doing things to it remotely (which explains this mail :). I'll try building a system with the newer aic7xxx stuff and let you know how things go. It's currently building world to the 9GB drive with a noasync mount of the RAID array and a 2.5GB copy going to it via (async) NFS.....(waits)... Copy worked okay (but the buildworld continues). Later. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message