From owner-freebsd-questions Sun Dec 8 15:32: 7 2002 Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 90C9437B401 for ; Sun, 8 Dec 2002 15:32:01 -0800 (PST) Received: from smtp3.libero.it (smtp3.libero.it [193.70.192.127]) by mx1.FreeBSD.org (Postfix) with ESMTP id 398EA43EBE for ; Sun, 8 Dec 2002 15:32:00 -0800 (PST) (envelope-from fcasadei@inwind.it) Received: from [62.98.253.68] (62.98.253.68) by smtp3.libero.it (6.7.015) id 3DF0A6F400113773 for freebsd-questions@freebsd.org; Mon, 9 Dec 2002 00:31:53 +0100 Received: (qmail 2389 invoked by uid 1000); 8 Dec 2002 23:31:38 -0000 Date: Mon, 9 Dec 2002 00:31:38 +0100 From: Francesco Casadei To: freebsd-questions@freebsd.org Subject: Re: ATA errors Message-ID: <20021208233138.GA2252@goku.kasby> Mail-Followup-To: freebsd-questions@freebsd.org References: <005401c29d44$b0e24130$c00c460a@pro.tl.thomcorp.net> <3DF0DF91.1050002@pantherdragon.org> <20021207073513.GB34099@dru.dn.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ZGiS0Q5IWpPtfppv" Content-Disposition: inline In-Reply-To: <20021207073513.GB34099@dru.dn.ua> User-Agent: Mutt/1.4i X-Operating-System: FreeBSD 4.7-STABLE i386 Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --ZGiS0Q5IWpPtfppv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Dec 07, 2002 at 09:35:13AM +0200, Vladislav V. Zhuk wrote: [snip] >=20 > I don't think like you. > I check my hardware and I consider that problem in new ATA driver. > Under FreeBSD 4.1.1 my hardware work excellent. > After 4.5 release I get more troubles with IDE devices. > Some bugs was fixed and now (under 4.7s) I have no problem > with IDE HDD (even softupdates work). >=20 > After reboot my system work excellent 2-5 days, than I get > "read timeout" problem with my CDROM and all system hang. >=20 > I wrote about that troubles with ATA, but not get answer... >=20 > Who have problem with ATA driver - write here about this > and show /var/run/dmesg. Maybe we discover some dependences > where trouble appeared.... >=20 > -- > Vladislav V. Zhuk (06267)3-60-03 admin@dru.dn.ua 2:465/197@FidoNet.org >=20 > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-questions" in the body of the message >=20 > end of the original message I had a lot of problems with tagged queuing enabled on IBM drives. I have a server with a Promise FastTrak TX2 ATA RAID controller and 2 IBM 40G drives attached to it. I have another IBM drive (identical to the o= ther two) attached to the mainboard's ATA controller. # atacontrol list ATA channel 0: Master: ad0 ATA/ATAPI rev 5 Slave: no device present ATA channel 1: Master: acd0 ATA/ATAPI rev 0 Slave: no device present ATA channel 2: Master: ad4 ATA/ATAPI rev 5 Slave: no device present ATA channel 3: Master: ad6 ATA/ATAPI rev 5 Slave: no device present The filesystems layout is: # mount /dev/ar0s1a on / (ufs, local, soft-updates) /dev/ar0s1f on /usr (ufs, local, soft-updates) /dev/ar0s1d on /var (ufs, local, noatime, soft-updates) /dev/ar0s1e on /var/tmp (ufs, local, soft-updates) /dev/ar0s1g on /db (ufs, local, soft-updates) /dev/ar0s1h on /home (ufs, local, noatime, soft-updates) /dev/ad0s1a on /backup (ufs, local, soft-updates) procfs on /proc (procfs, local) The sysctl's hw.ata tunables are set as follows: # sysctl -a | grep 'hw\.ata' hw.ata.ata_dma: 1 hw.ata.wc: 1 hw.ata.tags: 1 hw.ata.atapi_dma: 0 The server ran without problems since october 2001 till the summer of 2002, when an MFC broke the tagged queing support. I had to set hw.ata.tags to 0 = to avoid kernel panics and have the system up and running. Finally, the TQ sup= port was (apparently) fixed and I re-enabled it. The system ran fine for a short time though, because the drive on the second channel of the Promise control= ler began to fallback to PIO mode. I don't think it's a hardware problem, because I rebooted the system from t= he live-system CD of the FreeBSD distribution set and ran dd on the faulty dri= ve: no error was reported. I have rebuilt the array using the Promise utilty and rebooted the system w= hich ran in UDMA100 mode for a couple of weeks. Then the problem appeared again: Dec 4 05:40:02 zeus /kernel: ad6: SERVICE timeout tag=3D0 s=3D51 e=3D04 Dec 4 05:40:02 zeus /kernel: ad6: invalidating queued requests Dec 4 05:40:02 zeus /kernel: ad6: no request for tag=3D0 Dec 4 05:40:02 zeus /kernel: ad6: invalidating queued requests Dec 4 05:40:12 zeus /kernel: ad6: READ command timeout tag=3D0 serv=3D1 - resetting Dec 4 05:40:22 zeus /kernel: ad6: invalidating queued requests Dec 4 05:40:22 zeus /kernel: ata3: resetting devices .. ad6: invalidating queued requests Dec 4 05:40:22 zeus /kernel: done Dec 4 05:40:22 zeus /kernel: ad6: READ command timeout tag=3D0 serv=3D1 - resetting Dec 4 05:40:22 zeus /kernel: ad6: invalidating queued requests Dec 4 05:40:22 zeus /kernel: ata3: resetting devices .. ad6: invalidating queued requests Dec 4 05:40:22 zeus /kernel: done Dec 4 05:40:22 zeus /kernel: ad6: no request for tag=3D0 Dec 4 05:40:22 zeus /kernel: ad6: invalidating queued requests Dec 4 05:40:32 zeus /kernel: ad6: READ command timeout tag=3D0 serv=3D1 - resetting Dec 4 05:40:52 zeus /kernel: ad6: invalidating queued requests Dec 4 05:40:52 zeus /kernel: ata3: resetting devices .. ad6: invalidating queued requests Dec 4 05:40:52 zeus /kernel: done Dec 4 05:40:52 zeus /kernel: ad6: timeout waiting for READY Dec 4 05:40:52 zeus /kernel: ad6: invalidating queued requests Dec 4 05:40:52 zeus /kernel: ad6: timeout sending command=3D00 s=3Dd0 e=3D= 04 Dec 4 05:40:52 zeus /kernel: ad6: flush queue failed Dec 4 05:40:52 zeus /kernel: - resetting Dec 4 05:40:52 zeus /kernel: ata3: resetting devices .. ad6: invalidating queued requests Dec 4 05:40:52 zeus /kernel: done Dec 4 05:40:52 zeus /kernel: ad6: READ command timeout tag=3D0 serv=3D1 - resetting Dec 4 05:40:52 zeus /kernel: ad6: invalidating queued requests Dec 4 05:40:52 zeus /kernel: ad6: trying fallback to PIO mode Dec 4 05:40:52 zeus /kernel: ata3: resetting devices .. ad6: invalidating queued requests Dec 4 05:40:52 zeus /kernel: done Dec 4 05:40:52 zeus /kernel: ad6: WRITE command timeout tag=3D0 serv=3D0 - resetting Dec 4 05:40:52 zeus /kernel: ad6: invalidating queued requests Dec 4 05:40:52 zeus /kernel: ata3: resetting devices .. ad6: invalidating queued requests Dec 4 05:40:52 zeus /kernel: done (The most recent error report is shown) # atacontrol mode 3 Master =3D PIO4=20 Slave =3D ??? # atacontrol mode 3 udma100 xxx Master =3D UDMA100=20 Slave =3D ??? # atacontrol mode 3 Master =3D UDMA100=20 Slave =3D ??? If I execute an IO-intensive program then the system falls back to PIO mode= 4: # find /usr/ports/ -name nonexistent # atacontrol mode 3 Master =3D PIO4=20 Slave =3D ??? If I reboot the system the Promise utilty tells me that the array has a critical status. If I rebuild the array and reboot the system, then everyth= ing is fine for other 1-4 weeks before the problem appears again! Note that the problem appears always before the completion of backup activi= ty. =46rom the daily run output before the drive failure: Last dump(s) done (Dump '>' file systems): > /dev/ar0s1a ( /) Last dump: Level 0, Date Tue Dec 3 05:30 /dev/ar0s1d ( /var) Last dump: Level 0, Date Tue Dec 3 05:30 /dev/ar0s1e (/var/tmp) Last dump: Level 0, Date Tue Dec 3 05:30 /dev/ar0s1f ( /usr) Last dump: Level 0, Date Tue Dec 3 05:30 /dev/ar0s1g ( /db) Last dump: Level 0, Date Tue Dec 3 05:40 /dev/ar0s1h ( /home) Last dump: Level 0, Date Tue Dec 3 05:39 On dec, 4th at 05:40:02 the timeout problem appears: Last dump(s) done (Dump '>' file systems): > /dev/ar0s1a ( /) Last dump: Level 0, Date Wed Dec 4 05:30 /dev/ar0s1d ( /var) Last dump: Level 0, Date Wed Dec 4 05:30 /dev/ar0s1e (/var/tmp) Last dump: Level 0, Date Wed Dec 4 05:30 /dev/ar0s1f ( /usr) Last dump: Level 0, Date Wed Dec 4 05:30 /dev/ar0s1g ( /db) Last dump: Level 0, Date Wed Dec 4 05:41 /dev/ar0s1h ( /home) Last dump: Level 0, Date Wed Dec 4 05:39 note that the duration of the backup of /dev/ar0s1g is 1 minute longer than usual (with exactly the same load, not showed). After the problem appeared: Last dump(s) done (Dump '>' file systems): > /dev/ar0s1a ( /) Last dump: Level 0, Date Thu Dec 5 05:30 /dev/ar0s1d ( /var) Last dump: Level 0, Date Thu Dec 5 05:30 /dev/ar0s1e (/var/tmp) Last dump: Level 0, Date Thu Dec 5 05:30 /dev/ar0s1f ( /usr) Last dump: Level 0, Date Thu Dec 5 05:30 /dev/ar0s1g ( /db) Last dump: Level 0, Date Thu Dec 5 05:47 /dev/ar0s1h ( /home) Last dump: Level 0, Date Thu Dec 5 05:44 obviously the system is slower, but it works. I'm tired to reboot and rebuild the array each time, can anybody help me to solve this problem? Francesco Casadei P.S. sorry for the long post, but I'm sure the information I gave you will = help you to diagnose the problem. --=20 You can download my public key from http://digilander.libero.it/fcasadei/ or retrieve it from a keyserver (pgpkeys.mit.edu, wwwkeys.pgp.net, ...) Key fingerprint is: 1671 9A23 ACB4 520A E7EE 00B0 7EC3 375F 164E B17B --ZGiS0Q5IWpPtfppv Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (FreeBSD) Comment: For info see http://www.gnupg.org iD8DBQE989ZafsM3XxZOsXsRAjQGAJ9GV4NtdiImD17ytrhqu9cVeqDetQCg1O8S iPAFYXFDnWKtfrhb0icH5ys= =8JqG -----END PGP SIGNATURE----- --ZGiS0Q5IWpPtfppv-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message