Date: Thu, 2 Jan 2003 17:38:12 +0100 From: Francesco Casadei <fcasadei@inwind.it> To: Bruce Campbell <bruce@engmail.uwaterloo.ca> Cc: freebsd-hardware@freebsd.org, freebsd-questions@freebsd.org Subject: Re: ata "fallback to PIO mode" on dual processor AMD systems Message-ID: <20030102163812.GA2350@goku.kasby> In-Reply-To: <1041368236.3e1204ac45da5@www.nexusmail.uwaterloo.ca> References: <1041368236.3e1204ac45da5@www.nexusmail.uwaterloo.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
--wRRV7LY7NUeQGEoC Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Dec 31, 2002 at 03:57:16PM -0500, Bruce Campbell wrote: >=20 > I am seeing a problem with ata disks on 4 new systems, which > I believe is either a bug in the ata driver, or a problem with > the onboard IDE controller, or something else. Systems are as follows: >=20 > Motherboard: ASUS A7M266-D > CPUs : 2 x 2000+ AMD MP > Memory : 2 x 512MB Crucial part: CT6472Y265 >=20 > Disks (all UDMA100): >=20 > Master Slave > System 1: WDC WD400BB WDC WD1000BB > System 2: WDC WD400BB WDC WD1000BB > System 3: WDC WD400BB WDC WD800BB > System 4: WDC WD400BB Maxtor 98196H8 >=20 > Kernel : 4.7-RELEASE, custom kernel (compared to GENERIC): >=20 > commented out: >=20 > cpu I386_CPU > cpu I486_CPU >=20 > enabled=20 >=20 > options SMP # Symmetric MultiProcessor Kernel > options APIC_IO # Symmetric (APIC) I/O >=20 >=20 > I am running a test with "dbench" (/usr/ports/benchmarks/dbench) > with a script which runs: >=20 > dbench 1 > sleep for 5 minutes > dbench 2 > sleep for 5 minutes > dbench 3 > ... >=20 > to simulate 1,2,3... clients. >=20 > The following has happened on systems 2,3 and 4, after about 15 hours > of running the test: >=20 > Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=3D0 serv= =3D0 - > resetting > Dec 30 23:26:59 ecserv13 /kernel: ata0: resetting devices .. done > Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=3D0 serv= =3D0=20 > resetting > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done > Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=3D0 serv= =3D0=20 > resetting > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done > Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=3D0 serv= =3D0=20 > resetting > Dec 30 23:27:00 ecserv13 /kernel: ad0: timeout waiting for cmd=3Def s=3Dd= 0 e=3D00 > Dec 30 23:27:00 ecserv13 /kernel: ad0: trying fallback to PIO mode > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done >=20 > The test continues to run with the ata controller in PIO mode, with > slower performance, and higher load average. >=20 > Once the master drops to PIO, attempts to access the slave then cause > it to drop to PIO. >=20 > If I run: >=20 > atacontrol mode 0 UDMA100 UDMA100 >=20 > attempts to access either drive result in a delay until the controller > drops to PIO, and then operations resume. A soft reboot and things > work in UDMA mode again. Also tried UDMA33 and UDMA66 with no change. > I also tried "atacontrol reinit 0" with no help. >=20 > Theories when I search the web for "fallback to PIO mode" include: >=20 > - bad disks > - something to do with thermal recalibration >=20 > I don't believe the problems are bad disks, as the slave drops to PIO > after the master does, and I can't get in back to UDMA, other than by > soft reboot. Plus I see the problem on 6 of 8 disks. >=20 > The problem is very repeatable. >=20 > Can anyone offer any ideas, or suggest investigative steps ? I have a sy= stem > in PIO mode right now. >=20 > Thanks, >=20 > --=20 > Bruce Campbell > Engineering Computing > CPH-2374B > University of Waterloo > (519)888-4567 ext 5889 >=20 > ---------------------------------------- > This mail sent through www.mywaterloo.ca >=20 > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-questions" in the body of the message >=20 > end of the original message Same problem here, but slightly different configuration: # atacontrol list ATA channel 0: Master: ad0 <IC35L040AVER07-0/ER4OA44A> ATA/ATAPI rev 5 Slave: no device present ATA channel 1: Master: acd0 <LG CD-ROM CRD-8521B/1.03> ATA/ATAPI rev 0 Slave: no device present ATA channel 2: Master: ad4 <IC35L040AVER07-0/ER4OA44A> ATA/ATAPI rev 5 Slave: no device present ATA channel 3: Master: ad6 <IC35L040AVER07-0/ER4OA44A> ATA/ATAPI rev 5 Slave: no device present ad4 and ad6 are attached to a Promise FastTrak 100 TX2 ATA RAID controller. # atacontrol mode 0 Master =3D UDMA100=20 Slave =3D ??? # atacontrol mode 1 Master =3D PIO4=20 Slave =3D ??? # atacontrol mode 2 Master =3D UDMA100=20 Slave =3D ??? # atacontrol mode 3 Master =3D PIO4=20 Slave =3D ??? ad6 falls back to PIO mode on heavy I/O activity, i.e. when the system does= a level 0 file systems dump from the RAID 1 array (ad4,ad6) to the backup disk ad0. Rebooting and rebuilding the array with the Promise BIOS utility temporarily solve the problem. The system may be up and running for 1-4 weeks doing a level 0 dump every morning at 5:30am and then one day the drive ad6 falls b= ack to PIO mode again (little before the completion of fs dump). Do the hard drives you are using support the ATA tagged queuing? And if so,= do you have TQ enbled? Francesco Casadei --=20 You can download my public key from http://digilander.libero.it/fcasadei/ or retrieve it from a keyserver (pgpkeys.mit.edu, wwwkeys.pgp.net, ...) Key fingerprint is: 1671 9A23 ACB4 520A E7EE 00B0 7EC3 375F 164E B17B --wRRV7LY7NUeQGEoC Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (FreeBSD) Comment: For info see http://www.gnupg.org iD8DBQE+FGr0fsM3XxZOsXsRAlInAKDb4DiO9vSpMBJnmfRnS3v+qtTs+ACg0EZG BvkLn2Sdg7cpD6KSWoxsYRA= =sE+F -----END PGP SIGNATURE----- --wRRV7LY7NUeQGEoC-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hardware" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030102163812.GA2350>