Date: Fri, 24 Sep 2004 15:51:08 -0000 From: Francesco Casadei <fcasadei@inwind.it> To: dwbear75@gmail.com Cc: freebsd-hardware@FreeBSD.ORG Subject: Re: ata "fallback to PIO mode" on dual processor AMD systems Message-ID: <20030102163812.GA2350@goku.kasby> In-Reply-To: <1041368236.3e1204ac45da5@www.nexusmail.uwaterloo.ca> References: <1041368236.3e1204ac45da5@www.nexusmail.uwaterloo.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
--wRRV7LY7NUeQGEoC
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Tue, Dec 31, 2002 at 03:57:16PM -0500, Bruce Campbell wrote:
>=20
> I am seeing a problem with ata disks on 4 new systems, which
> I believe is either a bug in the ata driver, or a problem with
> the onboard IDE controller, or something else. Systems are as follows:
>=20
> Motherboard: ASUS A7M266-D
> CPUs : 2 x 2000+ AMD MP
> Memory : 2 x 512MB Crucial part: CT6472Y265
>=20
> Disks (all UDMA100):
>=20
> Master Slave
> System 1: WDC WD400BB WDC WD1000BB
> System 2: WDC WD400BB WDC WD1000BB
> System 3: WDC WD400BB WDC WD800BB
> System 4: WDC WD400BB Maxtor 98196H8
>=20
> Kernel : 4.7-RELEASE, custom kernel (compared to GENERIC):
>=20
> commented out:
>=20
> cpu I386_CPU
> cpu I486_CPU
>=20
> enabled=20
>=20
> options SMP # Symmetric MultiProcessor Kernel
> options APIC_IO # Symmetric (APIC) I/O
>=20
>=20
> I am running a test with "dbench" (/usr/ports/benchmarks/dbench)
> with a script which runs:
>=20
> dbench 1
> sleep for 5 minutes
> dbench 2
> sleep for 5 minutes
> dbench 3
> ...
>=20
> to simulate 1,2,3... clients.
>=20
> The following has happened on systems 2,3 and 4, after about 15 hours
> of running the test:
>=20
> Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=3D0 serv=
=3D0 -
> resetting
> Dec 30 23:26:59 ecserv13 /kernel: ata0: resetting devices .. done
> Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=3D0 serv=
=3D0=20
> resetting
> Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
> Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=3D0 serv=
=3D0=20
> resetting
> Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
> Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=3D0 serv=
=3D0=20
> resetting
> Dec 30 23:27:00 ecserv13 /kernel: ad0: timeout waiting for cmd=3Def s=3Dd=
0 e=3D00
> Dec 30 23:27:00 ecserv13 /kernel: ad0: trying fallback to PIO mode
> Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
>=20
> The test continues to run with the ata controller in PIO mode, with
> slower performance, and higher load average.
>=20
> Once the master drops to PIO, attempts to access the slave then cause
> it to drop to PIO.
>=20
> If I run:
>=20
> atacontrol mode 0 UDMA100 UDMA100
>=20
> attempts to access either drive result in a delay until the controller
> drops to PIO, and then operations resume. A soft reboot and things
> work in UDMA mode again. Also tried UDMA33 and UDMA66 with no change.
> I also tried "atacontrol reinit 0" with no help.
>=20
> Theories when I search the web for "fallback to PIO mode" include:
>=20
> - bad disks
> - something to do with thermal recalibration
>=20
> I don't believe the problems are bad disks, as the slave drops to PIO
> after the master does, and I can't get in back to UDMA, other than by
> soft reboot. Plus I see the problem on 6 of 8 disks.
>=20
> The problem is very repeatable.
>=20
> Can anyone offer any ideas, or suggest investigative steps ? I have a sy=
stem
> in PIO mode right now.
>=20
> Thanks,
>=20
> --=20
> Bruce Campbell
> Engineering Computing
> CPH-2374B
> University of Waterloo
> (519)888-4567 ext 5889
>=20
> ----------------------------------------
> This mail sent through www.mywaterloo.ca
>=20
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-questions" in the body of the message
>=20
> end of the original message
Same problem here, but slightly different configuration:
# atacontrol list
ATA channel 0:
Master: ad0 <IC35L040AVER07-0/ER4OA44A> ATA/ATAPI rev 5
Slave: no device present
ATA channel 1:
Master: acd0 <LG CD-ROM CRD-8521B/1.03> ATA/ATAPI rev 0
Slave: no device present
ATA channel 2:
Master: ad4 <IC35L040AVER07-0/ER4OA44A> ATA/ATAPI rev 5
Slave: no device present
ATA channel 3:
Master: ad6 <IC35L040AVER07-0/ER4OA44A> ATA/ATAPI rev 5
Slave: no device present
ad4 and ad6 are attached to a Promise FastTrak 100 TX2 ATA RAID controller.
# atacontrol mode 0
Master =3D UDMA100=20
Slave =3D ???
# atacontrol mode 1
Master =3D PIO4=20
Slave =3D ???
# atacontrol mode 2
Master =3D UDMA100=20
Slave =3D ???
# atacontrol mode 3
Master =3D PIO4=20
Slave =3D ???
ad6 falls back to PIO mode on heavy I/O activity, i.e. when the system does=
a
level 0 file systems dump from the RAID 1 array (ad4,ad6) to the backup disk
ad0.
Rebooting and rebuilding the array with the Promise BIOS utility temporarily
solve the problem. The system may be up and running for 1-4 weeks doing a
level 0 dump every morning at 5:30am and then one day the drive ad6 falls b=
ack
to PIO mode again (little before the completion of fs dump).
Do the hard drives you are using support the ATA tagged queuing? And if so,=
do
you have TQ enbled?
Francesco Casadei
--=20
You can download my public key from http://digilander.libero.it/fcasadei/
or retrieve it from a keyserver (pgpkeys.mit.edu, wwwkeys.pgp.net, ...)
Key fingerprint is: 1671 9A23 ACB4 520A E7EE 00B0 7EC3 375F 164E B17B
--wRRV7LY7NUeQGEoC
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (FreeBSD)
Comment: For info see http://www.gnupg.org
iD8DBQE+FGr0fsM3XxZOsXsRAlInAKDb4DiO9vSpMBJnmfRnS3v+qtTs+ACg0EZG
BvkLn2Sdg7cpD6KSWoxsYRA=
=sE+F
-----END PGP SIGNATURE-----
--wRRV7LY7NUeQGEoC--
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030102163812.GA2350>
