Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 31 Dec 2002 15:57:16 -0500
From:      Bruce Campbell <bruce@engmail.uwaterloo.ca>
To:        freebsd-hardware@freebsd.org, freebsd-questions@freebsd.org
Subject:   ata "fallback to PIO mode" on dual processor AMD systems
Message-ID:  <1041368236.3e1204ac45da5@www.nexusmail.uwaterloo.ca>

next in thread | raw e-mail | index | archive | help

I am seeing a problem with ata disks on 4 new systems, which
I believe is either a bug in the ata driver, or a problem with
the onboard IDE controller, or something else.  Systems are as follows:

Motherboard: ASUS A7M266-D
CPUs       : 2 x 2000+ AMD MP
Memory     : 2 x 512MB Crucial part: CT6472Y265

Disks (all UDMA100):

            Master                   Slave
System 1:  WDC WD400BB             WDC WD1000BB
System 2:  WDC WD400BB             WDC WD1000BB
System 3:  WDC WD400BB             WDC WD800BB
System 4:  WDC WD400BB             Maxtor 98196H8

Kernel : 4.7-RELEASE, custom kernel (compared to GENERIC):

commented out:

 cpu           I386_CPU
 cpu           I486_CPU

enabled 

 options       SMP                     # Symmetric MultiProcessor Kernel
 options       APIC_IO                 # Symmetric (APIC) I/O


I am running a test with "dbench" (/usr/ports/benchmarks/dbench)
with a script which runs:

  dbench 1
  sleep for 5 minutes
  dbench 2
  sleep for 5 minutes
  dbench 3
  ...

to simulate 1,2,3... clients.

The following has happened on systems 2,3 and 4, after about 15 hours
of running the test:

Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 -
resetting
Dec 30 23:26:59 ecserv13 /kernel: ata0: resetting devices .. done
Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 
resetting
Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 
resetting
Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 
resetting
Dec 30 23:27:00 ecserv13 /kernel: ad0: timeout waiting for cmd=ef s=d0 e=00
Dec 30 23:27:00 ecserv13 /kernel: ad0: trying fallback to PIO mode
Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done

The test continues to run with the ata controller in PIO mode, with
slower performance, and higher load average.

Once the master drops to PIO, attempts to access the slave then cause
it to drop to PIO.

If I run:

  atacontrol mode 0 UDMA100 UDMA100

attempts to access either drive result in a delay until the controller
drops to PIO, and then operations resume.  A soft reboot and things
work in UDMA mode again.  Also tried UDMA33 and UDMA66 with no change.
I also tried "atacontrol reinit 0" with no help.

Theories when I search the web for "fallback to PIO mode" include:

 - bad disks
 - something to do with thermal recalibration

I don't believe the problems are bad disks, as the slave drops to PIO
after the master does, and I can't get in back to UDMA, other than by
soft reboot.  Plus I see the problem on 6 of 8 disks.

The problem is very repeatable.

Can anyone offer any ideas, or suggest investigative steps ?  I have a system
in PIO mode right now.

Thanks,

-- 
Bruce Campbell
Engineering Computing
CPH-2374B
University of Waterloo
(519)888-4567 ext 5889

----------------------------------------
This mail sent through www.mywaterloo.ca

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1041368236.3e1204ac45da5>