Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Sep 2004 20:56:36 -0400
From:      Louis LeBlanc <FreeBSD@keyslapper.org>
To:        Kendall Gifford <zettabyte@gmail.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: AARRRGGHHH! (was Re: TIMEOUT - WRITE_DMA errors in security output)
Message-ID:  <20040903005636.GC46000@keyslapper.org>
In-Reply-To: <86ba954f040902122348f2b3a4@mail.gmail.com>
References:  <20040827131301.GA58030@keyslapper.org> <20040827224713.GA62316@keyslapper.org> <20040830031613.GA91827@keyslapper.org> <b2807d040408292039204d05a5@mail.gmail.com> <20040830124000.GA96013@keyslapper.org> <86ba954f040902122348f2b3a4@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 09/02/04 01:23 PM, Kendall Gifford sat at the `puter and typed:
> On Mon, 30 Aug 2004 08:40:00 -0400, Louis LeBlanc
> <freebsd@keyslapper.org> wrote:
> > Well, it's probably not an old BOIS, since the machine is less than 3
> > months old.  I checked the BIOS after the system locked up, and it was
> > enabled.  I disabled it, and still couldn't boot until I used the
> > generic kernel.
> > 
> > Right now, the BIOS DMA is off, but the ata_dma sysctl variable is
> > still setting to 1.  I'll check this out for a while, and see if it
> > works.  If not, I'll call Dell for a new HD and cable.
> > 
> 
> Well, I read part of this thread a few days ago but haven't had time
> to respond until now -- sorry.

Hey, I'm just glad you responded :)

> I had a problem that seems to be similar if not identical to this
> one about three months ago -- I emailed both freebsd-questions and
> freebsd-hardware in that order and never got a response, though I've
> worked around the problem.
> 
> I had been running 4.9 in a system with a new motherboard and two
> 120 GB Maxtor ATA133 drives that were also pretty new. I had also
> purchased new, custom UDMA133 round cables for the drives and
> everything worked just peachy under 4.9. When I upgraded to 5.2.1,
> however, I had problems just like what you mentioned -- WRITE_DMA
> warnings and failures whenever there was significant disk activity.
> Eventually this caused one of my vinum raid plexes to go down.
> 
> Anyhow, from researching old mailings and such I noticed that I
> wasn't alone in this seemingly inexplicable problem under 5.1 and
> newer. It seems some of us with drives (someone mentioned that it
> seemed to be ones larger than or equal to 80 or 120 GB or something
> like that) on an ATA controller like the VIA 8235 have this problem
> unless you put the drive(s) into PIO mode.
> 
> I did this (using a custom script in /usr/local/etc/rc.d that
> executes the atacontrol command: /sbin/atacontrol mode 0 pio4 xxx)
> for both of my ATA controllers and everything works fine. I wonder
> if the ata driver just has an incompatability with my specific VIA
> 8235 ata controller or something like that.
> 
> Anyhow, what kind of motheboard/ata-controller do you have?

The ata controller(s) are, from the /var/run/dmesg.boot:

atapci0: <Intel ICH5 UDMA100 controller> port 0xffa0-0xffaf,0x374-0x377,0x170-0x177,0x3f4-0x3f7,0x1f0-0x1f7 mem 0xfebffc00-0xfebfffff irq 18 at device 31.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata0: [MPSAFE]
ata1: at 0x170 irq 15 on atapci0
ata1: [MPSAFE]
atapci1: <Intel ICH5 SATA150 controller> port 0xfea0-0xfeaf,0xfe30-0xfe33,0xfe20-0xfe27,0xfe10-0xfe13,0xfe00-0xfe07 irq 18 at device 31.2 on pci0
atapci1: [MPSAFE]
ata2: at 0xfe00 on atapci1
ata2: [MPSAFE]
ata3: at 0xfe20 on atapci1
ata3: [MPSAFE]

I *think* it's an Intel MB, but I don't know which one.  The machine
is a Dell Dimension 8300 if that helps, and the drive is 160 Gig.

atacontrol gives the following info:
# atacontrol info 0 
Master:      no device present
Slave:       no device present
# atacontrol info 1
Master: acd0 <HL-DT-STDVD-ROM GDR8162B/0015> ATA/ATAPI rev 5
Slave:  acd1 <HL-DT-ST GCE-8483B/B105> ATA/ATAPI rev 0
# atacontrol info 2
Master:  ad4 <WDC WD1600JD-75HBB0/08.02D08> ATA/ATAPI rev 6
Slave:       no device present
# atacontrol info 3
Master:      no device present
Slave:       no device present

Looks like I only have to hit channel 2 for now.

More detail on the disk:
# atacontrol cap 2 0
ATA channel 2, Master, device ad4:

ATA/ATAPI revision    6
device model          WDC WD1600JD-75HBB0
serial number         WD-WMAL91191824
firmware revision     08.02D08
cylinders             16383
heads                 16
sectors/track         63
lba supported         268435455 sectors
lba48 supported         312500000 sectors
dma supported
overlap not supported

Feature                      Support  Enable    Value   Vendor
write cache                    yes      yes
read ahead                     yes      yes
dma queued                     no       no      0/0x00
SMART                          yes      yes
microcode download             yes      yes
security                       no       no
power management               yes      yes
advanced power management      no       no      0/0x00
automatic acoustic management  yes      no      128/0x80 128/0x80


> Also, I noticed that others also recommended turning DMA mode off in
> /boot/loader.conf and that you tried it and it didn't work. I also remember
> reading somewhere a reason why this won't/doesn't work and that is why
> I do it in a /usr/local/etc/rc.d script -- there never seems to be a problem
> booting in DMA mode. Here's my pretty standard script:

I did turn off DMA in the BIOS.  I still saw a couple of the timeouts
yesterday, but nothing so major it locked the system up.

Another thing that I just remembered; when I initially installed
5.2.1-R on this machine, softupdates was on by default.  I had a few
incidents while building the kernel and world from updated source that
caused the system to lock up even worse, and each time resulted in
lost data in the /usr partition.  I turned softupdates off for all
partitions, and though I've had a couple lockups, I haven't had any
lost data since.  Now I wonder if the lockups had anything to do with
softupdates.  It's starting to look like softupdates were only
responsible for the lost data because the lockup caused the updates to
fail.

> #!/bin/sh
> #
> 
> case "$1" in
> start|restart)
>     if [ -f /sbin/atacontrol ] && [ -x /sbin/atacontrol ]; then
>         /sbin/atacontrol mode 0 pio4 xxx
>         /sbin/atacontrol mode 1 pio4 xxx
>     fi
>     ;;
> stop)
>     ;;
> esac

Should I restart with this script, or just try the change without
rebooting?  As I interpret this, the command I should use is 

/sbin/atacontrol mode 2 pio4 xxx

Assuming pio4 is supported by my hardware.  Google's no help there.

> P.S. Let us know what ata controller you have.

If I interpreted everthing right, that would be an Intel ICH5 SATA150.

> O.T. Also, does anyone know why disabling DMA in
> /bool/loader.conf doesn't work?

That would be useful to know, I'm sure.

Thank you very much for your pointers Kendall.

Lou
-- 
Louis LeBlanc               FreeBSD@keyslapper.org
Fully Funded Hobbyist, KeySlapper Extrordinaire :)
http://www.keyslapper.org                     ԿԬ



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040903005636.GC46000>