Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 14 May 2006 11:31:49 +0930
From:      Phil Kernick <Phil@Kernick.org>
To:        freebsd-stable <freebsd-stable@FreeBSD.ORG>
Subject:   ATA DMA corruption on 5.5-beta4
Message-ID:  <44668F8D.3090208@Kernick.org>

next in thread | raw e-mail | index | archive | help
I have been having repeated filesystem corruption issues on 5.4 and 5.5-beta4.

The problems only occur on disks connected to the VIA 82C596B UDMA66
controller.  There have been no problems at all with the HighPoint controller.

The problems only occur under heavy load with lots of reads and writes, such
as a snapshot dump on a live filesystem or a large port compile.

I have replaced the disk, motherboard, cables, ram and everything else but
the problem does not go away.  The one commonality is FreeBSD.

I first noticed the problem when a moved from 4.11 to 5.4, and upgrading to
5.5-beta didn't fix the problem.

Adding these lines to loader.conf completely solves the corruption problem:
$ grep ata /boot/loader.conf
hw.ata.wc=0
hw.ata.ata_dma=0

Unfortunately it also slows disk IO to 20% of previous and loads CPU with
interrupt waits all the time.

Looking through the stable archives this was reported last year but there
does not seem to be any resolution of the problem.

Since it is limited to a specific controller, and is definitely DMA related,
I assume that it is a bug in the controller handling code, but I don't know
how to track it down.

Any help would be greatly appreciated.


Thanks,
Phil.


Here is some relevant logging...

$ dmesg | grep ata
atapci0: <VIA 82C596B UDMA66 controller> port
0xc000-0xc00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
atapci1: <HighPoint HPT370 UDMA100 controller> port
0xe800-0xe8ff,0xe400-0xe403,0xe000-0xe007,0xdc00-0xdc03,0xd800-0xd807 irq 10
at device 11.0 on pci0
ata2: channel #0 on atapci1
ata3: channel #1 on atapci1
ad0: 76319MB <ST380011A/3.06> [155061/16/63] at ata0-master PIO4
acd0: DVDROM <LITE-ON DVD SOHD-16P9S/FS07> at ata1-master PIO4
ad3: 114473MB <ST3120026A/8.01> [232581/16/63] at ata1-slave PIO4
ad4: 152627MB <ST3160023A/8.01> [310101/16/63] at ata2-master PIO4
ad6: 152627MB <ST3160023A/3.06> [310101/16/63] at ata3-master PIO4
 disk0 READY on ad4 at ata2-master
 disk1 READY on ad6 at ata3-master
cd0 at ata1 bus 0 target 0 lun 0

$ bzcat /var/log/messages.0.bz2 | grep "bad block"
May  7 11:19:52 kernel: bad block -1, ino 1979583
May  7 11:19:52 kernel: pid 952 (rm), uid 0 inumber 1979583 on /: bad block
May  7 22:45:36 kernel: bad block 3472896748185596465, ino 9374210
May  7 22:45:36 kernel: pid 721 (nmbd), uid 0 inumber 9374210 on /: bad block
May  7 22:45:36 kernel: bad block 3533694269363075360, ino 9374210
May  7 22:45:36 kernel: pid 721 (nmbd), uid 0 inumber 9374210 on /: bad block
May  7 22:45:36 kernel: bad block 3691043183734567734, ino 9374210
May  7 22:45:36 kernel: pid 721 (nmbd), uid 0 inumber 9374210 on /: bad block
May  7 22:45:36 kernel: bad block 3900180173712011312, ino 9374210
May  7 22:45:36 kernel: pid 721 (nmbd), uid 0 inumber 9374210 on /: bad block





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?44668F8D.3090208>