From owner-freebsd-stable@FreeBSD.ORG Sun May 14 02:02:23 2006 Return-Path: X-Original-To: freebsd-stable@FreeBSD.ORG Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E67AB16A401 for ; Sun, 14 May 2006 02:02:23 +0000 (UTC) (envelope-from Phil@Kernick.org) Received: from mail.rotfl.com.au (eth1779.sa.adsl.internode.on.net [150.101.235.242]) by mx1.FreeBSD.org (Postfix) with ESMTP id 53C3543D45 for ; Sun, 14 May 2006 02:02:22 +0000 (GMT) (envelope-from Phil@Kernick.org) Received: from localhost (localhost.rotfl.com.au [127.0.0.1]) by mail.rotfl.com.au (Postfix) with ESMTP id 5BB6B1CD17 for ; Sun, 14 May 2006 11:32:21 +0930 (CST) X-Virus-Scanned: amavisd-new at rotfl.com.au Received: from mail.rotfl.com.au ([127.0.0.1]) by localhost (mail.rotfl.com.au [127.0.0.1]) (amavisd-new, port 10024) with LMTP id gSwkmYYg-bPq for ; Sun, 14 May 2006 11:31:54 +0930 (CST) Message-ID: <44668F8D.3090208@Kernick.org> Date: Sun, 14 May 2006 11:31:49 +0930 From: Phil Kernick User-Agent: Mozilla Thunderbird 1.0.7 (Windows/20050923) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-stable X-Enigmail-Version: 0.93.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Subject: ATA DMA corruption on 5.5-beta4 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 May 2006 02:02:24 -0000 I have been having repeated filesystem corruption issues on 5.4 and 5.5-beta4. The problems only occur on disks connected to the VIA 82C596B UDMA66 controller. There have been no problems at all with the HighPoint controller. The problems only occur under heavy load with lots of reads and writes, such as a snapshot dump on a live filesystem or a large port compile. I have replaced the disk, motherboard, cables, ram and everything else but the problem does not go away. The one commonality is FreeBSD. I first noticed the problem when a moved from 4.11 to 5.4, and upgrading to 5.5-beta didn't fix the problem. Adding these lines to loader.conf completely solves the corruption problem: $ grep ata /boot/loader.conf hw.ata.wc=0 hw.ata.ata_dma=0 Unfortunately it also slows disk IO to 20% of previous and loads CPU with interrupt waits all the time. Looking through the stable archives this was reported last year but there does not seem to be any resolution of the problem. Since it is limited to a specific controller, and is definitely DMA related, I assume that it is a bug in the controller handling code, but I don't know how to track it down. Any help would be greatly appreciated. Thanks, Phil. Here is some relevant logging... $ dmesg | grep ata atapci0: port 0xc000-0xc00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 atapci1: port 0xe800-0xe8ff,0xe400-0xe403,0xe000-0xe007,0xdc00-0xdc03,0xd800-0xd807 irq 10 at device 11.0 on pci0 ata2: channel #0 on atapci1 ata3: channel #1 on atapci1 ad0: 76319MB [155061/16/63] at ata0-master PIO4 acd0: DVDROM at ata1-master PIO4 ad3: 114473MB [232581/16/63] at ata1-slave PIO4 ad4: 152627MB [310101/16/63] at ata2-master PIO4 ad6: 152627MB [310101/16/63] at ata3-master PIO4 disk0 READY on ad4 at ata2-master disk1 READY on ad6 at ata3-master cd0 at ata1 bus 0 target 0 lun 0 $ bzcat /var/log/messages.0.bz2 | grep "bad block" May 7 11:19:52 kernel: bad block -1, ino 1979583 May 7 11:19:52 kernel: pid 952 (rm), uid 0 inumber 1979583 on /: bad block May 7 22:45:36 kernel: bad block 3472896748185596465, ino 9374210 May 7 22:45:36 kernel: pid 721 (nmbd), uid 0 inumber 9374210 on /: bad block May 7 22:45:36 kernel: bad block 3533694269363075360, ino 9374210 May 7 22:45:36 kernel: pid 721 (nmbd), uid 0 inumber 9374210 on /: bad block May 7 22:45:36 kernel: bad block 3691043183734567734, ino 9374210 May 7 22:45:36 kernel: pid 721 (nmbd), uid 0 inumber 9374210 on /: bad block May 7 22:45:36 kernel: bad block 3900180173712011312, ino 9374210 May 7 22:45:36 kernel: pid 721 (nmbd), uid 0 inumber 9374210 on /: bad block