From owner-freebsd-hackers Wed Dec 26 12:15:43 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id A88AA37B405; Wed, 26 Dec 2001 12:15:22 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id fBQKFHi47418; Wed, 26 Dec 2001 12:15:17 -0800 (PST) (envelope-from dillon) Date: Wed, 26 Dec 2001 12:15:17 -0800 (PST) From: Matthew Dillon Message-Id: <200112262015.fBQKFHi47418@apollo.backplane.com> To: Soren Schmidt Cc: , Subject: Need PCI/VIA chipset help (was Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers) References: <200112202333.fBKNXZ679605@apollo.backplane.com> <29650.193.88.88.10.1008956812.squirrel@webmail.jkkn.net> <200112211930.fBLJUs988388@apollo.backplane.com> <200112261924.fBQJORF47150@apollo.backplane.com> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I am trying to determine if our ATA/VIA setup code may have issues. I did a google search and came up with a linux-quirks patch which may apply to the random corruption problems people have been reporting. Here is the URL: http://www.linuxhq.com/kernel/v2.4/patch/patch-2.4.8/linux/drivers/pci/quirks.c.html My question is: Does any of this apply to our ATA driver code? The problem they are solving looks fairly serious. I would also like to know if there is any sort of generic 'safe' pci configuration that we can set for thet ATA driver to try to rule-in or rule-out the driver as the cause of the memory corruption. So far nearly everyone reporting these weird panics is running under heavy IDE drive loads. I've received about 5 kernel core's from one person and a core from another and investigated them. The corruption appears to occur 'out of the blue', during heavy IDE drive loads. There are no warnings or errors in the dmesg output prior to the crash. I do not believe the kernel code itself is causing the memory corruption. The corruption appears to occur both for DMA and PIO modes. The corruption does not appear to be a memory fault and I find it highly unlikely that it would be a power supply issue. The people reporting the problem have different video and ethernet setups. The only common denominator (for at least 6 out of 7 of the people reporting problems) is the fact that they are doing heavy IDE I/O and may also be doing heavy network I/O (i.e. samba, NFS, etc....). -Matt Copyright (c) 1992-2001 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.5-PRERELEASE #1: Tue Dec 25 13:05:39 PST 2001 root@beaker:/usr/obj/vol/src.stable/sys/BEAKER.debug Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 451025428 Hz CPU: AMD-K6(tm) 3D processor (451.03-MHz 586-class CPU) Origin = "AuthenticAMD" Id = 0x58c Stepping = 12 Features=0x8021bf AMD Features=0x80000800 real memory = 134152192 (131008K bytes) config> di sn0 No such device: sn0 Invalid command or syntax. Type `?' for help. config> di lnc0 No such device: lnc0 Invalid command or syntax. Type `?' for help. config> di ie0 No such device: ie0 Invalid command or syntax. Type `?' for help. config> di fe0 No such device: fe0 Invalid command or syntax. Type `?' for help. config> di ed0 No such device: ed0 Invalid command or syntax. Type `?' for help. config> di cs0 No such device: cs0 Invalid command or syntax. Type `?' for help. config> di bt0 config> di aic0 config> di aha0 config> di adv0 config> q avail memory = 126656512 (123688K bytes) Preloaded elf kernel "kernel" at 0xc03df000. Preloaded userconfig_script "/boot/kernel.conf" at 0xc03df09c. VESA: v3.0, 4096k memory, flags:0x1, mode table:0xc036dca2 (1000022) VESA: NVidia K6-family MTRR support enabled (2 registers) md0: Malloc disk Using $PIR table, 7 entries at 0xc00fded0 npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 pcib1: at device 1.0 on pci0 pci1: on pcib1 isab0: at device 7.0 on pci0 isa0: on isab0 atapci0: port 0xe400-0xe40f at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 uhci0: port 0xe000-0xe01f irq 5 at device 7.2 on pci0 usb0: on uhci0 usb0: USB revision 1.0 uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered pcib2: at device 7.3 on pci0 sym0: <810a> port 0xe800-0xe8ff mem 0xe5000000-0xe50000ff irq 10 at device 8.0 on pci0 sym0: No NVRAM, ID 7, Fast-10, SE, parity checking xl0: <3Com 3c900-COMBO Etherlink XL> port 0xec00-0xec3f irq 11 at device 9.0 on pci0 xl0: Ethernet address: 00:a0:24:d2:c4:91 xl0: selecting 10baseT transceiver, half duplex pci0: at 11.0 irq 5 orm0: