Date: Fri, 28 Dec 2001 00:39:00 +0100 From: "Kristian K. Nielsen" <jkkn@jkkn.dk> To: "Matthew Dillon" <dillon@apollo.backplane.com>, "Nils Holland" <nils@tisys.org> Cc: =?iso-8859-1?Q?S=F8ren_Schmidt?= <sos@freebsd.dk>, "Matthew Gilbert" <agilbertm@earthlink.net>, <freebsd-stable@FreeBSD.ORG>, <freebsd-hackers@FreeBSD.ORG> Subject: Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers Message-ID: <008d01c18f2f$ab983b40$bb5ca8c0@jkkn.net> References: <200112262355.fBQNtfK48250@apollo.backplane.com> <200112270945.fBR9j1e97273@freebsd.dk> <20011227163252.A151@tisys.org> <200112271847.fBRIlxh52129@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hey, It is great if you are finding a solution for the VIA-chipset. Do you have any idea if it is a simular problem that I am experiencing? I am not enough into chip code to have a clue what exactly the patch is doing - but maybe it is just decreasing the load on the kernel/system in a way that the crashes are avoided tough there is still a bug outthere somewhere?! I do not have a single VIA-chip in my box that I know of - all Intel and is running the latest BIOS version avialable for my motherboard and still having crashes whenever I put any pressure on the box, like compiling/moving large files across filesystems/etc: Copyright (c) 1992-2001 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.4-STABLE #0: Fri Dec 7 14:21:48 CET 2001 jkkn@jkkn.jkkn.net:/usr/src/sys/compile/JKKN_KRNL Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 300683283 Hz CPU: Pentium II/Pentium II Xeon/Celeron (300.68-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x634 Stepping = 4 Features=0x80f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,M MX> real memory = 402640896 (393204K bytes) avail memory = 387928064 (378836K bytes) Preloaded elf kernel "kernel" at 0xc02ff000. Pentium Pro MTRR support enabled Using $PIR table, 6 entries at 0xc00f0d10 npx0: <math processor> on motherboard npx0: INT 16 interface pcib0: <Intel 82443BX (440 BX) host to PCI bridge> on motherboard pci0: <PCI bus> on pcib0 pcib1: <Intel 82443BX (440 BX) PCI-PCI (AGP) bridge> at device 1.0 on pci0 pci1: <PCI bus> on pcib1 isab0: <Intel 82371AB PCI to ISA bridge> at device 4.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel PIIX4 ATA33 controller> port 0xd800-0xd80f at device 4.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pci0: <Intel 82371AB/EB (PIIX4) USB controller> at 4.2 irq 12 chip1: <Intel 82371AB Power management controller> port 0xe800-0xe80f at device 4.3 on pci0 rl0: <RealTek 8139 10/100BaseTX> port 0xd000-0xd0ff mem 0xe3000000-0xe30000ff irq 10 at device 10.0 on pci0 rl0: Ethernet address: 00:40:95:30:2e:5e miibus0: <MII bus> on rl0 rlphy0: <RealTek internal media interface> on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto pci0: <S3 ViRGE DX/GX graphics accelerator> at 12.0 irq 11 orm0: <Option ROM> at iomem 0xc0000-0xc7fff on isa0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 device_probe_and_attach: atkbd0 attach returned 6 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 IPsec: Initialized Security Association Processing. ad0: 39266MB <IC35L040AVER07-0> [79780/16/63] at ata0-master UDMA33 ad2: 9641MB <IBM-DTTA-371010> [19590/16/63] at ata1-master UDMA33 acd0: CD-RW <CD-RW CRX100E> at ata1-slave using PIO4 Mounting root from ufs:/dev/ad0s1a WARNING: / was not properly dismounted swapon: adding /dev/ad0s1b as swap device Automatic boot in progress... /dev/ad0s1a: 1312 files, 66500 used, 32691 free (355 frags, 4042 blocks, 0.4% fragmentation) /dev/ad2s1a: FILESYSTEM CLEAN; SKIPPING CHECKS /dev/ad2s1a: clean, 15373 free (197 frags, 1897 blocks, 0.5% fragmentation) /dev/ad2s1f: FILESYSTEM CLEAN; SKIPPING CHECKS /dev/ad2s1f: clean, 2784577 free (35865 frags, 343589 blocks, 0.4% fragmentation) /dev/ad2s1e: FILESYSTEM CLEAN; SKIPPING CHECKS /dev/ad2s1e: clean, 10574 free (1182 frags, 1174 blocks, 6.0% fragmentation) /dev/ad0s1f: UNREF FILE I=8558312 OWNER=cyrus MODE=100600 /dev/ad0s1f: SIZE=676 MTIME=Dec 26 03:35 2001 (CLEARED) /dev/ad0s1f: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED) /dev/ad0s1f: SUMMARY INFORMATION BAD (SALVAGED) /dev/ad0s1f: BLK(S) MISSING IN BIT MAPS (SALVAGED) /dev/ad0s1f: 237835 files, 5627465 used, 32454489 free (75065 frags, 4047428 blocks, 0.2% fragmentation) /dev/ad0s1e: 135 files, 568 used, 19247 free (71 frags, 2397 blocks, 0.4% fragmentation) Doing initial network setup: hostname . rl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 192.168.1.2 netmask 0xffffff00 broadcast 192.168.1.255 inet 192.168.1.3 netmask 0xffffffff broadcast 192.168.1.3 inet 192.168.1.4 netmask 0xffffffff broadcast 192.168.1.4 ether 00:40:95:30:2e:5e media: Ethernet autoselect (100baseTX) status: active lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet 127.0.0.1 netmask 0xff000000 add net default: gateway 192.168.1.1 Additional routing options: TCP keepalive=YES . Routing daemons: . Additional daemons: syslogd . dumpon: crash dumps to /dev/ad0s1b (116, 131073) Checking for core dump: savecore: reboot after panic: page fault Dec 26 04:11:11 jkkn savecore: reboot after panic: page fault savecore: system went down at Wed Dec 26 04:07:57 2001 savecore: writing core to /var/crash/vmcore.4 .....snip end..... Regards Kristian ----- Original Message ----- From: "Matthew Dillon" <dillon@apollo.backplane.com> To: "Nils Holland" <nils@tisys.org> Cc: "Søren Schmidt" <sos@freebsd.dk>; "Matthew Gilbert" <agilbertm@earthlink.net>; <freebsd-stable@FreeBSD.ORG>; <freebsd-hackers@FreeBSD.ORG> Sent: Thursday, December 27, 2001 7:47 PM Subject: Re: 4.4-STABLE crashes - suspects new ata-driver over wd-drivers > This is great news! I'm crossing my fingers and hoping that Nils can't > reproduce the crash any more with Soren's fix. > > Just to let you all know, Nils has been working his ass off helping me > track his crash down. I've been pulling my hair out... I gave him patch > after patch to test various conditions & panic if the nfs_node's hash list > somehow got broken, and for the last week not a single one of those tests > detected the problem prior to the panic. The nfs_node's hash list > was being corrupted seemingly out of nowhere. > > The last two days I've had Nils use hardware watchpoints in DDB> to > try to track down what was modifying the memory location, with no > success. The watchpoint was catching the (correct) write to the list > head but then failed to catch the corrupted write prior to the system > panicing, which is what makes me believe it is some sort of chipset > issue. > > Another thing to note: One of the really weird things about Nils crashes > is that the same memory location was getting corrupted every time, five > times in a row (which made it possible to use a hardware watch point). > The corruption changed somewhat when he added the hardware watch point. > Another similar set of crashes in the vm_page_list (that other people > report, including a number of machines at Yahoo), have a similar M.O.... > IDE drive, medium/heavy activity, but while corrupted address always > winds up in the (static) vm_page array, it always tends to be slightly > different. I'm hoping that it winds up being the same or similar > issue. I'm not ruling out the possibility that chipsets other then > the 686B have problems too. > > In anycase, Nils description makes a lot of sense. I've asked him to > continue testing his system to make sure that this particular crash cannot > be reproduced, and I am crossing my fingers. > > I'm also wondering how applicable this patch might be in regards to > forcing a 'safe' mode for other PCI chipsets, to allow us to test > it on non-686B machines that have similar problems. > > -Matt > Matthew Dillon > <dillon@backplane.com> > > > :On Thu, Dec 27, 2001 at 10:45:01AM +0100, Søren Schmidt stood up and spoke: > :> > :> OK, here goes the VIA 686b patch, it is hand cut out from the bulk patches > :> to go into 4.5 so beware :) > : > :Well, as Matt has said, I reported a crash that he's trying to debug. Since > :I have the 686b in my machine, I applied the patch. Ever since then I was > :not able to reproduce the crash again, although yesterday it was so easy > :that I could do it twice an hour ;-) > : > :Anyway, you (Soren) said that the right way to fix this is a BIOS update. > :Now, could it be that some mainboard manufacturers are incapabel of > :handling this? I'm using the latest BIOS for my board, and according to > :http://www.chaintech.com.tw/DL/7xMB/7AJA0.HTM, this should already have > :been fixed in their BIOS release from 2001-04-23... > : > :Second interesting thing: I was using a UDMA66 drive on my 686b until a few > :weeks ago and never had any problems - the stuff Matt is looking at only > :started two appear a short while after I exchanged that drive for a UDMA100 > :one. So, it seems as if probably the slower drive didn't produce a high > :enough PCI workload for anything to actually happen. > : > :This fix will probably also have some influence on a few other similar > :problems (I read Matt was working on many of them). In the end I hope that > :this fix - or a variation thereof - will actually go into 4.5. > : > :Greetings > :Nils > : > :-- > :Nils Holland > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?008d01c18f2f$ab983b40$bb5ca8c0>