From owner-freebsd-current Sun Aug 29 12:43:19 1999 Delivered-To: freebsd-current@freebsd.org Received: from pinhead.parag.codegen.com (207-44-235-154.CodeGen.COM [207.44.235.154]) by hub.freebsd.org (Postfix) with ESMTP id 711F414E66 for ; Sun, 29 Aug 1999 12:43:14 -0700 (PDT) (envelope-from parag@pinhead.parag.codegen.com) Received: from pinhead.parag.codegen.com (parag@localhost.parag.codegen.com [127.0.0.1]) by pinhead.parag.codegen.com (8.9.3/8.9.3) with ESMTP id MAA76839 for ; Sun, 29 Aug 1999 12:43:13 -0700 (PDT) (envelope-from parag@pinhead.parag.codegen.com) To: freebsd-current@freebsd.org Subject: 4.0-CURRENT SMP crash with vinum raid-5 and softupdates X-Face: =O'Kj74icvU|oS*<7gS/8'\Pbpm}okVj*@UC!IgkmZQAO!W[|iBiMs*|)n*`X ]pW%m>Oz_mK^Gdazsr.Z0/JsFS1uF8gBVIoChGwOy{EK=<6g?aHE`[\S]C]T0Wm X-URL: http://www.codegen.com Date: Sun, 29 Aug 1999 12:43:13 -0700 Message-ID: <76835.935955793@pinhead.parag.codegen.com> From: Parag Patel Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hello. I'm not sure which list this should go to as I'm not sure what caused this fault. Unfortunately, I can't get a trace out of ddb - it simply faults again, so advice on how to debug it or what to look at would be much appreciated. The machine is dead right now, and I'll leave it that way so I can run any ddb commands you like, or give you a login on my machine here and let you cu to the PPro console. A friend of mine left a quad-PentiumPro system at my house to play with 'till he builds a new house with enough room to house it. It's a big heavy rack-mount machine with dual power-supplies and such - a recently retired NT server of some sort. Naturally I put FBSD SMP on it. :) / and /usr are on an IDE drive, which the only disk the firmware can see at the moment as my NCR cards don't have the appropriate BIOS code. This leaves 7x4Gb UW SCSI disks to play around with vinum. I setup two RAID volumes on 6 disks, and a regular filesystem on the 7th for comparison and testing. Each disk has a single slice, 256Mb swap, and the rest for either vinum or 4.2BSD filesystems. Swap is enabled only on drives 0, 3, and 6 right now, although none of it was actually touched. (The IDE drive is much slower for swap.) The bootup messages from the kernel plus the vinum config file and the crash output are appended below. Anyway, I was copying the /usr/src tree (find|cpio) onto the raid5 volume when it died. The same command worked fine earlier for the raid10 volume (striped and mirrored only) and the single "noraid" vanilla FFS+softupdates volume on da0 (also in the same array). Both RAID filesystems were also running with softupdates. Earlier I had 3.2-STABLE (also as of Friday evening) installed, and the raid5 volume also crashed the system. As I hadn't built DDB into the kernel, I don't know why it died there, but it's probably the same as whatever nuked 4.0-CURRENT. Assuming that 4.0 would have newer vinum code, I installed that hoping things had improved. (The 3.2-STABLE raid5 volume had also crashed without softupdates.) There are no apparent SCSI errors, and access to the raid10 volume plus a single vanilla FFS filesystem on the extra SCSI drive are fine. Thoughts? Thanks! -- Parag Patel -----ddb crash output----- Fatal trap 12: page fault while in kernel mode mp_lock = 03000003; cpuid = 3; lapic.id = 02000000 fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x8:0x0 stack pointer = 0x10:0xd5730b08 frame pointer = 0x10:0xd5730b4c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 376 (cpio) interrupt mask = bio <- SMP: XXX kernel: type 12 trap, code=0 Stopped at 0: Fatal trap 12: page fault while in kernel mode mp_lock = 03000004; cpuid = 3; lapic.id = 02000000 fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x8:0xc022f168 stack pointer = 0x10:0xd5730980 frame pointer = 0x10:0xd5730984 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 376 (cpio) interrupt mask = bio <- SMP: XXX kernel: type 12 trap, code=0 db> db> trace Fatal trap 12: page fault while in kernel mode mp_lock = 03000005; cpuid = 3; lapic.id = 02000000 fault virtual address = 0x0 fault code = supervisor read, page not present instruction pointer = 0x8:0xc022f168 stack pointer = 0x10:0xd57308b0 frame pointer = 0x10:0xd57308b4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 376 (cpio) interrupt mask = bio <- SMP: XXX kernel: type 12 trap, code=0 db> db> ps pid proc addr uid ppid pgrp flag stat wmesg wchan cmd 376 d2f8dec0 d572f000 0 374 374 804006 2 cpio 375 d2f8ef40 d5700000 0 374 374 004086 3 pipdwt d56e8d40 find 374 d2f8e180 d5721000 0 246 374 004086 3 wait d2f8e180 sh 246 d2f8e020 d572b000 1000 244 246 004086 3 pause d572b108 ksh 244 d2f8e440 d571a000 0 218 218 000084 2 sshd1 241 d2f8f780 d56e0000 0 1 241 004086 3 ttyin c13dfa10 ksh 218 d2f8e2e0 d571e000 0 1 218 000084 3 select c02d82ec sshd1 177 d2f8e5a0 d5717000 0 1 177 000184 3 select c02d82ec sendmail 173 d2f8ec80 d5706000 0 1 173 000084 3 nanslp c02c18a0 cron 170 d2f8f620 d56e3000 0 1 170 000084 3 select c02d82ec inetd 146 d2f8e700 d5713000 0 1 141 000084 3 nfsidl c02da68c nfsiod 145 d2f8e860 d5710000 0 1 141 000084 3 nfsidl c02da688 nfsiod 144 d2f8e9c0 d570c000 0 1 141 000084 3 nfsidl c02da684 nfsiod 143 d2f8eb20 d5709000 0 1 141 000084 3 nfsidl c02da680 nfsiod 130 d2f8ede0 d5703000 1 1 130 000184 3 select c02d82ec portmap 125 d2f8f0a0 d56fc000 0 1 125 000084 3 select c02d82ec xntpd 118 d2f8f4c0 d56ea000 0 1 118 000084 2 syslogd 33 d2f8f200 d56f1000 0 1 33 000084 3 mfsidl d2f87bc0 mount_mfs 18 d2f8f360 d56ee000 0 1 18 000004 3 vinum c142eb74 vinum 5 d2f8f8e0 d2f9c000 0 0 0 500284 3 vrlock 460001 syncer 4 d2f8fa40 d2f9a000 0 0 0 500204 3 psleep c02c19bc bufdaemon 3 d2f8fba0 d2f98000 0 0 0 400204 3 psleep c02cdabc vmdaemon 2 d2f8fd00 d2f96000 0 0 0 500204 3 psleep c02b1bf8 pagedaemon 1 d2f8fe60 d2f94000 0 0 1 004284 3 wait d2f8fe60 init 0 c02d76c0 c0340000 0 0 0 000204 3 sched c02d76c0 swapper db> -----dmesg/bootup----- System: 4xPPro/200Mhz, 512Mb RAM, NCR 875 SCSI, 7x4Gb Fujitsu array, serial console, no VGA, no keyboard FreeBSD: 4.0-CURRENT from Friday Aug 27 evening, and also 3.2-STABLE kernel config: available upon request, but system is down right now essentially GENERIC + SOFTUPDATES + INVARIANTS + DDB - unused devices vinum loaded as module bootup messages (cut-paste from tty console window): Copyright (c) 1992-1999 The FreeBSD Project. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. FreeBSD 4.0-CURRENT #3: Sat Aug 28 23:29:38 PDT 1999 parag@quadhead.parag.codegen.com:/usr/src/sys/compile/PPRO Timecounter "i8254" frequency 1193182 Hz CPU: Pentium Pro (686-class CPU) Origin = "GenuineIntel" Id = 0x619 Stepping = 9 Features=0xfbff real memory = 536870912 (524288K bytes) avail memory = 517791744 (505656K bytes) Programming 16 pins in IOAPIC #0 EISA INTCONTROL = 00001e00 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 3, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000 cpu2 (AP): apic id: 1, version: 0x00040011, at 0xfee00000 cpu3 (AP): apic id: 2, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 4, version: 0x000f0011, at 0xfec00000 Preloaded elf kernel "kernel" at 0xc032d000. Pentium Pro MTRR support enabled npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 isab0: at device 2.0 on pci0 eisa0: on isab0 mainboard0: on eisa0 slot 0 isa0: on isab0 ide_pci0: irq 14 at device 3.0 on pci0 xl0: <3Com 3c905-TX Fast Etherlink XL> irq 9 at device 14.0 on pci0 xl0: Ethernet address: 00:60:97:a1:88:23 xl0: autoneg complete, link status good (half-duplex, 100Mbps) ncr0: irq 11 at device 15.0 on pci0 chip0: at device 20.0 on pci0 pcib1: on motherboard pci1: on pcib1 ncr1: irq 10 at device 12.0 on pci1 Probing for PnP devices: fdc0: at port 0x3f0-0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 wdc0 at port 0x1f0-0x1f7 irq 14 on isa0 wdc0: unit 0 (wd0): wd0: 1628MB (3334464 sectors), 3308 cyls, 16 heads, 63 S/T, 512 B/S wdc1: not probed (disabled) atkbdc0: at port 0x60-0x6f on isa0 atkbd0: irq 1 on atkbdc0 cu: Got hangup signal Disconnected. $ cu-test Connected. sa0 at ncr1 bus 0 target 5 lun 0 sa0: Removable Sequential Access SCSI-2 device sa0: 5.000MB/s transfers (5.000MHz, offset 8) da6 at ncr0 bus 0 target 14 lun 0 da6: Fixed Direct Access SCSI-2 device da6: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled da6: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) da5 at ncr0 bus 0 target 13 lun 0 da5: Fixed Direct Access SCSI-2 device da5: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled da5: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) da4 at ncr0 bus 0 target 12 lun 0 da4: Fixed Direct Access SCSI-2 device da4: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled da4: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) da3 at ncr0 bus 0 target 11 lun 0 da3: Fixed Direct Access SCSI-2 device da3: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled da3: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) da2 at ncr0 bus 0 target 10 lun 0 da2: Fixed Direct Access SCSI-2 device da2: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled da2: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) da1 at ncr0 bus 0 target 9 lun 0 da1: Fixed Direct Access SCSI-2 device da1: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled da1: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) da0 at ncr0 bus 0 target 8 lun 0 da0: Fixed Direct Access SCSI-2 device da0: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled da0: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C) changing root device to wd0s1a cd0 at ncr1 bus 0 target 6 lun 0 cd0: Removable CD-ROM SCSI-2 device cd0: 10.000MB/s transfers (10.000MHz, offset 8) cd0: Attempt to query device size failed: NOT READY, Medium not present -----vinum config----- $ s vinum list Configuration summary Drives: 6 (8 configured) Volumes: 2 (4 configured) Plexes: 3 (8 configured) Subdisks: 12 (16 configured) D d1 State: up Device /dev/da1a Avail: 0/3892 MB (0%) D d2 State: up Device /dev/da2a Avail: 0/3892 MB (0%) D d3 State: up Device /dev/da3a Avail: 0/3892 MB (0%) D d4 State: up Device /dev/da4a Avail: 0/3892 MB (0%) D d5 State: up Device /dev/da5a Avail: 0/3892 MB (0%) D d6 State: up Device /dev/da6a Avail: 0/3892 MB (0%) V raid5 State: up Plexes: 1 Size: 9731 MB V raid10 State: up Plexes: 2 Size: 5838 MB P raid5.p0 R5 State: up Subdisks: 6 Size: 9731 MB P raid10.p0 S State: up Subdisks: 3 Size: 5838 MB P raid10.p1 S State: up Subdisks: 3 Size: 5838 MB S raid5.p0.s0 State: up PO: 0 B Size: 1946 MB S raid5.p0.s1 State: up PO: 256 kB Size: 1946 MB S raid5.p0.s2 State: up PO: 512 kB Size: 1946 MB S raid5.p0.s3 State: up PO: 768 kB Size: 1946 MB S raid5.p0.s4 State: up PO: 1024 kB Size: 1946 MB S raid5.p0.s5 State: up PO: 1280 kB Size: 1946 MB S raid10.p0.s0 State: up PO: 0 B Size: 1946 MB S raid10.p0.s1 State: up PO: 256 kB Size: 1946 MB S raid10.p0.s2 State: up PO: 512 kB Size: 1946 MB S raid10.p1.s0 State: up PO: 0 B Size: 1946 MB S raid10.p1.s1 State: up PO: 256 kB Size: 1946 MB S raid10.p1.s2 State: up PO: 512 kB Size: 1946 MB df : Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/da0a 3922958 1 3609121 0% /noraid /dev/vinum/raid5 9806179 1 9021684 0% /raid5 /dev/vinum/raid10 5883701 1 5413004 0% /raid10 mount : /dev/da0a on /noraid (local, soft-updates, writes: sync 2 async 781) /dev/vinum/raid5 on /raid5 (local, soft-updates, writes: sync 2 async 0) /dev/vinum/raid10 on /raid10 (local, soft-updates, writes: sync 2 async 0) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message