Date: 16 Jun 2003 17:49:41 -0000 From: Przemyslaw Frasunek <venglin@freebsd.lublin.pl> To: FreeBSD-gnats-submit@FreeBSD.org Subject: i386/53382: Repetable panics in ffs_vget() on Proliant ML350 with SMP/HTT enabled Message-ID: <20030616174941.31554.qmail@lagoon.freebsd.lublin.pl> Resent-Message-ID: <200306161800.h5GI0YSH032666@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 53382 >Category: i386 >Synopsis: Repetable panics in ffs_vget() on Proliant ML350 with SMP/HTT enabled >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-i386 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Jun 16 11:00:34 PDT 2003 >Closed-Date: >Last-Modified: >Originator: Przemyslaw Frasunek >Release: FreeBSD 4.8-RELEASE i386 >Organization: ATM S.A. >Environment: System: FreeBSD riot.atman.pl 4.8-RELEASE FreeBSD 4.8-RELEASE #0: Mon Jun 16 18:06:45 CEST 2003 root@riot.atman.pl:/usr/src/sys/compile/RIOT i386 Compaq Proliant ML350; problem repetable on other ML350s with SMP/HTT enabled. Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.8-RELEASE #0: Mon Jun 16 18:06:45 CEST 2003 root@riot.atman.pl:/usr/src/sys/compile/RIOT Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 2392260632 Hz CPU: Intel(R) Xeon(TM) CPU 2.40GHz (2392.26-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf27 Stepping = 7 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Hyperthreading: 2 logical CPUs real memory = 1073717248 (1048552K bytes) avail memory = 1041403904 (1016996K bytes) Preloaded elf kernel "kernel" at 0xc0308000. Pentium Pro MTRR support enabled npx0: <math processor> on motherboard npx0: INT 16 interface pcib0: <Host to PCI bridge> on motherboard pci0: <PCI bus> on pcib0 ahc0: <Adaptec (Compaq OEM) 3960D Ultra160 SCSI adapter> port 0x2400-0x24ff mem 0xf7cf0000-0xf7cf0fff irq 10 at device 2.0 on pci0 aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs ahc1: <Adaptec (Compaq OEM) 3960D Ultra160 SCSI adapter> port 0x2800-0x28ff mem 0xf7ce0000-0xf7ce0fff irq 10 at device 2.1 on pci0 aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs pci0: <ATI Mach64-GR graphics accelerator> at 3.0 bge0: <Broadcom BCM5702X Gigabit Ethernet, ASIC rev. 0x1002> mem 0xf5fe0000-0xf5feffff irq 3 at device 4.0 on pci0 bge0: Ethernet address: 00:0b:cd:4e:17:f7 miibus0: <MII bus> on bge0 brgphy0: <BCM5703 10/100/1000baseTX PHY> on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto pci0: <unknown card> (vendor=0x0e11, dev=0xa0f0) at 5.0 irq 5 isab0: <PCI to ISA bridge (vendor=1166 device=0201)> at device 15.0 on pci0 isa0: <ISA bus> on isab0 pci0: <Unknown PCI ATA controller> at 15.1 pcib1: <Host to PCI bridge> on motherboard pci1: <PCI bus> on pcib1 pcib2: <Host to PCI bridge> on motherboard pci2: <PCI bus> on pcib2 ciss0: <Compaq Smart Array 532> port 0x3000-0x30ff mem 0xf7df0000-0xf7df3fff,0xf7ec0000-0xf7efffff irq 15 at device 1.0 on pci2 ciss0: using 256 of 1024 available commands ciss0: 0 logical drives configured ciss0: firmware 2.20 ciss0: 2 SCSI channels ciss0: signature 'CISS' ciss0: valence 1 ciss0: supported I/O methods 0xe<simple,performant,MEMQ> ciss0: active I/O method 0x3<simple> ciss0: 4G page base 0x00000000 ciss0: interrupt coalesce delay 1000us ciss0: interrupt coalesce count 16 ciss0: max outstanding commands 1024 ciss0: bus types 0x2<ultra3> ciss0: server name '' ciss0: heartbeat 0x3000004a ciss0: 0 logical drive xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0x3400-0x347f mem 0xf7eb0000-0xf7eb007f irq 11 at device 2.0 on pci2 xl0: reset didn't complete xl0: Ethernet address: 00:04:75:f2:2b:e1 miibus1: <MII bus> on xl0 ukphy0: <Generic IEEE 802.3u media interface> on miibus1 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto pcib3: <Host to PCI bridge> on motherboard pci3: <PCI bus> on pcib3 pcib4: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard pci4: <PCI bus> on pcib4 pcib5: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard pci5: <PCI bus> on pcib5 xl1: <3Com 3c905C-TX Fast Etherlink XL> port 0x4000-0x407f mem 0xf7ff0000-0xf7ff007f irq 10 at device 1.0 on pci5 xl0: reset didn't complete xl1: Ethernet address: 00:04:75:f2:2b:dd miibus2: <MII bus> on xl1 ukphy1: <Generic IEEE 802.3u media interface> on miibus2 ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl2: <3Com 3c980C Fast Etherlink XL> port 0x4080-0x40ff mem 0xf7fe0000-0xf7fe007f irq 15 at device 2.0 on pci5 xl2: Ethernet address: 00:04:75:db:fa:9c miibus3: <MII bus> on xl2 xlphy0: <3c905C 10/100 internal PHY> on miibus3 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcbfff,0xcc000-0xcc7ff,0xcc800-0xccfff,0xee000-0xeffff on isa0 fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: model IntelliMouse Explorer, device ID 4 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> DUMMYNET initialized (011031) IP packet filtering initialized, divert disabled, rule-based forwarding enabled, default to accept, logging disabled IP Filter: v3.4.31 initialized. Default = pass all, Logging = enabled Waiting 15 seconds for SCSI devices to settle pt0 at ahc0 bus 0 target 15 lun 0 pt0: <COMPAQ PROLIANT 4L6I 1.78> Fixed Processor SCSI-2 device pt0: 3.300MB/s transfers da2 at ahc0 bus 0 target 2 lun 0 da2: <COMPAQ BD03664545 B20B> Fixed Direct Access SCSI-2 device da2: 160.000MB/s transfers (80.000MHz, offset 127, 16bit), Tagged Queueing Enabled da2: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C) da3 at ahc0 bus 0 target 3 lun 0 da3: <COMPAQ BD03664545 B20B> Fixed Direct Access SCSI-2 device da3: 160.000MB/s transfers (80.000MHz, offset 127, 16bit), Tagged Queueing Enabled da3: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C) da1 at ahc0 bus 0 target 1 lun 0 da1: <COMPAQ BD0366349C 3B06> Fixed Direct Access SCSI-2 device da1: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled da1: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C) da0 at ahc0 bus 0 target 0 lun 0 da0: <COMPAQ BD0186349B 3B11> Fixed Direct Access SCSI-2 device da0: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled da0: 17365MB (35565080 512 byte sectors: 255H 63S/T 2213C) da5 at ahc0 bus 0 target 5 lun 0 da5: <COMPAQ BD03685A24 HPB3> Fixed Direct Access SCSI-3 device da5: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled da5: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C) da4 at ahc0 bus 0 target 4 lun 0 da4: <COMPAQ BD03685A24 HPB3> Fixed Direct Access SCSI-3 device da4: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled da4: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C) Mounting root from ufs:/dev/da0s1a machine i386 cpu I686_CPU ident RIOT maxusers 256 options INET options INET6 options FFS options FFS_ROOT options SOFTUPDATES options UFS_DIRHASH options COMPAT_43 options SCSI_DELAY=15000 options USERCONFIG options SYSVSHM options SYSVMSG options SYSVSEM options MAXDSIZ="(512*1024*1024)" options MAXSSIZ="(512*1024*1024)" options DFLDSIZ="(512*1024*1024)" options NMBCLUSTERS=131070 options PMAP_SHPGPERPROC=400 options SMP options APIC_IO options HTT options P1003_1B options _KPOSIX_PRIORITY_SCHEDULING options ICMP_BANDLIM options KBD_INSTALL_CDEV options IPFILTER options IPFILTER_LOG options IPFIREWALL options IPFIREWALL_DEFAULT_TO_ACCEPT options DUMMYNET device isa device pci device fdc0 at isa? port IO_FD1 irq 6 drq 2 device fd0 at fdc0 drive 0 device fd1 at fdc0 drive 1 device scbus device da device sa device cd device pass device pt device ses device ahc device ciss device atkbdc0 at isa? port IO_KBD device atkbd0 at atkbdc? irq 1 flags 0x1 device psm0 at atkbdc? irq 12 device vga0 at isa? device sc0 at isa? flags 0x100 device npx0 at nexus? port IO_NPX irq 13 device miibus # MII bus support device xl device bge pseudo-device loop # Network loopback pseudo-device ether # Ethernet support pseudo-device pty # Pseudo-ttys (telnet etc) pseudo-device bpf #Berkeley packet filter pseudo-device tun pseudo-device gif >Description: After short period of time with heavy disk activity, most I/O operations fails with EBADF. Then, page fault is caught after no more than one minute: SMP 2 cpus IdlePTD at phsyical address 0x0033a000 initial pcb at physical address 0x002a54e0 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode mp_lock = 01000002; cpuid = 1; lapic.id = 07000000 fault virtual address = 0x0 fault code = supervisor write, page not present instruction pointer = 0x8:0xc023ffb3 stack pointer = 0x10:0xff6e8be0 frame pointer = 0x10:0xff6e8c14 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 4837 (squid) interrupt mask = bio <- SMP: XXX trap number = 12 panic: page fault mp_lock = 01000002; cpuid = 1; lapic.id = 07000000 boot() called on cpu#1 syncing disks... 109 109 109 109 109 109 109 32 32 32 32 32 32 32 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 giving up on 19 buffers Uptime: 5m31s xl0: reset didn't complete xl1: reset didn't complete dumping to dev #da/0x20001, offset 2097200 [...] (kgdb) bt #0 0xc0175a26 in dumpsys () #1 0xc01757f7 in boot () #2 0xc0175c50 in poweroff_wait () #3 0xc02415e0 in trap_fatal () #4 0xc0241271 in trap_pfault () #5 0xc0240e0f in trap () #6 0xc023ffb3 in generic_bzero () #7 0xc0201ae3 in ffs_vget () #8 0xc01f6795 in ffs_valloc () #9 0xc0208fa3 in ufs_makeinode () #10 0xc02069a8 in ufs_create () #11 0xc02092d9 in ufs_vnoperate () #12 0xc01aa4d4 in vn_open () #13 0xc01a66d0 in open () #14 0xc02418b1 in syscall2 () #15 0xc022eefb in Xint0x80_syscall () cannot read proc at 0 (kgdb) info all eax 0x0 0 ecx 0x0 0 edx 0x0 0 ebx 0x0 0 esp 0xff6e8ab0 0xff6e8ab0 ebp 0xff6e8abc 0xff6e8abc esi 0x0 0 edi 0x68000040 1744830528 eip 0xc0175a26 0xc0175a26 eflags 0x0 0 cs 0x0 0 ss 0x0 0 ds 0x0 0 es 0x0 0 fs cannot read u area ptr for proc at 0 Sometimes, panic in pmap-related functions also occur: Fatal trap 12: page fault while in kernel mode mp_lock = 00000002; cpuid = 0; lapic.id = 06000000 fault virtual address = 0xbfc00000 fault code = supervisor write, page not present instruction pointer = 0x8:0xc023d461 stack pointer = 0x10:0xff685e30 frame pointer = 0x10:0xff685e3c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 8523 (cpp0) interrupt mask = none <- SMP: XXX trap number = 12 panic: page fault mp_lock = 00000002; cpuid = 0; lapic.id = 06000000 boot() called on cpu#0 syncing disks... 96 87 87 72 71 69 67 66 66 56 53 52 50 48 48 32 31 31 22 21 19 18 17 17 14 12 11 11 3 3 done Uptime: 5m33s xl0: reset didn't complete xl1: reset didn't complete dumping to dev #da/0x20001, offset 2097200 [...] (kgdb) bt #0 0xc0175a26 in dumpsys () #1 0xc01757f7 in boot () #2 0xc0175c50 in poweroff_wait () #3 0xc02415e0 in trap_fatal () #4 0xc0241271 in trap_pfault () #5 0xc0240e0f in trap () #6 0xc023d461 in pmap_qenter () #7 0xc0185d56 in pipe_build_write_buffer () #8 0xc0185f28 in pipe_direct_write () #9 0xc01862ca in pipe_write () #10 0xc0184723 in dofilewrite () #11 0xc018461a in write () #12 0xc02418b1 in syscall2 () #13 0xc022eefb in Xint0x80_syscall () #14 0x804e900 in ?? () #15 0x804a696 in ?? () #16 0x804813e in ?? () (kgdb) info all eax 0x0 0 ecx 0x0 0 edx 0x0 0 ebx 0x0 0 esp 0xff685d00 0xff685d00 ebp 0xff685d0c 0xff685d0c esi 0x0 0 edi 0x0 0 eip 0xc0175a26 0xc0175a26 eflags 0x0 0 cs 0x0 0 ss 0x0 0 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x2f 47 >How-To-Repeat: Heavy I/O activity on Proliant ML350. >Fix: Turn off SMP. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030616174941.31554.qmail>