Date: Sat, 04 Jun 2005 03:48:04 +0200 From: Palle Girgensohn <girgen@pingpong.net> To: freebsd-stable@freebsd.org Cc: Brendan White <bmwt@caida.org> Subject: Repeatable crash with 5.4-p1-RELEASE and SMP Message-ID: <2032FF2A928A89651F1C7843@rambutan.pingpong.net>
next in thread | raw e-mail | index | archive | help
Hi! This is very similar to Brendan White problem just reported here. My guess=20 is it is the very same problem. I've reported the same problem on some=20 occasions before (although I use amd64, so my postings are to=20 amd64@freebsd.org). My system is also Dell 2850, dual CPUs, 3GB RAM, running amd64 FreeBSD=20 5.4-p1. It is quite stable (but slow) when running without SMP. When SMP is = on, it crashes within a few hours. High load, around 4. See my postings on=20 amd64@ for many more details. Anyway, I have managed to get an automatic reboot and a core dump. Giant=20 leap for mankind :-) . It looks kind of partly overwritten, though.=20 According to the Developer's handbook, the core should be saved *before*=20 the swap partition is added to the system. I can easily verifying that this = is not the case, the swap is "mounted" first. I once again raise the=20 question if PR conf/73834 shouln't be addressed? Or perhaps my core dump is = quite normal? Doesn't look like it. In rc.conf, I have: # kernel crash dumps dumpdev=3D"/dev/amrd0s2b" dumpdir=3D"/misc/crash" Here's the dump. Anything else I shall extract, please just ask. # kgdb kernel.debug /misc/crash/vmcore.11 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: = Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you = are welcome to change it and/or distribute copies of it under certain=20 conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". #0 doadump () at pcpu.h:167 167 __asm __volatile("movq %%gs:0,%0" : "=3Dr" (td)); (kgdb) backtrace #0 doadump () at pcpu.h:167 #1 0x0000000000000000 in ?? () #2 0xffffffff80341267 in boot (howto=3D260) at=20 /usr/src/sys/kern/kern_shutdown.c:410 #3 0xffffffff80341ac6 in panic (fmt=3D0xffffff007b76d000 "=A0=ABx{") at=20 /usr/src/sys/kern/kern_shutdown.c:566 #4 0xffffffff804f0f52 in trap_fatal (frame=3D0xc, = eva=3D18446742976269307904) at /usr/src/sys/amd64/amd64/trap.c:639 #5 0xffffffff804f11ef in trap_pfault (frame=3D0xffffffffb1d229b0, = usermode=3D0) at /usr/src/sys/amd64/amd64/trap.c:562 #6 0xffffffff804f1457 in trap (frame=3D {tf_rdi =3D -1097427517200, tf_rsi =3D -1097440243712, tf_rdx =3D = 1056,=20 tf_rcx =3D 0, tf_r8 =3D 0, tf_r9 =3D 0, tf_r ax =3D 1056, tf_rbx =3D 0, tf_rbp =3D -1098069766144, tf_r10 =3D = 4503599627366400,=20 tf_r11 =3D 3392, tf_r12 =3D 4, tf_r13 =3D 4, tf_r14 =3D -1099313881192, tf_r15 =3D -1097364452848, tf_trapno =3D 12, = tf_addr =3D 136, tf_flags =3D -1099313881192 , tf_err =3D 0, tf_rip =3D -2144020582, tf_cs =3D 8, tf_rflags =3D 66050, = tf_rsp =3D=20 -1311626640, tf_ss =3D 0}) at /usr/src/sys/amd64/amd64/trap.c:341 #7 0xffffffff804deb0b in calltrap () at=20 /usr/src/sys/amd64/amd64/exception.S:171 #8 0xffffff007c3900f0 in ?? () #9 0xffffff007b76d000 in ?? () #10 0x0000000000000420 in ?? () #11 0x0000000000000000 in ?? () #12 0x0000000000000000 in ?? () #13 0x0000000000000000 in ?? () #14 0x0000000000000420 in ?? () #15 0x0000000000000000 in ?? () #16 0xffffff0055f11000 in ?? () #17 0x000ffffffffff000 in ?? () #18 0x0000000000000d40 in ?? () #19 0x0000000000000004 in ?? () #20 0x0000000000000004 in ?? () #21 0xffffff000bc95f98 in ?? () #22 0xffffff007ffb4a10 in ?? () #23 0x000000000000000c in ?? () #24 0x0000000000000088 in ?? () #25 0xffffff000bc95f98 in ?? () #26 0x0000000000000000 in ?? () #27 0xffffffff8034d79a in thread_fini (mem=3D0x0, size=3D0) at=20 /usr/src/sys/kern/kern_thread.c:271 #28 0x0000000000000000 in ?? () #29 0x0000000000000001 in ?? () #30 0xffffff007ffb4a00 in ?? () #31 0xffffff0055f11f98 in ?? () #32 0xffffffff804d46ff in zone_drain (zone=3D0x8) at=20 /usr/src/sys/vm/uma_core.c:749 #33 0xffffffff804d22b6 in zone_foreach (zfunc=3D0xffffffff804d4530=20 <zone_drain>) at /usr/src/sys/vm/uma_core.c:1494 #34 0xffffffff804d5ec9 in uma_reclaim () at /usr/src/sys/vm/uma_core.c:2623 #35 0xffffffff804cfcac in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:674 #36 0xffffffff8032805c in fork_exit (callout=3D0xffffffff804cf6b0=20 <vm_pageout>, arg=3D0x0, frame=3D0xffffffffb1d22c50) at /usr/src/sys/kern/kern_fork.c:791 #37 0xffffffff804ded0e in fork_trampoline () at=20 /usr/src/sys/amd64/amd64/exception.S:296 #38 0x0000000000000000 in ?? () #39 0x0000000000000000 in ?? () #40 0x0000000000000001 in ?? () #41 0x0000000000000000 in ?? () #42 0x0000000000000000 in ?? () #43 0x0000000000000000 in ?? () #44 0x0000000000000000 in ?? () #45 0x0000000000000000 in ?? () #46 0x0000000000000000 in ?? () #47 0x0000000000000000 in ?? () #48 0x0000000000000000 in ?? () ---Type <return> to continue, or q <return> to quit--- #49 0x0000000000000000 in ?? () #50 0x0000000000000000 in ?? () #51 0x0000000000000000 in ?? () #52 0x0000000000000000 in ?? () #53 0x0000000000000000 in ?? () #54 0x0000000000000000 in ?? () #55 0x0000000000000000 in ?? () #56 0x0000000000000000 in ?? () #57 0x0000000000000000 in ?? () #58 0x0000000000000000 in ?? () #59 0x0000000000000000 in ?? () #60 0x0000000000000000 in ?? () #61 0x0000000000000000 in ?? () #62 0x0000000000000000 in ?? () #63 0x0000000000000000 in ?? () #64 0x0000000000000000 in ?? () #65 0x0000000000000000 in ?? () #66 0x0000000000000000 in ?? () #67 0x0000000000000000 in ?? () #68 0x0000000000000000 in ?? () #69 0x0000000000000000 in ?? () #70 0x000000000095d000 in ?? () #71 0xffffffffb1d229b0 in ?? () #72 0x0000000000000104 in ?? () #73 0x0000000000000000 in ?? () #74 0xffffff007b78aba0 in ?? () #75 0xffffff007b7af280 in ?? () #76 0xffffffffb1d226e8 in ?? () #77 0xffffff007b76d000 in ?? () #78 0xffffffff80355d5c in sched_switch (td=3D0x0, newtd=3D0x0, flags=3D1) = at=20 /usr/src/sys/kern/sched_4bsd.c:881 #79 0x0000000000000000 in ?? () #80 0x0000000000000000 in ?? () #81 0x0000000000000000 in ?? () #82 0x0000000000000000 in ?? () #83 0x0000000000000000 in ?? () #84 0x0000000000000000 in ?? () #85 0x0000000000000000 in ?? () #86 0x0000000000000000 in ?? () #87 0x0000000000000000 in ?? () #88 0x0000000000000000 in ?? () #89 0x0000000000000000 in ?? () #90 0x0000000000000000 in ?? () #91 0x0000000000000000 in ?? () #92 0x0000000000000000 in ?? () #93 0x0000000000000000 in ?? () #94 0x0000000000000000 in ?? () #95 0x0000000000000000 in ?? () #96 0x0000000000000000 in ?? () #97 0x0000000000000000 in ?? () #98 0x0000000000000000 in ?? () #99 0x0000000000000000 in ?? () #100 0x0000000000000000 in ?? () #101 0x0000000000000000 in ?? () #102 0x0000000000000000 in ?? () #103 0x0000000000000000 in ?? () #104 0x0000000000000000 in ?? () #105 0x0000000000000000 in ?? () #106 0x0000000000000000 in ?? () ---Type <return> to continue, or q <return> to quit--- #107 0x0000000000000000 in ?? () #108 0x0000000000000000 in ?? () #109 0x0000000000000000 in ?? () #110 0x0000000000000000 in ?? () #111 0x0000000000000000 in ?? () #112 0x0000000000000000 in ?? () #113 0x0000000000000000 in ?? () #114 0x0000000000000000 in ?? () #115 0x0000000000000000 in ?? () #116 0x0000000000000000 in ?? () #117 0x0000000000000000 in ?? () #118 0x0000000000000000 in ?? () #119 0x0000000000000000 in ?? () #120 0x0000000000000000 in ?? () #121 0x0000000000000000 in ?? () #122 0x0000000000000000 in ?? () #123 0x0000000000000000 in ?? () #124 0x0000000000000000 in ?? () #125 0x0000000000000000 in ?? () #126 0x0000000000000000 in ?? () #127 0x0000000000000000 in ?? () #128 0x0000000000000000 in ?? () #129 0x0000000000000000 in ?? () #130 0x0000000000000000 in ?? () #131 0x0000000000000000 in ?? () #132 0x0000000000000000 in ?? () #133 0x0000000000000000 in ?? () #134 0x0000000000000000 in ?? () #135 0x0000000000000000 in ?? () #136 0x0000000000000000 in ?? () #137 0x0000000000000000 in ?? () #138 0x0000000000000000 in ?? () #139 0x0000000000000000 in ?? () #140 0x0000000000000000 in ?? () #141 0x0000000000000000 in ?? () #142 0x0000000000000000 in ?? () #143 0x0000000000000000 in ?? () #144 0x0000000000000000 in ?? () #145 0x0000000000000000 in ?? () #146 0x0000000000000000 in ?? () #147 0x0000000000000000 in ?? () #148 0x0000000000000000 in ?? () #149 0x0000000000000000 in ?? () #150 0x0000000000000000 in ?? () Cannot access memory at address 0xffffffffb1d23000 $ dmesg Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.4-RELEASE-p1 #9: Fri Jun 3 22:26:49 CEST 2005 girgen@melon.pingpong.net:/usr/obj/usr/src/sys/MELON Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2793.01-MHz K8-class CPU) Origin =3D "GenuineIntel" Id =3D 0xf41 Stepping =3D 1 =20 Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,M= CA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=3D0x641d<SSE3,RSVD2>,MON,DS_CPL,CNTX-ID,CX16,<b14>> AMD Features=3D0x20100800<SYSCALL,NX,LM> real memory =3D 2147221504 (2047 MB) avail memory =3D 2061885440 (1966 MB) ACPI APIC Table: <DELL PE BKC > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 6 ioapic0: Changing APIC ID to 7 ioapic1: Changing APIC ID to 8 ioapic1: WARNING: intbase 32 !=3D expected base 24 ioapic2: Changing APIC ID to 9 ioapic2: WARNING: intbase 64 !=3D expected base 56 ioapic0 <Version 2.0> irqs 0-23 on motherboard ioapic1 <Version 2.0> irqs 32-55 on motherboard ioapic2 <Version 2.0> irqs 64-87 on motherboard acpi0: <DELL PE BKC> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci1 pci2: <ACPI PCI bus> on pcib2 amr0: <LSILogic MegaRAID 1.51> mem=20 0xdfdc0000-0xdfdfffff,0xd80f0000-0xd80fffff irq 46 at device 14.0 on pci2 amr0: <LSILogic PERC 4e/Di> Firmware 516A, BIOS H418, 256MB RAM pcib3: <ACPI PCI-PCI bridge> at device 0.2 on pci1 pci3: <ACPI PCI bus> on pcib3 pcib4: <ACPI PCI-PCI bridge> at device 4.0 on pci0 pci4: <ACPI PCI bus> on pcib4 pcib5: <ACPI PCI-PCI bridge> at device 5.0 on pci0 pci5: <ACPI PCI bus> on pcib5 pcib6: <ACPI PCI-PCI bridge> at device 0.0 on pci5 pci6: <ACPI PCI bus> on pcib6 em0: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port=20 0xecc0-0xecff mem 0xdfae0000-0xdfafffff irq 64 at device 7.0 on pci6 em0: Ethernet address: 00:11:43:37:a4:9e em0: Speed:N/A Duplex:N/A pcib7: <ACPI PCI-PCI bridge> at device 0.2 on pci5 pci7: <ACPI PCI bus> on pcib7 em1: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port=20 0xdcc0-0xdcff mem 0xdf8e0000-0xdf8fffff irq 65 at device 8.0 on pci7 em1: Ethernet address: 00:11:43:37:a4:9f em1: Speed:N/A Duplex:N/A pcib8: <ACPI PCI-PCI bridge> at device 6.0 on pci0 pci8: <ACPI PCI bus> on pcib8 pci0: <serial bus, USB> at device 29.0 (no driver attached) pci0: <serial bus, USB> at device 29.1 (no driver attached) pci0: <serial bus, USB> at device 29.2 (no driver attached) pci0: <serial bus, USB> at device 29.7 (no driver attached) pcib9: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci9: <ACPI PCI bus> on pcib9 pci9: <display, VGA> at device 13.0 (no driver attached) isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel ICH5 UDMA100 controller> port=20 0xfc00-0xfc0f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on=20 acpi0 sio0: type 16550A orm0: <ISA Option ROMs> at iomem=20 0xec000-0xeffff,0xce800-0xcf7ff,0xcb000-0xcbfff,0xc0000-0xcafff on isa0 ppc0: cannot reserve I/O port range sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=3D0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec acd0: CDROM <TEAC CD-ROM CD-224E/K.9A> at ata0-master PIO4 amrd0: <LSILogic MegaRAID logical drive> on amr0 amrd0: 139760MB (286228480 sectors) RAID 5 (optimal) ses0 at amr0 bus 0 target 6 lun 0 ses0: <PE/PV 1x6 SCSI BP 1.0> Fixed Processor SCSI-2 device ses0: SAF-TE Compliant Device SMP: AP CPU #1 Launched! Mounting root from ufs:/dev/amrd0s2a WARNING: / was not properly dismounted WARNING: /misc was not properly dismounted /misc: mount pending error: blocks 7368 files 5 WARNING: /usr was not properly dismounted WARNING: /usr/local was not properly dismounted /usr/local: mount pending error: blocks 204 files 1 WARNING: /var was not properly dismounted /var: mount pending error: blocks 1344 files 86 WARNING: /var/spool/imap was not properly dismounted em1: Link is up 100 Mbps Half Duplex em0: Link is up 1000 Mbps Full Duplex nothing at all in /etc/make.conf generic kernel with SMP, removed USB since I got interrupt storm, and don't = need it. Also removed FireWire. Diff against GENERIC: $ diff -u GENERIC MELON --- GENERIC Tue Apr 12 15:57:01 2005 +++ MELON Fri Jun 3 20:13:03 2005 @@ -20,7 +20,9 @@ machine amd64 cpu HAMMER -ident GENERIC +ident MELON + +makeoptions DEBUG=3D-g # To statically compile in device wiring instead of /boot/device.hints #hints "GENERIC.hints" # Default places to look for=20 devices. @@ -64,10 +66,10 @@ # Enabling NO_MIXED_MODE gives a performance improvement on some=20 motherboards # but does not work with some boards (mostly nVidia chipset based). -#options NO_MIXED_MODE # Don't penalize working chipsets +options NO_MIXED_MODE # Don't penalize working chipsets # Linux 32-bit ABI support -options LINPROCFS # Cannot be a module yet. +#options LINPROCFS # Cannot be a module yet. # Bus support. Do not remove isa, even if you have no isa slots device acpi @@ -234,29 +236,23 @@ # Note that 'bpf' is required for DHCP. device bpf # Berkeley packet filter -# USB support -device uhci # UHCI PCI->USB interface -device ohci # OHCI PCI->USB interface -#device ehci # EHCI PCI->USB interface (USB 2.0) -device usb # USB Bus (required) -#device udbp # USB Double Bulk Pipe devices -device ugen # Generic -device uhid # "Human Interface Devices" -device ukbd # Keyboard -device ulpt # Printer -device umass # Disks/Mass storage - Requires scbus and = da -device ums # Mouse -device urio # Diamond Rio 500 MP3 player -device uscanner # Scanners -# USB Ethernet, requires mii -device aue # ADMtek USB Ethernet -device axe # ASIX Electronics USB Ethernet -device cdce # Generic USB over Ethernet -device cue # CATC USB Ethernet -device kue # Kawasaki LSI USB Ethernet -device rue # RealTek RTL8150 USB Ethernet - -# FireWire support -device firewire # FireWire bus code -device sbp # SCSI over FireWire (Requires scbus and = da) -device fwe # Ethernet over FireWire (non-standard!) +# SMP +options SMP + +# SysV stuff +# This provides support for System V shared memory. +# +options SYSVSHM +options SYSVSEM +options SYSVMSG +options SHMMAXPGS=3D65536 +options SEMMNI=3D40 +options SEMMNS=3D240 +options SEMUME=3D40 +options SEMMNU=3D120 + +# Debug stuff, temporary +options KDB +options KDB_TRACE +options KDB_UNATTENDED
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2032FF2A928A89651F1C7843>