Date: Mon, 8 Feb 2016 09:10:22 +0000 From: Paul Thornton <prt@prt.org> To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org> Subject: Troubleshooting panic on 10.2-RELEASE Message-ID: <56B85B7E.9080705@prt.org>
next in thread | raw e-mail | index | archive | help
Hi My Kernel debugging isn't up to troubleshooting what's going on here. We have two out of 10 identical VMs that regularly (between 5 and 20 hours) panic. On one, I have recompiled GENERIC to get a kernel.debug and have crash dumps being recorded. They are running GENERIC kernels with ZFS-on-root (dmesg at the bottom of this mail). All of the other machines work fine, although the ones that are crashing have the highest use at the moment. The panic is lock-related, and going by pid 12 ([intr]) is interrupt related to network tx/rx. Unread portion of the kernel message buffer: Sleeping thread (tid 100024, pid 12) owns a non-sleepable lock KDB: stack backtrace of thread 100024: #0 0xffffffff80952061 at mi_switch+0xe1 #1 0xffffffff8099043a at sleepq_wait+0x3a #2 0xffffffff80951a87 at _sleep+0x287 #3 0xffffffff80a040c4 at filt_bpfread+0x94 #4 0xffffffff809079eb at knote+0xdb #5 0xffffffff80a00acb at catchpacket+0x67b #6 0xffffffff80a00d20 at bpf_mtap+0x1e0 #7 0xffffffff80a0f634 at ether_nh_input+0x174 #8 0xffffffff80a177d2 at netisr_dispatch_src+0x62 #9 0xffffffff80df8fcf at vmxnet3_rxq_eof+0x4ff #10 0xffffffff80df86f1 at vmxnet3_legacy_intr+0xe1 #11 0xffffffff8091482b at intr_event_execute_handlers+0xab #12 0xffffffff80914c76 at ithread_loop+0x96 #13 0xffffffff8091244a at fork_exit+0x9a #14 0xffffffff80d30cde at fork_trampoline+0xe panic: sleeping thread cpuid = 0 So far, out of 5 dumps, I see two patterns emerging. One, like the above, involves vmxnet3, the other (to my untrained eye) looks like a write on a Unix socket. They both share knote. (kgdb) backtrace #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff80948642 in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:451 #2 0xffffffff80948a25 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at ../../../kern/kern_shutdown.c:758 #3 0xffffffff809488b3 in panic (fmt=0x0) at ../../../kern/kern_shutdown.c:687 #4 0xffffffff80995f89 in propagate_priority (td=<value optimized out>) at ../../../kern/subr_turnstile.c:227 #5 0xffffffff80996a0e in turnstile_wait (ts=<value optimized out>, owner=<value optimized out>, queue=<value optimized out>) at ../../../kern/subr_turnstile.c:743 #6 0xffffffff8092ea3b in __mtx_lock_sleep (c=0xfffff80027b63718, tid=18446735277657056416, opts=<value optimized out>, file=<value optimized out>, line=0) at ../../../kern/kern_mutex.c:522 #7 0xffffffff809079a5 in knote (list=0xfffff80027ca50a8, hint=0, lockflags=<value optimized out>) at ../../../kern/kern_event.c:1873 #8 0xffffffff80a00acb in catchpacket (d=<value optimized out>, pkt=<value optimized out>, pktlen=<value optimized out>, snaplen=<value optimized out>, cpfn=<value optimized out>, bt=<value optimized out>) at ../../../net/bpf.c:980 #9 0xffffffff80a00d20 in bpf_mtap (bp=0xfffff800027e3680, m=0xfffff800274d2b00) at ../../../net/bpf.c:2142 #10 0xffffffff80a0f634 in ether_nh_input (m=<value optimized out>) at ../../../net/if_ethersubr.c:520 #11 0xffffffff80a177d2 in netisr_dispatch_src (proto=<value optimized out>, source=<value optimized out>, m=0x0) at ../../../net/netisr.c:976 #12 0xffffffff80df8fcf in vmxnet3_rxq_eof (rxq=0xfffff80002797400) at ../../../dev/vmware/vmxnet3/if_vmx.c:2076 #13 0xffffffff80df86f1 in vmxnet3_legacy_intr (xsc=0xfffff800027c3800) at ../../../dev/vmware/vmxnet3/if_vmx.c:2245 #14 0xffffffff8091482b in intr_event_execute_handlers (p=<value optimized out>, ie=0xfffff800027ea100) at ../../../kern/kern_intr.c:1264 #15 0xffffffff80914c76 in ithread_loop (arg=0xfffff800027ca7a0) at ../../../kern/kern_intr.c:1277 #16 0xffffffff8091244a in fork_exit (callout=0xffffffff80914be0 <ithread_loop>, arg=0xfffff800027ca7a0, frame=0xfffffe00934ccac0) at ../../../kern/kern_fork.c:1018 #17 0xffffffff80d30cde in fork_trampoline () at ../../../amd64/amd64/exception.S:611 #18 0x0000000000000000 in ?? () [From a different panic and reboot] (kgdb) backtrace #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff80948642 in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:451 #2 0xffffffff80948a25 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at ../../../kern/kern_shutdown.c:758 #3 0xffffffff809488b3 in panic (fmt=0x0) at ../../../kern/kern_shutdown.c:687 #4 0xffffffff80995f89 in propagate_priority (td=<value optimized out>) at ../../../kern/subr_turnstile.c:227 #5 0xffffffff80996a0e in turnstile_wait (ts=<value optimized out>, owner=<value optimized out>, queue=<value optimized out>) at ../../../kern/subr_turnstile.c:743 #6 0xffffffff8092ea3b in __mtx_lock_sleep (c=0xfffff8002770de18, tid=18446735278278059168, opts=<value optimized out>, file=<value optimized out>, line=0) at ../../../kern/kern_mutex.c:522 #7 0xffffffff809079a5 in knote (list=0xfffff80027866b80, hint=0, lockflags=<value optimized out>) at ../../../kern/kern_event.c:1873 #8 0xffffffff809bcc35 in sowakeup (so=0xfffff80027866ae0, sb=0xfffff80027866b70) at ../../../kern/uipc_sockbuf.c:189 #9 0xffffffff809cce46 in uipc_send (so=0xfffff8002786fae0, flags=<value optimized out>, m=<value optimized out>, nam=0x0, control=0x0, td=<value optimized out>) at ../../../kern/uipc_usrreq.c:902 #10 0xffffffff809c0526 in sosend_generic (so=0xfffff8002786fae0, addr=0x0, uio=0xfffffe00935fa960, top=<value optimized out>, control=<value optimized out>, flags=<value optimized out>, td=0x0) at ../../../kern/uipc_socket.c:1283 #11 0xffffffff809a4869 in soo_write (fp=<value optimized out>, uio=0xfffffe00935fa960, active_cred=<value optimized out>, flags=<value optimized out>, td=<value optimized out>) at ../../../kern/sys_socket.c:103 #12 0xffffffff8099c647 in dofilewrite (td=0xfffff800276e24a0, fd=7, fp=0xfffff80027871d20, auio=0xfffffe00935fa960, offset=<value optimized out>, flags=0) at file.h:304 #13 0xffffffff8099c378 in kern_writev (td=0xfffff800276e24a0, fd=7, auio=0xfffffe00935fa960) at ../../../kern/sys_generic.c:481 #14 0xffffffff8099c303 in sys_write (td=<value optimized out>, uap=<value optimized out>) at ../../../kern/sys_generic.c:396 #15 0xffffffff80d4b3a7 in amd64_syscall (td=0xfffff800276e24a0, traced=0) at subr_syscall.c:134 #16 0xffffffff80d30a8b in Xfast_syscall () at ../../../amd64/amd64/exception.S:396 #17 0x0000000801a1cc0a in ?? () We haven't seen this on any other 10.2-RELEASE install - either on a VM or physical machine. What's the best way forward to troubleshoot this? Thanks Paul. boot dmesg follows: Copyright (c) 1992-2015 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 10.2-RELEASE-p8 #1: Fri Feb 5 16:22:56 UTC 2016 root@portal.dev:/usr/src/sys/amd64/compile/GENERIC amd64 FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 CPU: Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz (3500.00-MHz K8-class CPU) Origin="GenuineIntel" Id=0x306f2 Family=0x6 Model=0x3f Stepping=2 Features=0xfa3fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,DTS,MMX,FXSR,SSE,SSE2,SS> Features2=0xfdfa3203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD Features2=0x21<LAHF,ABM> Structured Extended Features=0x272a<TSCADJ,BMI1,AVX2,BMI2,ERMS,INVPCID,NFPUSG> XSAVE Features=0x1<XSAVEOPT> TSC: P-state invariant Hypervisor: Origin = "VMwareVMware" real memory = 2147483648 (2048 MB) avail memory = 2050248704 (1955 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <PTLTD APIC > MADT: Forcing active-low polarity and level trigger for SCI ioapic0 <Version 1.1> irqs 0-23 on motherboard random: <Software, Yarrow> initialized kbd1 at kbdmux0 acpi0: <INTEL 440BX> on motherboard acpi0: Power Button (fixed) Timecounter "HPET" frequency 14318180 Hz quality 950 cpu0: <ACPI CPU> on acpi0 attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 isab0: <PCI-ISA bridge> at device 7.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel PIIX4 UDMA33 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x1060-0x106f at device 7.1 on pci0 ata0: <ATA channel> at channel 0 on atapci0 ata1: <ATA channel> at channel 1 on atapci0 pci0: <bridge> at device 7.3 (no driver attached) vgapci0: <VGA-compatible display> port 0x1070-0x107f mem 0xe8000000-0xefffffff,0xfe000000-0xfe7fffff irq 16 at device 15.0 on pci0 vgapci0: Boot video device pcib2: <ACPI PCI-PCI bridge> at device 17.0 on pci0 pci2: <ACPI PCI bus> on pcib2 pcib3: <ACPI PCI-PCI bridge> at device 21.0 on pci0 pci3: <ACPI PCI bus> on pcib3 vmx0: <VMware VMXNET3 Ethernet Adapter> port 0x4000-0x400f mem 0xfd5fc000-0xfd5fcfff,0xfd5fd000-0xfd5fdfff,0xfd5fe000-0xfd5fffff irq 18 at device 0.0 on pci3 vmx0: Ethernet address: 00:0c:29:e5:ef:c4 pcib4: <ACPI PCI-PCI bridge> at device 21.1 on pci0 pci4: <ACPI PCI bus> on pcib4 pcib5: <ACPI PCI-PCI bridge> at device 21.2 on pci0 pci5: <ACPI PCI bus> on pcib5 pcib6: <ACPI PCI-PCI bridge> at device 21.3 on pci0 pci6: <ACPI PCI bus> on pcib6 pcib7: <ACPI PCI-PCI bridge> at device 21.4 on pci0 pci7: <ACPI PCI bus> on pcib7 pcib8: <ACPI PCI-PCI bridge> at device 21.5 on pci0 pci8: <ACPI PCI bus> on pcib8 pcib9: <ACPI PCI-PCI bridge> at device 21.6 on pci0 pci9: <ACPI PCI bus> on pcib9 pcib10: <ACPI PCI-PCI bridge> at device 21.7 on pci0 pci10: <ACPI PCI bus> on pcib10 pcib11: <ACPI PCI-PCI bridge> at device 22.0 on pci0 pci11: <ACPI PCI bus> on pcib11 vmx1: <VMware VMXNET3 Ethernet Adapter> port 0x5000-0x500f mem 0xfd4fc000-0xfd4fcfff,0xfd4fd000-0xfd4fdfff,0xfd4fe000-0xfd4fffff irq 19 at device 0.0 on pci11 vmx1: Ethernet address: 0a:d1:c0:01:03:c0 pcib12: <ACPI PCI-PCI bridge> at device 22.1 on pci0 pci12: <ACPI PCI bus> on pcib12 pcib13: <ACPI PCI-PCI bridge> at device 22.2 on pci0 pci13: <ACPI PCI bus> on pcib13 pcib14: <ACPI PCI-PCI bridge> at device 22.3 on pci0 pci14: <ACPI PCI bus> on pcib14 pcib15: <ACPI PCI-PCI bridge> at device 22.4 on pci0 pci15: <ACPI PCI bus> on pcib15 pcib16: <ACPI PCI-PCI bridge> at device 22.5 on pci0 pci16: <ACPI PCI bus> on pcib16 pcib17: <ACPI PCI-PCI bridge> at device 22.6 on pci0 pci17: <ACPI PCI bus> on pcib17 pcib18: <ACPI PCI-PCI bridge> at device 22.7 on pci0 pci18: <ACPI PCI bus> on pcib18 pcib19: <ACPI PCI-PCI bridge> at device 23.0 on pci0 pci19: <ACPI PCI bus> on pcib19 mpt0: <LSILogic SAS/SATA Adapter> port 0x6000-0x60ff mem 0xfd3ec000-0xfd3effff,0xfd3f0000-0xfd3fffff irq 16 at device 0.0 on pci19 mpt0: MPI Version=1.5.0.0 pcib20: <ACPI PCI-PCI bridge> at device 23.1 on pci0 pci20: <ACPI PCI bus> on pcib20 pcib21: <ACPI PCI-PCI bridge> at device 23.2 on pci0 pci21: <ACPI PCI bus> on pcib21 pcib22: <ACPI PCI-PCI bridge> at device 23.3 on pci0 pci22: <ACPI PCI bus> on pcib22 pcib23: <ACPI PCI-PCI bridge> at device 23.4 on pci0 pci23: <ACPI PCI bus> on pcib23 pcib24: <ACPI PCI-PCI bridge> at device 23.5 on pci0 pci24: <ACPI PCI bus> on pcib24 pcib25: <ACPI PCI-PCI bridge> at device 23.6 on pci0 pci25: <ACPI PCI bus> on pcib25 pcib26: <ACPI PCI-PCI bridge> at device 23.7 on pci0 pci26: <ACPI PCI bus> on pcib26 pcib27: <ACPI PCI-PCI bridge> at device 24.0 on pci0 pci27: <ACPI PCI bus> on pcib27 vmx2: <VMware VMXNET3 Ethernet Adapter> port 0x7000-0x700f mem 0xfd2fc000-0xfd2fcfff,0xfd2fd000-0xfd2fdfff,0xfd2fe000-0xfd2fffff irq 17 at device 0.0 on pci27 vmx2: Ethernet address: 00:0c:29:e5:ef:d8 pcib28: <ACPI PCI-PCI bridge> at device 24.1 on pci0 pci28: <ACPI PCI bus> on pcib28 pcib29: <ACPI PCI-PCI bridge> at device 24.2 on pci0 pci29: <ACPI PCI bus> on pcib29 pcib30: <ACPI PCI-PCI bridge> at device 24.3 on pci0 pci30: <ACPI PCI bus> on pcib30 pcib31: <ACPI PCI-PCI bridge> at device 24.4 on pci0 pci31: <ACPI PCI bus> on pcib31 pcib32: <ACPI PCI-PCI bridge> at device 24.5 on pci0 pci32: <ACPI PCI bus> on pcib32 pcib33: <ACPI PCI-PCI bridge> at device 24.6 on pci0 pci33: <ACPI PCI bus> on pcib33 pcib34: <ACPI PCI-PCI bridge> at device 24.7 on pci0 pci34: <ACPI PCI bus> on pcib34 acpi_acad0: <AC Adapter> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse, device ID 3 orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff,0xc9000-0xc9fff,0xca000-0xcbfff,0xcc000-0xccfff,0xdc000-0xdffff,0xe0000-0xe7fff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: cannot reserve I/O port range acpi_throttle0: <ACPI CPU Throttling> on cpu0 Timecounters tick every 1.000 msec ipfw2 (+ipv6) initialized, divert loadable, nat loadable, default to deny, logging disabled random: unblocking device. da0 at mpt0 bus 0 scbus2 target 0 lun 0 da0: <VMware Virtual disk 1.0> Fixed Direct Access SCSI-2 device da0: 300.000MB/s transfers da0: Command Queueing enabled da0: 40960MB (83886080 512 byte sectors: 255H 63S/T 5221C) da0: quirks=0x40<RETRY_BUSY> Timecounter "TSC-low" frequency 1749998500 Hz quality 1000 Trying to mount root from ufs:/dev/da0p3 [rw]... WARNING: / was not properly dismounted ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present; to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf. ZFS filesystem version: 5 ZFS storage pool version: features support (5000)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56B85B7E.9080705>