From owner-freebsd-questions@freebsd.org Thu Jun 1 14:00:00 2017 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CDD40AFD848 for ; Thu, 1 Jun 2017 14:00:00 +0000 (UTC) (envelope-from raimo+freebsd@erix.ericsson.se) Received: from sessmg23.ericsson.net (sessmg23.ericsson.net [193.180.251.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 320247A9E3 for ; Thu, 1 Jun 2017 13:59:59 +0000 (UTC) (envelope-from raimo+freebsd@erix.ericsson.se) X-AuditID: c1b4fb2d-5a49e9a000000d37-a1-59301ddcc9a9 Received: from ESESSHC022.ericsson.se (Unknown_Domain [153.88.183.84]) by sessmg23.ericsson.net (Symantec Mail Security) with SMTP id 60.5E.03383.CDD10395; Thu, 1 Jun 2017 15:59:57 +0200 (CEST) Received: from duper.otp.ericsson.se (153.88.183.153) by smtp.internal.ericsson.com (153.88.183.86) with Microsoft SMTP Server id 14.3.339.0; Thu, 1 Jun 2017 15:53:39 +0200 Received: from duper.otp.ericsson.se (localhost [127.0.0.1]) by duper.otp.ericsson.se (8.15.2/8.15.2) with ESMTP id v51DrbRi005116; Thu, 1 Jun 2017 15:53:37 +0200 (CEST) (envelope-from raimo+freebsd@erix.otp.ericsson.se) Received: (from raimo@localhost) by duper.otp.ericsson.se (8.15.2/8.15.2/Submit) id v51Draa4005115; Thu, 1 Jun 2017 15:53:36 +0200 (CEST) (envelope-from raimo+freebsd@erix.otp.ericsson.se) X-Authentication-Warning: duper.otp.ericsson.se: raimo set sender to raimo+freebsd@erix.ericsson.se using -f Date: Thu, 1 Jun 2017 15:53:36 +0200 From: Raimo Niskanen To: Subject: Re: Advice on kernel panics Message-ID: <20170601135336.GD2256@erix.ericsson.se> Mail-Followup-To: freebsd-questions@freebsd.org References: <20170529092043.GA89682@erix.ericsson.se> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170529092043.GA89682@erix.ericsson.se> "To: freebsd-questions@freebsd.org" User-Agent: Mutt/1.7.2 (2016-11-26) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrBLMWRmVeSWpSXmKPExsUyM2J7iO5dWYNIg6d72Cxeft3E4sDoMePT fJYAxigum5TUnMyy1CJ9uwSujFtn4grOlFcc+byOuYHxf0IXIyeHhICJxI2pTSxdjFwcQgJH GCVOvZrKBuFsYJT40H0WKvOIUaJh8TNWCOcAo8TpL0sYIfpzJN5+OsoEYrMIqEh8OPaFBcRm EzCVaPx5hhXEFhFQlvh37SIziC0MVLNi6z2wXl6g3VPfzmOGsPUlPq54BVYvBNR77s4Udoi4 oMTJmU/AZjIL6Egs2P0J6DwOIFtaYvk/DpAwp4CZxJ2uJmaQMAPQqrZTQiCmKMjWr1ADtSUm vDnAOoFRZBaSmbOQzJyFMHMBI/MqRtHi1OLi3HQjY73Uoszk4uL8PL281JJNjMDwPrjlt+4O xtWvHQ8xCnAwKvHw2ggaRAqxJpYVV+YeYpTgYFYS4RWQBArxpiRWVqUW5ccXleakFh9ilOZg URLnddh3IUJIID2xJDU7NbUgtQgmy8TBKdXAOOPrxpsZ+imyax6ks61qtJC677R92nUNK6dz bv0u6y/UzXU8UGPn07nELO5q+fSt8/p3rLqm8Ste4o7zqtzQNvbir/ffdq0NjDnz4YXLwy8s 385/DV07M9FJp/l15e0Dst9MzaVvnynqOnOmf9otg8I5s2SXJskcerbH6RSzts28shKjGzOe lSqxFGckGmoxFxUnAgCbfFCjawIAAA== X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 14:00:00 -0000 Hello again. I gave to little details in my original post; this concerns a Dell Power Edge R320 server with motherboard disk controller and ZFS only install. The dmsg is at the end of this mail. On Mon, May 29, 2017 at 11:20:43AM +0200, Raimo Niskanen wrote: > Hello list. > > I have a server that panics about every 3 days and need some advice on how > to handle that. > > It currently has 7 dumps in /var/crash/, head of the latest core.txt.4 > looks like this: > > > ======= > sasquatch.otp.ericsson.se dumped core - see /var/crash/vmcore.4 > > Mon May 29 03:15:32 CEST 2017 > > FreeBSD sasquatch.otp.ericsson.se 10.3-RELEASE-p18 FreeBSD 10.3-RELEASE-p18 > #0: Tue Apr 11 10:31:00 UTC 2017 > root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 > > panic: page fault > > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you > are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x0 > fault code = supervisor write data, page not present > instruction pointer = 0x20:0xffffffff809fb017 > stack pointer = 0x28:0xfffffe04673a18c0 > frame pointer = 0x28:0xfffffe04673a1900 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 18 (syncer) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff8098e7e0 at kdb_backtrace+0x60 > #1 0xffffffff809514b6 at vpanic+0x126 > #2 0xffffffff80951383 at panic+0x43 > #3 0xffffffff80d5646b at trap_fatal+0x36b > #4 0xffffffff80d5676d at trap_pfault+0x2ed > #5 0xffffffff80d55dea at trap+0x47a > #6 0xffffffff80d3bdb2 at calltrap+0x8 > #7 0xffffffff809f9b23 at vfs_msync+0x203 > #8 0xffffffff809fb858 at sync_fsync+0x108 > #9 0xffffffff80e81ed7 at VOP_FSYNC_APV+0xa7 > #10 0xffffffff809fc27b at sched_sync+0x3ab > #11 0xffffffff8091a93a at fork_exit+0x9a > #12 0xffffffff80d3c2ee at fork_trampoline+0xe > Uptime: 2d19h53m15s > ======= > > > What sticks out later in core.txt.4 is the fstat section that contains a > lot of errors, but I can not tell if that is just a secondary symptom... > > Looks like this: > ======= > fstat > > fstat: can't read file 1 at 0x200007fffffffff > fstat: can't read file 2 at 0x4000000001fffff > fstat: can't read znode_phys at 0x1 > fstat: can't read znode_phys at 0x1 > fstat: can't read znode_phys at 0x1 > : > USER CMD PID FD MOUNT INUM MODE SZ|DV R/W > root sed 78401 root - - error - > root sed 78401 wd - - error - > root sed 78401 text - - error - > root sed 78401 0* pipe fffff8001800f000 <-> fffff8001800f160 > 0 rw > root grep 78400 root - - error - > root grep 78400 wd - - error - > root grep 78400 text - - error - > : > ======= > > To me the other core.txt.? files does not look exactly the same. All have > an fstat section with many errors, though. > > Does anyone have some advice on how to proceed? > -- Copyright (c) 1992-2016 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 10.3-RELEASE-p18 #0: Tue Apr 11 10:31:00 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 CPU: Intel(R) Xeon(R) CPU E5-2407 v2 @ 2.40GHz (2400.06-MHz K8-class CPU) Origin="GenuineIntel" Id=0x306e4 Family=0x6 Model=0x3e Stepping=4 Features=0xbfebfbff Features2=0x7fbee3ff AMD Features=0x2c100800 AMD Features2=0x1 Structured Extended Features=0x281 XSAVE Features=0x1 VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr TSC: P-state invariant, performance statistics real memory = 12884901888 (12288 MB) avail memory = 12380942336 (11807 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 2 cpu2 (AP): APIC ID: 4 cpu3 (AP): APIC ID: 6 random: initialized ioapic1: Changing APIC ID to 1 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 32-55 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 atrtc0: port 0x70-0x7f irq 8 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 attimer0: port 0x40-0x5f irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 550 Event timer "HPET1" frequency 14318180 Hz quality 440 Event timer "HPET2" frequency 14318180 Hz quality 440 Event timer "HPET3" frequency 14318180 Hz quality 440 Event timer "HPET4" frequency 14318180 Hz quality 440 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: irq 53 at device 1.0 on pci0 pci1: on pcib1 pcib2: irq 53 at device 3.0 on pci0 pci8: on pcib2 bge0: mem 0xd90a0000-0xd90affff,0xd90b0000-0xd90bffff,0xd90c0000-0xd90cffff irq 48 at device 0.0 on pci8 bge0: APE FW version: NCSI v1.2.33.0 bge0: CHIP ID 0x05720000; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E miibus0: on bge0 brgphy0: PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bge0: Using defaults for TSO: 65518/35/2048 bge0: Ethernet address: 00:0a:f7:52:b1:1a bge1: mem 0xd90d0000-0xd90dffff,0xd90e0000-0xd90effff,0xd90f0000-0xd90fffff irq 52 at device 0.1 on pci8 bge1: APE FW version: NCSI v1.2.33.0 bge1: CHIP ID 0x05720000; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E miibus1: on bge1 brgphy1: PHY 2 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bge1: Using defaults for TSO: 65518/35/2048 bge1: Ethernet address: 00:0a:f7:52:b1:1b pcib3: irq 16 at device 17.0 on pci0 pci9: on pcib3 pci0: at device 22.0 (no driver attached) pci0: at device 22.1 (no driver attached) ehci0: mem 0xde8fd000-0xde8fd3ff irq 23 at device 26.0 on pci0 usbus0: EHCI version 1.0 usbus0 on ehci0 pcib4: at device 28.0 on pci0 pci10: on pcib4 pcib5: irq 16 at device 28.4 on pci0 pci2: on pcib5 bge2: mem 0xd91a0000-0xd91affff,0xd91b0000-0xd91bffff,0xd91c0000-0xd91cffff irq 16 at device 0.0 on pci2 bge2: APE FW version: NCSI v1.2.33.0 bge2: CHIP ID 0x05720000; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E miibus2: on bge2 brgphy2: PHY 1 on miibus2 brgphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bge2: Using defaults for TSO: 65518/35/2048 bge2: Ethernet address: c8:1f:66:bc:10:cd bge3: mem 0xd91d0000-0xd91dffff,0xd91e0000-0xd91effff,0xd91f0000-0xd91fffff irq 17 at device 0.1 on pci2 bge3: APE FW version: NCSI v1.2.33.0 bge3: CHIP ID 0x05720000; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E miibus3: on bge3 brgphy3: PHY 2 on miibus3 brgphy3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bge3: Using defaults for TSO: 65518/35/2048 bge3: Ethernet address: c8:1f:66:bc:10:ce pcib6: irq 19 at device 28.7 on pci0 pci3: on pcib6 pcib7: at device 0.0 on pci3 pci4: on pcib7 pcib8: at device 0.0 on pci4 pci5: on pcib8 pcib9: at device 0.0 on pci5 pci6: on pcib9 vgapci0: mem 0xd8000000-0xd8ffffff,0xddffc000-0xddffffff,0xdd000000-0xdd7fffff irq 19 at device 0.0 on pci6 vgapci0: Boot video device pcib10: at device 1.0 on pci4 pci7: on pcib10 ehci1: mem 0xde8fe000-0xde8fe3ff irq 22 at device 29.0 on pci0 usbus1: EHCI version 1.0 usbus1 on ehci1 pcib11: at device 30.0 on pci0 pci11: on pcib11 isab0: at device 31.0 on pci0 isa0: on isab0 ahci0: port 0xfce8-0xfcef,0xfcf8-0xfcfb,0xfcf0-0xfcf7,0xfcfc-0xfcff,0xfcc0-0xfcdf mem 0xde8ff000-0xde8ff7ff irq 20 at device 31.2 on pci0 ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahcich2: at channel 2 on ahci0 ahcich3: at channel 3 on ahci0 ahcich4: at channel 4 on ahci0 ahciem0: on ahci0 pcib12: on acpi0 pci63: on pcib12 pcib13: on acpi0 pci127: on pcib13 uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 orm0: at iomem 0xc0000-0xc7fff,0xec000-0xeffff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: cannot reserve I/O port range est0: on cpu0 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 1d4d00001800 device_attach: est0 attach returned 6 est1: on cpu1 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 1d4d00001800 device_attach: est1 attach returned 6 est2: on cpu2 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 1d4d00001800 device_attach: est2 attach returned 6 est3: on cpu3 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 1d4d00001800 device_attach: est3 attach returned 6 ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Timecounters tick every 1.000 msec random: unblocking device. usbus0: 480Mbps High Speed USB v2.0 usbus1: 480Mbps High Speed USB v2.0 ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 ses0 at ahciem0 bus 0 scbus5 target 0 lun 0 ses0: SEMB S-E-S 2.00 device ses0: SEMB SES Device ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ATA8-ACS SATA 2.x device ada0: Serial Number WD-WMAYP8034312 ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 476940MB (976773168 512 byte sectors) ada0: Previously was known as ad4 ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: ACS-2 ATA SATA 3.x device ada1: Serial Number WD-WCC4JDU1EVHN ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 953869MB (1953525168 512 byte sectors) ada1: quirks=0x1<4K> ada1: Previously was known as ad6 cd0 at ahcich4 bus 0 scbus4 target 0 lun 0 cd0: Removable CD-ROM SCSI device cd0: Serial Number S1596YBF3001M9 cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes) cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed SMP: AP CPU #2 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #3 Launched! Timecounter "TSC-low" frequency 1200028244 Hz quality 1000 GEOM_MIRROR: Device mirror/swap launched (2/2). Root mount waiting for: usbus1 usbus0 uhub1: 2 ports with 2 removable, self powered uhub0: 2 ports with 2 removable, self powered Root mount waiting for: usbus1 usbus0 ugen1.2: at usbus1 uhub2: on usbus1 ugen0.2: at usbus0 uhub3: on usbus0 Root mount waiting for: usbus1 usbus0 uhub3: 6 ports with 6 removable, self powered uhub2: 8 ports with 8 removable, self powered Root mount waiting for: usbus1 usbus0 ugen0.3: at usbus0 uhub4: on usbus0 ugen1.3: at usbus1 uhub5: on usbus1 uhub5: 4 ports with 4 removable, self powered uhub4: 6 ports with 6 removable, self powered Root mount waiting for: usbus1 usbus0 ugen0.4: at usbus0 ukbd0: on usbus0 kbd0 at ukbd0 ugen1.4: at usbus1 ukbd1: on usbus1 kbd2 at ukbd1 Trying to mount root from zfs:zroot/ROOT/default []... -- / Raimo Niskanen, Erlang/OTP, Ericsson AB