Date: Thu, 17 Aug 2017 13:03:40 +0100 From: Kaya Saman <kayasaman@gmail.com> To: freebsd-questions <freebsd-questions@freebsd.org> Subject: Upgrade to 11.1 from 10.3 causing complete system freeze / hang Message-ID: <c6bd6cc3-8549-521e-f891-47a396a61475@gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, I've just been upgrading my systems to 11.1-Release from 10.3 and so far all seems to have gone well bar one in particular system. I will try to explain as clearly as possible to mitigate confusion. This issue seems to be restricted to two systems only. Both these systems run identical SuperMicro Celeron Mini-ITX system boards. From a hardware perspective both systems are identical but configurations are quite different. System 1: ZFS on root running 2x Jails and 2x (non root) hard drives configured as iscsi targets, this machine uses a lagg interface and 2x vlans in 802.1q trunk <--- after upgrade the above system hung initially but after hard power-off/on it seems to be up now for a few days without any issues System 2: UFS on root and 3x zpools configured over 4 hard drives exported through NFS, this system also uses a lagg interface but not vlans <--- this system seems to hang quite a bit - (every few hours or less with tuning in /boot/loader.conf , /etc/sysctl.conf!) An SSH session running 'top' shows this before the hang: last pid: 40393; load averages: 0.31, 0.22, 0.22 up 0+03:25:51 03:50:51 53 processes: 1 running, 52 sleeping CPU: 0.0% user, 0.0% nice, 1.1% system, 0.1% interrupt, 98.8% idle Mem: 2028K Active, 291M Inact, 11M Laundry, 7321M Wired, 249M Buf, 159M Free ARC: 5654M Total, 664M MFU, 4568M MRU, 48K Anon, 99M Header, 322M Other 4980M Compressed, 5237M Uncompressed, 1.05:1 Ratio Swap: 2327M Total, 33M Used, 2293M Free, 1% Inuse From above the system looks absolutely fine in terms of no abnormal load, processor or memory usage..... The dmesg output of the system is as follows: Copyright (c) 1992-2017 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 11.1-RELEASE-p1 #0: Wed Aug 9 11:55:48 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0) VT(vga): resolution 640x480 CPU: Intel(R) Celeron(R) CPU J1900 @ 1.99GHz (2000.05-MHz K8-class CPU) Origin="GenuineIntel" Id=0x30678 Family=0x6 Model=0x37 Stepping=8 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x41d8e3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,RDRAND> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x101<LAHF,Prefetch> Structured Extended Features=0x2282<TSCADJ,SMEP,ERMS,NFPUSG> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 8589934592 (8192 MB) avail memory = 8106274816 (7730 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <SUPERM SMCI--MB> WARNING: L1 data cache covers less APIC IDs than a core 0 < 1 FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) random: unblocking device. ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 128/32 (20170303/tbfadt-748) ioapic0 <Version 2.0> irqs 0-86 on motherboard SMP: AP CPU #2 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #1 Launched! Timecounter "TSC" frequency 2000049240 Hz quality 1000 random: entropy device external interface kbd1 at kbdmux0 netmap: loaded module module_register_init: MOD_LOAD (vesa, 0xffffffff80f5b220, 0) error 19 random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" nexus0 vtvga0: <VT VGA driver> on motherboard cryptosoft0: <software crypto> on motherboard acpi0: <SUPERM SMCI--MB> on motherboard acpi0: Power Button (fixed) unknown: I/O range not supported cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 cpu2: <ACPI CPU> on acpi0 cpu3: <ACPI CPU> on acpi0 atrtc0: <AT realtime clock> port 0x70-0x77 on acpi0 atrtc0: Warning: Couldn't map I/O. Event timer "RTC" frequency 32768 Hz quality 0 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 8 on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 450 Event timer "HPET1" frequency 14318180 Hz quality 440 Event timer "HPET2" frequency 14318180 Hz quality 440 attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pcib0: _OSC returned error 0x10 pci0: <ACPI PCI bus> on pcib0 vgapci0: <VGA-compatible display> port 0xe080-0xe087 mem 0x90000000-0x903fffff,0x80000000-0x8fffffff irq 16 at device 2.0 on pci0 vgapci0: Boot video device ahci0: <AHCI SATA controller> port 0xe070-0xe077,0xe060-0xe063,0xe050-0xe057,0xe040-0xe043,0xe020-0xe03f mem 0x90a06000-0x90a067ff irq 19 at device 19.0 on pci0 ahci0: AHCI v1.30 with 2 3Gbps ports, Port Multiplier not supported ahcich1: <AHCI channel> at channel 1 on ahci0 pci0: <encrypt/decrypt> at device 26.0 (no driver attached) hdac0: <Intel BayTrail HDA Controller> mem 0x90a00000-0x90a03fff irq 22 at device 27.0 on pci0 pcib1: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0 pcib1: [GIANT-LOCKED] pcib2: <ACPI PCI-PCI bridge> irq 18 at device 28.2 on pci0 pcib2: [GIANT-LOCKED] pci1: <ACPI PCI bus> on pcib2 igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xd000-0xd01f mem 0x90900000-0x9097ffff,0x90980000-0x90983fff irq 18 at device 0.0 on pci1 igb0: Using MSIX interrupts with 5 vectors igb0: Ethernet address: 0c:c4:7a:b0:60:92 igb0: Bound queue 0 to cpu 0 igb0: Bound queue 1 to cpu 1 igb0: Bound queue 2 to cpu 2 igb0: Bound queue 3 to cpu 3 igb0: netmap queues/slots: TX 4/1024, RX 4/1024 pcib3: <ACPI PCI-PCI bridge> irq 19 at device 28.3 on pci0 pcib3: [GIANT-LOCKED] pci2: <ACPI PCI bus> on pcib3 pcib4: <ACPI PCI-PCI bridge> mem 0x90800000-0x90803fff irq 19 at device 0.0 on pci2 pci3: <ACPI PCI bus> on pcib4 pcib5: <PCI-PCI bridge> irq 16 at device 1.0 on pci3 pci4: <PCI bus> on pcib5 igb1: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xc000-0xc01f mem 0x90700000-0x9077ffff,0x90780000-0x90783fff irq 16 at device 0.0 on pci4 igb1: Using MSIX interrupts with 5 vectors igb1: Ethernet address: 0c:c4:7a:b0:60:93 igb1: Bound queue 0 to cpu 0 igb1: Bound queue 1 to cpu 1 igb1: Bound queue 2 to cpu 2 igb1: Bound queue 3 to cpu 3 igb1: netmap queues/slots: TX 4/1024, RX 4/1024 pcib6: <PCI-PCI bridge> irq 17 at device 2.0 on pci3 pci5: <PCI bus> on pcib6 pcib7: <PCI-PCI bridge> irq 18 at device 3.0 on pci3 pci6: <PCI bus> on pcib7 ahci1: <Marvell 88SE9230 AHCI SATA controller> port 0xb050-0xb057,0xb040-0xb043,0xb030-0xb037,0xb020-0xb023,0xb000-0xb01f mem 0x90610000-0x906107ff irq 18 at device 0.0 on pci6 ahci1: AHCI v1.20 with 8 6Gbps ports, Port Multiplier not supported ahci1: quirks=0x900<NOBSYRES,ALTSIG> ahcich2: <AHCI channel> at channel 0 on ahci1 ahcich3: <AHCI channel> at channel 1 on ahci1 ahcich4: <AHCI channel> at channel 2 on ahci1 ahcich5: <AHCI channel> at channel 3 on ahci1 ahcich6: <AHCI channel> at channel 4 on ahci1 ahcich7: <AHCI channel> at channel 5 on ahci1 ahcich8: <AHCI channel> at channel 6 on ahci1 ahcich9: <AHCI channel> at channel 7 on ahci1 ehci0: <Intel BayTrail USB 2.0 controller> mem 0x90a05000-0x90a053ff irq 23 at device 29.0 on pci0 usbus0: EHCI version 1.0 usbus0 on ehci0 usbus0: 480Mbps High Speed USB v2.0 isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 acpi_button0: <Power Button> on acpi0 acpi_button1: <Sleep Button> on acpi0 acpi_tz0: <Thermal Zone> on acpi0 uart0: <16950 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart2: <16950 or compatible> port 0x3e0-0x3e7 irq 3 on acpi0 uart3: <16950 or compatible> port 0x3e8-0x3ef irq 4 on acpi0 uart4: <16950 or compatible> port 0x2e0-0x2e7 irq 3 on acpi0 orm0: <ISA Option ROM> at iomem 0xd2000-0xd2fff on isa0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] fdc0: <Enhanced floppy controller> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 ppc0: cannot reserve I/O port range est0: <Enhanced SpeedStep Frequency Control> on cpu0 est1: <Enhanced SpeedStep Frequency Control> on cpu1 est2: <Enhanced SpeedStep Frequency Control> on cpu2 est3: <Enhanced SpeedStep Frequency Control> on cpu3 Timecounters tick every 1.000 msec nvme cam probe device init hdacc0: <Realtek ALC888 HDA CODEC> at cad 0 on hdac0 hdaa0: <Realtek ALC888 Audio Function Group> at nid 1 on hdacc0 pcm0: <Realtek ALC888 (Front Analog)> at nid 27 and 25 on hdaa0 pcm1: <Realtek ALC888 (Internal Digital)> at nid 17 on hdaa0 hdacc1: <Intel (0x2882) HDA CODEC> at cad 2 on hdac0 hdaa1: <Intel (0x2882) Audio Function Group> at nid 1 on hdacc1 hdaa1: hdaa_audio_as_parse: Duplicate pin 0 (5) in association 1! Disabling association. pcm2: <Intel (0x2882) (HDMI/DP 8ch)> at nid 6 on hdaa1 ugen0.1: <Intel EHCI root HUB> at usbus0 uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0 ada0 at ahcich1 bus 0 scbus0 target 0 lun 0 ada0: <INTEL SSDSA2M040G2GC 2CV102HB> ATA-7 SATA 2.x device ada0: Serial Number CVGB007000G3040GGN ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 38166MB (78165360 512 byte sectors) ada0: quirks=0x1<4K> ada1 at ahcich2 bus 0 scbus1 target 0 lun 0 ada1: <WDC WD60EZRX-00MVLB1 80.00A80> ACS-2 ATA SATA 3.x device ada1: Serial Number WD-WX21D947NY6S ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 5723166MB (11721045168 512 byte sectors) ada1: quirks=0x1<4K> ada2 at ahcich3 bus 0 scbus2 target 0 lun 0 ada2: <WDC WD2001FASS-00U0B0 01.00101> ATA8-ACS SATA 2.x device ada2: Serial Number WD-WMAUR0169440 ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 1907729MB (3907029168 512 byte sectors) ada3 at ahcich4 bus 0 scbus3 target 0 lun 0 ada3: <HGST HDS724040ALE640 MJAOA580> ATA8-ACS SATA 3.x device ada3: Serial Number PK2334PBG4GTDT ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 3815447MB (7814037168 512 byte sectors) ada4 at ahcich5 bus 0 scbus4 target 0 lun 0 ada4: <WDC WD2001FASS-00U0B0 01.00101> ATA8-ACS SATA 2.x device ada4: Serial Number WD-WMAUR0169605 ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada4: Command Queueing enabled ada4: 1907729MB (3907029168 512 byte sectors) pass5 at ahcich9 bus 0 scbus8 target 0 lun 0 pass5: <Marvell Console 1.01> Removable Processor SCSI device pass5: Serial Number HKDP221516WL pass5: 150.000MB/s transfers (SATA 1.x, UDMA4, ATAPI 12bytes, PIO 8192bytes) Trying to mount root from ufs:/dev/ada0s1a [rw]... ZFS filesystem version: 5 ZFS storage pool version: features support (5000) uhub0: 8 ports with 8 removable, self powered ugen0.2: <vendor 0x8087 product 0x07e6> at usbus0 uhub1 on uhub0 uhub1: <vendor 0x8087 product 0x07e6, class 9/0, rev 2.00/0.14, addr 2> on usbus0 uhub1: 4 ports with 4 removable, self powered ugen0.3: <vendor 0x0409 product 0x005a> at usbus0 uhub2 on uhub1 uhub2: <vendor 0x0409 product 0x005a, class 9/0, rev 2.00/1.00, addr 3> on usbus0 uhub2: 4 ports with 4 removable, self powered ugen0.4: <vendor 0x046a product 0x002f> at usbus0 ukbd0 on uhub2 ukbd0: <vendor 0x046a product 0x002f, class 0/0, rev 2.00/1.00, addr 4> on usbus0 kbd2 at ukbd0 lagg0: link state changed to DOWN ums0 on uhub2 ums0: <vendor 0x046a product 0x002f, class 0/0, rev 2.00/1.00, addr 4> on usbus0 ums0: 3 buttons and [XY] coordinates ID=0 igb0: link state changed to UP lagg0: link state changed to UP igb1: link state changed to UP uname -a output: 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #0: Wed Aug 9 11:55:48 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 I'm not sure what more information I could provide but this issue only started after the upgrade so I'm wondering if it's a bug, perhaps something that has already been reported?? Unfortunately after the system becomes totally unresponsive I don't see any error messages on the local terminal and after reboot there is no core.dump so I have no idea what's going on. Would anyone be able to offer any advise? Thanks. Kaya
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?c6bd6cc3-8549-521e-f891-47a396a61475>