Date: Wed, 6 Sep 2017 21:30:07 -0700 From: =?UTF-8?Q?K=C3=A1ri_Hreinsson?= <karihre@gmail.com> To: freebsd-virtualization@freebsd.org Subject: (bhyve) Debian vm crashing with kernel panic Message-ID: <CAJujt5f6QQWyiNPXTmr9p9qkGnyudTiEZyO1WUXaCu6fAQedaA@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Dear all, I have been experiencing random linux kernel panics on a Debian virtual machine running under bhyve on FreeBSD 11.1, and believe it may be related to the virtualization environment. I am not an advanced FreeBSD user by any means, which is why I am turning to this mailing list for possible answers, realizing that I could be making some simple errors. I have two similar (same version and kernel) Debian VMs running on the FreeBSD host, one of them lightly loaded and running without any issues, the other one more heavily loaded and experiencing kernel panics a few days after booting. CPU: Intel(R) Xeon(R) CPU E3-1275 v6 Host system: 11.1-RELEASE-p1 VM: Debian 9 (Stretch), kernel 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u3 On the FreeBSD side of things I find nothing in any logs under /var/log indicating any problem (perhaps I am not looking in the right places?). On the Debian side of things an open ssh session got plenty of these leading up to the crash: kernel:[489300.648296] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:14902] Debian kern.log file contains this just before the crash: Sep 6 10:23:59 hostname kernel: [488456.219948] INFO: rcu_sched self-detected stall on CPU Sep 6 10:23:59 hostname kernel: [488456.220007] 0-...: (5249 ticks this GP) idle=b45/140000000000001/0 softirq=27802459/27802459 fqs=2423 Sep 6 10:23:59 hostname kernel: [488456.220062] (t=5250 jiffies g=10449032 c=10449031 q=319) Sep 6 10:23:59 hostname kernel: [488456.220093] Task dump for CPU 0: Sep 6 10:23:59 hostname kernel: [488456.220094] kworker/0:0 R running task 0 14902 2 0x00000008 Sep 6 10:23:59 hostname kernel: [488456.220108] Workqueue: rpciod rpc_async_schedule [sunrpc] Sep 6 10:23:59 hostname kernel: [488456.220109] ffffffff90713580 ffffffff8faa3bcb 0000000000000000 ffffffff90713580 Sep 6 10:23:59 hostname kernel: [488456.220111] ffffffff8fb7a4b6 ffff8a0bffc18fc0 ffffffff9064a6c0 0000000000000000 Sep 6 10:23:59 hostname kernel: [488456.220112] ffffffff90713580 00000000ffffffff ffffffff8fadee04 0000000000e746a9 Sep 6 10:23:59 hostname kernel: [488456.220113] Call Trace: Sep 6 10:23:59 hostname kernel: [488456.220114] <IRQ> Sep 6 10:23:59 hostname kernel: [488456.220116] [<ffffffff8faa3bcb>] ? sched_show_task+0xcb/0x130 Sep 6 10:23:59 hostname kernel: [488456.220118] [<ffffffff8fb7a4b6>] ? rcu_dump_cpu_stacks+0x92/0xb2 Sep 6 10:23:59 hostname kernel: [488456.220119] [<ffffffff8fadee04>] ? rcu_check_callbacks+0x754/0x8a0 Sep 6 10:23:59 hostname kernel: [488456.220121] [<ffffffff8faed0c3>] ? update_wall_time+0x473/0x790 Sep 6 10:23:59 hostname kernel: [488456.220122] [<ffffffff8faf48c0>] ? tick_sched_handle.isra.12+0x50/0x50 Sep 6 10:23:59 hostname kernel: [488456.220124] [<ffffffff8fae5718>] ? update_process_times+0x28/0x50 Sep 6 10:23:59 hostname kernel: [488456.220125] [<ffffffff8faf4890>] ? tick_sched_handle.isra.12+0x20/0x50 Sep 6 10:23:59 hostname kernel: [488456.220125] [<ffffffff8faf48f8>] ? tick_sched_timer+0x38/0x70 Sep 6 10:23:59 hostname kernel: [488456.220126] [<ffffffff8fae60fc>] ? __hrtimer_run_queues+0xdc/0x240 Sep 6 10:23:59 hostname kernel: [488456.220127] [<ffffffff8fae67cc>] ? hrtimer_interrupt+0x9c/0x1a0 Sep 6 10:23:59 hostname kernel: [488456.220128] [<ffffffff90008ba9>] ? smp_apic_timer_interrupt+0x39/0x50 Sep 6 10:23:59 hostname kernel: [488456.220129] [<ffffffff90007ec2>] ? apic_timer_interrupt+0x82/0x90 Sep 6 10:23:59 hostname kernel: [488456.220130] <EOI> Sep 6 10:23:59 hostname kernel: [488456.220131] [<ffffffff8fac0e11>] ? native_queued_spin_lock_slowpath+0x21/0x190 Sep 6 10:23:59 hostname kernel: [488456.220132] [<ffffffff9000613d>] ? _raw_spin_lock+0x1d/0x20 Sep 6 10:23:59 hostname kernel: [488456.220141] [<ffffffffc047e87a>] ? nfs4_close_done+0xfa/0x400 [nfsv4] Sep 6 10:23:59 hostname kernel: [488456.220145] [<ffffffffc0493280>] ? nfs4_xdr_dec_open_downgrade+0xf0/0xf0 [nfsv4] Sep 6 10:23:59 hostname kernel: [488456.220151] [<ffffffffc02fb5f0>] ? __rpc_sleep_on_priority+0x340/0x340 [sunrpc] Sep 6 10:23:59 hostname kernel: [488456.220155] [<ffffffffc02fb5f0>] ? __rpc_sleep_on_priority+0x340/0x340 [sunrpc] Sep 6 10:23:59 hostname kernel: [488456.220159] [<ffffffffc02fb61a>] ? rpc_exit_task+0x2a/0x90 [sunrpc] Sep 6 10:23:59 hostname kernel: [488456.220163] [<ffffffffc02fbf86>] ? __rpc_execute+0x86/0x420 [sunrpc] Sep 6 10:23:59 hostname kernel: [488456.220164] [<ffffffff8fa90384>] ? process_one_work+0x184/0x410 Sep 6 10:23:59 hostname kernel: [488456.220165] [<ffffffff8fa9065d>] ? worker_thread+0x4d/0x480 Sep 6 10:23:59 hostname kernel: [488456.220166] [<ffffffff8fa90610>] ? process_one_work+0x410/0x410 Sep 6 10:23:59 hostname kernel: [488456.220167] [<ffffffff8fa7bb0a>] ? do_group_exit+0x3a/0xa0 Sep 6 10:23:59 hostname kernel: [488456.220168] [<ffffffff8fa965d7>] ? kthread+0xd7/0xf0 Sep 6 10:23:59 hostname kernel: [488456.220169] [<ffffffff8fa96500>] ? kthread_park+0x60/0x60 Sep 6 10:23:59 hostname kernel: [488456.220170] [<ffffffff900064f5>] ? ret_from_fork+0x25/0x30 This seems to be all I have to go on. This is the first panic I experience after upgrading to 11.1, in the past I was experiencing similar panics on 11.0 but the log file output from those seemed different as the kernel spat out hundreds of errors in the hours leading up to finally crashing. I'm not sure those are relevant as I was running 11.0 and didn't see the same (but similar) errors this time around, but I can attach that log file if anyone is interested. The vm startup command is: bhyve -AHP \ -s 0:0,hostbridge \ -s 1:0,lpc \ -s 2:0,virtio-net,tap0 \ -s 3:0,virtio-net,tap1 \ -s 4:0,virtio-blk,/dev/zvol/tank/vms/hostname-root \ -s 5:0,virtio-blk,/dev/zvol/tank/vms/hostname-scratch \ -s 6:0,virtio-blk,/dev/zvol/tank/vms/hostname-temp \ -s 29,fbuf,tcp=127.0.0.1:5900,w=800,h=600 \ -l com1,/dev/nmdm0A \ -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \ -c 2 \ -m 32G hostname Anything that could shed some light on this issue would be much appreciated. If I can provide any additional information please let me know. Thank you, Kari Hreinsson
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJujt5f6QQWyiNPXTmr9p9qkGnyudTiEZyO1WUXaCu6fAQedaA>