Date: Thu, 19 Dec 2019 13:04:00 +0000 From: bugzilla-noreply@freebsd.org To: virtualization@FreeBSD.org Subject: [Bug 242724] bhyve: Unkillable processes stuck in 'STOP' state (vmm::vm_handle_suspend()) Message-ID: <bug-242724-27103@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D242724 Bug ID: 242724 Summary: bhyve: Unkillable processes stuck in 'STOP' state (vmm::vm_handle_suspend()) Product: Base System Version: CURRENT Hardware: Any OS: Any Status: New Severity: Affects Many People Priority: --- Component: bhyve Assignee: virtualization@FreeBSD.org Reporter: aleksandr.fedorov@itglobal.com On one of our servers, there are three unkillable bhyve processes which stu= ck in 'STOP' state. All has the same symptoms: # procstat -kk 9277 | sort -k 4 PID TID COMM TDNAME KSTACK 9277 102504 bhyve blk-1:0:0-0 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102514 bhyve blk-1:0:0-1 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102752 bhyve blk-1:0:0-2 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102760 bhyve blk-1:0:0-3 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102761 bhyve blk-1:0:0-4 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102763 bhyve blk-1:0:0-5 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102764 bhyve blk-1:0:0-6 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102766 bhyve blk-1:0:0-7 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102767 bhyve blk-3:0-0 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102768 bhyve blk-3:0-1 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102769 bhyve blk-3:0-2 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102770 bhyve blk-3:0-3 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102771 bhyve blk-3:0-4 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102772 bhyve blk-3:0-5 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102773 bhyve blk-3:0-6 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102774 bhyve blk-3:0-7 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102775 bhyve blk-4:0:0-0 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102780 bhyve blk-4:0:0-1 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102887 bhyve blk-4:0:0-2 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102888 bhyve blk-4:0:0-3 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102889 bhyve blk-4:0:0-4 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102890 bhyve blk-4:0:0-5 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102891 bhyve blk-4:0:0-6 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102892 bhyve blk-4:0:0-7 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102687 bhyve mevent mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102941 bhyve pci-devstat mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102983 bhyve pci-reconfig mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102937 bhyve rfb mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102991 bhyve vcpu 0 mi_switch+0xe2 sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2 devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x= 101 9277 100782 bhyve vcpu 1 mi_switch+0xe2 thread_suspend_switch+0xd4 thread_single+0x47b sigexit+0x57 postsig+0x2f8 ast+0x327 fast_syscall_common+0x198 9277 100788 bhyve vcpu 2 mi_switch+0xe2 sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2 devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x= 101 9277 101385 bhyve vcpu 3 mi_switch+0xe2 sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2 devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x= 101 9277 103402 bhyve vcpu 4 mi_switch+0xe2 sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2 devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x= 101 9277 102206 bhyve vcpu 5 mi_switch+0xe2 sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2 devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x= 101 9277 102932 bhyve vtnet-5:0 tx mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f # procstat threads 9277|grep vcpu 9277 100782 bhyve vcpu 1 -1 124 stop - 9277 100788 bhyve vcpu 2 -1 123 stop vmsusp 9277 101385 bhyve vcpu 3 -1 123 stop vmsusp 9277 102206 bhyve vcpu 5 -1 122 stop vmsusp 9277 102991 bhyve vcpu 0 -1 124 stop vmsusp 9277 103402 bhyve vcpu 4 -1 123 stop vmsusp As you can see, One of the VCPU threads exit, the rest are sleeping in ker= nel mode: https://svnweb.freebsd.org/base/head/sys/amd64/vmm/vmm.c?view=3Dmarkup#l1515 The main problem is that if the bhyve process is killed, the threads driving those vCPUs will be gone, resulting in the remaining thread waiting in kernelspace forever: https://svnweb.freebsd.org/base/head/sys/amd64/vmm/vmm.c?view=3Dmarkup#l1507 I found that the same bug was fixed in SmartOS: https://github.com/joyent/illumos-joyent/commit/dce228e4331f185347c3e0325ca= b8a3af72d6410#diff-721b4b2a86bbb8d11c6e01a2995868e6 Unfortunately the site https://smartos.org/bugview/OS-6888 is not available= , so it is impossible to see the detailed description. Can we apply similar fix to our bhyve code? --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-242724-27103>