Date: Thu, 21 Jul 2022 08:31:38 -0700 From: Chuck Tuffli <ctuffli@gmail.com> To: FreeBSD-Current <freebsd-current@freebsd.org> Subject: bhyve core dump related to llvm 14 Message-ID: <CAKAYmML=qRvGQdbW7cZP_vRBDP99ZkQk9yA9H2N%2BagfcK-RN1A@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
I have a virtual machine used to test the NVMe emulation in bhyve. All of the tests in the VM pass running under FreeBSD 13.1-R, but the same VM running under -current causes bhyve(8) to dump core because of a segmentation fault. git bisect identified the last "good" commit on main as cb2ae6163174 sysvsem: Fix a typo After this commit, there are a half-dozen commits related to merging the llvm project release/14.x The core dump is repeatable and consistent. Back traces under lldb look similar to this: * thread #22, name = 'vcpu 2', stop reason = signal SIGSEGV: invalid address (fault address: 0xb8) * frame #0: 0x0000383eb9fc916b bhyve`pci_nvme_read(ctx=0x000038483ad2d700, vcpu=0, pi=0x0000000000000000, baridx=-188391150, offset=0, size=0) at pci_nvme.c:3035:34 frame #1: 0x0000384834616280 frame #2: 0x0000383eb9fc1f7a bhyve`pci_emul_mem_handler(ctx=<unavailable>, vcpu=<unavailable>, dir=<unavailable>, addr=<unavailable>, size=<unavailable>, val=<unavailable>, arg1=0x00003846e5b71600, arg2=0) at pci_emul.c:498:4 In frame 0, pi being NULL causes the core dump, but most of the arguments are invalid / garbage. Looking earlier in the stack, the vcpu value should be 2, the ctx pointer doesn't match, and the value passed to pi isn't NULL. Poking around in frame 2, I can see that the "direction" is a memory write (dir == MEM_F_WRITE) and the statement being executed is this: (*pe->pe_barwrite)(ctx, vcpu, pdi, bidx, offset, size, *val); Confusingly, the function pointer pe_barwrite is pci_nvme_write() and not pci_nvme_read() where the crash occurs. I've confirmed the fault is in pci_nvme_read() by adding an assert for pi != NULL. This is especially odd because pci_emul_mem_handler() directly calls pci_nvme_read() and pci_nvme_write(). So why does frame 1 exist at all? Using gdb, the back traces either don't decode at all or look similar to this: (gdb) bt #0 pci_nvme_read (ctx=0x944c1168700, vcpu=0, pi=0x0, baridx=-1835053270, offset=0, size=0) at /poudriere/jails/14-current-amd64/usr/src/usr.sbin/bhyve/pci_nvme.c:3035 #1 0x000009436891d8e8 in _CurrentRuneLocale () from /lib/libc.so.7 #2 0x000009436a73ca28 in ?? () #3 0x000009436a73e1c0 in ?? () ... #34 0x000009436a747600 in ?? () #35 0x0000093b3e76b088 in pci_de_lpc () #36 0x000009436a716500 in ?? () #37 0x00000944c3196d10 in ?? () #38 0x0000093b3e74501a in pci_emul_mem_handler (ctx=0x9436a7bd670, vcpu=0, dir=<optimized out>, addr=<optimized out>, size=0, val=0x646165725f657469, arg1=0x1, arg2=10185153275136) at /poudriere/jails/14-current-amd64/usr/src/usr.sbin/bhyve/pci_emul.c:498 Other random tidbits: - disabling compiler optimization (i.e. -O0) for the two files in question (pci_nvme.c and pci_emul.c) makes the core dump go away - using the default optimization level but generously sprinkling debug printf everywhere makes the core dump go away. I'm not sure where to go from here and could use some help. --chuck
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAKAYmML=qRvGQdbW7cZP_vRBDP99ZkQk9yA9H2N%2BagfcK-RN1A>