Date: Sat, 26 Mar 2022 15:38:29 +0100 From: Roger Pau =?utf-8?B?TW9ubsOp?= <roger.pau@citrix.com> To: Ze Dupsys <zedupsys@gmail.com> Cc: <freebsd-xen@freebsd.org>, <buhrow@nfbcal.org> Subject: Re: ZFS + FreeBSD XEN dom0 panic Message-ID: <Yj8lZWqeHbD%2BkfOQ@Air-de-Roger> In-Reply-To: <4da2302b-0745-ea1d-c868-5a8a5fc66b18@gmail.com> References: <YjipQwBQ/JTo4S6G@Air-de-Roger> <Yji8NZePmovLFhk2@Air-de-Roger> <YjxuPF80Z8z0V58t@Air-de-Roger> <abcdae23-eea9-93c3-04da-61b7f79a99e9@gmail.com> <YjybrgeORadwBmjP@Air-de-Roger> <088c8222-063a-1db5-da83-a5a0168d66c6@gmail.com> <Yj16hdrxawD61mAL@Air-de-Roger> <639f7ce0-8a07-884c-c1cf-8257b9f3d9e8@gmail.com> <Yj7YrW9CG2aXT%2BiC@Air-de-Roger> <4da2302b-0745-ea1d-c868-5a8a5fc66b18@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 26, 2022 at 02:08:06PM +0200, Ze Dupsys wrote: > On 2022.03.26. 11:11, Roger Pau Monné wrote: > > > > Hm, do you think you could upload (or attach) your > > /usr/lib/debug/boot/kernel/kernel.debug and provide an updated panic > > trace using that same exact kernel? > > Yes, it is just too big for email attachment. > Uploaded at: https://files.fm/f/mp3v3qa22 > > This time i starved Dom0 of RAM(2G) to speed panic up. Panic trace it the > same. > > Trace: > Fatal trap 12: page fault while in kernel mode > cpuid = 2; apic id = 04 > fault virtual address = 0x22710028 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff80c6a2b2 > stack pointer = 0x28:0xfffffe009e486b30 > frame pointer = 0x28:0xfffffe009e486b30 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 3995 (devmatch) > trap number = 12 > panic: page fault > cpuid = 2 > time = 1648293768 > KDB: stack backtrace: > #0 0xffffffff80c7c285 at kdb_backtrace+0x65 > #1 0xffffffff80c2e2e1 at vpanic+0x181 > #2 0xffffffff80c2e153 at panic+0x43 > #3 0xffffffff810c8b97 at trap+0xba7 > #4 0xffffffff810c8bef at trap+0xbff > #5 0xffffffff810c8243 at trap+0x253 > #6 0xffffffff810a0848 at calltrap+0x8 > #7 0xffffffff80c86ed1 at rman_is_region_manager+0x241 > #8 0xffffffff80c3eb41 at sbuf_new_for_sysctl+0x101 > #9 0xffffffff80c3df8c at kernel_sysctl+0x3ec > #10 0xffffffff80c3e603 at userland_sysctl+0x173 > #11 0xffffffff80c3e44f at sys___sysctl+0x5f > #12 0xffffffff810c949c at amd64_syscall+0x10c > #13 0xffffffff810a115b at Xfast_syscall+0xfb > Uptime: 10m19s It's weird, because here you get a page fault, but there are also traces with: general protection fault while in kernel mode cpuid = 3; a(d8) Scan for VGA option rom pic id = 06 instruction pointer = 0x20:0xffffffff810c5d64 stack pointer = 0x28:0xfffffe00a20fe990 frame pointer = 0x28:0xfffffe00a20fe990 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 8998 (devmatch) trap number = 9 panic: general protection fault cpuid = 3 time = 1647416577 KDB: stack backtrace: #0 0xffffffff80c7ca05 at kdb_backtrace+0x65 #1 0xffffffff80c2ea11 at vpanic+0x181 #2 0xffffffff80c2e883 at panic+0x43 #3 0xffffffff810c9b97 at trap+0xba7 #4 0xffffffff810c907b at trap+0x8b #5 0xffffffff810a0dc8 at calltrap+0x8 #6 0xffffffff80c83067 at kvprintf+0x1007 #7 0xffffffff80c83df9 at snprintf+0x59 #8 0xffffffff80c8768b at rman_is_region_manager+0x27b #9 0xffffffff80c3f271 at sbuf_new_for_sysctl+0x101 #10 0xffffffff80c3e6bc at kernel_sysctl+0x3ec #11 0xffffffff80c3ed33 at userland_sysctl+0x173 #12 0xffffffff80c3eb7f at sys___sysctl+0x5f #13 0xffffffff810ca49c at amd64_syscall+0x10c #14 0xffffffff810a16db at Xfast_syscall+0xfb That show a general protection fault instead of a page fault. I've built an hypervisor with debug enabled for you, it's at: https://people.freebsd.org/~royger/xen-debug This is the same as the one in ports, just build with debug=y. If you can place it in /boot/ and change your xen_kernel to: xen_kernel="/boot/xen-debug" It might provide some additional info. I've also noticed it seems to always be 'devmatch' the process that triggers the panic. > > cat /tmp/panic.log| sed -Ee 's/^#[0-9]* //' -e 's/ .*//' | xargs addr2line > -e /usr/lib/debug/boot/kernel/kernel.debug > /usr/src/sys/kern/subr_kdb.c:443 > /usr/src/sys/kern/kern_shutdown.c:0 > /usr/src/sys/kern/kern_shutdown.c:844 > /usr/src/sys/amd64/amd64/trap.c:944 > /usr/src/sys/amd64/amd64/trap.c:0 > /usr/src/sys/amd64/amd64/trap.c:0 > /usr/src/sys/amd64/amd64/exception.S:292 > /usr/src/sys/kern/subr_rman.c:0 I've been able to get a better trace with gdb and your debug symbols, and this is: (gdb) info line *0xffffffff80c6a2b2 Line 1386 of "/usr/src/sys/kern/subr_bus.c" starts at address 0xffffffff80c6a2b2 <device_get_name+18> and ends at 0xffffffff80c6a2b6 <device_get_name+22>. (gdb) info line *0xffffffff80c86ed1 Line 1052 of "/usr/src/sys/kern/subr_rman.c" starts at address 0xffffffff80c86ecc <sysctl_rman+540> and ends at 0xffffffff80c86ed5 <sysctl_rman+549>. The page fault happens exactly at: https://cgit.freebsd.org/src/tree/sys/kern/subr_bus.c?h=stable/13#n1386 Which is called from https://cgit.freebsd.org/src/tree/sys/kern/subr_rman.c?h=stable/13#n1052 I'm trying to figure out how the device could be removed or disconnected from the rman. I will try to create a patch to catch the device that leaves rman regions when destroyed/removed. Thanks, Roger.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Yj8lZWqeHbD%2BkfOQ>