From owner-freebsd-hackers@freebsd.org Fri Oct 2 21:30:28 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ABB12A0D06D for ; Fri, 2 Oct 2015 21:30:28 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 9AEE71767; Fri, 2 Oct 2015 21:30:27 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id AAA16209; Sat, 03 Oct 2015 00:30:25 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Zi7uL-0000OX-Hw; Sat, 03 Oct 2015 00:30:25 +0300 Subject: Re: How to get anything useful out of kgdb? To: John Baldwin References: <554E41EE.2010202@ignoranthack.me> <560E238F.9050609@FreeBSD.org> <1595419.L0rkNTMkPe@ralph.baldwin.cx> Cc: Ryan Stone , "freebsd-hackers@freebsd.org" From: Andriy Gapon X-Enigmail-Draft-Status: N1110 Message-ID: <560EF73A.8050505@FreeBSD.org> Date: Sat, 3 Oct 2015 00:29:30 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <1595419.L0rkNTMkPe@ralph.baldwin.cx> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Oct 2015 21:30:28 -0000 On 02/10/2015 19:12, John Baldwin wrote: > On Friday, October 02, 2015 09:26:23 AM Andriy Gapon wrote: >> On 15/05/2015 20:57, Ryan Stone wrote: >>> *Sigh*, kgdb isn't unwinding the trap frame properly. You can try this to >>> figure out where it was running: >> >> I wonder, what is a reason for this? >> Can that be fixed in kgdb itself? >> It seems that usually kgdb handles trap frames just fine, but not always? > > It should be fixable. If this doesn't work under newer kgdb let me know and I'll > try to fix it. Okay, letting you know :-) The backtraces from the in-tree kgdb and the newer kgdb both abort at the same frame (output from the newer kgdb is in my message in another kgdb related thread). > I did fix a few edge cases with special frame handling in the > newer kgdb though those mostly had to do with fork_trampoline and possibly > Xtimerint (and aside from fork_trampoline I think the fixes were mostly for i386 > where different handlers setup trapframes differently) > >>> That gives you the top of the callstack at the time that the core was >>> taken. To get the rest of it, try: >>> >>> define trace_stack >>> set $frame_ptr=$arg0 >>> set $iters=0 >>> while $frame_ptr != 0 && $iters < $arg1 >>> set $ret_addr=((char*)$frame_ptr) + sizeof(void*) >>> printf "frameptr=%p, ret_addr=%p\n", (void*)$frame_ptr, *(void**)$ret_addr >>> printf " " >>> info line **(void***)$ret_addr >>> set $frame_ptr=*(void**)$frame_ptr >>> set $iters=$iters+1 >>> end >>> end >>> >>> trace_stack frame->tf_rbp 20 >> >> Thank you for this script. >> Here is an example from my practice. >> >> (kgdb) bt >> #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:291 >> #1 0xffffffff8063453f in kern_reboot (howto=260) at >> /usr/src/sys/kern/kern_shutdown.c:359 >> #2 0xffffffff80634ba4 in vpanic (fmt=, ap=> out>) at /usr/src/sys/kern/kern_shutdown.c:635 >> #3 0xffffffff806348a3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:568 >> #4 0xffffffff8041bba7 in db_panic (addr=, have_addr=false, >> count=0, modif=0x0) at /usr/src/sys/ddb/db_command.c:473 >> #5 0xffffffff8041b67b in db_command (cmd_table=0x0) at >> /usr/src/sys/ddb/db_command.c:440 >> #6 0xffffffff8041b524 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493 >> #7 0xffffffff8041de0b in db_trap (type=, code=0) at >> /usr/src/sys/ddb/db_main.c:251 >> #8 0xffffffff80669de8 in kdb_trap (type=19, code=0, tf=0xffffffff80f976d0) at >> /usr/src/sys/kern/subr_kdb.c:653 >> #9 0xffffffff80820d26 in trap (frame=0xffffffff80f976d0) at >> /usr/src/sys/amd64/amd64/trap.c:381 >> #10 0xffffffff80809623 in nmi_calltrap () at >> /usr/src/sys/libkern/explicit_bzero.c:28 > > This may be part of the problem. The trapframe unwinder depends on function names > to know when it is crossing a trapframe. nmi_calltrap() is not the function at > explicit_bzero.c:28. Usually debugging this sort of thing starts by going to frame 11 > and comparing its registers with the values in the trapframe. They should match, but > sometimes you will find them shifted by one or two, etc. And it seems that nmi_calltrap being a label within an assembler-defined procedure confuses the in-tree kgdb quite a lot: (kgdb) list *0xffffffff80809623 0xffffffff80809623 is at /usr/src/sys/libkern/explicit_bzero.c:28. 23 void 24 explicit_bzero(void *buf, size_t len) 25 { 26 memset(buf, 0, len); 27 __explicit_bzero_hook(buf, len); 28 } (kgdb) list nmi_calltrap 23 void 24 explicit_bzero(void *buf, size_t len) 25 { 26 memset(buf, 0, len); 27 __explicit_bzero_hook(buf, len); 28 } (kgdb) disassemble nmi_calltrap Dump of assembler code for function nmi_calltrap: 0xffffffff8080961b : mov %rsp,%rdi 0xffffffff8080961e : callq 0xffffffff80820670 0xffffffff80809623 : test %ebx,%ebx 0xffffffff80809625 : je 0xffffffff80809695 0xffffffff80809627 : mov %gs:0x0,%rax 0xffffffff80809630 : or %rax,%rax 0xffffffff80809633 : je 0xffffffff80809695 0xffffffff80809635 : testl $0x400000,0xec(%rax) 0xffffffff8080963f : je 0xffffffff80809695 0xffffffff80809641 : mov %rsp,%rsi 0xffffffff80809644 : mov $0xc0,%rcx 0xffffffff8080964b : mov %gs:0x220,%rdx 0xffffffff80809654 : sub %rcx,%rdx 0xffffffff80809657 : mov %rdx,%rdi 0xffffffff8080965a : shr $0x3,%rcx 0xffffffff8080965e : cld 0xffffffff8080965f : rep movsq %ds:(%rsi),%es:(%rdi) 0xffffffff80809662 : mov %ss,%eax 0xffffffff80809664 : push %rax 0xffffffff80809665 : push %rdx 0xffffffff80809666 : pushfq 0xffffffff80809667 : mov %cs,%eax 0xffffffff80809669 : push %rax 0xffffffff8080966a : pushq $0xffffffff80809671 0xffffffff8080966f : iretq End of assembler dump. (kgdb) disassemble explicit_bzero Dump of assembler code for function explicit_bzero: 0xffffffff806e74c0 : push %rbp 0xffffffff806e74c1 : mov %rsp,%rbp 0xffffffff806e74c4 : push %r14 0xffffffff806e74c6 : push %rbx 0xffffffff806e74c7 : mov %rsi,%r14 0xffffffff806e74ca : mov %rdi,%rbx 0xffffffff806e74cd : callq 0xffffffff806e74f0 0xffffffff806e74d2 : mov %rbx,%rdi 0xffffffff806e74d5 : mov %r14,%rsi 0xffffffff806e74d8 : callq 0xffffffff8088a2d0 <__explicit_bzero_hook> 0xffffffff806e74dd : pop %rbx 0xffffffff806e74de : pop %r14 0xffffffff806e74e0 : pop %rbp 0xffffffff806e74e1 : retq End of assembler dump. The newer kgdb is smarter about this situation: (kgdb) list *0xffffffff80809623 0xffffffff80809623 is at /usr/src/sys/amd64/amd64/exception.S:527. 522 * - Check if the thread requires a user call chain to be 523 * captured. 524 * 525 * We are still in NMI mode at this point. 526 */ 527 testl %ebx,%ebx 528 jz nocallchain /* not from userspace */ 529 movq PCPU(CURTHREAD),%rax 530 orq %rax,%rax /* curthread present? */ 531 jz nocallchain However, that does not seem to help with stack unwinding. -- Andriy Gapon