Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 3 Oct 2015 00:29:30 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        Ryan Stone <rysto32@gmail.com>, "freebsd-hackers@freebsd.org" <freebsd-hackers@FreeBSD.org>
Subject:   Re: How to get anything useful out of kgdb?
Message-ID:  <560EF73A.8050505@FreeBSD.org>
In-Reply-To: <1595419.L0rkNTMkPe@ralph.baldwin.cx>
References:  <554E41EE.2010202@ignoranthack.me> <CAFMmRNyM6Tc7P8rLJmMSVXOFkK4Tc0OCOtc=E9dLEtzKrEtjLg@mail.gmail.com> <560E238F.9050609@FreeBSD.org> <1595419.L0rkNTMkPe@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
On 02/10/2015 19:12, John Baldwin wrote:
> On Friday, October 02, 2015 09:26:23 AM Andriy Gapon wrote:
>> On 15/05/2015 20:57, Ryan Stone wrote:
>>> *Sigh*,  kgdb isn't unwinding the trap frame properly.  You can try this to
>>> figure out where it was running:
>>
>> I wonder, what is a reason for this?
>> Can that be fixed in kgdb itself?
>> It seems that usually kgdb handles trap frames just fine, but not always?
> 
> It should be fixable.  If this doesn't work under newer kgdb let me know and I'll
> try to fix it.

Okay, letting you know :-)
The backtraces from the in-tree kgdb and the newer kgdb both abort at the same
frame (output from the newer kgdb is in my message in another kgdb related thread).

> I did fix a few edge cases with special frame handling in the
> newer kgdb though those mostly had to do with fork_trampoline and possibly
> Xtimerint (and aside from fork_trampoline I think the fixes were mostly for i386
> where different handlers setup trapframes differently)
> 
>>> That gives you the top of the callstack at the time that the core was
>>> taken.  To get the rest of it, try:
>>>
>>> define trace_stack
>>>   set $frame_ptr=$arg0
>>>   set $iters=0
>>>   while $frame_ptr != 0 && $iters < $arg1
>>>     set $ret_addr=((char*)$frame_ptr) + sizeof(void*)
>>>     printf "frameptr=%p, ret_addr=%p\n", (void*)$frame_ptr, *(void**)$ret_addr
>>>     printf "    "
>>>     info line **(void***)$ret_addr
>>>     set $frame_ptr=*(void**)$frame_ptr
>>>     set $iters=$iters+1
>>>   end
>>> end
>>>
>>> trace_stack frame->tf_rbp 20
>>
>> Thank you for this script.
>> Here is an example from my practice.
>>
>> (kgdb) bt
>> #0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:291
>> #1  0xffffffff8063453f in kern_reboot (howto=260) at
>> /usr/src/sys/kern/kern_shutdown.c:359
>> #2  0xffffffff80634ba4 in vpanic (fmt=<value optimized out>, ap=<value optimized
>> out>) at /usr/src/sys/kern/kern_shutdown.c:635
>> #3  0xffffffff806348a3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:568
>> #4  0xffffffff8041bba7 in db_panic (addr=<value optimized out>, have_addr=false,
>> count=0, modif=0x0) at /usr/src/sys/ddb/db_command.c:473
>> #5  0xffffffff8041b67b in db_command (cmd_table=0x0) at
>> /usr/src/sys/ddb/db_command.c:440
>> #6  0xffffffff8041b524 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493
>> #7  0xffffffff8041de0b in db_trap (type=<value optimized out>, code=0) at
>> /usr/src/sys/ddb/db_main.c:251
>> #8  0xffffffff80669de8 in kdb_trap (type=19, code=0, tf=0xffffffff80f976d0) at
>> /usr/src/sys/kern/subr_kdb.c:653
>> #9  0xffffffff80820d26 in trap (frame=0xffffffff80f976d0) at
>> /usr/src/sys/amd64/amd64/trap.c:381
>> #10 0xffffffff80809623 in nmi_calltrap () at
>> /usr/src/sys/libkern/explicit_bzero.c:28
> 
> This may be part of the problem.  The trapframe unwinder depends on function names
> to know when it is crossing a trapframe.  nmi_calltrap() is not the function at
> explicit_bzero.c:28.  Usually debugging this sort of thing starts by going to frame 11
> and comparing its registers with the values in the trapframe.  They should match, but
> sometimes you will find them shifted by one or two, etc.

And it seems that nmi_calltrap being a label within an assembler-defined
procedure confuses the in-tree kgdb quite a lot:

(kgdb) list  *0xffffffff80809623
0xffffffff80809623 is at /usr/src/sys/libkern/explicit_bzero.c:28.
23      void
24      explicit_bzero(void *buf, size_t len)
25      {
26              memset(buf, 0, len);
27              __explicit_bzero_hook(buf, len);
28      }
(kgdb) list nmi_calltrap
23      void
24      explicit_bzero(void *buf, size_t len)
25      {
26              memset(buf, 0, len);
27              __explicit_bzero_hook(buf, len);
28      }
(kgdb) disassemble nmi_calltrap
Dump of assembler code for function nmi_calltrap:
0xffffffff8080961b <nmi_calltrap+0>:    mov    %rsp,%rdi
0xffffffff8080961e <nmi_calltrap+3>:    callq  0xffffffff80820670 <trap>
0xffffffff80809623 <nmi_calltrap+8>:    test   %ebx,%ebx
0xffffffff80809625 <nmi_calltrap+10>:   je     0xffffffff80809695 <nocallchain>
0xffffffff80809627 <nmi_calltrap+12>:   mov    %gs:0x0,%rax
0xffffffff80809630 <nmi_calltrap+21>:   or     %rax,%rax
0xffffffff80809633 <nmi_calltrap+24>:   je     0xffffffff80809695 <nocallchain>
0xffffffff80809635 <nmi_calltrap+26>:   testl  $0x400000,0xec(%rax)
0xffffffff8080963f <nmi_calltrap+36>:   je     0xffffffff80809695 <nocallchain>
0xffffffff80809641 <nmi_calltrap+38>:   mov    %rsp,%rsi
0xffffffff80809644 <nmi_calltrap+41>:   mov    $0xc0,%rcx
0xffffffff8080964b <nmi_calltrap+48>:   mov    %gs:0x220,%rdx
0xffffffff80809654 <nmi_calltrap+57>:   sub    %rcx,%rdx
0xffffffff80809657 <nmi_calltrap+60>:   mov    %rdx,%rdi
0xffffffff8080965a <nmi_calltrap+63>:   shr    $0x3,%rcx
0xffffffff8080965e <nmi_calltrap+67>:   cld
0xffffffff8080965f <nmi_calltrap+68>:   rep movsq %ds:(%rsi),%es:(%rdi)
0xffffffff80809662 <nmi_calltrap+71>:   mov    %ss,%eax
0xffffffff80809664 <nmi_calltrap+73>:   push   %rax
0xffffffff80809665 <nmi_calltrap+74>:   push   %rdx
0xffffffff80809666 <nmi_calltrap+75>:   pushfq
0xffffffff80809667 <nmi_calltrap+76>:   mov    %cs,%eax
0xffffffff80809669 <nmi_calltrap+78>:   push   %rax
0xffffffff8080966a <nmi_calltrap+79>:   pushq  $0xffffffff80809671
0xffffffff8080966f <nmi_calltrap+84>:   iretq
End of assembler dump.
(kgdb) disassemble explicit_bzero
Dump of assembler code for function explicit_bzero:
0xffffffff806e74c0 <explicit_bzero+0>:  push   %rbp
0xffffffff806e74c1 <explicit_bzero+1>:  mov    %rsp,%rbp
0xffffffff806e74c4 <explicit_bzero+4>:  push   %r14
0xffffffff806e74c6 <explicit_bzero+6>:  push   %rbx
0xffffffff806e74c7 <explicit_bzero+7>:  mov    %rsi,%r14
0xffffffff806e74ca <explicit_bzero+10>: mov    %rdi,%rbx
0xffffffff806e74cd <explicit_bzero+13>: callq  0xffffffff806e74f0 <memset>
0xffffffff806e74d2 <explicit_bzero+18>: mov    %rbx,%rdi
0xffffffff806e74d5 <explicit_bzero+21>: mov    %r14,%rsi
0xffffffff806e74d8 <explicit_bzero+24>: callq  0xffffffff8088a2d0
<__explicit_bzero_hook>
0xffffffff806e74dd <explicit_bzero+29>: pop    %rbx
0xffffffff806e74de <explicit_bzero+30>: pop    %r14
0xffffffff806e74e0 <explicit_bzero+32>: pop    %rbp
0xffffffff806e74e1 <explicit_bzero+33>: retq
End of assembler dump.


The newer kgdb is smarter about this situation:

(kgdb) list  *0xffffffff80809623
0xffffffff80809623 is at /usr/src/sys/amd64/amd64/exception.S:527.
522              * - Check if the thread requires a user call chain to be
523              *   captured.
524              *
525              * We are still in NMI mode at this point.
526              */
527             testl   %ebx,%ebx
528             jz      nocallchain     /* not from userspace */
529             movq    PCPU(CURTHREAD),%rax
530             orq     %rax,%rax       /* curthread present? */
531             jz      nocallchain

However, that does not seem to help with stack unwinding.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?560EF73A.8050505>