Date: Tue, 8 Dec 2015 11:39:56 -0800 (PST) From: Don Lewis <truckman@FreeBSD.org> To: jhb@freebsd.org Cc: freebsd-current@freebsd.org Subject: Re: panic: sbuf_vprintf called with a NULL sbuf pointer Message-ID: <201512081940.tB8JduPg003988@gw.catspoiler.org> In-Reply-To: <2133429.08nAJynLee@ralph.baldwin.cx>
next in thread | previous in thread | raw e-mail | index | archive | help
On 8 Dec, John Baldwin wrote: > On Monday, December 07, 2015 10:10:51 PM Don Lewis wrote: >> On 2 Dec, John Baldwin wrote: >> > On Wednesday, December 02, 2015 01:25:56 PM Don Lewis wrote: >> >> > If you want to look at this further, try going to frame 16 and dissassembling the >> >> > instructions before the call to see if you can spot which register the first >> >> > parameter (saved in %rdi IIRC) comes from. >> >> >> >> Dump of assembler code for function sbuf_printf: >> >> 0xffffffff80a673e0 <+0>: push %rbp >> >> 0xffffffff80a673e1 <+1>: mov %rsp,%rbp >> >> 0xffffffff80a673e4 <+4>: push %r14 >> >> 0xffffffff80a673e6 <+6>: push %rbx >> >> 0xffffffff80a673e7 <+7>: sub $0x50,%rsp >> >> 0xffffffff80a673eb <+11>: mov %rsi,%r14 >> >> 0xffffffff80a673ee <+14>: mov %rdi,%rbx >> >> 0xffffffff80a673f1 <+17>: mov %r9,-0x38(%rbp) >> >> 0xffffffff80a673f5 <+21>: mov %r8,-0x40(%rbp) >> >> 0xffffffff80a673f9 <+25>: mov %rcx,-0x48(%rbp) >> >> 0xffffffff80a673fd <+29>: mov %rdx,-0x50(%rbp) >> >> 0xffffffff80a67401 <+33>: lea -0x60(%rbp),%rax >> >> 0xffffffff80a67405 <+37>: mov %rax,-0x20(%rbp) >> >> 0xffffffff80a67409 <+41>: lea 0x10(%rbp),%rax >> >> 0xffffffff80a6740d <+45>: mov %rax,-0x28(%rbp) >> >> 0xffffffff80a67411 <+49>: movl $0x30,-0x2c(%rbp) >> >> 0xffffffff80a67418 <+56>: movl $0x10,-0x30(%rbp) >> >> 0xffffffff80a6741f <+63>: mov $0xffffffff8137bdf8,%rdi >> >> 0xffffffff80a67426 <+70>: mov %rbx,%rsi >> >> 0xffffffff80a67429 <+73>: callq 0xffffffff80a66c00 <_assert_sbuf_integrity> >> >> >> >> >> >> 0xffffffff80a237b9 <+825>: jmpq 0xffffffff80a236fd <sigexit+637> >> >> 0xffffffff80a237be <+830>: mov $0xffffffff80fd8ad3,%rsi >> >> 0xffffffff80a237c5 <+837>: xor %eax,%eax >> >> 0xffffffff80a237c7 <+839>: mov %r12,%rdi >> >> 0xffffffff80a237ca <+842>: mov -0x228(%rbp),%rdx >> >> 0xffffffff80a237d1 <+849>: callq 0xffffffff80a673e0 <sbuf_printf> >> >> => 0xffffffff80a237d6 <+854>: inc %r14d >> >> 0xffffffff80a237d9 <+857>: jmpq 0xffffffff80a236fd <sigexit+637> >> > >> > So maybe try 'p $r12' in the corefile_open() frame. >> >> #17 0xffffffff80a237d6 in corefile_open (compress=0, comm=<optimized out>, >> uid=<optimized out>, pid=<optimized out>, td=<optimized out>, >> vpp=<optimized out>, namep=<optimized out>) >> at /usr/src/sys/kern/kern_sig.c:3188 >> 3188 sbuf_printf(&sb, "%s", comm); >> (kgdb) p $r12 >> $1 = 0 > > So it's definitely zero. :( The next step is probably to disassemble the > corefile_open function (ugh) and walk backwards to find where %r12 is read > from. It might be from a local variable on the stack, so then you would > want to examine that memory in the stack and the surrounding memory to see > if there is memory corruption and perhaps if there is anything recognizable > about it (e.g. if the corruption contains some sort of data you recognize, > or if the corruption is bounded by a certain length, etc.). It's a bit of > a shot in the dark though. > > Is this reproducible? No it's not. The only time it happened was when there was a swap timeout, probably because of a lengthy deep recovery on one of the mirrored swap devices. The code in question is: struct sbuf sb; [snip] (void)sbuf_new(&sb, name, MAXPATHLEN, SBUF_FIXEDLEN); [snip] for (i = 0; format[i] != '\0'; i++) { switch (format[i]) { case '%': /* Format character */ i++; switch (format[i]) { [snip] case 'N': /* process name */ sbuf_printf(&sb, "%s", comm); break; &sb is used in a bunch of places, so the compiler probably computes its value once by adding the proper offset to the stack pointer and stashing the result in a register. Since kern.corefile is "%N.core", the failure is happening on the first interation of the loop, so there isn't much opportunity for things to get corrupted. Also, the control flow in this function only depends on the format, so there shouldn't be anything special about a swap timeout vs. a segfault generated core. How is gdb able to print the register contents for an arbitrary stack frame? It's not like this is a SPARC with register windows. Aren't only the final register values when the core dump was generated saved along with the memory contents? Unless a register contents are pushed onto the stack, I would think that if a callee overwrites a register, its contents are gone.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201512081940.tB8JduPg003988>