From owner-freebsd-current@freebsd.org Wed Dec 9 18:23:45 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CD6999D68F8 for ; Wed, 9 Dec 2015 18:23:45 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8DC6B100F; Wed, 9 Dec 2015 18:23:45 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 8F65BB94E; Wed, 9 Dec 2015 13:23:43 -0500 (EST) From: John Baldwin To: Don Lewis Cc: freebsd-current@freebsd.org Subject: Re: panic: sbuf_vprintf called with a NULL sbuf pointer Date: Tue, 08 Dec 2015 12:27:48 -0800 Message-ID: <2246380.LId0aa2Jcy@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: <201512081940.tB8JduPg003988@gw.catspoiler.org> References: <201512081940.tB8JduPg003988@gw.catspoiler.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 09 Dec 2015 13:23:43 -0500 (EST) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Dec 2015 18:23:45 -0000 On Tuesday, December 08, 2015 11:39:56 AM Don Lewis wrote: > On 8 Dec, John Baldwin wrote: > > On Monday, December 07, 2015 10:10:51 PM Don Lewis wrote: > >> On 2 Dec, John Baldwin wrote: > >> > On Wednesday, December 02, 2015 01:25:56 PM Don Lewis wrote: > >> >> > If you want to look at this further, try going to frame 16 and dissassembling the > >> >> > instructions before the call to see if you can spot which register the first > >> >> > parameter (saved in %rdi IIRC) comes from. > >> >> > >> >> Dump of assembler code for function sbuf_printf: > >> >> 0xffffffff80a673e0 <+0>: push %rbp > >> >> 0xffffffff80a673e1 <+1>: mov %rsp,%rbp > >> >> 0xffffffff80a673e4 <+4>: push %r14 > >> >> 0xffffffff80a673e6 <+6>: push %rbx > >> >> 0xffffffff80a673e7 <+7>: sub $0x50,%rsp > >> >> 0xffffffff80a673eb <+11>: mov %rsi,%r14 > >> >> 0xffffffff80a673ee <+14>: mov %rdi,%rbx > >> >> 0xffffffff80a673f1 <+17>: mov %r9,-0x38(%rbp) > >> >> 0xffffffff80a673f5 <+21>: mov %r8,-0x40(%rbp) > >> >> 0xffffffff80a673f9 <+25>: mov %rcx,-0x48(%rbp) > >> >> 0xffffffff80a673fd <+29>: mov %rdx,-0x50(%rbp) > >> >> 0xffffffff80a67401 <+33>: lea -0x60(%rbp),%rax > >> >> 0xffffffff80a67405 <+37>: mov %rax,-0x20(%rbp) > >> >> 0xffffffff80a67409 <+41>: lea 0x10(%rbp),%rax > >> >> 0xffffffff80a6740d <+45>: mov %rax,-0x28(%rbp) > >> >> 0xffffffff80a67411 <+49>: movl $0x30,-0x2c(%rbp) > >> >> 0xffffffff80a67418 <+56>: movl $0x10,-0x30(%rbp) > >> >> 0xffffffff80a6741f <+63>: mov $0xffffffff8137bdf8,%rdi > >> >> 0xffffffff80a67426 <+70>: mov %rbx,%rsi > >> >> 0xffffffff80a67429 <+73>: callq 0xffffffff80a66c00 <_assert_sbuf_integrity> > >> >> > >> >> > >> >> 0xffffffff80a237b9 <+825>: jmpq 0xffffffff80a236fd > >> >> 0xffffffff80a237be <+830>: mov $0xffffffff80fd8ad3,%rsi > >> >> 0xffffffff80a237c5 <+837>: xor %eax,%eax > >> >> 0xffffffff80a237c7 <+839>: mov %r12,%rdi > >> >> 0xffffffff80a237ca <+842>: mov -0x228(%rbp),%rdx > >> >> 0xffffffff80a237d1 <+849>: callq 0xffffffff80a673e0 > >> >> => 0xffffffff80a237d6 <+854>: inc %r14d > >> >> 0xffffffff80a237d9 <+857>: jmpq 0xffffffff80a236fd > >> > > >> > So maybe try 'p $r12' in the corefile_open() frame. > >> > >> #17 0xffffffff80a237d6 in corefile_open (compress=0, comm=, > >> uid=, pid=, td=, > >> vpp=, namep=) > >> at /usr/src/sys/kern/kern_sig.c:3188 > >> 3188 sbuf_printf(&sb, "%s", comm); > >> (kgdb) p $r12 > >> $1 = 0 > > > > So it's definitely zero. :( The next step is probably to disassemble the > > corefile_open function (ugh) and walk backwards to find where %r12 is read > > from. It might be from a local variable on the stack, so then you would > > want to examine that memory in the stack and the surrounding memory to see > > if there is memory corruption and perhaps if there is anything recognizable > > about it (e.g. if the corruption contains some sort of data you recognize, > > or if the corruption is bounded by a certain length, etc.). It's a bit of > > a shot in the dark though. > > > > Is this reproducible? > > No it's not. The only time it happened was when there was a swap > timeout, probably because of a lengthy deep recovery on one of the > mirrored swap devices. > > The code in question is: > struct sbuf sb; > [snip] > (void)sbuf_new(&sb, name, MAXPATHLEN, SBUF_FIXEDLEN); > [snip] > for (i = 0; format[i] != '\0'; i++) { > switch (format[i]) { > case '%': /* Format character */ > i++; > switch (format[i]) { > [snip] > case 'N': /* process name */ > sbuf_printf(&sb, "%s", comm); > break; > > > &sb is used in a bunch of places, so the compiler probably computes its > value once by adding the proper offset to the stack pointer and stashing > the result in a register. Since kern.corefile is "%N.core", the failure > is happening on the first interation of the loop, so there isn't much > opportunity for things to get corrupted. Also, the control flow in this > function only depends on the format, so there shouldn't be anything > special about a swap timeout vs. a segfault generated core. Yes, r12 is call-safe (IIRC), so I expect it only computes it once as well, but I've sometimes seen the compiler spill local vars onto the stack due to register pressure. That said, I think it is unlikely it would have to spill &sb during the early part of the function. :( > How is gdb able to print the register contents for an arbitrary stack > frame? It's not like this is a SPARC with register windows. Aren't > only the final register values when the core dump was generated saved > along with the memory contents? Unless a register contents are pushed > onto the stack, I would think that if a callee overwrites a register, > its contents are gone. My understanding is that when the compiler saves a register (e.g. on the stack during a function prologue) it also emits DWARF records that indicate where it saved each register. As the stack unwinder walks back up the stack it populates the current "register view" based on the debug info at that frame (e.g. register X is stored at address Y, register Z is gone ( in gdb parlance)). -- John Baldwin