From owner-freebsd-current@freebsd.org Mon Nov 30 00:16:19 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B9889A3CA1E for ; Mon, 30 Nov 2015 00:16:19 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 83AC51D8F for ; Mon, 30 Nov 2015 00:16:19 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id tAU0GB2c093099 for ; Sun, 29 Nov 2015 16:16:15 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201511300016.tAU0GB2c093099@gw.catspoiler.org> Date: Sun, 29 Nov 2015 16:16:11 -0800 (PST) From: Don Lewis Subject: panic: sbuf_vprintf called with a NULL sbuf pointer To: freebsd-current@FreeBSD.org MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Nov 2015 00:16:19 -0000 I got this strange panic the other day: panic: sbuf_vprintf called with a NULL sbuf pointer The machine was running this: FreeBSD zipper.catspoiler.org 11.0-CURRENT FreeBSD 11.0-CURRENT #31 r290224: Sat Oct 31 02:56:36 PDT 2015 dl@zipper.catspoiler.org:/usr/obj/usr/src/sys/GENERIC amd64 The trigger was: swap_pager: indefinite wait buffer: bufobj: 0 blkno: 1737153, size 4096 That triggered an attempt to kill a process and create a core file for it. Where things get strange is at #16 and #17 in the backtrace: #13 0xffffffff80a20676 in kassert_panic ( fmt=0xffffffff8137bf84 "%s called with a NULL sbuf pointer") at /usr/src/sys/kern/kern_shutdown.c:647 #14 0xffffffff80a66c51 in _assert_sbuf_integrity ( fun=0xffffffff8137bdf8 "sbuf_vprintf", s=0x0) at /usr/src/sys/kern/subr_sbuf.c:103 #15 0xffffffff80a6742e in sbuf_vprintf (s=0x0, fmt=, ap=) at /usr/src/sys/kern/subr_sbuf.c:583 #16 sbuf_printf (s=0x0, fmt=0xffffffff80fd8ad3 "%s") at /usr/src/sys/kern/subr_sbuf.c:677 #17 0xffffffff80a237d6 in corefile_open (compress=0, comm=, uid=, pid=, td=, vpp=, namep=) at /usr/src/sys/kern/kern_sig.c:3188 #18 coredump (td=0x0) at /usr/src/sys/kern/kern_sig.c:3337 #19 sigexit (td=0x0, sig=40) at /usr/src/sys/kern/kern_sig.c:2976 #20 0xffffffff80a6fcb8 in userret (td=0xfffff8057b0a59a0, frame=) at /usr/src/sys/kern/subr_trap.c:147 #21 0xffffffff80e68619 in syscallret (td=0xfffff8057b0a59a0, error=, sa=) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:187 #22 amd64_syscall (td=0xfffff8057b0a59a0, traced=0) at /usr/src/sys/amd64/amd64/trap.c:956 #23 0xffffffff80e47a3b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:394 #24 0x000000080cc9f12a in ?? () In #16, s is 0x0, which is strange because kern_sig.c:3188 is: #17 0xffffffff80a237d6 in corefile_open (compress=0, comm=, uid=, pid=, td=, vpp=, namep=) at /usr/src/sys/kern/kern_sig.c:3188 3188 sbuf_printf(&sb, "%s", comm); I don't see how &sb can be NULL ... Also interesting is: (kgdb) print sb $1 = I don't see how sb can be optimized out since its address is passed to sbuf_printf(). While format is optimized out, the corefilename isn't and format=corefilename (kgdb) print format $2 = (kgdb) print corefilename $3 = "%N.core", '\000' Since this is the first iteration of the loop, and the crash happens on the entry into sbuf_printf()->sbuf_vprintf(), it doesn't look like it could be caused by stack smash. Under other circumstances I've seen other processes dump core, and I can trigger core dumps by sending SIGQUIT to processes ... %sleep 10 & [1] 2456 %kill -QUIT %1 % [1] Quit sleep 10 (core dumped) Another other oddity is td=0x0 in frames 18 and 19. I'm guessing that kgdb is wrong about this because that should cause a NULL pointer dereference panic in sigexit(). Also sig=40 in sigexit() is strange since the highest numbered signal is 33. Lastly, I don't see how we get from userret() to sigexit(). Line 147 in subr_trap.c (userret()) is: WITNESS_WARN(WARN_PANIC, NULL, "userret: returning");