From owner-freebsd-stable@freebsd.org Mon Jul 24 01:44:51 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5F114DB33D4 for ; Mon, 24 Jul 2017 01:44:51 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-pf0-x241.google.com (mail-pf0-x241.google.com [IPv6:2607:f8b0:400e:c00::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2F22E76A91 for ; Mon, 24 Jul 2017 01:44:51 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-pf0-x241.google.com with SMTP id y25so1884194pfk.4 for ; Sun, 23 Jul 2017 18:44:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=puA80O/Ehas4QxeROoPi0qjoqvkV+eUMeYJ/6LKM4lQ=; b=e9wUag7tealqj+G1lSfw1blQbn+c0xQ6gCw4Jo1RDHrFgsIbbNs6FSkro9ohN39Osy X89BDMgdlxPQOKFI3n0R8hfKVPJDVZjsnVI/jrwhC/mU4pecHV/i+B7uO8rGzMb1rz9v UQl/csftTWt5nFzFAWFdJlCiW2KOiR2oZMKgfGWsJw6dw4vs8KCUoBHKsqxafsLV0sCT F5NQU4ai2XZQBo8xADD2PBYetIbanQT0KB8Tj46JO6iU/P2jQYSW5CnHehDi6jsgF0lK bzF2QEgCgyAPik+DK4XFBRq6FT5V8NPy9d9cEOzE65yWTs4i7s+0DlS9iRsrSt2ty7Ft Iqfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=puA80O/Ehas4QxeROoPi0qjoqvkV+eUMeYJ/6LKM4lQ=; b=JShVcmzP28t4+dXLatfGl0n9hUSXPL0t8NA/JCAv/HE+JuiRNPMNPvmXrBPiRffhPL jInWubvWSJMpkxQsh7lzNSPYWXQi77SK0q6K/pE2K2UKHnlj2f6v6R+5bnRZ0qc4bt38 JVISdBzdvfH3SYk4ejLo3Vfigh1vDUYsHJxeYuqJeaEOG7zCOHT+5K7A5x6eVZbA76+g 9cwvnvh4FbjhanLJA747UkRoOgBybv4Ju/kjPVhuk/UUyV8i2d19B1UKRXPUGhYVkpkm iORPWwzjDicwJacRJrX7uxHXiAB72AAnOng8+lA/gmH7cIsR2qxld0DCFOwhyVMbKRqZ OKvA== X-Gm-Message-State: AIVw1127azSiUEbIxWcsgJznG2Kgh3v323G8QY5N3duQ6bKnJbBxtujD y0uj7I8HWpgEJBFB X-Received: by 10.99.137.194 with SMTP id v185mr14947903pgd.279.1500860690692; Sun, 23 Jul 2017 18:44:50 -0700 (PDT) Received: from raichu ([2604:4080:1102:0:ca60:ff:fe9d:3963]) by smtp.gmail.com with ESMTPSA id y26sm19154056pfk.46.2017.07.23.18.44.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 23 Jul 2017 18:44:49 -0700 (PDT) Sender: Mark Johnston Date: Sun, 23 Jul 2017 18:44:46 -0700 From: Mark Johnston To: Eugene Grosbein Cc: FreeBSD Stable Subject: Re: stable/11 debugging kernel unable to produce crashdump again Message-ID: <20170724014445.GA20872@raichu> References: <587928B3.2050607@grosbein.net> <20170113193726.GC77535@wkstn-mjohnston.west.isilon.com> <587A0E12.7070205@grosbein.net> <59746BD5.5010301@grosbein.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <59746BD5.5010301@grosbein.net> User-Agent: Mutt/1.8.3 (2017-05-23) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Jul 2017 01:44:51 -0000 On Sun, Jul 23, 2017 at 04:26:45PM +0700, Eugene Grosbein wrote: > On 14.01.2017 18:40, Eugene Grosbein wrote: > > > >> I suspect that this is because we only stop the scheduler upon a panic > >> if SMP is configured. Can you retest with the patch below applied? > >> > >> Index: sys/kern/kern_shutdown.c > >> =================================================================== > >> --- sys/kern/kern_shutdown.c (revision 312082) > >> +++ sys/kern/kern_shutdown.c (working copy) > >> @@ -713,6 +713,7 @@ > >> CPU_CLR(PCPU_GET(cpuid), &other_cpus); > >> stop_cpus_hard(other_cpus); > >> } > >> +#endif > >> > >> /* > >> * Ensure that the scheduler is stopped while panicking, even if panic > >> @@ -719,7 +720,6 @@ > >> * has been entered from kdb. > >> */ > >> td->td_stopsched = 1; > >> -#endif > >> > >> bootopt = RB_AUTOBOOT; > >> newpanic = 0; > >> > >> > > > > Indeed, my router is uniprocessor system and your patch really solves the problem. > > Now kernel generates crashdump just fine in case of panic. Please commit the fix, thanks! > > Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump: Is this amd64 GENERIC, or something else? > > - "call doadump" from DDB prompt works just fine; > - "shutdown -r now" reboots the system without problems; > - "sysctl debug.kdb.panic=1" triggers a panic just fine but system hangs just afer showing uptime > instead of continuing with crashdump generation; same if "real" panic occurs. > > Same for debug.minidump set to 1 or 0. How do I debug this? I'm not able to reproduce the problem in bhyve using r321401. Looking at the code, the culprits might be cngrab(), or one of the shutdown_post_sync eventhandlers. Since you're apparently able to see the console output at the time of the panic, I guess it's probably the latter. Could you try your test with the patch below applied? It'll print a bunch of "entering post_sync"/"leaving post_sync" messages with addresses that can be resolved using kgdb. That'll help determine where we're getting stuck. Index: sys/sys/eventhandler.h =================================================================== --- sys/sys/eventhandler.h (revision 321401) +++ sys/sys/eventhandler.h (working copy) @@ -85,7 +85,11 @@ _t = (struct eventhandler_entry_ ## name *)_ep; \ CTR1(KTR_EVH, "eventhandler_invoke: executing %p", \ (void *)_t->eh_func); \ + if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \ + printf("entering post_sync %p\n", (void *)_t->eh_func); \ _t->eh_func(_ep->ee_arg , ## __VA_ARGS__); \ + if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \ + printf("leaving post_sync %p\n", (void *)_t->eh_func); \ EHL_LOCK((list)); \ } \ } \