From owner-freebsd-current@freebsd.org Sat Mar 31 00:28:11 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 30E75F76A97 for ; Sat, 31 Mar 2018 00:28:11 +0000 (UTC) (envelope-from areilly@bigpond.net.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id BC9EA8779C for ; Sat, 31 Mar 2018 00:28:10 +0000 (UTC) (envelope-from areilly@bigpond.net.au) Received: by mailman.ysv.freebsd.org (Postfix) id 6DF0DF76A93; Sat, 31 Mar 2018 00:28:10 +0000 (UTC) Delivered-To: current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 46CCDF76A92 for ; Sat, 31 Mar 2018 00:28:10 +0000 (UTC) (envelope-from areilly@bigpond.net.au) Received: from nsstlmta33p.bpe.bigpond.com (nsstlmta33p.bpe.bigpond.com [203.38.21.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "", Issuer "Openwave Messaging Inc." (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id F19A487783; Sat, 31 Mar 2018 00:28:05 +0000 (UTC) (envelope-from areilly@bigpond.net.au) Received: from smtp.telstra.com ([10.10.24.4]) by nsstlfep33p-svc.bpe.nexus.telstra.com.au with ESMTP id <20180331002753.HTZY15908.nsstlfep33p-svc.bpe.nexus.telstra.com.au@smtp.telstra.com>; Sat, 31 Mar 2018 11:27:53 +1100 X-RG-Spam: Unknown X-RazorGate-Vade: gggruggvucftvghtrhhoucdtuddrgedtgedrfeehgdefiecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfupfevtfgpvffgnffuvfftteenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhepfffhvffukfhfgggtugfgjggfsehtkeertddtreejnecuhfhrohhmpeetnhgurhgvficutfgvihhllhihuceorghrvghilhhlhiessghighhpohhnugdrnhgvthdrrghuqeenucffohhmrghinhepfhhrvggvsghsugdrohhrghdprggtqdhrrdhnuhenucfkphepuddvgedrudeltddrgedtrddukedvnecurfgrrhgrmhephhgvlhhopegkvghnrdgrtgdqrhdrnhhupdhinhgvthepuddvgedrudeltddrgedtrddukedvpdhm X-RG-VS-CLASS: clean X-Authentication-Info: Submitted using ID areilly@bigpond.net.au Received: from Zen.ac-r.nu (124.190.40.182) by smtp.telstra.com (9.0.019.22-1) (authenticated as areilly@bigpond.net.au) id 5A614436199D079C; Sat, 31 Mar 2018 11:27:53 +1100 Date: Sat, 31 Mar 2018 11:27:46 +1100 From: Andrew Reilly To: Jonathan Looney Cc: FreeBSD Current , Warner Losh , jtl@freebsd.org Subject: Re: 12-Current panics on boot (didn't a week ago.) Message-ID: <20180331002746.GA2466@Zen.ac-r.nu> References: <20180324035653.GA3411@Zen.ac-r.nu> <20180324232206.GA2457@Zen.ac-r.nu> <20180325032110.GA10881@Zen.ac-r.nu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Mar 2018 00:28:11 -0000 Hi Jonathan, all, I've just compiled and booted a kernel derived from current-GENERIC but with nooptions TCP_BLACKBOX, and much to my surprise it boots. Possible link to network-related activities is that the next line of boot output that was not being displayed during the crash is: [ath_hal] loaded That's vaguely network-shaped: could it be an issue? Please let me know if there's anything else that I could test or poke, in order to find the real culprit. My make.conf says: KERNCONF=ZEN WRKDIRPREFIX=/usr/obj/ports MALLOC_PRODUCTION=yes My /usr/src/sys/amd64/conf/ZEN says: include GENERIC nooptions TCP_BLACKBOX Uname -a says: FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r331768M: Sat Mar 31 10:47:52 AEDT 2018 root@Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN amd64 Cheers, Andrew Here's the top part of the new dmesg.boot, FYI: Copyright (c) 1992-2018 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 12.0-CURRENT #0 r331768M: Sat Mar 31 10:47:52 AEDT 2018 root@Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN amd64 FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0) WARNING: WITNESS option enabled, expect reduced performance. VT(vga): resolution 640x480 CPU: AMD Ryzen 7 1700 Eight-Core Processor (2994.45-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 Features=0x178bfbff Features2=0x7ed8320b AMD Features=0x2e500800 AMD Features2=0x35c233ff Structured Extended Features=0x209c01a9 XSAVE Features=0xf AMD Extended Feature Extensions ID EBX=0x7 SVM: (disabled in BIOS) NP,NRIP,VClean,AFlush,DAssist,NAsids=32768 TSC: P-state invariant, performance statistics real memory = 34359738368 (32768 MB) avail memory = 33271214080 (31729 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 2 cache groups x 4 core(s) random: unblocking device. Firmware Warning (ACPI): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x0000000000000000/0x1 (20180313/tbfadt-796) ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-55 on motherboard SMP: AP CPU #7 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #1 Launched! Timecounter "TSC-low" frequency 1497224985 Hz quality 1000 random: entropy device external interface [ath_hal] loaded module_register_init: MOD_LOAD (vesa, 0xffffffff8109f600, 0) error 19 random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" kbd1 at kbdmux0 netmap: loaded module nexus0 vtvga0: on motherboard cryptosoft0: on motherboard aesni0: on motherboard acpi0: on motherboard acpi0: Power Button (fixed) cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 cpu4: on acpi0 cpu5: on acpi0 cpu6: on acpi0 cpu7: on acpi0 attimer0: port 0x40-0x43 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 atrtc0: port 0x70-0x71 on acpi0 atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 hpet0: iomem 0xfed00000-0xfed003ff irq 0,8 on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 350 Event timer "HPET1" frequency 14318180 Hz quality 350 Event timer "HPET2" frequency 14318180 Hz quality 350 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 amdsmn0: on hostb0 amdtemp0: on hostb0 On Sun, Mar 25, 2018 at 04:35:31AM +0000, Jonathan Looney wrote: > For now, you can update through r331485 and then take TCP_BLACKBOX out of > your kernel config file. That won’t really “fix” anything, but should at > least get you a booting system (assuming the new code from r331347 is > really triggering a problem). > > > I’ll take another look to see if I missed something in the commit. But, at > the moment, I’m hard-pressed to see how r331347 would cause the problem you > describe. > > > Jonathan > > On Sat, Mar 24, 2018 at 9:17 PM Andrew Reilly > wrote: > > > OK, I've completed the search: r331346 works, r331347 panics > > somewhere in the initialization of random. > > > > In the 331347 change (Add the "TCP Blackbox Recorder") I can't see > > anything obvious to tweak, unfortunately. It's a fair chunk of new > > code but it's all network-stack related, and my kernel is panicking > > long before any network activity happens. > > > > Any suggestions? > > > > Cheers, > > > > Andrew > > > > On Sat, Mar 24, 2018 at 05:23:18PM -0600, Warner Losh wrote: > > > Thanks Andrew... I can't recreate this on my VM nor my real hardware. > > > > > > Warner > > > > > > On Sat, Mar 24, 2018 at 5:22 PM, Andrew Reilly > > > wrote: > > > > > > > So, r331464 crashes in the same place, on my system. r331064 still > > boots > > > > OK. I'll keep searching. > > > > > > > > One week ago there was a change to randomdev to poll for signals every > > so > > > > often, as a defence against very large reads. That wouldn't have > > > > introduced a race somewhere, > > > > or left things in an unexpected state, perhaps? That change (r331070) > > by > > > > cem@ is just a few revisions after the one that is working for me. > > I'll > > > > start looking there... > > > > > > > > Cheers, > > > > > > > > Andrew > > > > > > > > On Sun, Mar 25, 2018 at 07:49:17AM +1100, Andrew Reilly wrote: > > > > > Hi Warner, > > > > > > > > > > The breakage was in 331470, and at least one version earlier, that I > > > > updated past when it panicked. > > > > > > > > > > I'm guessing that kdb's inability to dump would be down to it not > > having > > > > found any disk devices yet, right? So yes, bisecting to narrow down > > the > > > > issue is probably the best bet. I'll try your r331464: if that works > > that > > > > leaves only four or five revisions. Of course the breakage could be > > > > hardware specific. > > > > > > > > > > Cheers, > > > > > -- > > > > > Andrew > > > > > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"