Date: Sat, 31 Mar 2018 11:27:46 +1100 From: Andrew Reilly <areilly@bigpond.net.au> To: Jonathan Looney <jonlooney@gmail.com> Cc: FreeBSD Current <current@freebsd.org>, Warner Losh <imp@bsdimp.com>, jtl@freebsd.org Subject: Re: 12-Current panics on boot (didn't a week ago.) Message-ID: <20180331002746.GA2466@Zen.ac-r.nu> In-Reply-To: <CADrOrmstNpFFK%2BoobR5yWELSmTz4_C_CV1JYiVRMpDwZ6cr_yw@mail.gmail.com> References: <20180324035653.GA3411@Zen.ac-r.nu> <CANCZdfozmyxC5MuNS8Tu_LD1bbAYNTnTcPe52-6sz9KPCQou_Q@mail.gmail.com> <B70A5BB4-CC2B-4503-8998-2A360D24E0CF@bigpond.net.au> <20180324232206.GA2457@Zen.ac-r.nu> <CANCZdfovA1MiWhp6ueSaWTCtKv31wHW6B5-pS2rCLmspDuHeTw@mail.gmail.com> <20180325032110.GA10881@Zen.ac-r.nu> <CADrOrmstNpFFK%2BoobR5yWELSmTz4_C_CV1JYiVRMpDwZ6cr_yw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Jonathan, all, I've just compiled and booted a kernel derived from current-GENERIC but with nooptions TCP_BLACKBOX, and much to my surprise it boots. Possible link to network-related activities is that the next line of boot output that was not being displayed during the crash is: [ath_hal] loaded That's vaguely network-shaped: could it be an issue? Please let me know if there's anything else that I could test or poke, in order to find the real culprit. My make.conf says: KERNCONF=ZEN WRKDIRPREFIX=/usr/obj/ports MALLOC_PRODUCTION=yes My /usr/src/sys/amd64/conf/ZEN says: include GENERIC nooptions TCP_BLACKBOX Uname -a says: FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r331768M: Sat Mar 31 10:47:52 AEDT 2018 root@Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN amd64 Cheers, Andrew Here's the top part of the new dmesg.boot, FYI: Copyright (c) 1992-2018 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 12.0-CURRENT #0 r331768M: Sat Mar 31 10:47:52 AEDT 2018 root@Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN amd64 FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0) WARNING: WITNESS option enabled, expect reduced performance. VT(vga): resolution 640x480 CPU: AMD Ryzen 7 1700 Eight-Core Processor (2994.45-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM> AMD Features2=0x35c233ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX> Structured Extended Features=0x209c01a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA> XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES> AMD Extended Feature Extensions ID EBX=0x7<CLZERO,IRPerf,XSaveErPtr> SVM: (disabled in BIOS) NP,NRIP,VClean,AFlush,DAssist,NAsids=32768 TSC: P-state invariant, performance statistics real memory = 34359738368 (32768 MB) avail memory = 33271214080 (31729 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <ALASKA A M I > FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 2 cache groups x 4 core(s) random: unblocking device. Firmware Warning (ACPI): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x0000000000000000/0x1 (20180313/tbfadt-796) ioapic0 <Version 2.1> irqs 0-23 on motherboard ioapic1 <Version 2.1> irqs 24-55 on motherboard SMP: AP CPU #7 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #1 Launched! Timecounter "TSC-low" frequency 1497224985 Hz quality 1000 random: entropy device external interface [ath_hal] loaded module_register_init: MOD_LOAD (vesa, 0xffffffff8109f600, 0) error 19 random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" kbd1 at kbdmux0 netmap: loaded module nexus0 vtvga0: <VT VGA driver> on motherboard cryptosoft0: <software crypto> on motherboard aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM,SHA1,SHA256> on motherboard acpi0: <ALASKA A M I > on motherboard acpi0: Power Button (fixed) cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 cpu2: <ACPI CPU> on acpi0 cpu3: <ACPI CPU> on acpi0 cpu4: <ACPI CPU> on acpi0 cpu5: <ACPI CPU> on acpi0 cpu6: <ACPI CPU> on acpi0 cpu7: <ACPI CPU> on acpi0 attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 atrtc0: <AT realtime clock> port 0x70-0x71 on acpi0 atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 0,8 on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 350 Event timer "HPET1" frequency 14318180 Hz quality 350 Event timer "HPET2" frequency 14318180 Hz quality 350 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 amdsmn0: <AMD Family 17h System Management Network> on hostb0 amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb0 On Sun, Mar 25, 2018 at 04:35:31AM +0000, Jonathan Looney wrote: > For now, you can update through r331485 and then take TCP_BLACKBOX out of > your kernel config file. That won’t really “fix” anything, but should at > least get you a booting system (assuming the new code from r331347 is > really triggering a problem). > > > I’ll take another look to see if I missed something in the commit. But, at > the moment, I’m hard-pressed to see how r331347 would cause the problem you > describe. > > > Jonathan > > On Sat, Mar 24, 2018 at 9:17 PM Andrew Reilly <areilly@bigpond.net.au> > wrote: > > > OK, I've completed the search: r331346 works, r331347 panics > > somewhere in the initialization of random. > > > > In the 331347 change (Add the "TCP Blackbox Recorder") I can't see > > anything obvious to tweak, unfortunately. It's a fair chunk of new > > code but it's all network-stack related, and my kernel is panicking > > long before any network activity happens. > > > > Any suggestions? > > > > Cheers, > > > > Andrew > > > > On Sat, Mar 24, 2018 at 05:23:18PM -0600, Warner Losh wrote: > > > Thanks Andrew... I can't recreate this on my VM nor my real hardware. > > > > > > Warner > > > > > > On Sat, Mar 24, 2018 at 5:22 PM, Andrew Reilly <areilly@bigpond.net.au> > > > wrote: > > > > > > > So, r331464 crashes in the same place, on my system. r331064 still > > boots > > > > OK. I'll keep searching. > > > > > > > > One week ago there was a change to randomdev to poll for signals every > > so > > > > often, as a defence against very large reads. That wouldn't have > > > > introduced a race somewhere, > > > > or left things in an unexpected state, perhaps? That change (r331070) > > by > > > > cem@ is just a few revisions after the one that is working for me. > > I'll > > > > start looking there... > > > > > > > > Cheers, > > > > > > > > Andrew > > > > > > > > On Sun, Mar 25, 2018 at 07:49:17AM +1100, Andrew Reilly wrote: > > > > > Hi Warner, > > > > > > > > > > The breakage was in 331470, and at least one version earlier, that I > > > > updated past when it panicked. > > > > > > > > > > I'm guessing that kdb's inability to dump would be down to it not > > having > > > > found any disk devices yet, right? So yes, bisecting to narrow down > > the > > > > issue is probably the best bet. I'll try your r331464: if that works > > that > > > > leaves only four or five revisions. Of course the breakage could be > > > > hardware specific. > > > > > > > > > > Cheers, > > > > > -- > > > > > Andrew > > > > > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180331002746.GA2466>