Date: Mon, 20 Aug 2018 00:45:12 +0200 From: Michael Gmelin <freebsd@grem.de> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Michael Gmelin <freebsd@grem.de>, John Baldwin <jhb@FreeBSD.org>, "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, Matthias Apitz <guru@unixarea.de> Subject: Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy) Message-ID: <20180820004512.5171fa75@bsd64.grem.de> In-Reply-To: <20180819161642.GP2340@kib.kiev.ua> References: <20180606010625.62632920@bsd64.grem.de> <20180815005106.69402d23@bsd64.grem.de> <20180815130447.GZ2340@kib.kiev.ua> <C26CD25D-3CB0-4F7E-8B50-F7E95E16B776@grem.de> <20180815135531.GA2340@kib.kiev.ua> <FAEA5B0A-5302-4A48-B322-21CB0D97C8CC@grem.de> <e82ed552-83b0-5331-3117-6750b8c205f7@FreeBSD.org> <07E28AC5-EBE6-4893-810A-6C03F07925C8@grem.de> <8726bc32-6023-bfe1-7600-5b2c706236f8@FreeBSD.org> <20180819165951.274d61b0@bsd64.grem.de> <20180819161642.GP2340@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 19 Aug 2018 19:16:42 +0300 Konstantin Belousov <kostikbel@gmail.com> wrote: > On Sun, Aug 19, 2018 at 04:59:51PM +0200, Michael Gmelin wrote: > > > > > > On Fri, 17 Aug 2018 10:02:08 +0100 > > John Baldwin <jhb@FreeBSD.org> wrote: > > > > > On 8/17/18 9:54 AM, Michael Gmelin wrote: > > > > > > > > > > > >> On 17. Aug 2018, at 08:17, John Baldwin <jhb@FreeBSD.org> > > > >> wrote: > > > >>> On 8/16/18 1:58 PM, Michael Gmelin wrote: > > > >>> > > > >>> > > > >>>> On 15. Aug 2018, at 15:55, Konstantin Belousov > > > >>>> <kostikbel@gmail.com <mailto:kostikbel@gmail.com>> wrote: > > > >>>>> On Wed, Aug 15, 2018 at 03:52:37PM +0200, Michael Gmelin > > > >>>>> wrote: > > > >>>>> > > > >>>>> > > > >>>>>>> On 15. Aug 2018, at 15:04, Konstantin Belousov > > > >>>>>>> <kostikbel@gmail.com <mailto:kostikbel@gmail.com>> wrote: > > > >>>>>>> > > > >>>>>>> On Wed, Aug 15, 2018 at 12:51:06AM +0200, Michael Gmelin > > > >>>>>>> wrote: Reviving this old thread, since I just updated to > > > >>>>>>> r337818 and a similar problem is happening again. Since > > > >>>>>>> the fix in r334799 (review > > > >>>>>>> https://reviews.freebsd.org/D15675) (mp_)machdep.c have > > > >>>>>>> been touched, so maybe this is related > > > >>>>>>> (https://svnweb.freebsd.org/base?view=revision&revision=334799). > > > >>>>>>> > > > >>>>>>> Please see the screenshot of the panic below: > > > >>>>>>> https://gist.github.com/grembo/78d0f2a100dd4f16775b85a118769658 > > > >>>>>>> > > > >>>>>>> This is me not digging any deeper, hoping that this is > > > >>>>>>> something obvious. Please let me know if you need more > > > >>>>>>> input. > > > >>>>>> > > > >>>>>> I do not see how recent mp_machdep.c changes could affect > > > >>>>>> this. Can you try newest kernel but old loader ? > > > >>>>> > > > >>>>> I will try (but that will take a while). Oh, also, it still > > > >>>>> boots in save mode/with smp disabled. > > > >>>> > > > >>>> Right, this is because the access to that address through > > > >>>> DMAP is only needed when configuring AP startup resources. > > > >>>> > > > >>>> Also, I think it is safe to suggest that the bisect is > > > >>>> needed. > > > >>> > > > >>> Using an older loader didn???t help, but I identified the > > > >>> problem: > > > >>> > > > >>> https://svnweb.freebsd.org/base?view=revision&revision=334952 > > > >>> > > > >>> modified the code you introduced in > > > >>> > > > >>> https://svnweb.freebsd.org/base?view=revision&revision=334799 > > > >>> > > > >>> By correcting units to pages it also broke booting the > > > >>> Chromebook as a side effect - so the previous fix just worked > > > >>> due to a bug it seems. > > > >>> > > > >>> Is there an easy way to output the content of physmap at that > > > >>> point (debug.late_console=0 doesn???t work) - like an existing > > > >>> buffer I could use, or would this be more elaborate (I did > > > >>> something complicated last time but didn???t save it, so any > > > >>> simple solution would be preferred). > > > >> > > > >> How about reverting the commit for now so you get a working > > > >> console and print out the physmap array values along with > > > >> Maxmem later in the boot (or just use kgdb to examine them > > > >> once the system is running)? > > > > > > > > This is before the system has a working console (part of calling > > > > getmem...), disabling late console makes it hang, physmap > > > > changes afterwards, so running kgdb later doesn???t help. Last > > > > time I kept a copy of physmap and logged it later to know the > > > > original content. I can do that again, I just thought maybe > > > > there is a simple mechanism I???m not aware of that would save > > > > me some time. > > > > > > I thought we only modified phys_avail[], but saving a copy of > > > physmap[] and dumping it from kgdb is probably the simplest thing > > > to do. > > > > > > > Okay, so I had some time to investigate a bit more: > > > > Before calling init_ops.mp_bootaddress in getmemsize (machdep.c), > > physmap looks like this: > > > > physmap_idx: 8 > > i mem atop > > 0 0x0 0x0 > > 1 0x30000 0x30 > > 2 0x40000 0x40 > > 3 0x9e400 0x9e > > 4 0x100000 0x100 > > 5 0xf00000 0xf00 > > 6 0x1000000 0x1000 > > 7 0x7bf7a000 0x7bf7a > > 8 0x100000000 0x100000 > > 9 0x100600000 0x100600 > > 10 0x0 0x0 > > Maxmem: 0x100600000 0x100600 > > > > Without using atop (the "buggy" version that actually boots without > > crashing), the loop in mp_bootaddress looks like this: > > > > i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem > > 8 0x100000000 0x100600000 0x100600 0x100600 > > 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600 > > 4 0x100000 0xf00000 0xf00 0x100600 > > 2 0x40000 0x9e400 0x9e 0x100600 > > > > And physmap looks like this afterwards: > > > > physmap_idx: 8 > > i mem atop > > 0 0x0 0x0 > > 1 0x30000 0x30 > > 2 0x43000 0x43 <-- here > > 3 0x9e400 0x9e > > 4 0x100000 0x100 > > 5 0xf00000 0xf00 > > 6 0x1000000 0x1000 > > 7 0x7bf7a000 0x7bf7a > > 8 0x100000000 0x100000 > > 9 0x100600000 0x100600 > > 10 0x0 0x0 > > mptramp_pagetables is 0x40000 > > > > So a three page gap was made at 0x40000 (atop(idx 2) is now 0x43 > > instead of 0x40) > > > > In the current version (using atop), the loop in mp_bootaddress > > looks like this: > > > > i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem > > 8 0x100000000 0x100600000 0x100600 0x100600 > > 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600 > > > > And physmap looks like this afterwards: > > > > physmap_idx: 8 > > i mem atop > > 0 0x0 0x0 > > 1 0x30000 0x30 > > 2 0x40000 0x40 > > 3 0x9e400 0x9e > > 4 0x100000 0x100 > > 5 0xf00000 0xf00 > > 6 0x1003000 0x1003 <-- here > > 7 0x7bf7a000 0x7bf7a > > 8 0x100000000 0x100000 > > 9 0x100600000 0x100600 > > 10 0x0 0x0 > > mptramp_pagetables: 0x1000000 > > > > So a three page gap was made at 0x1000000 (atop(idx 6) is now > > 0x1003 instead of 0x1000) > > > > When changing the code to require a page below 0x1000: > > > > if (physmap[i] >= GiB(4) || physmap[i + 1] - > > round_page(physmap[i]) < PAGE_SIZE * 3 || > > atop(physmap[i + 1]) > Maxmem > > || atop(physmap[i + 1]) > 0x1000) // <--- this > > continue; > > > > The system boots just fine. It uses page 0x100 > > for the bootstrap code in this case: > > > > i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem > > 8 0x100000000 0x100600000 0x100600 0x100600 > > 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600 > > 4 0x100000 0xf00000 0xf00 0x100600 > > > > Physmap looks like this: > > physmap_idx: 8 > > i mem atop > > 0 0x0 0x0 > > 1 0x30000 0x30 > > 2 0x40000 0x40 > > 3 0x9e400 0x9e > > 4 0x103000 0x103 <-- here > > 5 0xf00000 0xf00 > > 6 0x1000000 0x1000 > > 7 0x7bf7a000 0x7bf7a > > 8 0x100000000 0x100000 > > 9 0x100600000 0x100600 > > 10 0x0 0x0 > > mptramp_pagetables: 0x100000 > > > > So for some reason it's crashing when using pages 0x1000 - 0x1003 > > for the bootstrap code, while it boots okay when using 0x40 - 0x43 > > and 0x100 - 0x103. > > > > Any ideas? > I in fact misread the page fault state decoding in your photo. > It is curiously protection violation on write, instead of non-present > page access. > > Compile ddb into your kernel, then on fault do > db> x/x dmaplimit > db> x/x dmaplimit+4 > db> show pte <fault virtual address> This was a bit more complicated, as the keyboard doesn't work in ddb at that point (neither internal, nor USB). I ended up hacking sys/ddb/db_script.c to execute these commands on kdb.enter.trap (tunable support for scripting would be cool). Anyway, dmaplimit is 40000000, dmaplimit+4 is 1 See here for a screenshot (also including the output of "show pte 0xfffff80001000000"): https://gist.github.com/grembo/78d0f2a100dd4f16775b85a118769658#file-ddb1-png > > Also show me the verbose dmesg lines with CPU features identification. > CPU: Intel(R) Celeron(R) 2955U @ 1.40GHz (1396.80-MHz K8-class CPU) Origin="GenuineIntel" Id=0x40651 Family=0x6 Model=0x45 Stepping=1 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD Features2=0x21<LAHF,ABM> Structured Extended Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG> XSAVE Features=0x1<XSAVEOPT> VT-x: (disabled in BIOS) PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 4301258752 (4102 MB) avail memory = 1907445760 (1819 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <CORE COREBOOT> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) random: unblocking device. ioapic0 <Version 2.0> irqs 0-39 on motherboard Launching APs: 1 Timecounter "TSC" frequency 1396795536 Hz quality 1000 -m -- Michael Gmelin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180820004512.5171fa75>