From owner-freebsd-current@freebsd.org Sun Aug 19 22:45:16 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF524107A91D for ; Sun, 19 Aug 2018 22:45:15 +0000 (UTC) (envelope-from freebsd@grem.de) Received: from mail.grem.de (outcast.grem.de [213.239.217.27]) by mx1.freebsd.org (Postfix) with SMTP id 6BBD28E240 for ; Sun, 19 Aug 2018 22:45:15 +0000 (UTC) (envelope-from freebsd@grem.de) Received: (qmail 54621 invoked by uid 89); 19 Aug 2018 22:45:13 -0000 Received: from unknown (HELO bsd64.grem.de) (mg@grem.de@46.244.231.99) by mail.grem.de with ESMTPA; 19 Aug 2018 22:45:13 -0000 Date: Mon, 20 Aug 2018 00:45:12 +0200 From: Michael Gmelin To: Konstantin Belousov Cc: Michael Gmelin , John Baldwin , "freebsd-current@freebsd.org" , Matthias Apitz Subject: Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy) Message-ID: <20180820004512.5171fa75@bsd64.grem.de> In-Reply-To: <20180819161642.GP2340@kib.kiev.ua> References: <20180606010625.62632920@bsd64.grem.de> <20180815005106.69402d23@bsd64.grem.de> <20180815130447.GZ2340@kib.kiev.ua> <20180815135531.GA2340@kib.kiev.ua> <07E28AC5-EBE6-4893-810A-6C03F07925C8@grem.de> <8726bc32-6023-bfe1-7600-5b2c706236f8@FreeBSD.org> <20180819165951.274d61b0@bsd64.grem.de> <20180819161642.GP2340@kib.kiev.ua> X-Mailer: Claws Mail 3.15.1 (GTK+ 2.24.31; amd64-portbld-freebsd10.3) X-Face: $wrgCtfdVw_H9WAY?S&9+/F"!41z'L$uo*WzT8miX?kZ~W~Lr5W7v?j0Sde\mwB&/ypo^}> +a'4xMc^^KroE~+v^&^#[B">soBo1y6(TW6#UZiC]o>C6`ej+i Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWJBwe5BQDl LASZU0/LTEWEfHbyj0Txi32+sKrp1Mv944X8/fm1rS+cAAAACXBIWXMAAAsTAAAL EwEAmpwYAAAAB3RJTUUH3wESCxwC7OBhbgAAACFpVFh0Q29tbWVudAAAAAAAQ3Jl YXRlZCB3aXRoIFRoZSBHSU1QbbCXAAAAAghJREFUOMu11DFvEzEUAGCfEhBVFzuq AKkLd0O6VrIQsLXVSZXoWE5N1K3DobBBA9fQpRWc8OkWouaIjedWKiyREOKs+3PY fvalCNjgLVHeF7/3bMtBzV8C/VsQ8tecEgCcDgrzjekwKZ7TwsJZd/ywEKwwP+ZM 8P3drTsAwWn2mpWuDDuYiK1bFs6De0KUUFw0tWxm+D4AIhuuvZqtyWYeO7jQ4Aea 7jUqI+ixhQoHex4WshEvSXdood7stlv4oSuFOC4tqGcr0NjEqXgV4mMJO38nld4+ xKNxRDon7khyKVqY7YR4d+Cg0OMrkWXZOM7YDkEfKiilCn1qYv4mighZiynuHHOA Wq9QJq+BIES7lMFUtcikMnkDGHUoncA+uHgrP0ctIEqfwLHzeSo+eUA66AqzwN6n 2ZHJhw6Qh/PoyC/QENyEyC/AyNjq74Bs+3UH0xYwzDUC4B97HgLocg1QLYgDDO1v f3UX9Y307Ew4AHh67YAFFsxEpkXwpXY3eIgMhAAE3R19L919nNnuD2wlPcDE3UeT L2ytEICQib9BXgS2fU8PrD82ToYO1OEmMSnYTjSqSv9wdC0tPYC+rQRQD9ESnldF CyqfmiYW+tlALt8gH2xrMdC/youbjzPXEun+/ReXsMCDyve3dZc09fn2Oas8oXGc Jj6/fOeK5UmSMPmf/jL+GD8BEj0k/Fn6IO4AAAAASUVORK5CYII= MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Aug 2018 22:45:16 -0000 On Sun, 19 Aug 2018 19:16:42 +0300 Konstantin Belousov wrote: > On Sun, Aug 19, 2018 at 04:59:51PM +0200, Michael Gmelin wrote: > > > > > > On Fri, 17 Aug 2018 10:02:08 +0100 > > John Baldwin wrote: > > > > > On 8/17/18 9:54 AM, Michael Gmelin wrote: > > > > > > > > > > > >> On 17. Aug 2018, at 08:17, John Baldwin > > > >> wrote: > > > >>> On 8/16/18 1:58 PM, Michael Gmelin wrote: > > > >>> > > > >>> > > > >>>> On 15. Aug 2018, at 15:55, Konstantin Belousov > > > >>>> > wrote: > > > >>>>> On Wed, Aug 15, 2018 at 03:52:37PM +0200, Michael Gmelin > > > >>>>> wrote: > > > >>>>> > > > >>>>> > > > >>>>>>> On 15. Aug 2018, at 15:04, Konstantin Belousov > > > >>>>>>> > wrote: > > > >>>>>>> > > > >>>>>>> On Wed, Aug 15, 2018 at 12:51:06AM +0200, Michael Gmelin > > > >>>>>>> wrote: Reviving this old thread, since I just updated to > > > >>>>>>> r337818 and a similar problem is happening again. Since > > > >>>>>>> the fix in r334799 (review > > > >>>>>>> https://reviews.freebsd.org/D15675) (mp_)machdep.c have > > > >>>>>>> been touched, so maybe this is related > > > >>>>>>> (https://svnweb.freebsd.org/base?view=revision&revision=334799). > > > >>>>>>> > > > >>>>>>> Please see the screenshot of the panic below: > > > >>>>>>> https://gist.github.com/grembo/78d0f2a100dd4f16775b85a118769658 > > > >>>>>>> > > > >>>>>>> This is me not digging any deeper, hoping that this is > > > >>>>>>> something obvious. Please let me know if you need more > > > >>>>>>> input. > > > >>>>>> > > > >>>>>> I do not see how recent mp_machdep.c changes could affect > > > >>>>>> this. Can you try newest kernel but old loader ? > > > >>>>> > > > >>>>> I will try (but that will take a while). Oh, also, it still > > > >>>>> boots in save mode/with smp disabled. > > > >>>> > > > >>>> Right, this is because the access to that address through > > > >>>> DMAP is only needed when configuring AP startup resources. > > > >>>> > > > >>>> Also, I think it is safe to suggest that the bisect is > > > >>>> needed. > > > >>> > > > >>> Using an older loader didn???t help, but I identified the > > > >>> problem: > > > >>> > > > >>> https://svnweb.freebsd.org/base?view=revision&revision=334952 > > > >>> > > > >>> modified the code you introduced in > > > >>> > > > >>> https://svnweb.freebsd.org/base?view=revision&revision=334799 > > > >>> > > > >>> By correcting units to pages it also broke booting the > > > >>> Chromebook as a side effect - so the previous fix just worked > > > >>> due to a bug it seems. > > > >>> > > > >>> Is there an easy way to output the content of physmap at that > > > >>> point (debug.late_console=0 doesn???t work) - like an existing > > > >>> buffer I could use, or would this be more elaborate (I did > > > >>> something complicated last time but didn???t save it, so any > > > >>> simple solution would be preferred). > > > >> > > > >> How about reverting the commit for now so you get a working > > > >> console and print out the physmap array values along with > > > >> Maxmem later in the boot (or just use kgdb to examine them > > > >> once the system is running)? > > > > > > > > This is before the system has a working console (part of calling > > > > getmem...), disabling late console makes it hang, physmap > > > > changes afterwards, so running kgdb later doesn???t help. Last > > > > time I kept a copy of physmap and logged it later to know the > > > > original content. I can do that again, I just thought maybe > > > > there is a simple mechanism I???m not aware of that would save > > > > me some time. > > > > > > I thought we only modified phys_avail[], but saving a copy of > > > physmap[] and dumping it from kgdb is probably the simplest thing > > > to do. > > > > > > > Okay, so I had some time to investigate a bit more: > > > > Before calling init_ops.mp_bootaddress in getmemsize (machdep.c), > > physmap looks like this: > > > > physmap_idx: 8 > > i mem atop > > 0 0x0 0x0 > > 1 0x30000 0x30 > > 2 0x40000 0x40 > > 3 0x9e400 0x9e > > 4 0x100000 0x100 > > 5 0xf00000 0xf00 > > 6 0x1000000 0x1000 > > 7 0x7bf7a000 0x7bf7a > > 8 0x100000000 0x100000 > > 9 0x100600000 0x100600 > > 10 0x0 0x0 > > Maxmem: 0x100600000 0x100600 > > > > Without using atop (the "buggy" version that actually boots without > > crashing), the loop in mp_bootaddress looks like this: > > > > i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem > > 8 0x100000000 0x100600000 0x100600 0x100600 > > 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600 > > 4 0x100000 0xf00000 0xf00 0x100600 > > 2 0x40000 0x9e400 0x9e 0x100600 > > > > And physmap looks like this afterwards: > > > > physmap_idx: 8 > > i mem atop > > 0 0x0 0x0 > > 1 0x30000 0x30 > > 2 0x43000 0x43 <-- here > > 3 0x9e400 0x9e > > 4 0x100000 0x100 > > 5 0xf00000 0xf00 > > 6 0x1000000 0x1000 > > 7 0x7bf7a000 0x7bf7a > > 8 0x100000000 0x100000 > > 9 0x100600000 0x100600 > > 10 0x0 0x0 > > mptramp_pagetables is 0x40000 > > > > So a three page gap was made at 0x40000 (atop(idx 2) is now 0x43 > > instead of 0x40) > > > > In the current version (using atop), the loop in mp_bootaddress > > looks like this: > > > > i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem > > 8 0x100000000 0x100600000 0x100600 0x100600 > > 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600 > > > > And physmap looks like this afterwards: > > > > physmap_idx: 8 > > i mem atop > > 0 0x0 0x0 > > 1 0x30000 0x30 > > 2 0x40000 0x40 > > 3 0x9e400 0x9e > > 4 0x100000 0x100 > > 5 0xf00000 0xf00 > > 6 0x1003000 0x1003 <-- here > > 7 0x7bf7a000 0x7bf7a > > 8 0x100000000 0x100000 > > 9 0x100600000 0x100600 > > 10 0x0 0x0 > > mptramp_pagetables: 0x1000000 > > > > So a three page gap was made at 0x1000000 (atop(idx 6) is now > > 0x1003 instead of 0x1000) > > > > When changing the code to require a page below 0x1000: > > > > if (physmap[i] >= GiB(4) || physmap[i + 1] - > > round_page(physmap[i]) < PAGE_SIZE * 3 || > > atop(physmap[i + 1]) > Maxmem > > || atop(physmap[i + 1]) > 0x1000) // <--- this > > continue; > > > > The system boots just fine. It uses page 0x100 > > for the bootstrap code in this case: > > > > i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem > > 8 0x100000000 0x100600000 0x100600 0x100600 > > 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600 > > 4 0x100000 0xf00000 0xf00 0x100600 > > > > Physmap looks like this: > > physmap_idx: 8 > > i mem atop > > 0 0x0 0x0 > > 1 0x30000 0x30 > > 2 0x40000 0x40 > > 3 0x9e400 0x9e > > 4 0x103000 0x103 <-- here > > 5 0xf00000 0xf00 > > 6 0x1000000 0x1000 > > 7 0x7bf7a000 0x7bf7a > > 8 0x100000000 0x100000 > > 9 0x100600000 0x100600 > > 10 0x0 0x0 > > mptramp_pagetables: 0x100000 > > > > So for some reason it's crashing when using pages 0x1000 - 0x1003 > > for the bootstrap code, while it boots okay when using 0x40 - 0x43 > > and 0x100 - 0x103. > > > > Any ideas? > I in fact misread the page fault state decoding in your photo. > It is curiously protection violation on write, instead of non-present > page access. > > Compile ddb into your kernel, then on fault do > db> x/x dmaplimit > db> x/x dmaplimit+4 > db> show pte This was a bit more complicated, as the keyboard doesn't work in ddb at that point (neither internal, nor USB). I ended up hacking sys/ddb/db_script.c to execute these commands on kdb.enter.trap (tunable support for scripting would be cool). Anyway, dmaplimit is 40000000, dmaplimit+4 is 1 See here for a screenshot (also including the output of "show pte 0xfffff80001000000"): https://gist.github.com/grembo/78d0f2a100dd4f16775b85a118769658#file-ddb1-png > > Also show me the verbose dmesg lines with CPU features identification. > CPU: Intel(R) Celeron(R) 2955U @ 1.40GHz (1396.80-MHz K8-class CPU) Origin="GenuineIntel" Id=0x40651 Family=0x6 Model=0x45 Stepping=1 Features=0xbfebfbff Features2=0x4ddaebbf AMD Features=0x2c100800 AMD Features2=0x21 Structured Extended Features=0x2603 XSAVE Features=0x1 VT-x: (disabled in BIOS) PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 4301258752 (4102 MB) avail memory = 1907445760 (1819 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) random: unblocking device. ioapic0 irqs 0-39 on motherboard Launching APs: 1 Timecounter "TSC" frequency 1396795536 Hz quality 1000 -m -- Michael Gmelin