From owner-freebsd-current@freebsd.org Sun Aug 19 15:00:03 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 043C5106B987 for ; Sun, 19 Aug 2018 15:00:03 +0000 (UTC) (envelope-from freebsd@grem.de) Received: from mail.grem.de (outcast.grem.de [213.239.217.27]) by mx1.freebsd.org (Postfix) with SMTP id 6DA977A681 for ; Sun, 19 Aug 2018 15:00:02 +0000 (UTC) (envelope-from freebsd@grem.de) Received: (qmail 36663 invoked by uid 89); 19 Aug 2018 14:59:53 -0000 Received: from unknown (HELO bsd64.grem.de) (mg@grem.de@46.244.231.99) by mail.grem.de with ESMTPA; 19 Aug 2018 14:59:53 -0000 Date: Sun, 19 Aug 2018 16:59:51 +0200 From: Michael Gmelin To: John Baldwin Cc: Michael Gmelin , Konstantin Belousov , "freebsd-current@freebsd.org" , Matthias Apitz Subject: Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy) Message-ID: <20180819165951.274d61b0@bsd64.grem.de> In-Reply-To: <8726bc32-6023-bfe1-7600-5b2c706236f8@FreeBSD.org> References: <20180603215020.452a81d8@bsd64.grem.de> <20180603205340.GS3789@kib.kiev.ua> <20180604004632.56ca6afa@bsd64.grem.de> <20180604110654.GA2450@kib.kiev.ua> <20180604231756.2ed2adb9@bsd64.grem.de> <20180605131135.GH2450@kib.kiev.ua> <20180606010625.62632920@bsd64.grem.de> <20180815005106.69402d23@bsd64.grem.de> <20180815130447.GZ2340@kib.kiev.ua> <20180815135531.GA2340@kib.kiev.ua> <07E28AC5-EBE6-4893-810A-6C03F07925C8@grem.de> <8726bc32-6023-bfe1-7600-5b2c706236f8@FreeBSD.org> X-Mailer: Claws Mail 3.15.1 (GTK+ 2.24.31; amd64-portbld-freebsd10.3) X-Face: $wrgCtfdVw_H9WAY?S&9+/F"!41z'L$uo*WzT8miX?kZ~W~Lr5W7v?j0Sde\mwB&/ypo^}> +a'4xMc^^KroE~+v^&^#[B">soBo1y6(TW6#UZiC]o>C6`ej+i Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWJBwe5BQDl LASZU0/LTEWEfHbyj0Txi32+sKrp1Mv944X8/fm1rS+cAAAACXBIWXMAAAsTAAAL EwEAmpwYAAAAB3RJTUUH3wESCxwC7OBhbgAAACFpVFh0Q29tbWVudAAAAAAAQ3Jl YXRlZCB3aXRoIFRoZSBHSU1QbbCXAAAAAghJREFUOMu11DFvEzEUAGCfEhBVFzuq AKkLd0O6VrIQsLXVSZXoWE5N1K3DobBBA9fQpRWc8OkWouaIjedWKiyREOKs+3PY fvalCNjgLVHeF7/3bMtBzV8C/VsQ8tecEgCcDgrzjekwKZ7TwsJZd/ywEKwwP+ZM 8P3drTsAwWn2mpWuDDuYiK1bFs6De0KUUFw0tWxm+D4AIhuuvZqtyWYeO7jQ4Aea 7jUqI+ixhQoHex4WshEvSXdood7stlv4oSuFOC4tqGcr0NjEqXgV4mMJO38nld4+ xKNxRDon7khyKVqY7YR4d+Cg0OMrkWXZOM7YDkEfKiilCn1qYv4mighZiynuHHOA Wq9QJq+BIES7lMFUtcikMnkDGHUoncA+uHgrP0ctIEqfwLHzeSo+eUA66AqzwN6n 2ZHJhw6Qh/PoyC/QENyEyC/AyNjq74Bs+3UH0xYwzDUC4B97HgLocg1QLYgDDO1v f3UX9Y307Ew4AHh67YAFFsxEpkXwpXY3eIgMhAAE3R19L919nNnuD2wlPcDE3UeT L2ytEICQib9BXgS2fU8PrD82ToYO1OEmMSnYTjSqSv9wdC0tPYC+rQRQD9ESnldF CyqfmiYW+tlALt8gH2xrMdC/youbjzPXEun+/ReXsMCDyve3dZc09fn2Oas8oXGc Jj6/fOeK5UmSMPmf/jL+GD8BEj0k/Fn6IO4AAAAASUVORK5CYII= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Aug 2018 15:00:03 -0000 On Fri, 17 Aug 2018 10:02:08 +0100 John Baldwin wrote: > On 8/17/18 9:54 AM, Michael Gmelin wrote: > >=20 > > =20 > >> On 17. Aug 2018, at 08:17, John Baldwin wrote: > >> =20 > >>> On 8/16/18 1:58 PM, Michael Gmelin wrote: > >>> > >>> =20 > >>>> On 15. Aug 2018, at 15:55, Konstantin Belousov > >>>> > wrote:=20 > >>>>> On Wed, Aug 15, 2018 at 03:52:37PM +0200, Michael Gmelin wrote: > >>>>> > >>>>> =20 > >>>>>>> On 15. Aug 2018, at 15:04, Konstantin Belousov > >>>>>>> > wrote: > >>>>>>> > >>>>>>> On Wed, Aug 15, 2018 at 12:51:06AM +0200, Michael Gmelin > >>>>>>> wrote: Reviving this old thread, since I just updated to > >>>>>>> r337818 and a similar problem is happening again. Since the > >>>>>>> fix in r334799 (review https://reviews.freebsd.org/D15675) > >>>>>>> (mp_)machdep.c have been touched, so maybe this is related > >>>>>>> (https://svnweb.freebsd.org/base?view=3Drevision&revision=3D33479= 9). > >>>>>>> > >>>>>>> Please see the screenshot of the panic below: > >>>>>>> https://gist.github.com/grembo/78d0f2a100dd4f16775b85a118769658 > >>>>>>> > >>>>>>> This is me not digging any deeper, hoping that this is > >>>>>>> something obvious. Please let me know if you need more > >>>>>>> input. =20 > >>>>>> > >>>>>> I do not see how recent mp_machdep.c changes could affect this. > >>>>>> Can you try newest kernel but old loader ? =20 > >>>>> > >>>>> I will try (but that will take a while). Oh, also, it still > >>>>> boots in save mode/with smp disabled. =20 > >>>> > >>>> Right, this is because the access to that address through DMAP > >>>> is only needed when configuring AP startup resources. > >>>> > >>>> Also, I think it is safe to suggest that the bisect is needed. =20 > >>> > >>> Using an older loader didn=E2=80=99t help, but I identified the probl= em: > >>> > >>> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D334952 > >>> > >>> modified the code you introduced in > >>> > >>> https://svnweb.freebsd.org/base?view=3Drevision&revision=3D334799 > >>> > >>> By correcting units to pages it also broke booting the Chromebook > >>> as a side effect - so the previous fix just worked due to a bug > >>> it seems. > >>> > >>> Is there an easy way to output the content of physmap at that > >>> point (debug.late_console=3D0 doesn=E2=80=99t work) - like an existing > >>> buffer I could use, or would this be more elaborate (I did > >>> something complicated last time but didn=E2=80=99t save it, so any si= mple > >>> solution would be preferred). =20 > >> > >> How about reverting the commit for now so you get a working console > >> and print out the physmap array values along with Maxmem later in > >> the boot (or just use kgdb to examine them once the system is > >> running)?=20 > >=20 > > This is before the system has a working console (part of calling > > getmem...), disabling late console makes it hang, physmap changes > > afterwards, so running kgdb later doesn=E2=80=99t help. Last time I kep= t a > > copy of physmap and logged it later to know the original content. I > > can do that again, I just thought maybe there is a simple mechanism > > I=E2=80=99m not aware of that would save me some time. =20 >=20 > I thought we only modified phys_avail[], but saving a copy of > physmap[] and dumping it from kgdb is probably the simplest thing to > do. >=20 Okay, so I had some time to investigate a bit more: Before calling init_ops.mp_bootaddress in getmemsize (machdep.c), physmap looks like this: physmap_idx: 8 i mem atop 0 0x0 0x0 1 0x30000 0x30 2 0x40000 0x40 3 0x9e400 0x9e 4 0x100000 0x100 5 0xf00000 0xf00 6 0x1000000 0x1000 7 0x7bf7a000 0x7bf7a 8 0x100000000 0x100000 9 0x100600000 0x100600 10 0x0 0x0 Maxmem: 0x100600000 0x100600 Without using atop (the "buggy" version that actually boots without crashing), the loop in mp_bootaddress looks like this: i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem 8 0x100000000 0x100600000 0x100600 0x100600=20 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600=20 4 0x100000 0xf00000 0xf00 0x100600=20 2 0x40000 0x9e400 0x9e 0x100600=20 And physmap looks like this afterwards: physmap_idx: 8 i mem atop 0 0x0 0x0 1 0x30000 0x30 2 0x43000 0x43 <-- here 3 0x9e400 0x9e 4 0x100000 0x100 5 0xf00000 0xf00 6 0x1000000 0x1000 7 0x7bf7a000 0x7bf7a 8 0x100000000 0x100000 9 0x100600000 0x100600 10 0x0 0x0 mptramp_pagetables is 0x40000 So a three page gap was made at 0x40000 (atop(idx 2) is now 0x43 instead of 0x40) In the current version (using atop), the loop in mp_bootaddress looks like this: i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem 8 0x100000000 0x100600000 0x100600 0x100600=20 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600=20 And physmap looks like this afterwards: physmap_idx: 8 i mem atop 0 0x0 0x0 1 0x30000 0x30 2 0x40000 0x40 3 0x9e400 0x9e 4 0x100000 0x100 5 0xf00000 0xf00 6 0x1003000 0x1003 <-- here 7 0x7bf7a000 0x7bf7a 8 0x100000000 0x100000 9 0x100600000 0x100600 10 0x0 0x0 mptramp_pagetables: 0x1000000 So a three page gap was made at 0x1000000 (atop(idx 6) is now 0x1003 instead of 0x1000) When changing the code to require a page below 0x1000: if (physmap[i] >=3D GiB(4) || physmap[i + 1] - round_page(physmap[i]) < PAGE_SIZE * 3 || atop(physmap[i + 1]) > Maxmem || atop(physmap[i + 1]) > 0x1000) // <--- this continue; The system boots just fine. It uses page 0x100 for the bootstrap code in this case: i, physmap[i], physmap[i + 1], atop(physmap[i + 1]), Maxmem 8 0x100000000 0x100600000 0x100600 0x100600=20 6 0x1000000 0x7bf7a000 0x7bf7a 0x100600=20 4 0x100000 0xf00000 0xf00 0x100600=20 Physmap looks like this: physmap_idx: 8 i mem atop 0 0x0 0x0 1 0x30000 0x30 2 0x40000 0x40 3 0x9e400 0x9e 4 0x103000 0x103 <-- here 5 0xf00000 0xf00 6 0x1000000 0x1000 7 0x7bf7a000 0x7bf7a 8 0x100000000 0x100000 9 0x100600000 0x100600 10 0x0 0x0 mptramp_pagetables: 0x100000 So for some reason it's crashing when using pages 0x1000 - 0x1003 for the bootstrap code, while it boots okay when using 0x40 - 0x43 and 0x100 - 0x103. Any ideas? Best, Michael p.s. This is what biosmem looks like Type '?' for a list of command, 'help' for more detailed help. OK biosmem bios_basemem: 0x9e400 bios_extmem: 0x3ff00000 memtop: 0x3c000000 high_heap_base: 0x3c000000 high_heap_size: 0x4000000 bios_quirks: 0x01 BQ_DISTRUST_820_EXTMEM b_bios_probed: 0x0a B_BASEMEM_12 B_EXTMEM_E801 --=20 Michael Gmelin