From owner-freebsd-current@freebsd.org Mon Jun 4 11:07:12 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 76C0FFF5637 for ; Mon, 4 Jun 2018 11:07:12 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F30DF807D5; Mon, 4 Jun 2018 11:07:11 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTP id w54B6tit019883; Mon, 4 Jun 2018 14:06:58 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua w54B6tit019883 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id w54B6t03019867; Mon, 4 Jun 2018 14:06:55 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 4 Jun 2018 14:06:55 +0300 From: Konstantin Belousov To: Michael Gmelin Cc: "freebsd-current@freebsd.org" , Matthias Apitz , jhb@freebsd.org Subject: Re: Fatal trap 12: page fault on Acer Chromebook 720 (peppy) Message-ID: <20180604110654.GA2450@kib.kiev.ua> References: <20180603144840.44bfea41@bsd64.grem.de> <20180603132110.GP3789@kib.kiev.ua> <20180603165500.361ec894@bsd64.grem.de> <20180603150423.GQ3789@kib.kiev.ua> <20180603215020.452a81d8@bsd64.grem.de> <20180603205340.GS3789@kib.kiev.ua> <20180604004632.56ca6afa@bsd64.grem.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180604004632.56ca6afa@bsd64.grem.de> User-Agent: Mutt/1.10.0 (2018-05-17) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Jun 2018 11:07:12 -0000 On Mon, Jun 04, 2018 at 12:46:32AM +0200, Michael Gmelin wrote: > > > On Sun, 3 Jun 2018 23:53:40 +0300 > Konstantin Belousov wrote: > > > On Sun, Jun 03, 2018 at 09:50:20PM +0200, Michael Gmelin wrote: > > > > > > > > > On Sun, 3 Jun 2018 18:04:23 +0300 > > > Konstantin Belousov wrote: > > > > > > > On Sun, Jun 03, 2018 at 04:55:00PM +0200, Michael Gmelin wrote: > > > > > > > > > > > > > > > On Sun, 3 Jun 2018 16:21:10 +0300 > > > > > Konstantin Belousov wrote: > > > > > > > > > > > On Sun, Jun 03, 2018 at 02:48:40PM +0200, Michael Gmelin > > > > > > wrote: > > > > > > > Hi, > > > > > > > > > > > > > > After upgrading CURRENT to r333992 (from something at least > > > > > > > a year old, quite some changes in mp_machdep.c since), this > > > > > > > machine crashes on boot: > > > > > > > > > > > > > > Copyright (c) 1992-2018 The FreeBSD Project. > > > > > > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, > > > > > > > 1992, 1993, 1994 The Regents of the University of > > > > > > > California. All rights reserved. FreeBSD is a registered > > > > > > > trademark of The FreeBSD Foundation. FreeBSD 12.0-CURRENT > > > > > > > #1 r333992: Tue May 22 00:31:04 CEST 2018 > > > > > > > root@flimsy:/usr/obj/usr/src/amd64.amd64/sys/flimsy amd64 > > > > > > > FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) > > > > > > > (based on LLVM 6.0.0) WARNING: WITNESS option enabled, > > > > > > > expect reduced performance. VT(vga): resolution 640x480 > > > > > > > CPU: Intel(R) Celeron(R) 2955U @ 1.40GHz (1396.80-MHz > > > > > > > K8-class CPU) Origin="GenuineIntel" Id=0x40651 > > > > > > > Family=0x6 Model=0x45 Stepping=1 > > > > > > > Features=0xbfebfbff > > > > > > CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > > > > > > > Features2=0x4ddaebbf > > > > > > xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND> > > > > > > > AMD Features=0x2c100800 AMD > > > > > > > Features2=0x21 Structured Extended > > > > > > > Features=0x2603 XSAVE > > > > > > > Features=0x1 VT-x: (disabled in BIOS) > > > > > > > PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, > > > > > > > performance statistics real memory = 4301258752 (4102 MB) > > > > > > > avail memory = 1907572736 (1819 MB) Event timer "LAPIC" > > > > > > > quality 600 ACPI APIC Table: > > > > > > What does this mean ? Did you flashed coreboot ? > > > > > > > > > > This machine comes with it by default (my model was delivered > > > > > with SeaBIOS 20131018_145217-build121-m2). So I didn't flash > > > > > anything (didn't feel like bricking it). > > > > > > > > > > > > > > > > > > kernel trap 12 with interrupts disabled > > > > > > > > > > > > > > Fatal trap 12: page fault while in kernel mode > > > > > > > cpuid = 0; apic id = 00 > > > > > > > fault virtual address = 0xfffff80001000000 > > > > > > > fault code = supervisor write data, protection > > > > > > > violation instruction pointer = 0x20:Oxffffffff8102955f > > > > > > > stack pointer = 0x28:0xffffffff82a79be0 > > > > > > > frame pointer = 0x28:0xffffffff82a79c10 > > > > > > > code segment = base Ox0, limit Oxfffff, type > > > > > > > Ox1b = DPL 0, pres 1, long 1, def32 0, gran > > > > > > > 1 processor eflags = resume, IOPL = 0 > > > > > > > current process = 0 () > > > > > > > [ thread pid 0 tid 0 ] > > > > > > > Stopped at native_start_all_aps+0x08f: movq > > > > > > > %rax,(%rsi) > > > > > > Look up the source line number for this address. > > > > > > > > > > > > > > > > I guess that's sys/amd64/amd64/support.S line 854 (in rdmsr), > > > > > called by native_start_all_aps. Any additional hints how I can > > > > > track it down? > > > > Why did you decided that this is rdmsr_safe() ? First, > > > > native_start_all_aps() does not call rdmsr, second the ddb > > > > report clearly indicates that the fault occured acessing DMAP in > > > > native_start_all_aps(). > > > > > > > > Just look up the source line by the address > > > > native_start_all_aps+0x08f. > > > > > > Okay, according to kgbd this should be here: > > > > > > https://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=333368&view=markup#l369 > > > > > > 364 > > > 365 /* Create the initial 1GB replicated page tables */ > > > 366 for (i = 0; i < 512; i++) { > > > 367 /* Each slot of the level 4 pages points to the same > > > level 3 page */ 368 pt4[i] = > > > (u_int64_t)(uintptr_t)(mptramp_pagetables + PAGE_SIZE); 369 > > > pt4[i] |= PG_V | PG_RW | PG_U; 370 > > > 371 /* Each slot of the level 3 pages points to the same > > > level 2 page */ 372 pt3[i] = > > > (u_int64_t)(uintptr_t)(mptramp_pagetables + (2 * PAGE_SIZE)); > > > 373 pt3[i] |= PG_V | PG_RW | PG_U; 374 > > > 375 /* The level 2 page slots are mapped with 2MB pages > > > for 1GB. */ 376 pt2[i] = i * (2 * 1024 * 1024); > > > 377 pt2[i] |= PG_V | PG_RW | PG_PS | PG_U; > > > 378 } > > > > > > -m > > You have fault on write due to read-only mapping of the portion of > > the direct map, which maps the kernel text. It is consistent with > > the faulting address. It is not clear if it is something new on > > your machine, or before the kernel text was silently corrupted, since > > ro protection is somewhat recent. > > > > It seems that mp_bootaddress() selected the bad place for the > > bootstrap page tables. Even more, we do not include the kernel text > > into the physmem[] array, so it is not clear how did it happen. This > > code was also changed recently. > > > > Can you add the print of the physmap[] array somewhere before the > > panic, to see what is the kernel idea of the available memory ? It > > should be already done if you have serial console and set > > debug.late_console tunable to 0. > > This is a sad little machine without any kind of serial console. > > Physmap looks like this after calling getmemsize(): > > [0]: 0x10000 > [1]: 0x30000 > [2]: 0x40000 > [3]: 0x9e000 > [4]: 0x100000 > [5]: 0xf00000 > [6]: 0x1003000 > [7]: 0x7bf7a000 > > Physical memory chunks logged in cpu_startup are: > > 0x0000000000010000 - 0x000000000002ffff, 141072 bytes (32 pages) > 0x0000000000040000 - 0x000000000009dfff, 385024 bytes (94 pages) These two chunks reports are consistent with the physmap[0-1, 2-3]. > 0x0000000000100000 - 0x00000000001fffff, 1048576 bytes (256 pages) > 0x0000000002c00000 - 0x0000000075467fff, 1921417216 bytes (469096 pages) > 0x0000000100000000 - 0x00000001005e7fff, 6193152 bytes (1512 pages) But these three looks completely unrelated to the rest of the physmap, perhaps except the physmap[4]. We allocate boot pages from the top of the last physmap chunk, but I am certain that we do not consume that much memory for boot to make physmap[7] from the last reported address. Are you sure that there are no typos in the values above ? > > -m > > > > > > > > > p.s. This machine uses quirks in biosmem.c, see > > > > > > Type '?' for a list of command, 'help' for more detailed > > > help. > > > OK biosmem > > > bios_basemem: 0x9e400 > > > bios_extmem: 0x3ff00000 > > > memtop: 0x3c000000 > > > high_heap_base: 0x3c000000 > > > high_heap_size: 0x4000000 > > > bios_quirks: 0x01 BQ_DISTRUST_820_EXTMEM > > > b_bios_probed: 0x0a B_BASEMEM_12 B_EXTMEM_E801 > > > > > > -- > > > Michael Gmelin > > > > > > -- > > > Michael Gmelin > > > > -- > Michael Gmelin