From owner-freebsd-stable@freebsd.org Sun Sep 4 08:40:00 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D1A81BCE67B for ; Sun, 4 Sep 2016 08:40:00 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id BE380F02 for ; Sun, 4 Sep 2016 08:40:00 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: by mailman.ysv.freebsd.org (Postfix) id BD7B2BCE67A; Sun, 4 Sep 2016 08:40:00 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BD152BCE679 for ; Sun, 4 Sep 2016 08:40:00 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8127AF01; Sun, 4 Sep 2016 08:40:00 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1bgSy6-000IKo-A2; Sun, 04 Sep 2016 11:39:58 +0300 Date: Sun, 4 Sep 2016 11:39:58 +0300 From: Slawa Olhovchenkov To: Andriy Gapon Cc: Konstantin Belousov , stable@FreeBSD.org Subject: Re: X2APIC support Message-ID: <20160904083958.GD34394@zxy.spb.ru> References: <20151212130615.GE70867@zxy.spb.ru> <20151212133513.GL82577@kib.kiev.ua> <20160901112724.GX88122@zxy.spb.ru> <20160901114500.GJ83214@kib.kiev.ua> <20160901121300.GZ88122@zxy.spb.ru> <4ba05c00-f737-f562-553d-a7fa59145768@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4ba05c00-f737-f562-553d-a7fa59145768@FreeBSD.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Sep 2016 08:40:00 -0000 On Sun, Sep 04, 2016 at 11:19:16AM +0300, Andriy Gapon wrote: > On 01/09/2016 15:13, Slawa Olhovchenkov wrote: > > DMAR: Found table at 0x79b32798 > > x2APIC available but disabled by DMAR table > > > Event timer "LAPIC" quality 600 > > LAPIC: ipi_wait() us multiplier 1 (r 116268019 tsc 2200043851) > > ACPI APIC Table: > > Package ID shift: 5 > > L3 cache ID shift: 5 > > L2 cache ID shift: 1 > > L1 cache ID shift: 1 > > Core ID shift: 1 > > kernel trap 12 with interrupts disabled > > > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; apic id = ff > > > fault virtual address = 0x0 > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0xffffffff80537e74 > > stack pointer = 0x28:0xffffffff814b4a60 > > frame pointer = 0x28:0xffffffff814b4a70 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = resume, IOPL = 0 > > current process = 0 () > > trap number = 12 > > panic: page fault > > cpuid = 0 > > KDB: stack backtrace: > > #0 0xffffffff805272e7 at kdb_backtrace+0x67 > > #1 0xffffffff804dd662 at vpanic+0x182 > > #2 0xffffffff804dd4d3 at panic+0x43 > > #3 0xffffffff807a3791 at trap_fatal+0x351 > > #4 0xffffffff807a3983 at trap_pfault+0x1e3 > > #5 0xffffffff807a2f0c at trap+0x26c > > #6 0xffffffff80787ca1 at calltrap+0x8 > > #7 0xffffffff8083b52a at topo_probe+0x61a > > Interesting. Could you please do 'list *topo_probe+0x61a' in kgdb, so that I (kgdb) list *topo_probe+0x61a 0xffffffff8083b52a is in topo_probe (/usr/src/sys/x86/x86/mp_x86.c:540). 535 topo_layers[layer].subtype); 536 } 537 } 538 539 parent = &topo_root; 540 for (layer = 0; layer < nlayers; ++layer) { 541 node_id = boot_cpu_id >> topo_layers[layer].id_shift; 542 node = topo_find_node_by_hwid(parent, node_id, 543 topo_layers[layer].type, 544 topo_layers[layer].subtype); Current language: auto; currently minimal > can see what code is being executed when the trap happens? Also, disassembly of > the function could be useful as well. (kgdb) x/40i *topo_probe+0x600 0xffffffff8083b510 : and $0xf8,%al 0xffffffff8083b512 : movslq -0x4(%r12),%rcx 0xffffffff8083b517 : mov %rbx,%rdi 0xffffffff8083b51a : callq 0xffffffff80537e30 0xffffffff8083b51f : mov %rax,%rbx 0xffffffff8083b522 : mov %rbx,%rdi 0xffffffff8083b525 : callq 0xffffffff80537e70 0xffffffff8083b52a : add $0xc,%r12 0xffffffff8083b52e : dec %r14d 0xffffffff8083b531 : jne 0xffffffff8083b500 0xffffffff8083b533 : movb $0x1,0xffffffff80dfa664 0xffffffff8083b53b : add $0x68,%rsp 0xffffffff8083b53f : pop %rbx 0xffffffff8083b540 : pop %r12 0xffffffff8083b542 : pop %r13 0xffffffff8083b544 : pop %r14 0xffffffff8083b546 : pop %r15 0xffffffff8083b548 : pop %rbp 0xffffffff8083b549 : retq 0xffffffff8083b54a : nopw 0x0(%rax,%rax,1) > Wait... > Kostik, I see one strange thing which is common to both successful and > unsuccessful configurations. All "SMP: Added CPU..." lines have "AP" in them. for #1..#23 no line 'SMP: AP CPU #0 Launched!' > It seems like the platform does not tell explicitly tell which CPU is the BSP, > see cpu_add() function. This can break quite a few assumption. And I am not > even sure how the successful scenario works. # mptable =============================================================================== MPTable ------------------------------------------------------------------------------- MP Floating Pointer Structure: location: BIOS physical address: 0x000fd050 signature: '_MP_' length: 16 bytes version: 1.4 checksum: 0x27 mode: Virtual Wire ------------------------------------------------------------------------------- MP Config Table Header: physical address: 0x000fcaa0 signature: 'PCMP' base table length: 1228 version: 1.4 checksum: 0x95 OEM ID: 'A M I' Product ID: 'ALASKA' OEM table pointer: 0x00000000 OEM table size: 0 entry count: 112 local APIC address: 0xfee00000 extended table length: 220 extended table checksum: 72 ------------------------------------------------------------------------------- MP Config Base Table Entries: -- Processors: APIC ID Version State Family Model Step Flags 0 0x15 BSP, usable 6 15 1 0xbfebfbff 2 0x15 AP, usable 6 15 1 0xbfebfbff 4 0x15 AP, usable 6 15 1 0xbfebfbff 6 0x15 AP, usable 6 15 1 0xbfebfbff 8 0x15 AP, usable 6 15 1 0xbfebfbff 10 0x15 AP, usable 6 15 1 0xbfebfbff 16 0x15 AP, usable 6 15 1 0xbfebfbff 18 0x15 AP, usable 6 15 1 0xbfebfbff 20 0x15 AP, usable 6 15 1 0xbfebfbff 22 0x15 AP, usable 6 15 1 0xbfebfbff 24 0x15 AP, usable 6 15 1 0xbfebfbff 26 0x15 AP, usable 6 15 1 0xbfebfbff 32 0x15 AP, usable 6 15 1 0xbfebfbff 34 0x15 AP, usable 6 15 1 0xbfebfbff 36 0x15 AP, usable 6 15 1 0xbfebfbff 38 0x15 AP, usable 6 15 1 0xbfebfbff 40 0x15 AP, usable 6 15 1 0xbfebfbff 42 0x15 AP, usable 6 15 1 0xbfebfbff 48 0x15 AP, usable 6 15 1 0xbfebfbff 50 0x15 AP, usable 6 15 1 0xbfebfbff 52 0x15 AP, usable 6 15 1 0xbfebfbff 54 0x15 AP, usable 6 15 1 0xbfebfbff 56 0x15 AP, usable 6 15 1 0xbfebfbff 58 0x15 AP, usable 6 15 1 0xbfebfbff > Ah... I see that there is a backup code in cpu_mp_start() where boot_cpu_id is > set based on the current CPU's Local APIC ID. I suspect then that this > information is incorrect in the failing case. > > Slawa, > my guess can be checked by adding a printf to cpu_mp_start() right after > boot_cpu_id assignment. System now in early production and I can't be reboot often. > > #8 0xffffffff8078fe81 at cpu_mp_start+0x1b1 > > #9 0xffffffff805382ca at mp_start+0x3a > > #10 0xffffffff80465cd8 at mi_startup+0x118 > > #11 0xffffffff8028dfac at btext+0x2c > > Uptime: 1s > > > -- > Andriy Gapon