From owner-freebsd-current@freebsd.org Thu Dec 15 22:45:10 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 078C2C82D60 for ; Thu, 15 Dec 2016 22:45:10 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B506A1DE1 for ; Thu, 15 Dec 2016 22:45:09 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1cHelo-0000aX-LE; Fri, 16 Dec 2016 01:45:00 +0300 Date: Fri, 16 Dec 2016 01:45:00 +0300 From: Slawa Olhovchenkov To: Konstantin Belousov Cc: freebsd-current@freebsd.org Subject: Re: Enabling NUMA in BIOS stop booting FreeBSD Message-ID: <20161215224500.GM98176@zxy.spb.ru> References: <20161214102711.GF94325@kib.kiev.ua> <20161214105211.GC98176@zxy.spb.ru> <20161214113927.GG94325@kib.kiev.ua> <20161214121336.GD98176@zxy.spb.ru> <20161214152627.GF98176@zxy.spb.ru> <20161214190349.GJ94325@kib.kiev.ua> <20161215105118.GK98176@zxy.spb.ru> <20161215123330.GQ94325@kib.kiev.ua> <20161215131624.GL98176@zxy.spb.ru> <20161215135656.GS94325@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161215135656.GS94325@kib.kiev.ua> User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Dec 2016 22:45:10 -0000 On Thu, Dec 15, 2016 at 03:56:56PM +0200, Konstantin Belousov wrote: > > > Possibly, the dmesg of the boot (with late_console=0) with this and only > > > this patch applied against stock HEAD. This might be long. > > > > Do you need all (262144?) lines? > > > > Testing system > > memory........................................................................................................................pb 0x2040000000 > > pb 0x2040001000 > > pb 0x2040002000 > > pb 0x2040003000 > > pb 0x2040004000 > > pb 0x2040005000 > > pb 0x2040006000 > > [...] > > pb 0x207ffff000 > > > > > diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c > > > index 682307f5fe4..072c8d76acf 100644 > > > --- a/sys/amd64/amd64/machdep.c > > > +++ b/sys/amd64/amd64/machdep.c > > > @@ -1400,6 +1400,7 @@ getmemsize(caddr_t kmdp, u_int64_t first) > > > */ > > > *(int *)ptr = tmp; > > > > > > +if (page_bad) printf("pb 0x%lx\n", pa); > > > skip_memtest: > > > /* > > > * Adjust array of valid/good pages. > > > > PS: memtest86 hung at test 128-130G (server have 128G installed). > Well, the physical memory is 128G, but it is not mapped contiguously into > the address space accessible to the processors. E.g. in the SMAPs you > posted above, there are several holes (type 2) used for PCIe config > window, PCI BARs, APICs, and other i/o register pages. Intel chipsets > allow to remap the RAM hidden by the io pages, which is probably not > done correctly by BIOS. > > The SMAP clearly reports segment 0x100000000-0x2080000000 as populated > by RAM, this is 4G-130G. Very primitive memory test in kernel does > not like all pages starting at 129G. Possibly important detail is that > kernel memory test only touches first 4 bytes on each page. So if BIOS > erronously mapped any io registers into that range, memory test might > luckily avoid touching anything critical, but still noting that the > page does not behave as RAM. > > Update BIOS, and if the issue persists, contact supermicro. This > interesting detail adds even more evidence that BIOS is problematic. Updated BIOS don't solve this.