Date: Fri, 16 Dec 2016 01:45:00 +0300 From: Slawa Olhovchenkov <slw@zxy.spb.ru> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-current@freebsd.org Subject: Re: Enabling NUMA in BIOS stop booting FreeBSD Message-ID: <20161215224500.GM98176@zxy.spb.ru> In-Reply-To: <20161215135656.GS94325@kib.kiev.ua> References: <20161214102711.GF94325@kib.kiev.ua> <20161214105211.GC98176@zxy.spb.ru> <20161214113927.GG94325@kib.kiev.ua> <20161214121336.GD98176@zxy.spb.ru> <20161214152627.GF98176@zxy.spb.ru> <20161214190349.GJ94325@kib.kiev.ua> <20161215105118.GK98176@zxy.spb.ru> <20161215123330.GQ94325@kib.kiev.ua> <20161215131624.GL98176@zxy.spb.ru> <20161215135656.GS94325@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Dec 15, 2016 at 03:56:56PM +0200, Konstantin Belousov wrote: > > > Possibly, the dmesg of the boot (with late_console=0) with this and only > > > this patch applied against stock HEAD. This might be long. > > > > Do you need all (262144?) lines? > > > > Testing system > > memory........................................................................................................................pb 0x2040000000 > > pb 0x2040001000 > > pb 0x2040002000 > > pb 0x2040003000 > > pb 0x2040004000 > > pb 0x2040005000 > > pb 0x2040006000 > > [...] > > pb 0x207ffff000 > > > > > diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c > > > index 682307f5fe4..072c8d76acf 100644 > > > --- a/sys/amd64/amd64/machdep.c > > > +++ b/sys/amd64/amd64/machdep.c > > > @@ -1400,6 +1400,7 @@ getmemsize(caddr_t kmdp, u_int64_t first) > > > */ > > > *(int *)ptr = tmp; > > > > > > +if (page_bad) printf("pb 0x%lx\n", pa); > > > skip_memtest: > > > /* > > > * Adjust array of valid/good pages. > > > > PS: memtest86 hung at test 128-130G (server have 128G installed). > Well, the physical memory is 128G, but it is not mapped contiguously into > the address space accessible to the processors. E.g. in the SMAPs you > posted above, there are several holes (type 2) used for PCIe config > window, PCI BARs, APICs, and other i/o register pages. Intel chipsets > allow to remap the RAM hidden by the io pages, which is probably not > done correctly by BIOS. > > The SMAP clearly reports segment 0x100000000-0x2080000000 as populated > by RAM, this is 4G-130G. Very primitive memory test in kernel does > not like all pages starting at 129G. Possibly important detail is that > kernel memory test only touches first 4 bytes on each page. So if BIOS > erronously mapped any io registers into that range, memory test might > luckily avoid touching anything critical, but still noting that the > page does not behave as RAM. > > Update BIOS, and if the issue persists, contact supermicro. This > interesting detail adds even more evidence that BIOS is problematic. Updated BIOS don't solve this.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161215224500.GM98176>