Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Dec 2016 16:40:33 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Slawa Olhovchenkov <slw@zxy.spb.ru>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Enabling NUMA in BIOS stop booting FreeBSD
Message-ID:  <20161214144033.GH94325@kib.kiev.ua>
In-Reply-To: <20161214121336.GD98176@zxy.spb.ru>
References:  <20161213143401.GK90287@zxy.spb.ru> <20161213150139.GZ54029@kib.kiev.ua> <20161213152838.GL90287@zxy.spb.ru> <20161213172529.GC54029@kib.kiev.ua> <20161213174345.GB98176@zxy.spb.ru> <20161214095350.GE94325@kib.kiev.ua> <20161214102711.GF94325@kib.kiev.ua> <20161214105211.GC98176@zxy.spb.ru> <20161214113927.GG94325@kib.kiev.ua> <20161214121336.GD98176@zxy.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Dec 14, 2016 at 03:13:36PM +0300, Slawa Olhovchenkov wrote:
> On Wed, Dec 14, 2016 at 01:39:27PM +0200, Konstantin Belousov wrote:
> 
> > In other words, it is almost certainly the hang and not a fault causing
> > hang. This means that the machine is not compliant with the IA32
> > architecture, in particular, the region reported as normal memory by
> > E820 BIOS service does not behave as normal memory.
> > 
> > Since regardless of the option setting, the memory map is same, and
> > bootstrap page table only depend on the memory map, we use the same page
> > table when hanging and when operating correctly. We do not fault or hang
> > when the option is turned off, which together with the improved early
> > fault handling in the patch, makes it almost certain that the problem is
> > in hardware configuration and not in our early setup.
> > 
> > Of course, the most puzzling part is that memory test makes the hang
> > go away, while repeating memory test operation only on the msgbuf region
> > does not. msgbuf is special in that it is located at TOHM (top of high
> > memory). It spans 128KB from below it to the last byte of the last
> > physical segment.
> > 
> > The only ideas I have right now is that there is either a bug in the
> > Caching Agent/Home agent/IMC configuration in BIOS, in which case there
> > is nothing OS can do to mitigate it.  Or it might be that the memory
> > map reported by CMS is wrong (you said that you use legacy boot, right
> > ?).  This is not too surprising if true, because non-EFI boot code path
> > definitely get less and less testing.
> > 
> > For the later case (potential bug in CMS), could you switch to EFI boot
> > mode and see whether the issue magically healths itself ?  You could boot
> > from USB stick in EFI mode without reinstalling for test.
> 
> I can't boot from USB stick -- this is remote DC and IPMI allow only
> CDROM emulation.
> 
> OK, I am boot in UEFI 12.0 snapshot ISO.
> Boot ok.
> 
> Can I convert installed OS to UEFI mode?
I am not sure what do you ask there.  Are you asking whether I need any
further information from the broken setup ?  I believe that no, I cannot
debug this any further.

I think that the interesting piece of data that can be obtained now is
the memmap command output from the EFI loader from all three configurations,
NUMA on/off and interleaving.

> 
> > Do you use latest BIOS for your motherboard ?
> 
> This is new MB (X10DRi) w/ BIOS 2.0, new is 2.1 but update is not
> simple (need to prepare bootable dos ISO, mostly utilites don't work
> under FreeBSD).
IMO the only way to fix this issue, if it is really important, is
to contact supermicro and show them the bug.  But this only makes sense if
repeated on the latest firmware version.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161214144033.GH94325>