From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 18:26:29 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2263A1065670; Wed, 23 Mar 2011 18:26:29 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id EABFE8FC1F; Wed, 23 Mar 2011 18:26:28 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id A1FC846B03; Wed, 23 Mar 2011 14:26:28 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 3D77D8A01B; Wed, 23 Mar 2011 14:26:28 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Date: Wed, 23 Mar 2011 14:26:27 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.4-CBSD-20110107; KDE/4.4.5; amd64; ; ) References: <201103231029.p2NATtwg090498@lurza.secnetix.de> <20110323171443.GA59972@freebsd.org> In-Reply-To: <20110323171443.GA59972@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201103231426.27750.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Wed, 23 Mar 2011 14:26:28 -0400 (EDT) Cc: Alexander Best , bz@freebsd.org, Oliver Fromme Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 18:26:29 -0000 On Wednesday, March 23, 2011 1:14:43 pm Alexander Best wrote: > On Wed Mar 23 11, Oliver Fromme wrote: > > Bjoern A. Zeeb wrote: > > > as part of the i386/pc98/amd64 boot process we are doing some basic > > > memory testing, mapping pages and running a couple of pattern > > > write/read tests on the first bytes (see getmemsize() implmentations). > > > [...] > > > With the growing number of memory this can lead to a significant > > > fraction of kernel startup time on amd64 (~40s delays observed with > > > 96G of RAM). Looping over the pages, but not mapping them and not > > > running the pattern tests reduces this significantly (to single digit > > > numbers of seconds). > > > [...] > > > Not wanting to remove them but maybe make more use of them in the > > > future (as we do not report any problems we find currently) I'd suggest > > > to introduce a tunable to disable/enable them, say > > > > > > hw.run_memtest > > > > +1 for introducing a tunable. > > > > I have also noticed the boot delay on server machines with > > lots of memory (all of them are amd64, FWIW). Co-workers > > have noticed it, too, causing some funny remarks. :-) > > or how about we dump the current memory checks, introduce a tunable and > implement some *real* memory checks. as john pointed out the current checks > are just rudimentary. I think that doing *real* memory checks isn't really the role of our kernel. Better effort would be spent on improving memtest86 since it is already trying to solve this problem. Something that would be nice would be a way to invoke memtest86 from the loader. Assuming you could pass arguments (such as a time limit) to the memtest "kernel", then you could install memtest to /boot/memtest and do something like 'nextboot -k memtest -o "-t 120"' to run memtest for 2 hours on the next boot then reboot back into the stock OS after it finishes, etc. There are several tricky things you need to get right if you want to do *real* memory tests that are a bit harder to do if you have a full blow kernel, such as relocating yourself into already-checked pages at some point so you can check all of the pages in the system, disabling caching for all pages except your kernel so you test the actual RAM rather than your caches, etc. -- John Baldwin