From owner-freebsd-arch@FreeBSD.ORG Tue Mar 22 19:51:16 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E5A7F106566C; Tue, 22 Mar 2011 19:51:15 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id BBDA38FC21; Tue, 22 Mar 2011 19:51:15 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 6691046B06; Tue, 22 Mar 2011 15:51:15 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id DFFC98A027; Tue, 22 Mar 2011 15:51:14 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Date: Tue, 22 Mar 2011 15:51:13 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.4-CBSD-20110107; KDE/4.4.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201103221551.14289.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 22 Mar 2011 15:51:15 -0400 (EDT) Cc: "Bjoern A. Zeeb" Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Mar 2011 19:51:17 -0000 On Tuesday, March 22, 2011 1:30:42 pm Bjoern A. Zeeb wrote: > Hi, > > as part of the i386/pc98/amd64 boot process we are doing some basic > memory testing, mapping pages and running a couple of pattern > write/read tests on the first bytes (see getmemsize() implmentations). > > Depending on the features enabled and boot -v or not you may notice > it as "nothing happens" booting from loader, after any of these > possible lines: > GDB: no debug ports present > KDB: debugger backends: ddb > KDB: current backend: ddb > SMAP type=... > but before the Copyright message. > > With the growing number of memory this can lead to a significant > fraction of kernel startup time on amd64 (~40s delays observed with > 96G of RAM). Looping over the pages, but not mapping them and not > running the pattern tests reduces this significantly (to single digit > numbers of seconds). > > As a first step I'd like to discuss how worth the actual memory tests > are these days, to figure out a sensible default. > > Not wanting to remove them but maybe make more use of them in the > future (as we do not report any problems we find currently) I'd suggest > to introduce a tunable to disable/enable them, say > > hw.run_memtest > > with the following values: > > 0 do not map the page and do not run the pattern tests > 1 do run the pattern test on the beginning of the page > (current default). > and maybe add > 2 run the pattern tests on the entire pages? > > I would further suggest to add a printf independently of boot -v > there, so that the user who would wait, will know what's (not) going on. > Something along the lines of: > "Testing physical address space (%s)." > 0 "skipping extra pattern tests" > 1 "pattern tests on beginning of each page" > 2 "pattern tests on entire pages" > > > If this is something that makes sense, I'd suggest to factor things > out to sys/x86 and would provide a patch for further discussion and > improvements (like error reporting, etc). > > Comments? Suggestions? Do other platforms bother with these sorts of memory tests? If not I'd vote to just drop it. I think this mattered more when you didn't have things like SMAP (so you had to guess at where memory ended sometimes). Also, modern server class x86 machines generally support ECC RAM which will trigger a machine check if there is a problem. I doubt that the early checks are catching anything even for the non-ECC case. If nothing else, I would definitely drop this from amd64 (all those systems have SMAP and machine check support, etc.). -- John Baldwin