From owner-freebsd-arch@FreeBSD.ORG Sun Mar 20 01:00:20 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9960E10656D9 for ; Sun, 20 Mar 2011 01:00:20 +0000 (UTC) (envelope-from gordon@tetlows.org) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 36D928FC0A for ; Sun, 20 Mar 2011 01:00:19 +0000 (UTC) Received: by bwz12 with SMTP id 12so4774341bwz.13 for ; Sat, 19 Mar 2011 18:00:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.81.27 with SMTP id v27mr2264393bkk.115.1300581516533; Sat, 19 Mar 2011 17:38:36 -0700 (PDT) Received: by 10.204.16.65 with HTTP; Sat, 19 Mar 2011 17:38:36 -0700 (PDT) In-Reply-To: References: <132388F1-44D9-45C9-AE05-1799A7A2DCD9@neville-neil.com> Date: Sat, 19 Mar 2011 17:38:36 -0700 Message-ID: From: Gordon Tetlow To: Navdeep Parhar Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org, George Neville-Neil Subject: Re: Updating our TCP and socket sysctl values... X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Mar 2011 01:00:20 -0000 On Sat, Mar 19, 2011 at 4:13 PM, Navdeep Parhar wrote: > 256KB seems adequate for 10G (as long as the consumer can keep > draining the socket rcv buffer fast enough). =A0If you consider 2 x > bandwidth delay product to be a reasonable socket buffer size then > 256K allows for 10G networks with ~100ms delays. =A0Normally the delay > is _way_ less than this for 10G and even 256K may be an overkill (but > this is ok, the kernel has tcp_do_autorcvbuf on by default) The BDP for a 10Gbps link with 100ms delay is about 120MB. Here's a decent calculator for figuring it out: http://www.speedguide.net/bdp.php Regards, Gordon From owner-freebsd-arch@FreeBSD.ORG Sun Mar 20 01:19:46 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8633F106564A for ; Sun, 20 Mar 2011 01:19:46 +0000 (UTC) (envelope-from nparhar@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 1B04F8FC12 for ; Sun, 20 Mar 2011 01:19:45 +0000 (UTC) Received: by fxm11 with SMTP id 11so5453759fxm.13 for ; Sat, 19 Mar 2011 18:19:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=SpBQYfmeBYrw/6Zs0uCZRUpG6nkknor/q4sJzqV/c/w=; b=Mo1MxLcMfO/NUAXxvwuxfIkNQRqzTHXi73WwEUxnJrleG6QMp77Y9ReA5mIBc3znim kZoWEQOPY4PaWuhsBC6StomAVo+/kUFSS/3z7BjUxbJlrmcMb4MtUrRB+pQKfzSgLXDk 4jGpHQNkiO6tgKRQFt0/YGrh1Ds4vWAVAst5c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=xJroWfQQBXs2jIGwzVfIvVJZvsNjwT4QZUEqKYD4W//pn/Ik3w1/mAJS9Arj7p93tz XsKSkx2THjhbfBdENOp4WVfYv8QAtZLmlweJWvJL6C9ECoan5FSGNsR+B2LXGrWv5uK+ 1HcoGsb/TakxXN2Kvqgq/LOFjLHzRuCTG/W7g= MIME-Version: 1.0 Received: by 10.223.77.19 with SMTP id e19mr3100914fak.36.1300583985172; Sat, 19 Mar 2011 18:19:45 -0700 (PDT) Received: by 10.223.32.204 with HTTP; Sat, 19 Mar 2011 18:19:45 -0700 (PDT) In-Reply-To: References: <132388F1-44D9-45C9-AE05-1799A7A2DCD9@neville-neil.com> Date: Sat, 19 Mar 2011 18:19:45 -0700 Message-ID: From: Navdeep Parhar To: Gordon Tetlow Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org, George Neville-Neil Subject: Re: Updating our TCP and socket sysctl values... X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Mar 2011 01:19:46 -0000 On Sat, Mar 19, 2011 at 5:38 PM, Gordon Tetlow wrote: > On Sat, Mar 19, 2011 at 4:13 PM, Navdeep Parhar wrote= : >> 256KB seems adequate for 10G (as long as the consumer can keep >> draining the socket rcv buffer fast enough). =A0If you consider 2 x >> bandwidth delay product to be a reasonable socket buffer size then >> 256K allows for 10G networks with ~100ms delays. =A0Normally the delay >> is _way_ less than this for 10G and even 256K may be an overkill (but >> this is ok, the kernel has tcp_do_autorcvbuf on by default) > > The BDP for a 10Gbps link with 100ms delay is about 120MB. I meant 100us (microseconds), sorry. My point still stands - 10G networks have much less one way delay than this. The worst I can find in the lab right now has around ~30us delay. A socket rcv bufsize of 64K maxes out the link in some casual testing with netperf (with autosizing disabled). 256K is already more than what's needed. Regards, Navdeep > > Here's a decent calculator for figuring it out: > http://www.speedguide.net/bdp.php > > Regards, > Gordon > From owner-freebsd-arch@FreeBSD.ORG Sun Mar 20 03:45:50 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 66D33106566B for ; Sun, 20 Mar 2011 03:45:50 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id 3B60A8FC08 for ; Sun, 20 Mar 2011 03:45:50 +0000 (UTC) Received: from 197.214.32.202.bf.2iij.net ([202.32.214.197] helo=[192.168.12.144]) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1Q19aO-0005n1-R8; Sat, 19 Mar 2011 23:45:49 -0400 Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: text/plain; charset=us-ascii From: George Neville-Neil In-Reply-To: <20110319160400.000043f5@unknown> Date: Sun, 20 Mar 2011 12:45:45 +0900 Content-Transfer-Encoding: quoted-printable Message-Id: <72B8E80C-E4C7-4763-A7B5-7A4441188C00@neville-neil.com> References: <132388F1-44D9-45C9-AE05-1799A7A2DCD9@neville-neil.com> <20110319160400.000043f5@unknown> To: Alexander Leidinger X-Pgp-Agent: GPGMail 1.3.2 X-Mailer: Apple Mail (2.1082) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com Cc: arch@freebsd.org Subject: Re: Updating our TCP and socket sysctl values... X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Mar 2011 03:45:50 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mar 20, 2011, at 00:04 , Alexander Leidinger wrote: > On Sat, 19 Mar 2011 15:37:47 +0900 George Neville-Neil > wrote: >=20 >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >>=20 >> Howdy, >>=20 >> I believe it's time for us to upgrade our sysctl values for TCP >> sockets so that they are more in line with the modern world. At the >> moment we have these limits on our buffering: >>=20 >> kern.ipc.maxsockbuf: 262144 >> net.inet.tcp.recvbuf_max: 262144 >> net.inet.tcp.sendbuf_max: 262144 >>=20 >> I believe it's time to up these values to something that's in line >> with higher speed local networks, such as 10G. Perhaps it's time to >> move these to 2MB instead of 256K. >>=20 >> Thoughts? >=20 > I suggest to read > http://www.bufferbloat.net/projects/bloat/wiki/Bufferbloat > and do a before/after test to make sure we do not suffer from the > described problem. Jim Getty has test descriptions: > http://gettys.wordpress.com/category/bufferbloat/ >=20 No need to read those, I heard him talk about it at dinner a few weeks ago. What he's mostly talking about is buffer bloat in non endpoint devices. Note that I'm not talking about changing where we start, but what the maximum is. It is definitely the case, and Jeff Roberson can back me up on this, that our defaults are hampering the out of the box experience for our users. Best, George -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) iEYEARECAAYFAk2FeGkACgkQYdh2wUQKM9K7KgCggOYPJlks8agtDEZdJX1jsxa/ 9vMAn2RRTOGgylTHd08bz6IZYayIHuaA =3DDXWB -----END PGP SIGNATURE----- From owner-freebsd-arch@FreeBSD.ORG Sun Mar 20 03:47:31 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6E0F3106564A for ; Sun, 20 Mar 2011 03:47:31 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id 467B78FC13 for ; Sun, 20 Mar 2011 03:47:31 +0000 (UTC) Received: from 197.214.32.202.bf.2iij.net ([202.32.214.197] helo=[192.168.12.144]) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1Q19c2-0006KW-ES; Sat, 19 Mar 2011 23:47:31 -0400 Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: text/plain; charset=us-ascii From: George Neville-Neil In-Reply-To: Date: Sun, 20 Mar 2011 12:47:27 +0900 Content-Transfer-Encoding: quoted-printable Message-Id: <281E39E0-55D0-4B52-9CD9-F437442B67EC@neville-neil.com> References: <132388F1-44D9-45C9-AE05-1799A7A2DCD9@neville-neil.com> To: Navdeep Parhar X-Pgp-Agent: GPGMail 1.3.2 X-Mailer: Apple Mail (2.1082) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com Cc: arch@freebsd.org Subject: Re: Updating our TCP and socket sysctl values... X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Mar 2011 03:47:31 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mar 20, 2011, at 08:13 , Navdeep Parhar wrote: > On Fri, Mar 18, 2011 at 11:37 PM, George Neville-Neil > wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >>=20 >> Howdy, >>=20 >> I believe it's time for us to upgrade our sysctl values for TCP = sockets so that >> they are more in line with the modern world. At the moment we have = these limits on >> our buffering: >>=20 >> kern.ipc.maxsockbuf: 262144 >> net.inet.tcp.recvbuf_max: 262144 >> net.inet.tcp.sendbuf_max: 262144 >>=20 >> I believe it's time to up these values to something that's in line = with higher speed >> local networks, such as 10G. Perhaps it's time to move these to 2MB = instead of 256K. >>=20 >> Thoughts? >=20 > 256KB seems adequate for 10G (as long as the consumer can keep > draining the socket rcv buffer fast enough). If you consider 2 x > bandwidth delay product to be a reasonable socket buffer size then > 256K allows for 10G networks with ~100ms delays. Normally the delay > is _way_ less than this for 10G and even 256K may be an overkill (but > this is ok, the kernel has tcp_do_autorcvbuf on by default) >=20 > While we're here discussing defaults, what about nmbclusters and > nmbjumboXX? Now those haven't kept up with modern machines (imho). >=20 Yes we should also up the nmbclusters, IMHO, but I wasn't going to put that in the same bucket with the TCP buffers just yet. On 64 bit/large memory machines you could make the nmbclusters far higher than our current default. I know people who just set that to 1,000,000 by default. If people are also happy to up nmbclusters I'm willing to conflate that with this. Best, George -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) iEYEARECAAYFAk2FeM8ACgkQYdh2wUQKM9KPZgCgy9AcsoowTLk+sAaFHx52VSkW mGEAn22eOTi3yqweMrOKsVkZ2XOWi9kX =3D22fZ -----END PGP SIGNATURE----- From owner-freebsd-arch@FreeBSD.ORG Sun Mar 20 06:26:50 2011 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0F550106564A for ; Sun, 20 Mar 2011 06:26:50 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx06.syd.optusnet.com.au (fallbackmx06.syd.optusnet.com.au [211.29.132.8]) by mx1.freebsd.org (Postfix) with ESMTP id 8FE7A8FC0C for ; Sun, 20 Mar 2011 06:26:48 +0000 (UTC) Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au [211.29.132.182]) by fallbackmx06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p2K4MvBd019969 for ; Sun, 20 Mar 2011 15:22:57 +1100 Received: from c122-107-125-80.carlnfd1.nsw.optusnet.com.au (c122-107-125-80.carlnfd1.nsw.optusnet.com.au [122.107.125.80]) by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p2K4Mprj023625 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 20 Mar 2011 15:22:53 +1100 Date: Sun, 20 Mar 2011 15:22:51 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Jeff Roberson In-Reply-To: Message-ID: <20110320151003.A939@besplex.bde.org> References: <132388F1-44D9-45C9-AE05-1799A7A2DCD9@neville-neil.com> <20110319160400.000043f5@unknown> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Alexander Leidinger , George Neville-Neil , arch@FreeBSD.org Subject: Re: Updating our TCP and socket sysctl values... X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Mar 2011 06:26:50 -0000 On Sat, 19 Mar 2011, Jeff Roberson wrote: > On Sat, 19 Mar 2011, Alexander Leidinger wrote: > >> On Sat, 19 Mar 2011 15:37:47 +0900 George Neville-Neil >> wrote: >>> >>> I believe it's time for us to upgrade our sysctl values for TCP >>> sockets so that they are more in line with the modern world. At the >>> moment we have these limits on our buffering: >>> >>> kern.ipc.maxsockbuf: 262144 >>> net.inet.tcp.recvbuf_max: 262144 >>> net.inet.tcp.sendbuf_max: 262144 >>> >>> I believe it's time to up these values to something that's in line >>> with higher speed local networks, such as 10G. Perhaps it's time to >>> move these to 2MB instead of 256K. All hard-coded limits are bogus. The same limit for a machine that has 8MB memory is nonense for a machine that has 8GB. In FreeBSD, AFAIK only the vm system has _very_ good auto-tuning of parameters and limits thanks to dyson's work 10-15 years ago. It has almost no user-settable parameters or limits like the above. >> I suggest to read >> http://www.bufferbloat.net/projects/bloat/wiki/Bufferbloat >> and do a before/after test to make sure we do not suffer from the >> described problem. Jim Getty has test descriptions: >> http://gettys.wordpress.com/category/bufferbloat/ > > Are they not talking about buffers in non-endpoint devices? Or perhaps even > overly large rx queues in endpoints, but not local socket receive buffers? > It seems that they are describing situations where excessive buffering masks > network conditions until it's too late. I don't know, but there is an mostly-unrelated bufferbloat problem that is purely local. If you have a buffer that is larger than an Ln cache (or about half than), then actually using just a single buffer of that size guarantees thrashing of the Ln cache, so that almost every memory access is an Ln cache miss. Even with current hardware, a buffer of size 256K will thrash most L1 caches and a buffer of size a few MB will thrash most L2 caches. Bruce From owner-freebsd-arch@FreeBSD.ORG Sun Mar 20 15:58:14 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 769DD106566C for ; Sun, 20 Mar 2011 15:58:14 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 3775D8FC16 for ; Sun, 20 Mar 2011 15:58:13 +0000 (UTC) Received: from outgoing.leidinger.net (p5B154CFB.dip.t-dialin.net [91.21.76.251]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 2F29984400E; Sun, 20 Mar 2011 16:58:08 +0100 (CET) Received: from unknown (IO.Leidinger.net [192.168.2.110]) by outgoing.leidinger.net (Postfix) with ESMTP id 1163C3F82; Sun, 20 Mar 2011 16:58:05 +0100 (CET) Date: Sun, 20 Mar 2011 16:58:05 +0100 From: Alexander Leidinger To: Jeff Roberson Message-ID: <20110320165805.00005886@unknown> In-Reply-To: References: <132388F1-44D9-45C9-AE05-1799A7A2DCD9@neville-neil.com> <20110319160400.000043f5@unknown> X-Mailer: Claws Mail 3.7.8cvs47 (GTK+ 2.16.6; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 2F29984400E.A51A7 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1, required 6, autolearn=disabled, ALL_TRUSTED -1.00) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1301241489.76402@fr3N/i7X3aeZp6UxtnmxYQ X-EBL-Spam-Status: No Cc: arch@freebsd.org, George Neville-Neil Subject: Re: Updating our TCP and socket sysctl values... X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Mar 2011 15:58:14 -0000 On Sat, 19 Mar 2011 11:58:12 -1000 (HST) Jeff Roberson wrote: > On Sat, 19 Mar 2011, Alexander Leidinger wrote: > > > On Sat, 19 Mar 2011 15:37:47 +0900 George Neville-Neil > > wrote: > > > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> Howdy, > >> > >> I believe it's time for us to upgrade our sysctl values for TCP > >> sockets so that they are more in line with the modern world. At > >> the moment we have these limits on our buffering: > >> > >> kern.ipc.maxsockbuf: 262144 > >> net.inet.tcp.recvbuf_max: 262144 > >> net.inet.tcp.sendbuf_max: 262144 > >> > >> I believe it's time to up these values to something that's in line > >> with higher speed local networks, such as 10G. Perhaps it's time > >> to move these to 2MB instead of 256K. > >> > >> Thoughts? > > > > I suggest to read > > http://www.bufferbloat.net/projects/bloat/wiki/Bufferbloat > > and do a before/after test to make sure we do not suffer from the > > described problem. Jim Getty has test descriptions: > > http://gettys.wordpress.com/category/bufferbloat/ > > Are they not talking about buffers in non-endpoint devices? Or They are talking about every place where buffering instead of package-loss (= triggering the congestion algorithm) happens. If you connect via a wireless device, it may even your transmitting device which may be doing something wrong. You may commonly experience this in non-endpoint devices, as those are more likely to be in a congested situation than the endpoint devices, but there is still research going on to completely understand this. The FAQ (so far): http://gettys.wordpress.com/bufferbloat-faq/ Some changes made to a NIC to see a little bit the direction: http://gettys.wordpress.com/2011/02/13/bufferbloat-related-patches-for-the-iwl3945/ The description of an experiment with a WLAN-AP: http://gettys.wordpress.com/2010/12/02/home-router-puzzle-piece-two-fun-with-wireless/ > perhaps even overly large rx queues in endpoints, but not local > socket receive buffers? It seems that they are describing situations Yes, the RX buffers on an endpoint are not an issue. If the specific buffers in this thread are an issue... I don't know. > where excessive buffering masks network conditions until it's too > late. Bye, Alexander. From owner-freebsd-arch@FreeBSD.ORG Sun Mar 20 16:11:31 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A1DC106566C for ; Sun, 20 Mar 2011 16:11:31 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 4C5678FC13 for ; Sun, 20 Mar 2011 16:11:31 +0000 (UTC) Received: from outgoing.leidinger.net (p5B154CFB.dip.t-dialin.net [91.21.76.251]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 1B0A784400E; Sun, 20 Mar 2011 17:11:24 +0100 (CET) Received: from unknown (IO.Leidinger.net [192.168.2.110]) by outgoing.leidinger.net (Postfix) with ESMTP id 679233F86; Sun, 20 Mar 2011 17:11:21 +0100 (CET) Date: Sun, 20 Mar 2011 17:11:22 +0100 From: Alexander Leidinger To: George Neville-Neil Message-ID: <20110320171122.00004613@unknown> In-Reply-To: <72B8E80C-E4C7-4763-A7B5-7A4441188C00@neville-neil.com> References: <132388F1-44D9-45C9-AE05-1799A7A2DCD9@neville-neil.com> <20110319160400.000043f5@unknown> <72B8E80C-E4C7-4763-A7B5-7A4441188C00@neville-neil.com> X-Mailer: Claws Mail 3.7.8cvs47 (GTK+ 2.16.6; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 1B0A784400E.A4BA8 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1, required 6, autolearn=disabled, ALL_TRUSTED -1.00) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1301242288.46085@xUJcTwz/SLbx6RfgADbZIA X-EBL-Spam-Status: No Cc: arch@freebsd.org Subject: Re: Updating our TCP and socket sysctl values... X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Mar 2011 16:11:31 -0000 On Sun, 20 Mar 2011 12:45:45 +0900 George Neville-Neil wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > On Mar 20, 2011, at 00:04 , Alexander Leidinger wrote: > > > On Sat, 19 Mar 2011 15:37:47 +0900 George Neville-Neil > > wrote: > > > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> Howdy, > >> > >> I believe it's time for us to upgrade our sysctl values for TCP > >> sockets so that they are more in line with the modern world. At > >> the moment we have these limits on our buffering: > >> > >> kern.ipc.maxsockbuf: 262144 > >> net.inet.tcp.recvbuf_max: 262144 > >> net.inet.tcp.sendbuf_max: 262144 > >> > >> I believe it's time to up these values to something that's in line > >> with higher speed local networks, such as 10G. Perhaps it's time > >> to move these to 2MB instead of 256K. > >> > >> Thoughts? > > > > I suggest to read > > http://www.bufferbloat.net/projects/bloat/wiki/Bufferbloat > > and do a before/after test to make sure we do not suffer from the > > described problem. Jim Getty has test descriptions: > > http://gettys.wordpress.com/category/bufferbloat/ > > > > No need to read those, I heard him talk about it at dinner a > few weeks ago. What he's mostly talking about is buffer bloat Great. > in non endpoint devices. Note that I'm not talking about changing I had the impression that this can also be an issue with e.g. your laptop connected to a WLAN. Bye, Alexander. From owner-freebsd-arch@FreeBSD.ORG Sun Mar 20 17:49:52 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 77154106564A for ; Sun, 20 Mar 2011 17:49:52 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 074B68FC20 for ; Sun, 20 Mar 2011 17:49:51 +0000 (UTC) Received: by fxm11 with SMTP id 11so5794286fxm.13 for ; Sun, 20 Mar 2011 10:49:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:reply-to:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=IaNIhGosepu9RIQRwXo3DxDgmZJ94Ikg25Mf2Sbi/x8=; b=TktFDJZvHZUAiTG+yLEgmZR59tyV6ONaFr/egwyGeEUq/Fe8rb3ktnPisSePqRbaS3 ZS9NTiwbgPMXDC8w4FRHZRs45WZeydDY3y83aPtkR4AwmEw7hNwj8mDg5/JH05iChlet vSzRsJMV0Us5iZQGh0ZqgKpzoJ4hcDRSg8HAQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=xGrL/LYXlOEByzV0ZnQrh9aSTgEt7gn5Lgtoii4/mLEWJO11M9S9UhopoOE5kRhrIR aO0WZjZe/l1sLOW6s83OF5THc/usRRLcbVAvdD037FxMcF+FAzzm9DvYXuB79+YnX1qA GpU7xCDJTR5OdvDOCYtDRqvDueuGeGRPqob3g= MIME-Version: 1.0 Received: by 10.223.6.11 with SMTP id 11mr3802363fax.101.1300641853282; Sun, 20 Mar 2011 10:24:13 -0700 (PDT) Received: by 10.223.115.148 with HTTP; Sun, 20 Mar 2011 10:24:13 -0700 (PDT) In-Reply-To: <281E39E0-55D0-4B52-9CD9-F437442B67EC@neville-neil.com> References: <132388F1-44D9-45C9-AE05-1799A7A2DCD9@neville-neil.com> <281E39E0-55D0-4B52-9CD9-F437442B67EC@neville-neil.com> Date: Sun, 20 Mar 2011 12:24:13 -0500 Message-ID: From: Alan Cox To: George Neville-Neil Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: arch@freebsd.org, Navdeep Parhar Subject: Re: Updating our TCP and socket sysctl values... X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Mar 2011 17:49:52 -0000 On Sat, Mar 19, 2011 at 10:47 PM, George Neville-Neil wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > On Mar 20, 2011, at 08:13 , Navdeep Parhar wrote: > > > On Fri, Mar 18, 2011 at 11:37 PM, George Neville-Neil > > wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> Howdy, > >> > >> I believe it's time for us to upgrade our sysctl values for TCP sockets > so that > >> they are more in line with the modern world. At the moment we have > these limits on > >> our buffering: > >> > >> kern.ipc.maxsockbuf: 262144 > >> net.inet.tcp.recvbuf_max: 262144 > >> net.inet.tcp.sendbuf_max: 262144 > >> > >> I believe it's time to up these values to something that's in line with > higher speed > >> local networks, such as 10G. Perhaps it's time to move these to 2MB > instead of 256K. > >> > >> Thoughts? > > > > 256KB seems adequate for 10G (as long as the consumer can keep > > draining the socket rcv buffer fast enough). If you consider 2 x > > bandwidth delay product to be a reasonable socket buffer size then > > 256K allows for 10G networks with ~100ms delays. Normally the delay > > is _way_ less than this for 10G and even 256K may be an overkill (but > > this is ok, the kernel has tcp_do_autorcvbuf on by default) > > > > While we're here discussing defaults, what about nmbclusters and > > nmbjumboXX? Now those haven't kept up with modern machines (imho). > > > > Yes we should also up the nmbclusters, IMHO, but I wasn't going to > put that in the same bucket with the TCP buffers just yet. > On 64 bit/large memory machines you could make the nmbclusters > far higher than our current default. I know people who just set > that to 1,000,000 by default. > > If people are also happy to up nmbclusters I'm willing to conflate > that with this. > > A more modest but nonetheless significant increase could also be possible on i386 machines. If you go back to r129906, wherein we switched to using UMA for allocating mbufs and mbuf clusters, and read it carefully, you'll find that there was a subtle mistake made in the changes to the sizing of the kmem_map, or the "kernel heap". Prior to r129906, the overall size of the kmem map was based on the limits on mbufs and mbuf clusters PLUS the amount of kernel heap that was desired for everything else. After r129906, the limits on mbufs and mbuf clusters no longer made any difference to the size of the kmem map. The reason being that the limit on mbuf clusters was factored into the autosizing too early. It is added to the minimum "kernel heap" size, not the desired size. So, the end result is that mbufs, mbuf clusters, and everything else were made to compete for a smaller kmem map. In short, r129906 should have increased VM_KMEM_SIZE_MAX from its current limit of 320MB. I'd be curious if people running i386-based network servers have any problems with using #ifndef VM_KMEM_SIZE_MAX #define VM_KMEM_SIZE_MAX ((VM_MAX_KERNEL_ADDRESS - \ VM_MIN_KERNEL_ADDRESS + 1) * 3 / 5) #endif in place of #ifndef VM_KMEM_SIZE_MAX #define VM_KMEM_SIZE_MAX (320 * 1024 * 1024) #endif Really, the only downside to this change is that it reduces the available kernel virtual address space for thread stacks and 9 and 16KB jumbo frames. Alan From owner-freebsd-arch@FreeBSD.ORG Sun Mar 20 19:15:22 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 915D2106564A for ; Sun, 20 Mar 2011 19:15:22 +0000 (UTC) (envelope-from gordon@tetlows.org) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 2D6FD8FC0C for ; Sun, 20 Mar 2011 19:15:21 +0000 (UTC) Received: by bwz12 with SMTP id 12so5153262bwz.13 for ; Sun, 20 Mar 2011 12:15:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.189.1 with SMTP id dc1mr2889923bkb.34.1300648520328; Sun, 20 Mar 2011 12:15:20 -0700 (PDT) Received: by 10.204.16.65 with HTTP; Sun, 20 Mar 2011 12:15:20 -0700 (PDT) In-Reply-To: References: <132388F1-44D9-45C9-AE05-1799A7A2DCD9@neville-neil.com> Date: Sun, 20 Mar 2011 12:15:20 -0700 Message-ID: From: Gordon Tetlow To: Navdeep Parhar Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org, George Neville-Neil Subject: Re: Updating our TCP and socket sysctl values... X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Mar 2011 19:15:22 -0000 On Sat, Mar 19, 2011 at 6:19 PM, Navdeep Parhar wrote: > I meant 100us (microseconds), sorry. =A0My point still stands - 10G > networks have much less one way delay than this. =A0The worst =A0I can > find in the lab right now has around ~30us delay. =A0A socket rcv > bufsize of 64K maxes out the link in some casual testing with netperf > (with autosizing disabled). =A0256K is already more than what's needed. Let's look at something much more realistic on the internet. How about a 100Mbps link with 100ms delay. That's downloading something from Europe from the US. I do this at work all of the time. The BDP for such a link is ~1.2MB. This is a pretty common scenario today and it's not even close to what is reasonably capable (a reliable 1Gbps link over the same delay distance). Looking at other operating systems: Linux (CentOS 5.4): Read window: Initial: 87380 Max: 4194304 Write window: Initial: 16384 Max: 4194304 Solaris 10: Read window: Initial: 49152 Max: 1048576 Write window: Initial: 49152 Max: 1048576 What is the FreeBSD initial setting? Is that sendspace (32k) and recvspace (64k)? Should we look at changing those too or just discuss the maximum window sizes? Gordon From owner-freebsd-arch@FreeBSD.ORG Tue Mar 22 17:31:18 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B0AFC1065677 for ; Tue, 22 Mar 2011 17:31:18 +0000 (UTC) (envelope-from bz@FreeBSD.org) Received: from mx1.sbone.de (bird.sbone.de [46.4.1.90]) by mx1.freebsd.org (Postfix) with ESMTP id 6E5E18FC22 for ; Tue, 22 Mar 2011 17:31:18 +0000 (UTC) Received: from mail.sbone.de (mail.sbone.de [IPv6:fde9:577b:c1a9:31::2013:587]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.sbone.de (Postfix) with ESMTPS id 4A21A25D386D for ; Tue, 22 Mar 2011 17:30:45 +0000 (UTC) Received: from content-filter.sbone.de (content-filter.sbone.de [IPv6:fde9:577b:c1a9:31::2013:2742]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPS id 9C585159B625 for ; Tue, 22 Mar 2011 17:30:44 +0000 (UTC) X-Virus-Scanned: amavisd-new at sbone.de Received: from mail.sbone.de ([IPv6:fde9:577b:c1a9:31::2013:587]) by content-filter.sbone.de (content-filter.sbone.de [fde9:577b:c1a9:31::2013:2742]) (amavisd-new, port 10024) with ESMTP id MKElwrV1zWXT for ; Tue, 22 Mar 2011 17:30:43 +0000 (UTC) Received: from nv.sbone.de (nv.sbone.de [IPv6:fde9:577b:c1a9:31::2013:138]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPSA id 84AD1159B61A for ; Tue, 22 Mar 2011 17:30:43 +0000 (UTC) Date: Tue, 22 Mar 2011 17:30:42 +0000 (UTC) From: "Bjoern A. Zeeb" To: arch@freebsd.org Message-ID: X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Cc: Subject: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Mar 2011 17:31:18 -0000 Hi, as part of the i386/pc98/amd64 boot process we are doing some basic memory testing, mapping pages and running a couple of pattern write/read tests on the first bytes (see getmemsize() implmentations). Depending on the features enabled and boot -v or not you may notice it as "nothing happens" booting from loader, after any of these possible lines: GDB: no debug ports present KDB: debugger backends: ddb KDB: current backend: ddb SMAP type=... but before the Copyright message. With the growing number of memory this can lead to a significant fraction of kernel startup time on amd64 (~40s delays observed with 96G of RAM). Looping over the pages, but not mapping them and not running the pattern tests reduces this significantly (to single digit numbers of seconds). As a first step I'd like to discuss how worth the actual memory tests are these days, to figure out a sensible default. Not wanting to remove them but maybe make more use of them in the future (as we do not report any problems we find currently) I'd suggest to introduce a tunable to disable/enable them, say hw.run_memtest with the following values: 0 do not map the page and do not run the pattern tests 1 do run the pattern test on the beginning of the page (current default). and maybe add 2 run the pattern tests on the entire pages? I would further suggest to add a printf independently of boot -v there, so that the user who would wait, will know what's (not) going on. Something along the lines of: "Testing physical address space (%s)." 0 "skipping extra pattern tests" 1 "pattern tests on beginning of each page" 2 "pattern tests on entire pages" If this is something that makes sense, I'd suggest to factor things out to sys/x86 and would provide a patch for further discussion and improvements (like error reporting, etc). Comments? Suggestions? Bjoern -- Bjoern A. Zeeb You have to have visions! Stop bit received. Insert coin for new address family. From owner-freebsd-arch@FreeBSD.ORG Tue Mar 22 19:43:25 2011 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E12F106564A; Tue, 22 Mar 2011 19:43:25 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id E440D8FC0A; Tue, 22 Mar 2011 19:43:24 +0000 (UTC) Received: from [209.249.190.124] (helo=gnnmac.hudson-trading.com) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1Q27UB-0002ya-7g; Tue, 22 Mar 2011 15:43:23 -0400 Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: text/plain; charset=us-ascii From: George Neville-Neil In-Reply-To: Date: Tue, 22 Mar 2011 15:43:22 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <8E13F8D5-45E1-4A9D-9ACE-E1344B5F3686@neville-neil.com> References: To: Bjoern A. Zeeb X-Pgp-Agent: GPGMail 1.3.3 X-Mailer: Apple Mail (2.1082) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com Cc: arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Mar 2011 19:43:25 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mar 22, 2011, at 13:30 , Bjoern A. Zeeb wrote: > Hi, >=20 > as part of the i386/pc98/amd64 boot process we are doing some basic > memory testing, mapping pages and running a couple of pattern > write/read tests on the first bytes (see getmemsize() implmentations). >=20 > Depending on the features enabled and boot -v or not you may notice > it as "nothing happens" booting from loader, after any of these > possible lines: > GDB: no debug ports present > KDB: debugger backends: ddb > KDB: current backend: ddb > SMAP type=3D... > but before the Copyright message. >=20 > With the growing number of memory this can lead to a significant > fraction of kernel startup time on amd64 (~40s delays observed with > 96G of RAM). Looping over the pages, but not mapping them and not > running the pattern tests reduces this significantly (to single digit > numbers of seconds). >=20 > As a first step I'd like to discuss how worth the actual memory tests > are these days, to figure out a sensible default. >=20 > Not wanting to remove them but maybe make more use of them in the > future (as we do not report any problems we find currently) I'd = suggest > to introduce a tunable to disable/enable them, say >=20 > hw.run_memtest >=20 > with the following values: >=20 > 0 do not map the page and do not run the pattern tests > 1 do run the pattern test on the beginning of the page > (current default). > and maybe add > 2 run the pattern tests on the entire pages? >=20 > I would further suggest to add a printf independently of boot -v > there, so that the user who would wait, will know what's (not) going = on. > Something along the lines of: > "Testing physical address space (%s)." > 0 "skipping extra pattern tests" > 1 "pattern tests on beginning of each page" > 2 "pattern tests on entire pages" >=20 >=20 > If this is something that makes sense, I'd suggest to factor things > out to sys/x86 and would provide a patch for further discussion and > improvements (like error reporting, etc). >=20 > Comments? Suggestions? I do not know how effective our memory tests on are on modern systems. I do think that having a tunable is a good idea for people who want faster boots. Best, George -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (Darwin) iEYEARECAAYFAk2I+9oACgkQYdh2wUQKM9JZyACfaaPAbg2weBkZvi/gxM4JfKqV 3/IAoIFbwEpSo4Aix7TwRn7SNOmY6Syq =3DVKJG -----END PGP SIGNATURE----- From owner-freebsd-arch@FreeBSD.ORG Tue Mar 22 19:51:16 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E5A7F106566C; Tue, 22 Mar 2011 19:51:15 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id BBDA38FC21; Tue, 22 Mar 2011 19:51:15 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 6691046B06; Tue, 22 Mar 2011 15:51:15 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id DFFC98A027; Tue, 22 Mar 2011 15:51:14 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Date: Tue, 22 Mar 2011 15:51:13 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.4-CBSD-20110107; KDE/4.4.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201103221551.14289.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 22 Mar 2011 15:51:15 -0400 (EDT) Cc: "Bjoern A. Zeeb" Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Mar 2011 19:51:17 -0000 On Tuesday, March 22, 2011 1:30:42 pm Bjoern A. Zeeb wrote: > Hi, > > as part of the i386/pc98/amd64 boot process we are doing some basic > memory testing, mapping pages and running a couple of pattern > write/read tests on the first bytes (see getmemsize() implmentations). > > Depending on the features enabled and boot -v or not you may notice > it as "nothing happens" booting from loader, after any of these > possible lines: > GDB: no debug ports present > KDB: debugger backends: ddb > KDB: current backend: ddb > SMAP type=... > but before the Copyright message. > > With the growing number of memory this can lead to a significant > fraction of kernel startup time on amd64 (~40s delays observed with > 96G of RAM). Looping over the pages, but not mapping them and not > running the pattern tests reduces this significantly (to single digit > numbers of seconds). > > As a first step I'd like to discuss how worth the actual memory tests > are these days, to figure out a sensible default. > > Not wanting to remove them but maybe make more use of them in the > future (as we do not report any problems we find currently) I'd suggest > to introduce a tunable to disable/enable them, say > > hw.run_memtest > > with the following values: > > 0 do not map the page and do not run the pattern tests > 1 do run the pattern test on the beginning of the page > (current default). > and maybe add > 2 run the pattern tests on the entire pages? > > I would further suggest to add a printf independently of boot -v > there, so that the user who would wait, will know what's (not) going on. > Something along the lines of: > "Testing physical address space (%s)." > 0 "skipping extra pattern tests" > 1 "pattern tests on beginning of each page" > 2 "pattern tests on entire pages" > > > If this is something that makes sense, I'd suggest to factor things > out to sys/x86 and would provide a patch for further discussion and > improvements (like error reporting, etc). > > Comments? Suggestions? Do other platforms bother with these sorts of memory tests? If not I'd vote to just drop it. I think this mattered more when you didn't have things like SMAP (so you had to guess at where memory ended sometimes). Also, modern server class x86 machines generally support ECC RAM which will trigger a machine check if there is a problem. I doubt that the early checks are catching anything even for the non-ECC case. If nothing else, I would definitely drop this from amd64 (all those systems have SMAP and machine check support, etc.). -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Mar 22 20:00:49 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 1D133106566B; Tue, 22 Mar 2011 20:00:49 +0000 (UTC) Date: Tue, 22 Mar 2011 20:00:49 +0000 From: Alexander Best To: John Baldwin Message-ID: <20110322200049.GA84878@freebsd.org> References: <201103221551.14289.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201103221551.14289.jhb@freebsd.org> Cc: "Bjoern A. Zeeb" , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Mar 2011 20:00:49 -0000 On Tue Mar 22 11, John Baldwin wrote: > On Tuesday, March 22, 2011 1:30:42 pm Bjoern A. Zeeb wrote: > > Hi, > > > > as part of the i386/pc98/amd64 boot process we are doing some basic > > memory testing, mapping pages and running a couple of pattern > > write/read tests on the first bytes (see getmemsize() implmentations). > > > > Depending on the features enabled and boot -v or not you may notice > > it as "nothing happens" booting from loader, after any of these > > possible lines: > > GDB: no debug ports present > > KDB: debugger backends: ddb > > KDB: current backend: ddb > > SMAP type=... > > but before the Copyright message. > > > > With the growing number of memory this can lead to a significant > > fraction of kernel startup time on amd64 (~40s delays observed with > > 96G of RAM). Looping over the pages, but not mapping them and not > > running the pattern tests reduces this significantly (to single digit > > numbers of seconds). > > > > As a first step I'd like to discuss how worth the actual memory tests > > are these days, to figure out a sensible default. > > > > Not wanting to remove them but maybe make more use of them in the > > future (as we do not report any problems we find currently) I'd suggest > > to introduce a tunable to disable/enable them, say > > > > hw.run_memtest > > > > with the following values: > > > > 0 do not map the page and do not run the pattern tests > > 1 do run the pattern test on the beginning of the page > > (current default). > > and maybe add > > 2 run the pattern tests on the entire pages? > > > > I would further suggest to add a printf independently of boot -v > > there, so that the user who would wait, will know what's (not) going on. > > Something along the lines of: > > "Testing physical address space (%s)." > > 0 "skipping extra pattern tests" > > 1 "pattern tests on beginning of each page" > > 2 "pattern tests on entire pages" > > > > > > If this is something that makes sense, I'd suggest to factor things > > out to sys/x86 and would provide a patch for further discussion and > > improvements (like error reporting, etc). > > > > Comments? Suggestions? > > Do other platforms bother with these sorts of memory tests? If not I'd vote > to just drop it. I think this mattered more when you didn't have things like > SMAP (so you had to guess at where memory ended sometimes). Also, modern > server class x86 machines generally support ECC RAM which will trigger a > machine check if there is a problem. I doubt that the early checks are > catching anything even for the non-ECC case. > > If nothing else, I would definitely drop this from amd64 (all those systems > have SMAP and machine check support, etc.). also +1 for removing these routines on amd64. i don't think these are necessary on i386/pc98, too. but if it's being decided that the mem tests should stay on these archs, i vote for the introduction of a tunable. cheers. alex > > -- > John Baldwin -- a13x From owner-freebsd-arch@FreeBSD.ORG Tue Mar 22 20:19:24 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E285D106564A for ; Tue, 22 Mar 2011 20:19:24 +0000 (UTC) (envelope-from mj@feral.com) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id B352A8FC12 for ; Tue, 22 Mar 2011 20:19:24 +0000 (UTC) Received: from ghanima.in1.lcl (ghanima.in1.lcl [172.16.1.87]) by ns1.feral.com (8.14.4/8.14.3) with ESMTP id p2MJsnPD012168; Tue, 22 Mar 2011 11:54:49 -0800 (PST) (envelope-from mj@feral.com) Message-ID: <4D88FE89.1060900@feral.com> Date: Tue, 22 Mar 2011 12:54:49 -0700 From: Matthew Jacob User-Agent: Thunderbird 2.0.0.24 (X11/20101213) MIME-Version: 1.0 To: John Baldwin References: <201103221551.14289.jhb@freebsd.org> In-Reply-To: <201103221551.14289.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.6 (ns1.feral.com [192.67.166.1]); Tue, 22 Mar 2011 11:54:49 -0800 (PST) Cc: "Bjoern A. Zeeb" , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Mar 2011 20:19:25 -0000 John Baldwin wrote: > > Do other platforms bother with these sorts of memory tests? If not I'd vote > to just drop it. I think this mattered more when you didn't have things like > SMAP (so you had to guess at where memory ended sometimes). Also, modern > server class x86 machines generally support ECC RAM which will trigger a > machine check if there is a problem. I doubt that the early checks are > catching anything even for the non-ECC case. > > If nothing else, I would definitely drop this from amd64 (all those systems > have SMAP and machine check support, etc.). > > Memory checks are definitely still useful. Loading the linux mem tester has helped find lots of problems, even on so-called modern machines. I'd voter for leaving this as an option. From owner-freebsd-arch@FreeBSD.ORG Tue Mar 22 22:56:43 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0DA21106564A; Tue, 22 Mar 2011 22:56:43 +0000 (UTC) (envelope-from emaste@freebsd.org) Received: from mail1.sandvine.com (Mail1.sandvine.com [64.7.137.134]) by mx1.freebsd.org (Postfix) with ESMTP id 8F1D08FC0A; Tue, 22 Mar 2011 22:56:42 +0000 (UTC) Received: from labgw2.phaedrus.sandvine.com (192.168.222.22) by WTL-EXCH-1.sandvine.com (192.168.196.31) with Microsoft SMTP Server id 14.0.694.0; Tue, 22 Mar 2011 18:45:54 -0400 Received: by labgw2.phaedrus.sandvine.com (Postfix, from userid 10332) id 5495B33C02; Tue, 22 Mar 2011 18:45:54 -0400 (EDT) Date: Tue, 22 Mar 2011 18:45:54 -0400 From: Ed Maste To: John Baldwin Message-ID: <20110322224554.GA67925@sandvine.com> References: <201103221551.14289.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <201103221551.14289.jhb@freebsd.org> User-Agent: Mutt/1.4.2.1i Cc: "Bjoern A. Zeeb" , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Mar 2011 22:56:43 -0000 On Tue, Mar 22, 2011 at 03:51:13PM -0400, John Baldwin wrote: > Do other platforms bother with these sorts of memory tests? If not I'd vote > to just drop it. I think this mattered more when you didn't have things like > SMAP (so you had to guess at where memory ended sometimes). Also, modern > server class x86 machines generally support ECC RAM which will trigger a > machine check if there is a problem. I doubt that the early checks are > catching anything even for the non-ECC case. In the common case at work we want this off to reduce boot time. The desire for a tunable though that can add extended memory tests is to be able to use the FreeBSD startup code as a replacement for memtest86+, for a couple of reasons: - FreeBSD's serial console output is more easily parsed by automated tools - Memtest86+ appears to be limited to 64GB of RAM at the moment - Memtest86+ lacks support for the Tylersburg architecture last I looked -Ed From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 10:30:12 2011 Return-Path: Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 980F3106564A; Wed, 23 Mar 2011 10:30:12 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id 1B7FC8FC0A; Wed, 23 Mar 2011 10:30:11 +0000 (UTC) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id p2NATtf4090499; Wed, 23 Mar 2011 11:30:10 +0100 (CET) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id p2NATtwg090498; Wed, 23 Mar 2011 11:29:55 +0100 (CET) (envelope-from olli) Date: Wed, 23 Mar 2011 11:29:55 +0100 (CET) Message-Id: <201103231029.p2NATtwg090498@lurza.secnetix.de> From: Oliver Fromme To: freebsd-arch@FreeBSD.ORG, bz@FreeBSD.ORG In-Reply-To: X-Newsgroups: list.freebsd-arch User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.4-PRERELEASE-20080904 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5 (lurza.secnetix.de [127.0.0.1]); Wed, 23 Mar 2011 11:30:10 +0100 (CET) Cc: Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 10:30:12 -0000 Bjoern A. Zeeb wrote: > as part of the i386/pc98/amd64 boot process we are doing some basic > memory testing, mapping pages and running a couple of pattern > write/read tests on the first bytes (see getmemsize() implmentations). > [...] > With the growing number of memory this can lead to a significant > fraction of kernel startup time on amd64 (~40s delays observed with > 96G of RAM). Looping over the pages, but not mapping them and not > running the pattern tests reduces this significantly (to single digit > numbers of seconds). > [...] > Not wanting to remove them but maybe make more use of them in the > future (as we do not report any problems we find currently) I'd suggest > to introduce a tunable to disable/enable them, say > > hw.run_memtest +1 for introducing a tunable. I have also noticed the boot delay on server machines with lots of memory (all of them are amd64, FWIW). Co-workers have noticed it, too, causing some funny remarks. :-) By the way, "big" servers are not the only machines affected. I have recently built a small HTPC based on an Atom 330 (it supports amd64) with 4 GB RAM. Unfortunately, suspend + resume doesn't work, so I have to shutdown and boot it fully each time I want to use it. Needless to say, I would like to squeeze every second from the boot process. Currently, the time between the transition from bootloader to kernel and the start of init(8) is by far the largest slice of the total boot time. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "What is this talk of 'release'? We do not make software 'releases'. Our software 'escapes', leaving a bloody trail of designers and quality assurance people in its wake." From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 11:52:06 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A151B106564A; Wed, 23 Mar 2011 11:52:06 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 82AD98FC08; Wed, 23 Mar 2011 11:52:05 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA06851; Wed, 23 Mar 2011 13:51:23 +0200 (EET) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Q2Maw-000EOc-P6; Wed, 23 Mar 2011 13:51:22 +0200 Message-ID: <4D89DEB9.7060509@freebsd.org> Date: Wed, 23 Mar 2011 13:51:21 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110308 Lightning/1.0b2 Thunderbird/3.1.9 MIME-Version: 1.0 To: Matthew Jacob References: <201103221551.14289.jhb@freebsd.org> <4D88FE89.1060900@feral.com> In-Reply-To: <4D88FE89.1060900@feral.com> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: "Bjoern A. Zeeb" , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 11:52:06 -0000 on 22/03/2011 21:54 Matthew Jacob said the following: > John Baldwin wrote: >> >> Do other platforms bother with these sorts of memory tests? If not I'd vote >> to just drop it. I think this mattered more when you didn't have things like >> SMAP (so you had to guess at where memory ended sometimes). Also, modern >> server class x86 machines generally support ECC RAM which will trigger a >> machine check if there is a problem. I doubt that the early checks are >> catching anything even for the non-ECC case. >> >> If nothing else, I would definitely drop this from amd64 (all those systems >> have SMAP and machine check support, etc.). >> >> > Memory checks are definitely still useful. Loading the linux mem tester has > helped find lots of problems, even on so-called modern machines. I'd voter for > leaving this as an option. I think that you talk about a different kind of memory checking/testing. What we have in FreeBSD looks a lot like what BIOSes use(d) to do on startup. Besides, AFAIR, it doesn't report any results to you. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 11:54:52 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23D0B1065670; Wed, 23 Mar 2011 11:54:52 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 96BB68FC19; Wed, 23 Mar 2011 11:54:51 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 3638346B09; Wed, 23 Mar 2011 07:54:51 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id C347E8A02A; Wed, 23 Mar 2011 07:54:50 -0400 (EDT) From: John Baldwin To: Ed Maste Date: Wed, 23 Mar 2011 07:51:56 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.4-CBSD-20110107; KDE/4.4.5; amd64; ; ) References: <201103221551.14289.jhb@freebsd.org> <20110322224554.GA67925@sandvine.com> In-Reply-To: <20110322224554.GA67925@sandvine.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201103230751.56647.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Wed, 23 Mar 2011 07:54:50 -0400 (EDT) Cc: "Bjoern A. Zeeb" , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 11:54:52 -0000 On Tuesday, March 22, 2011 6:45:54 pm Ed Maste wrote: > On Tue, Mar 22, 2011 at 03:51:13PM -0400, John Baldwin wrote: > > > Do other platforms bother with these sorts of memory tests? If not I'd vote > > to just drop it. I think this mattered more when you didn't have things like > > SMAP (so you had to guess at where memory ended sometimes). Also, modern > > server class x86 machines generally support ECC RAM which will trigger a > > machine check if there is a problem. I doubt that the early checks are > > catching anything even for the non-ECC case. > > In the common case at work we want this off to reduce boot time. The > desire for a tunable though that can add extended memory tests is to be > able to use the FreeBSD startup code as a replacement for memtest86+, > for a couple of reasons: > > - FreeBSD's serial console output is more easily parsed by automated > tools > - Memtest86+ appears to be limited to 64GB of RAM at the moment > - Memtest86+ lacks support for the Tylersburg architecture last I looked The existing memory check is nowhere near the level of what memtest86+ does and relying on that to give you the same testing strength as memtest86+ seems very dubious to me. If you want a real memory tester, I'd be tempted to just write a custom kernel for that, probably still using BIOS routines for I/O similar to the boot loader, etc. You'd also want to install a MC handler before kicking off the test, but you would want to minimize the amount of RAM you used so you could test as much of the RAM as possible. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 15:57:49 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7058E1065676 for ; Wed, 23 Mar 2011 15:57:49 +0000 (UTC) (envelope-from freebsd-arch@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id F0A548FC22 for ; Wed, 23 Mar 2011 15:57:48 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1Q2QCp-0002lx-2a for freebsd-arch@freebsd.org; Wed, 23 Mar 2011 16:42:43 +0100 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 23 Mar 2011 16:42:42 +0100 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 23 Mar 2011 16:42:42 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-arch@freebsd.org From: Ivan Voras Date: Wed, 23 Mar 2011 16:42:31 +0100 Lines: 33 Message-ID: References: <201103221551.14289.jhb@freebsd.org> <4D88FE89.1060900@feral.com> <4D89DEB9.7060509@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.12) Gecko/20101102 Thunderbird/3.1.6 In-Reply-To: <4D89DEB9.7060509@freebsd.org> X-Enigmail-Version: 1.1.2 Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 15:57:49 -0000 On 23/03/2011 12:51, Andriy Gapon wrote: > on 22/03/2011 21:54 Matthew Jacob said the following: >> John Baldwin wrote: >>> >>> Do other platforms bother with these sorts of memory tests? If not I'd vote >>> to just drop it. I think this mattered more when you didn't have things like >>> SMAP (so you had to guess at where memory ended sometimes). Also, modern >>> server class x86 machines generally support ECC RAM which will trigger a >>> machine check if there is a problem. I doubt that the early checks are >>> catching anything even for the non-ECC case. >>> >>> If nothing else, I would definitely drop this from amd64 (all those systems >>> have SMAP and machine check support, etc.). >>> >>> >> Memory checks are definitely still useful. Loading the linux mem tester has >> helped find lots of problems, even on so-called modern machines. I'd voter for >> leaving this as an option. > > I think that you talk about a different kind of memory checking/testing. > What we have in FreeBSD looks a lot like what BIOSes use(d) to do on startup. > Besides, AFAIR, it doesn't report any results to you. I'd say that is the main point. At least once I've thought the machine hung when it was doing this check for a surprisingly long time. I'd vote for *at least* adding a "twirling baton" indicator (every 1 GB or so) that something is going on, on all platforms :) If these tests have any effect at all (how can they fail? has anyone seen them fail?) I'd vote to keep them enabled by default, with a tunable to optionally disable them, as every little bit helps for reliability. If there is no effect at all from the tests, then just remove them. From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 17:14:43 2011 Return-Path: Delivered-To: freebsd-arch@FreeBSD.ORG Received: by hub.freebsd.org (Postfix, from userid 1233) id E8500106571D; Wed, 23 Mar 2011 17:14:43 +0000 (UTC) Date: Wed, 23 Mar 2011 17:14:43 +0000 From: Alexander Best To: Oliver Fromme Message-ID: <20110323171443.GA59972@freebsd.org> References: <201103231029.p2NATtwg090498@lurza.secnetix.de> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <201103231029.p2NATtwg090498@lurza.secnetix.de> Cc: bz@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 17:14:44 -0000 On Wed Mar 23 11, Oliver Fromme wrote: > Bjoern A. Zeeb wrote: > > as part of the i386/pc98/amd64 boot process we are doing some basic > > memory testing, mapping pages and running a couple of pattern > > write/read tests on the first bytes (see getmemsize() implmentations). > > [...] > > With the growing number of memory this can lead to a significant > > fraction of kernel startup time on amd64 (~40s delays observed with > > 96G of RAM). Looping over the pages, but not mapping them and not > > running the pattern tests reduces this significantly (to single digit > > numbers of seconds). > > [...] > > Not wanting to remove them but maybe make more use of them in the > > future (as we do not report any problems we find currently) I'd suggest > > to introduce a tunable to disable/enable them, say > > > > hw.run_memtest > > +1 for introducing a tunable. > > I have also noticed the boot delay on server machines with > lots of memory (all of them are amd64, FWIW). Co-workers > have noticed it, too, causing some funny remarks. :-) or how about we dump the current memory checks, introduce a tunable and implement some *real* memory checks. as john pointed out the current checks are just rudimentary. > > By the way, "big" servers are not the only machines affected. > I have recently built a small HTPC based on an Atom 330 > (it supports amd64) with 4 GB RAM. Unfortunately, suspend + > resume doesn't work, so I have to shutdown and boot it fully > each time I want to use it. Needless to say, I would like > to squeeze every second from the boot process. Currently, > the time between the transition from bootloader to kernel > and the start of init(8) is by far the largest slice of the > total boot time. just as a side note: booting a kernel directly from boot stage 2 is broken on amd64. :( so there's no way around using the boot loader, which will cost extra time (even with autoboot_delay=0). so was the adbility to boot a kernel directly from boot2 abandoned? i heard reports it still works under i386. dunno about the other archs. cheers. alex > > Best regards > Oliver > > -- > Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. > Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: > secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- > chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart > > FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd > > "What is this talk of 'release'? We do not make software 'releases'. > Our software 'escapes', leaving a bloody trail of designers and quality > assurance people in its wake." -- a13x From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 17:50:46 2011 Return-Path: Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 04EA4106564A; Wed, 23 Mar 2011 17:50:46 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id 734178FC0C; Wed, 23 Mar 2011 17:50:45 +0000 (UTC) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id p2NHoSk2009827; Wed, 23 Mar 2011 18:50:43 +0100 (CET) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id p2NHoSF6009826; Wed, 23 Mar 2011 18:50:28 +0100 (CET) (envelope-from olli) Date: Wed, 23 Mar 2011 18:50:28 +0100 (CET) Message-Id: <201103231750.p2NHoSF6009826@lurza.secnetix.de> From: Oliver Fromme To: freebsd-arch@FreeBSD.ORG, arundel@FreeBSD.ORG, bz@FreeBSD.ORG In-Reply-To: <20110323171443.GA59972@freebsd.org> X-Newsgroups: list.freebsd-arch User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.4-PRERELEASE-20080904 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5 (lurza.secnetix.de [127.0.0.1]); Wed, 23 Mar 2011 18:50:44 +0100 (CET) Cc: Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 17:50:46 -0000 Alexander Best wrote: > just as a side note: booting a kernel directly from boot stage 2 is broken on > amd64. :( so there's no way around using the boot loader, which will cost extra > time (even with autoboot_delay=0). > > so was the adbility to boot a kernel directly from boot2 abandoned? i heard > reports it still works under i386. dunno about the other archs. The loader checks the type of the kernel binary (i386 vs amd64). In case of amd64, it enables "long mode", among other things. This is required for amd64, because the kernel expects to be started in long mode. boot2 doesn't do that, so you can't start an amd64 kernel directly from boot2. In theory you could write a specialized amd64 variant of boot2 that prepares the system for starting an amd64 kernel directly. Whether there's enough space for that in boot2 (8 KB) and whether it's worth the effort, I don't know. To be honest, I don't think that loader takes so much time. When you set autoboot_delay="-1" and beastie_disable="YES", the time spent in loader is negligible. (I'm assuming that you also set BOOTWAIT=0 in make.conf, so boot2 doesn't wait for a keypress either. I think the default is to wait 3 seconds.) Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "[...] one observation we can make here is that Python makes an excellent pseudocoding language, with the wonderful attribute that it can actually be executed." -- Bruce Eckel From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 18:26:29 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2263A1065670; Wed, 23 Mar 2011 18:26:29 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id EABFE8FC1F; Wed, 23 Mar 2011 18:26:28 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id A1FC846B03; Wed, 23 Mar 2011 14:26:28 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 3D77D8A01B; Wed, 23 Mar 2011 14:26:28 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Date: Wed, 23 Mar 2011 14:26:27 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.4-CBSD-20110107; KDE/4.4.5; amd64; ; ) References: <201103231029.p2NATtwg090498@lurza.secnetix.de> <20110323171443.GA59972@freebsd.org> In-Reply-To: <20110323171443.GA59972@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201103231426.27750.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Wed, 23 Mar 2011 14:26:28 -0400 (EDT) Cc: Alexander Best , bz@freebsd.org, Oliver Fromme Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 18:26:29 -0000 On Wednesday, March 23, 2011 1:14:43 pm Alexander Best wrote: > On Wed Mar 23 11, Oliver Fromme wrote: > > Bjoern A. Zeeb wrote: > > > as part of the i386/pc98/amd64 boot process we are doing some basic > > > memory testing, mapping pages and running a couple of pattern > > > write/read tests on the first bytes (see getmemsize() implmentations). > > > [...] > > > With the growing number of memory this can lead to a significant > > > fraction of kernel startup time on amd64 (~40s delays observed with > > > 96G of RAM). Looping over the pages, but not mapping them and not > > > running the pattern tests reduces this significantly (to single digit > > > numbers of seconds). > > > [...] > > > Not wanting to remove them but maybe make more use of them in the > > > future (as we do not report any problems we find currently) I'd suggest > > > to introduce a tunable to disable/enable them, say > > > > > > hw.run_memtest > > > > +1 for introducing a tunable. > > > > I have also noticed the boot delay on server machines with > > lots of memory (all of them are amd64, FWIW). Co-workers > > have noticed it, too, causing some funny remarks. :-) > > or how about we dump the current memory checks, introduce a tunable and > implement some *real* memory checks. as john pointed out the current checks > are just rudimentary. I think that doing *real* memory checks isn't really the role of our kernel. Better effort would be spent on improving memtest86 since it is already trying to solve this problem. Something that would be nice would be a way to invoke memtest86 from the loader. Assuming you could pass arguments (such as a time limit) to the memtest "kernel", then you could install memtest to /boot/memtest and do something like 'nextboot -k memtest -o "-t 120"' to run memtest for 2 hours on the next boot then reboot back into the stock OS after it finishes, etc. There are several tricky things you need to get right if you want to do *real* memory tests that are a bit harder to do if you have a full blow kernel, such as relocating yourself into already-checked pages at some point so you can check all of the pages in the system, disabling caching for all pages except your kernel so you test the actual RAM rather than your caches, etc. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 18:49:03 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3D0D4106566C; Wed, 23 Mar 2011 18:49:03 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id DB40C8FC19; Wed, 23 Mar 2011 18:49:01 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA13083; Wed, 23 Mar 2011 20:48:59 +0200 (EET) (envelope-from avg@freebsd.org) Message-ID: <4D8A409B.6090801@freebsd.org> Date: Wed, 23 Mar 2011 20:48:59 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110309 Lightning/1.0b2 Thunderbird/3.1.9 MIME-Version: 1.0 To: John Baldwin References: <201103231029.p2NATtwg090498@lurza.secnetix.de> <20110323171443.GA59972@freebsd.org> <201103231426.27750.jhb@freebsd.org> In-Reply-To: <201103231426.27750.jhb@freebsd.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: Alexander Best , bz@freebsd.org, Oliver Fromme , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 18:49:03 -0000 on 23/03/2011 20:26 John Baldwin said the following: > I think that doing *real* memory checks isn't really the role of our kernel. > Better effort would be spent on improving memtest86 since it is already trying > to solve this problem. Something that would be nice would be a way to invoke > memtest86 from the loader. Just a note that with sysutils/memtest86+ port you can already do that. But of course the utility is not headless and lacks the advanced functionality that you describe below. > Assuming you could pass arguments (such as a time > limit) to the memtest "kernel", then you could install memtest to > /boot/memtest and do something like 'nextboot -k memtest -o "-t 120"' to run > memtest for 2 hours on the next boot then reboot back into the stock OS after > it finishes, etc. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 19:47:25 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 897AB1065677; Wed, 23 Mar 2011 19:47:25 +0000 (UTC) Date: Wed, 23 Mar 2011 19:47:25 +0000 From: Alexander Best To: John Baldwin Message-ID: <20110323194725.GA83672@freebsd.org> References: <201103231029.p2NATtwg090498@lurza.secnetix.de> <20110323171443.GA59972@freebsd.org> <201103231426.27750.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201103231426.27750.jhb@freebsd.org> Cc: bz@freebsd.org, Oliver Fromme , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 19:47:25 -0000 On Wed Mar 23 11, John Baldwin wrote: > On Wednesday, March 23, 2011 1:14:43 pm Alexander Best wrote: > > On Wed Mar 23 11, Oliver Fromme wrote: > > > Bjoern A. Zeeb wrote: > > > > as part of the i386/pc98/amd64 boot process we are doing some basic > > > > memory testing, mapping pages and running a couple of pattern > > > > write/read tests on the first bytes (see getmemsize() implmentations). > > > > [...] > > > > With the growing number of memory this can lead to a significant > > > > fraction of kernel startup time on amd64 (~40s delays observed with > > > > 96G of RAM). Looping over the pages, but not mapping them and not > > > > running the pattern tests reduces this significantly (to single digit > > > > numbers of seconds). > > > > [...] > > > > Not wanting to remove them but maybe make more use of them in the > > > > future (as we do not report any problems we find currently) I'd suggest > > > > to introduce a tunable to disable/enable them, say > > > > > > > > hw.run_memtest > > > > > > +1 for introducing a tunable. > > > > > > I have also noticed the boot delay on server machines with > > > lots of memory (all of them are amd64, FWIW). Co-workers > > > have noticed it, too, causing some funny remarks. :-) > > > > or how about we dump the current memory checks, introduce a tunable and > > implement some *real* memory checks. as john pointed out the current checks > > are just rudimentary. > > I think that doing *real* memory checks isn't really the role of our kernel. > Better effort would be spent on improving memtest86 since it is already trying > to solve this problem. Something that would be nice would be a way to invoke > memtest86 from the loader. Assuming you could pass arguments (such as a time > limit) to the memtest "kernel", then you could install memtest to > /boot/memtest and do something like 'nextboot -k memtest -o "-t 120"' to run > memtest for 2 hours on the next boot then reboot back into the stock OS after > it finishes, etc. > > There are several tricky things you need to get right if you want to do *real* > memory tests that are a bit harder to do if you have a full blow kernel, such > as relocating yourself into already-checked pages at some point so you can > check all of the pages in the system, disabling caching for all pages except > your kernel so you test the actual RAM rather than your caches, etc. in the past i've used a memtest utility (forgot its name though), which was being written to video RAM and then executed from there. this has the big advantage of being able to test the entire RAM in one go without having to relocate anything. cheers. alex > > -- > John Baldwin -- a13x From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 19:52:42 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9DAA3106566B; Wed, 23 Mar 2011 19:52:42 +0000 (UTC) (envelope-from peter@wemm.org) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 0F9878FC18; Wed, 23 Mar 2011 19:52:41 +0000 (UTC) Received: by vxc34 with SMTP id 34so7418337vxc.13 for ; Wed, 23 Mar 2011 12:52:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.71.97 with SMTP id t1mr1935281vdu.246.1300908517759; Wed, 23 Mar 2011 12:28:37 -0700 (PDT) Received: by 10.52.163.105 with HTTP; Wed, 23 Mar 2011 12:28:37 -0700 (PDT) In-Reply-To: <201103231426.27750.jhb@freebsd.org> References: <201103231029.p2NATtwg090498@lurza.secnetix.de> <20110323171443.GA59972@freebsd.org> <201103231426.27750.jhb@freebsd.org> Date: Wed, 23 Mar 2011 12:28:37 -0700 Message-ID: From: Peter Wemm To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Alexander Best , bz@freebsd.org, Oliver Fromme , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 19:52:42 -0000 On Wed, Mar 23, 2011 at 11:26 AM, John Baldwin wrote: > On Wednesday, March 23, 2011 1:14:43 pm Alexander Best wrote: >> On Wed Mar 23 11, Oliver Fromme wrote: >> > Bjoern A. Zeeb wrote: >> > =A0> as part of the i386/pc98/amd64 boot process we are doing some bas= ic >> > =A0> memory testing, mapping pages and running a couple of pattern >> > =A0> write/read tests on the first bytes (see getmemsize() implmentati= ons). >> > =A0> [...] >> > =A0> With the growing number of memory this can lead to a significant >> > =A0> fraction of kernel startup time on amd64 (~40s delays observed wi= th >> > =A0> 96G of RAM). =A0Looping over the pages, but not mapping them and = not >> > =A0> running the pattern tests reduces this significantly (to single d= igit >> > =A0> numbers of seconds). >> > =A0> [...] >> > =A0> Not wanting to remove them but maybe make more use of them in the >> > =A0> future (as we do not report any problems we find currently) I'd s= uggest >> > =A0> to introduce a tunable to disable/enable them, say >> > =A0> >> > =A0> =A0 =A0 =A0 =A0 hw.run_memtest >> > >> > +1 for introducing a tunable. >> > >> > I have also noticed the boot delay on server machines with >> > lots of memory (all of them are amd64, FWIW). =A0Co-workers >> > have noticed it, too, causing some funny remarks. =A0:-) >> >> or how about we dump the current memory checks, introduce a tunable and >> implement some *real* memory checks. as john pointed out the current che= cks >> are just rudimentary. > > I think that doing *real* memory checks isn't really the role of our kern= el. > Better effort would be spent on improving memtest86 since it is already t= rying > to solve this problem. Part of the reason for this "check" is a sanity check to make sure we enumerated memory correctly and that we have at least got basic ram functionality. The existence of hw.physmem complicates this. On machines where hw.physmem could be used to tell the kernel that there was more ram present than the kernel enumerates (old bioses etc), this was kind of important to sanity check. Even though modern hardware will fail windows compliance tests if the SMAP etc is wrong, never underestimate the ability of bios makers to find new and bizarre ways of screwing things up. I'd kinda like to keep a basic "is this real, non mirrored ram?" test there. eg: the 2-pass step of writing physical address into each page and then checking that they are still there on the second pass. Oh, did I mention the machine where the ACPI bios info tells the OS that the current state is S3 (suspended to ram) instead of S0? When the kernel blows up at boot without a message.. we get the blame, not the bios maker. --=20 Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 20:02:00 2011 Return-Path: Delivered-To: freebsd-arch@FreeBSD.ORG Received: by hub.freebsd.org (Postfix, from userid 1233) id 1EEF3106566C; Wed, 23 Mar 2011 20:02:00 +0000 (UTC) Date: Wed, 23 Mar 2011 20:02:00 +0000 From: Alexander Best To: Oliver Fromme Message-ID: <20110323200200.GA85810@freebsd.org> References: <20110323171443.GA59972@freebsd.org> <201103231750.p2NHoSF6009826@lurza.secnetix.de> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <201103231750.p2NHoSF6009826@lurza.secnetix.de> Cc: bz@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 20:02:00 -0000 On Wed Mar 23 11, Oliver Fromme wrote: > Alexander Best wrote: > > just as a side note: booting a kernel directly from boot stage 2 is broken on > > amd64. :( so there's no way around using the boot loader, which will cost extra > > time (even with autoboot_delay=0). > > > > so was the adbility to boot a kernel directly from boot2 abandoned? i heard > > reports it still works under i386. dunno about the other archs. > > The loader checks the type of the kernel binary (i386 vs > amd64). In case of amd64, it enables "long mode", among > other things. This is required for amd64, because the > kernel expects to be started in long mode. boot2 doesn't > do that, so you can't start an amd64 kernel directly from > boot2. hmm...i can't seem to find the location in loader/main.c. could you pinpoint me to the exact location? this is something i'd *really* like to try in amd64. cheers. alex > > In theory you could write a specialized amd64 variant of > boot2 that prepares the system for starting an amd64 kernel > directly. Whether there's enough space for that in boot2 > (8 KB) and whether it's worth the effort, I don't know. > > To be honest, I don't think that loader takes so much time. > When you set autoboot_delay="-1" and beastie_disable="YES", > the time spent in loader is negligible. (I'm assuming that > you also set BOOTWAIT=0 in make.conf, so boot2 doesn't wait > for a keypress either. I think the default is to wait 3 > seconds.) > > Best regards > Oliver > > > -- > Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. > Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: > secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- > chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart > > FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd > > "[...] one observation we can make here is that Python makes > an excellent pseudocoding language, with the wonderful attribute > that it can actually be executed." -- Bruce Eckel -- a13x From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 20:51:14 2011 Return-Path: Delivered-To: freebsd-arch@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7868E106564A; Wed, 23 Mar 2011 20:51:14 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id F0A538FC12; Wed, 23 Mar 2011 20:51:13 +0000 (UTC) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id p2NKovvH017465; Wed, 23 Mar 2011 21:51:12 +0100 (CET) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id p2NKov4g017463; Wed, 23 Mar 2011 21:50:57 +0100 (CET) (envelope-from olli) From: Oliver Fromme Message-Id: <201103232050.p2NKov4g017463@lurza.secnetix.de> To: arundel@FreeBSD.ORG (Alexander Best) Date: Wed, 23 Mar 2011 21:50:57 +0100 (CET) In-Reply-To: <20110323200200.GA85810@freebsd.org> X-Mailer: ELM [version 2.5 PL8] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5 (lurza.secnetix.de [127.0.0.1]); Wed, 23 Mar 2011 21:51:13 +0100 (CET) Cc: bz@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 20:51:14 -0000 Alexander Best wrote: > On Wed Mar 23 11, Oliver Fromme wrote: > > Alexander Best wrote: > > > just as a side note: booting a kernel directly from boot stage 2 is broken on > > > amd64. :( so there's no way around using the boot loader, which will cost extra > > > time (even with autoboot_delay=0). > > > > > > so was the adbility to boot a kernel directly from boot2 abandoned? i heard > > > reports it still works under i386. dunno about the other archs. > > > > The loader checks the type of the kernel binary (i386 vs > > amd64). In case of amd64, it enables "long mode", among > > other things. This is required for amd64, because the > > kernel expects to be started in long mode. boot2 doesn't > > do that, so you can't start an amd64 kernel directly from > > boot2. > > hmm...i can't seem to find the location in loader/main.c. could you pinpoint > me to the exact location? this is something i'd *really* like to try in amd64. The actual code is in sys/boot/i386/libi386/amd64_tramp.S which is called from sys/boot/i386/libi386/elf64_freebsd.c (see the elf64_exec() function), which in turn is called indirectly via a method of a struct file_format from the boot loader. Beware, I don't know if this is the *only* thing preventing boot2 from booting an amd64 kernel. There might be more. I haven't tried booting FreeBSD without the boot loader in a long time. Probably not in this century. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "I made up the term 'object-oriented', and I can tell you I didn't have C++ in mind." -- Alan Kay, OOPSLA '97 From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 22:46:32 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E82871065675; Wed, 23 Mar 2011 22:46:32 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 934FF8FC0A; Wed, 23 Mar 2011 22:46:31 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id AAA16293; Thu, 24 Mar 2011 00:46:27 +0200 (EET) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Q2Wot-000F7N-6n; Thu, 24 Mar 2011 00:46:27 +0200 Message-ID: <4D8A7841.5080004@freebsd.org> Date: Thu, 24 Mar 2011 00:46:25 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110308 Lightning/1.0b2 Thunderbird/3.1.9 MIME-Version: 1.0 To: Peter Wemm References: <201103231029.p2NATtwg090498@lurza.secnetix.de> <20110323171443.GA59972@freebsd.org> <201103231426.27750.jhb@freebsd.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Alexander Best , bz@freebsd.org, Oliver Fromme , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 22:46:33 -0000 on 23/03/2011 21:28 Peter Wemm said the following: > Part of the reason for this "check" is a sanity check to make sure we > enumerated memory correctly and that we have at least got basic ram > functionality. The existence of hw.physmem complicates this. On > machines where hw.physmem could be used to tell the kernel that there > was more ram present than the kernel enumerates (old bioses etc), this > was kind of important to sanity check. > > Even though modern hardware will fail windows compliance tests if the > SMAP etc is wrong, never underestimate the ability of bios makers to > find new and bizarre ways of screwing things up. > > I'd kinda like to keep a basic "is this real, non mirrored ram?" test > there. eg: the 2-pass step of writing physical address into each page > and then checking that they are still there on the second pass. > > Oh, did I mention the machine where the ACPI bios info tells the OS > that the current state is S3 (suspended to ram) instead of S0? > > When the kernel blows up at boot without a message.. we get the blame, > not the bios maker. I hear what you are saying, but is there any other OS that takes this level of responsibility? Should we either? I mean, hardware and BIOS vendors can screw up things in very creative ways and it's impossible to protect against that. When we are bug-compatible with some other OS, then it's one thing; but when we try to to be even "better" than that, that's quite another thing. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 22:51:43 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9775A1065677; Wed, 23 Mar 2011 22:51:43 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id A98D28FC0A; Wed, 23 Mar 2011 22:51:42 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id AAA16364; Thu, 24 Mar 2011 00:51:35 +0200 (EET) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Q2Wtr-000F7f-1s; Thu, 24 Mar 2011 00:51:35 +0200 Message-ID: <4D8A7976.5090103@freebsd.org> Date: Thu, 24 Mar 2011 00:51:34 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110308 Lightning/1.0b2 Thunderbird/3.1.9 MIME-Version: 1.0 To: Oliver Fromme References: <20110323200200.GA85810@freebsd.org> <201103232050.p2NKov4g017463@lurza.secnetix.de> In-Reply-To: <201103232050.p2NKov4g017463@lurza.secnetix.de> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Alexander Best , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 22:51:43 -0000 on 23/03/2011 22:50 Oliver Fromme said the following: > Beware, I don't know if this is the *only* thing preventing > boot2 from booting an amd64 kernel. There might be more. > I haven't tried booting FreeBSD without the boot loader in > a long time. Probably not in this century. Kind of hijacking the thread - while we are gradually moving from mbr+bsdlabel to gpt and more, we are also moving from away from size-constrained boot2. My vision is that boot2 and loader should fuse into something more powerful that would reside in a boot partition, but with its config files on a "regular" filesystem. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 23:22:09 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 46B42106566C; Wed, 23 Mar 2011 23:22:09 +0000 (UTC) Date: Wed, 23 Mar 2011 23:22:09 +0000 From: Alexander Best To: Andriy Gapon Message-ID: <20110323232209.GA15486@freebsd.org> References: <20110323200200.GA85810@freebsd.org> <201103232050.p2NKov4g017463@lurza.secnetix.de> <4D8A7976.5090103@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4D8A7976.5090103@freebsd.org> Cc: Oliver Fromme , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 23:22:09 -0000 On Thu Mar 24 11, Andriy Gapon wrote: > on 23/03/2011 22:50 Oliver Fromme said the following: > > Beware, I don't know if this is the *only* thing preventing > > boot2 from booting an amd64 kernel. There might be more. > > I haven't tried booting FreeBSD without the boot loader in > > a long time. Probably not in this century. > > Kind of hijacking the thread - while we are gradually moving from mbr+bsdlabel > to gpt and more, we are also moving from away from size-constrained boot2. > My vision is that boot2 and loader should fuse into something more powerful that > would reside in a boot partition, but with its config files on a "regular" > filesystem. +1. being able to control the whole boot process by /etc/boot.conf would be great. there are defenately too many files in /boot. new users have no clue what boot, boot0, boot1, boot2, cdboot, gptboot, etc. are all about. merging /boot/loader, /boot/gptboot and /boot/zfsgptboot would be really nice. building/installing all the mbr+bsdlabel boot files could be made dependable upon some variable (WITH_MBR_BOOT=), which could be disabled by default. also moving away from forth to something more modern might be thinkable. having support for a modern scripting language might make it easier to come up with a nice graphical bootloader e.g. there're probably more advantages. problem might be that the current gpart(8) manual recommends using 64k for the boot partition. that might not be enough, if ada0p1 should contain the functionality of all boot stages (including the loader), support for a modern scripting language and support for graphical menues (including a rudimentary vga/vesa driver for high res). cheers. alex > > -- > Andriy Gapon -- a13x From owner-freebsd-arch@FreeBSD.ORG Wed Mar 23 23:29:26 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 6F8C91065672; Wed, 23 Mar 2011 23:29:26 +0000 (UTC) Date: Wed, 23 Mar 2011 23:29:26 +0000 From: Alexander Best To: Andriy Gapon Message-ID: <20110323232926.GA17502@freebsd.org> References: <201103231029.p2NATtwg090498@lurza.secnetix.de> <20110323171443.GA59972@freebsd.org> <201103231426.27750.jhb@freebsd.org> <4D8A7841.5080004@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4D8A7841.5080004@freebsd.org> Cc: bz@freebsd.org, Oliver Fromme , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Mar 2011 23:29:26 -0000 On Thu Mar 24 11, Andriy Gapon wrote: > on 23/03/2011 21:28 Peter Wemm said the following: > > Part of the reason for this "check" is a sanity check to make sure we > > enumerated memory correctly and that we have at least got basic ram > > functionality. The existence of hw.physmem complicates this. On > > machines where hw.physmem could be used to tell the kernel that there > > was more ram present than the kernel enumerates (old bioses etc), this > > was kind of important to sanity check. > > > > Even though modern hardware will fail windows compliance tests if the > > SMAP etc is wrong, never underestimate the ability of bios makers to > > find new and bizarre ways of screwing things up. > > > > I'd kinda like to keep a basic "is this real, non mirrored ram?" test > > there. eg: the 2-pass step of writing physical address into each page > > and then checking that they are still there on the second pass. > > > > Oh, did I mention the machine where the ACPI bios info tells the OS > > that the current state is S3 (suspended to ram) instead of S0? > > > > When the kernel blows up at boot without a message.. we get the blame, > > not the bios maker. > > I hear what you are saying, but is there any other OS that takes this level of > responsibility? Should we either? > I mean, hardware and BIOS vendors can screw up things in very creative ways and > it's impossible to protect against that. When we are bug-compatible with some > other OS, then it's one thing; but when we try to to be even "better" than that, > that's quite another thing. to be honest, i suspect 99.99999999...% of RAM issues users are experiencing are issues not being detected by the current ram checks in place. these included defects or wrong bios setting (overclocking, etc.). that's why i think they are unnecessary. i kind of like how a few linux distros come up with a boot menue which lets you run memtest. if you have any suspicion that your RAM is causing issues: run memtest! ...after all: if users have the feeling the harddrive is causing problems: run smartctl! nobody expects the OS to check whether the harddrives are ok. that's what smartd is designed for. > > -- > Andriy Gapon -- a13x From owner-freebsd-arch@FreeBSD.ORG Thu Mar 24 07:57:43 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 547AC106566B; Thu, 24 Mar 2011 07:57:43 +0000 (UTC) (envelope-from gljennjohn@googlemail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id B3CF88FC16; Thu, 24 Mar 2011 07:57:42 +0000 (UTC) Received: by wyf23 with SMTP id 23so9409661wyf.13 for ; Thu, 24 Mar 2011 00:57:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:date:from:to:cc:subject:message-id:in-reply-to :references:reply-to:x-mailer:mime-version:content-type :content-transfer-encoding; bh=223joyrrZMz107g2FkKfQHwyAKE0E/b5ByZRIkJJx20=; b=vXOfsNc9YQPONhK/sJIh+6TXKFpmRu1AtXO1Sc1jN9Ly3EaCbb9nLUXdGznUPCPHU7 2qKTtzfxu2O+JHno2jFf8sxsQR7DX5foye5qiRwA1C/3sgeVPWYyLVGJ9TlUGexbkbMW YWQLDY7jp3wcmWbxvADU885TU6w6Ek6svGDd0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=date:from:to:cc:subject:message-id:in-reply-to:references:reply-to :x-mailer:mime-version:content-type:content-transfer-encoding; b=mTBNKh/5a5CZxSFCKyaLbNGqdv6CyjkXdRQnWUjbUsyhh//QJNrXZOD/+TFfDAWoVz qIZ+KjC/97ASKQ33r/GPzj20hijjwdoQKSMMf3p6q/u0tByR7eYfmnMhIsvsSVXWkF3h SRH1pRe5jvbz80clvHuc+qGj/Sz/piL2CknE8= Received: by 10.227.147.198 with SMTP id m6mr7377611wbv.78.1300951851543; Thu, 24 Mar 2011 00:30:51 -0700 (PDT) Received: from ernst.jennejohn.org (p578E31D4.dip.t-dialin.net [87.142.49.212]) by mx.google.com with ESMTPS id u9sm5370179wbg.0.2011.03.24.00.30.50 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 24 Mar 2011 00:30:51 -0700 (PDT) Date: Thu, 24 Mar 2011 08:30:48 +0100 From: Gary Jennejohn To: Alexander Best Message-ID: <20110324083048.60862a0f@ernst.jennejohn.org> In-Reply-To: <20110323232209.GA15486@freebsd.org> References: <20110323200200.GA85810@freebsd.org> <201103232050.p2NKov4g017463@lurza.secnetix.de> <4D8A7976.5090103@freebsd.org> <20110323232209.GA15486@freebsd.org> X-Mailer: Claws Mail 3.7.8 (GTK+ 2.18.7; amd64-portbld-freebsd9.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Oliver Fromme , Andriy Gapon , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: gljennjohn@googlemail.com List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Mar 2011 07:57:43 -0000 On Wed, 23 Mar 2011 23:22:09 +0000 Alexander Best wrote: > On Thu Mar 24 11, Andriy Gapon wrote: > > on 23/03/2011 22:50 Oliver Fromme said the following: > > > Beware, I don't know if this is the *only* thing preventing > > > boot2 from booting an amd64 kernel. There might be more. > > > I haven't tried booting FreeBSD without the boot loader in > > > a long time. Probably not in this century. > > > > Kind of hijacking the thread - while we are gradually moving from mbr+bsdlabel > > to gpt and more, we are also moving from away from size-constrained boot2. > > My vision is that boot2 and loader should fuse into something more powerful that > > would reside in a boot partition, but with its config files on a "regular" > > filesystem. > > +1. being able to control the whole boot process by /etc/boot.conf would be > great. there are defenately too many files in /boot. new users have no clue > what boot, boot0, boot1, boot2, cdboot, gptboot, etc. are all about. > > merging /boot/loader, /boot/gptboot and /boot/zfsgptboot would be really nice. > building/installing all the mbr+bsdlabel boot files could be made dependable > upon some variable (WITH_MBR_BOOT=), which could be disabled by default. > > also moving away from forth to something more modern might be thinkable. having > support for a modern scripting language might make it easier to come up with a > nice graphical bootloader e.g. there're probably more advantages. > > problem might be that the current gpart(8) manual recommends using 64k for the > boot partition. that might not be enough, if ada0p1 should contain the > functionality of all boot stages (including the loader), support for a modern > scripting language and support for graphical menues (including a rudimentary > vga/vesa driver for high res). > I have no problem with any of these ideas as long as well-documented knobs are provided to turn these "features" off and maintain "legacy" functionality. I'm running off disks where FreeBSD was installed 8 years ago and I have no intention of using gpart(8) on any of them. The same goes for graphical bootloaders - I can't stand them. -- Gary Jennejohn From owner-freebsd-arch@FreeBSD.ORG Thu Mar 24 10:42:01 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A4D11065673; Thu, 24 Mar 2011 10:42:01 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au [211.29.132.190]) by mx1.freebsd.org (Postfix) with ESMTP id 114018FC1E; Thu, 24 Mar 2011 10:42:00 +0000 (UTC) Received: from c122-107-125-80.carlnfd1.nsw.optusnet.com.au (c122-107-125-80.carlnfd1.nsw.optusnet.com.au [122.107.125.80]) by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p2OAftew002912 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 24 Mar 2011 21:41:57 +1100 Date: Thu, 24 Mar 2011 21:41:55 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Peter Wemm In-Reply-To: Message-ID: <20110324210338.S1056@besplex.bde.org> References: <201103231029.p2NATtwg090498@lurza.secnetix.de> <20110323171443.GA59972@freebsd.org> <201103231426.27750.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Alexander Best , bz@freebsd.org, Oliver Fromme , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Mar 2011 10:42:01 -0000 On Wed, 23 Mar 2011, Peter Wemm wrote: > On Wed, Mar 23, 2011 at 11:26 AM, John Baldwin wrote: >> On Wednesday, March 23, 2011 1:14:43 pm Alexander Best wrote: >>> >>> or how about we dump the current memory checks, introduce a tunable and >>> implement some *real* memory checks. as john pointed out the current checks >>> are just rudimentary. >> >> I think that doing *real* memory checks isn't really the role of our kernel. >> Better effort would be spent on improving memtest86 since it is already trying >> to solve this problem. I agree. > Part of the reason for this "check" is a sanity check to make sure we > enumerated memory correctly and that we have at least got basic ram > functionality. The existence of hw.physmem complicates this. On > machines where hw.physmem could be used to tell the kernel that there > was more ram present than the kernel enumerates (old bioses etc), this > was kind of important to sanity check. It seems to check just 1 word per page. I think that's all it ever did. So it is nothing like a memory test, but is a probe for the memory size. > I'd kinda like to keep a basic "is this real, non mirrored ram?" test > there. eg: the 2-pass step of writing physical address into each page > and then checking that they are still there on the second pass. It's not a very sophisticated probe, but it does do this mirror check. Or does it? I can only see 1 pass, with writes of 0xaaaaaaa, 0x55555555, 0xffffffff, 0 and the original value to the single word tested. The fact that this takes more than a few microseconds shows that memory sizes are now _very_ large. Perhaps the 4 test writes and overhead for every page can be reduced. The overhead includes a page table write and an invtlb() for evey page the 4 test writes probably really do take only a few microseconds for all of memory, but the invtlb() takes much longer. It could at least be an invlpg() on all systems that can have much memory. But if there is more virtual address space then memory (as on amd64?), the probe can simply map all of memory and use a single invtlb(). Then each set of memory accesses for each page should take about the same time as a single access (for a cache miss). Say 100 nsec per page. With 128 GB, that is 3.56 seconds. Still a bit too much, and a 2-pass mirror test would double that by giving 2 cache misses per page. Bruce From owner-freebsd-arch@FreeBSD.ORG Thu Mar 24 11:26:09 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 036F91065670; Thu, 24 Mar 2011 11:26:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id 924C08FC17; Thu, 24 Mar 2011 11:26:08 +0000 (UTC) Received: from c122-107-125-80.carlnfd1.nsw.optusnet.com.au (c122-107-125-80.carlnfd1.nsw.optusnet.com.au [122.107.125.80]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p2OBQ4ow025781 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 24 Mar 2011 22:26:05 +1100 Date: Thu, 24 Mar 2011 22:26:03 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Gary Jennejohn In-Reply-To: <20110324083048.60862a0f@ernst.jennejohn.org> Message-ID: <20110324214320.K1149@besplex.bde.org> References: <20110323200200.GA85810@freebsd.org> <201103232050.p2NKov4g017463@lurza.secnetix.de> <4D8A7976.5090103@freebsd.org> <20110323232209.GA15486@freebsd.org> <20110324083048.60862a0f@ernst.jennejohn.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Alexander Best , Oliver Fromme , Andriy Gapon , freebsd-arch@freebsd.org Subject: Re: kernel memory checks on boot vs. boot time X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Mar 2011 11:26:09 -0000 On Thu, 24 Mar 2011, Gary Jennejohn wrote: > On Wed, 23 Mar 2011 23:22:09 +0000 > Alexander Best wrote: > >> On Thu Mar 24 11, Andriy Gapon wrote: >>> on 23/03/2011 22:50 Oliver Fromme said the following: >>>> Beware, I don't know if this is the *only* thing preventing >>>> boot2 from booting an amd64 kernel. There might be more. ["This" is setting up long mode.] I think amd64 also depends on loader to set up lots of page table, ELF and module stuff. Compare the size of amd64.locore.S with the size of i386/locore.s. I'm not sure how much of the simplification is automatic from using long mode. >>>> I haven't tried booting FreeBSD without the boot loader in >>>> a long time. Probably not in this century. Since I consider the existence of loader to be a bug, I try not to use it, and mostly use my version of biosboot (= biosboot with elf support copied from boot2) on i386. >>> Kind of hijacking the thread - while we are gradually moving from mbr+bsdlabel >>> to gpt and more, we are also moving from away from size-constrained boot2. This has been possible since ~1980 using disklabel. Either put the boot2 stage on a separate partition, or fix assumptions that the space reserved for the boot area (d_bbsize) is always 8K. >>> My vision is that boot2 and loader should fuse into something more powerful that >>> would reside in a boot partition, but with its config files on a "regular" >>> filesystem. My vision was that this would be named "/kernel". There might still be a boot2 stage, but it would have as many full kernel features as you want, to prepare for the full kernel, or perhaps for a reduced kernel. Unfortunately, in FreeBSD, instead of merging the second stage into the kernel, the second stage was split into boot2 and a loader stage, with enormous bloat in the loader stage. Yet the bloat isn't large enough to include necessary functionality for handling itself, starting with command line editing and history various search functions. >> +1. being able to control the whole boot process by /etc/boot.conf would be >> great. there are defenately too many files in /boot. new users have no clue >> what boot, boot0, boot1, boot2, cdboot, gptboot, etc. are all about. >> >> merging /boot/loader, /boot/gptboot and /boot/zfsgptboot would be really nice. >> building/installing all the mbr+bsdlabel boot files could be made dependable >> upon some variable (WITH_MBR_BOOT=), which could be disabled by default. >> >> also moving away from forth to something more modern might be thinkable. having >> support for a modern scripting language might make it easier to come up with a >> nice graphical bootloader e.g. there're probably more advantages. >> >> problem might be that the current gpart(8) manual recommends using 64k for the >> boot partition. that might not be enough, if ada0p1 should contain the >> functionality of all boot stages (including the loader), support for a modern >> scripting language and support for graphical menues (including a rudimentary >> vga/vesa driver for high res). > > I have no problem with any of these ideas as long as well-documented > knobs are provided to turn these "features" off and maintain "legacy" > functionality. > > I'm running off disks where FreeBSD was installed 8 years ago and I have > no intention of using gpart(8) on any of them. The same goes for > graphical bootloaders - I can't stand them. Me too. However, I liked userconfig and even the semi-graphical visual userconfig in FreeBSD-[2-4]. These were correctly placed from my point of view (in the kernel), and were optional so not using them was supported. Bruce