From owner-freebsd-hackers@FreeBSD.ORG  Sun Oct  7 10:40:56 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 818B3106566C;
	Sun,  7 Oct 2012 10:40:56 +0000 (UTC)
	(envelope-from hselasky@c2i.net)
Received: from swip.net (mailfe09.c2i.net [212.247.155.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 864688FC08;
	Sun,  7 Oct 2012 10:40:54 +0000 (UTC)
X-T2-Spam-Status: No, hits=-1.0 required=5.0 tests=ALL_TRUSTED
Received: from [176.74.213.204] (account mc467741@c2i.net HELO
	laptop015.hselasky.homeunix.org)
	by mailfe09.swip.net (CommuniGate Pro SMTP 5.4.4)
	with ESMTPA id 154285814; Sun, 07 Oct 2012 12:40:46 +0200
From: Hans Petter Selasky <hselasky@c2i.net>
To: freebsd-usb@freebsd.org,
 hackers@freebsd.org
Date: Sun, 7 Oct 2012 12:42:11 +0200
User-Agent: KMail/1.13.7 (FreeBSD/9.1-PRERELEASE; KDE/4.8.4; amd64; ; )
References: <5070C1E9.5000405@sbcglobal.net>
In-Reply-To: <5070C1E9.5000405@sbcglobal.net>
X-Face: 'mmZ:T{)),Oru^0c+/}w'`gU1$ubmG?lp!=R4Wy\ELYo2)@'UZ24N@d2+AyewRX}mAm; Yp
	|U[@, _z/([?1bCfM{_"B<.J>mICJCHAzzGHI{y7{%JVz%R~yJHIji`y>Y}k1C4TfysrsUI
	-%GU9V5]iUZF&nRn9mJ'?&>O
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201210071242.11932.hselasky@c2i.net>
Cc: bugs@freebsd.org, Jin Guojun <jguojun@sbcglobal.net>, current@freebsd.org
Subject: Re: 9.1-RCs issues
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2012 10:40:56 -0000

On Sunday 07 October 2012 01:42:33 Jin Guojun wrote:
> 1) moused stops functioning on 9.1-RC2. Neither PS2 nor USB mouse can work.
>     9.1-RC1 has no such problem.
> 
> 2) All i386 / amd64 of 9.1-RC1/RC2 have USB read failure -- see dmesg
> output at end of this email.
> ada0 is internal SATA drive for system disk -- s# partitions: /, /tmp,
> /var, /usr
>     s1   -- 6.4-Release
>     s2   -- 8.3-Release
>     s3   -- 9.1-RC2 amd64
>     s4   --  9.1-RC2 i386 --   This slice also contains /home
> da0 is external USB2 drive (300GB) plugged in USB2 port -- mounted on /mnt
> 

Regarding USB, it might be some patches did not reach it for the RC's. Have 
you tried 9-stable, or any 10-current snapshots?

--HPS

From owner-freebsd-hackers@FreeBSD.ORG  Sun Oct  7 12:17:36 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 42CA4106566C
	for <freebsd-hackers@FreeBSD.org>; Sun,  7 Oct 2012 12:17:36 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 930F38FC18
	for <freebsd-hackers@FreeBSD.org>; Sun,  7 Oct 2012 12:17:34 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA14196
	for <freebsd-hackers@freebsd.org>;
	Sun, 07 Oct 2012 15:17:32 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1TKpnY-000873-Gy
	for freebsd-hackers@freebsd.org; Sun, 07 Oct 2012 15:17:32 +0300
Message-ID: <507172DA.2020309@FreeBSD.org>
Date: Sun, 07 Oct 2012 15:17:30 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120913 Thunderbird/15.0.1
MIME-Version: 1.0
To: freebsd-hackers <freebsd-hackers@FreeBSD.org>
X-Enigmail-Version: 1.4.3
Content-Type: text/plain; charset=X-VIET-VPS
Content-Transfer-Encoding: 7bit
Cc: 
Subject: machine/cpu.h in userland
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2012 12:17:36 -0000


I noticed that couple of our userland file include machine/cpu.h for no apparent
good reason.  It looks like at least amd64 cpu.h has no userland serviceable
parts inside.
Maybe its content should be put under _KERNEL ?

Also:

--- a/sbin/adjkerntz/adjkerntz.c
+++ b/sbin/adjkerntz/adjkerntz.c
@@ -51,7 +51,6 @@ __FBSDID("$FreeBSD$");
 #include <syslog.h>
 #include <sys/time.h>
 #include <sys/param.h>
-#include <machine/cpu.h>
 #include <sys/sysctl.h>

 #include "pathnames.h"
diff --git a/usr.bin/w/w.c b/usr.bin/w/w.c
index 8441ce5..9674fc2 100644
--- a/usr.bin/w/w.c
+++ b/usr.bin/w/w.c
@@ -57,7 +57,6 @@ static const char sccsid[] = "@(#)w.c	8.4 (Berkeley) 4/16/94";
 #include <sys/socket.h>
 #include <sys/tty.h>

-#include <machine/cpu.h>
 #include <netinet/in.h>
 #include <arpa/inet.h>
 #include <arpa/nameser.h>

-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Sun Oct  7 12:43:38 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id F27161065670;
	Sun,  7 Oct 2012 12:43:37 +0000 (UTC)
	(envelope-from hselasky@c2i.net)
Received: from swip.net (mailfe02.c2i.net [212.247.154.34])
	by mx1.freebsd.org (Postfix) with ESMTP id 742E38FC12;
	Sun,  7 Oct 2012 12:43:35 +0000 (UTC)
X-T2-Spam-Status: No, hits=-0.2 required=5.0 tests=ALL_TRUSTED,
	BAYES_50
Received: from [176.74.213.204] (account mc467741@c2i.net HELO
	laptop015.hselasky.homeunix.org)
	by mailfe02.swip.net (CommuniGate Pro SMTP 5.4.4)
	with ESMTPA id 329237632; Sun, 07 Oct 2012 14:43:27 +0200
From: Hans Petter Selasky <hselasky@c2i.net>
To: freebsd-hackers@freebsd.org
Date: Sun, 7 Oct 2012 14:44:52 +0200
User-Agent: KMail/1.13.7 (FreeBSD/9.1-PRERELEASE; KDE/4.8.4; amd64; ; )
References: <5070C1E9.5000405@sbcglobal.net>
	<201210071242.11932.hselasky@c2i.net>
In-Reply-To: <201210071242.11932.hselasky@c2i.net>
X-Face: 'mmZ:T{)),Oru^0c+/}w'`gU1$ubmG?lp!=R4Wy\ELYo2)@'UZ24N@d2+AyewRX}mAm; Yp
	|U[@, _z/([?1bCfM{_"B<.J>mICJCHAzzGHI{y7{%JVz%R~yJHIji`y>Y}k1C4TfysrsUI
	-%GU9V5]iUZF&nRn9mJ'?&>O
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201210071444.52845.hselasky@c2i.net>
Cc: bugs@freebsd.org, hackers@freebsd.org, Jin Guojun <jguojun@sbcglobal.net>,
	freebsd-usb@freebsd.org, current@freebsd.org
Subject: Re: 9.1-RCs issues
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2012 12:43:38 -0000

On Sunday 07 October 2012 12:42:11 Hans Petter Selasky wrote:
> On Sunday 07 October 2012 01:42:33 Jin Guojun wrote:
> > 1) moused stops functioning on 9.1-RC2. Neither PS2 nor USB mouse can
> > work.
> > 
> >     9.1-RC1 has no such problem.
> > 
> > 2) All i386 / amd64 of 9.1-RC1/RC2 have USB read failure -- see dmesg
> > output at end of this email.
> > ada0 is internal SATA drive for system disk -- s# partitions: /, /tmp,
> > /var, /usr
> > 
> >     s1   -- 6.4-Release
> >     s2   -- 8.3-Release
> >     s3   -- 9.1-RC2 amd64
> >     s4   --  9.1-RC2 i386 --   This slice also contains /home
> > 
> > da0 is external USB2 drive (300GB) plugged in USB2 port -- mounted on
> > /mnt
> 
> Regarding USB, it might be some patches did not reach it for the RC's. Have
> you tried 9-stable, or any 10-current snapshots?
> 
> --HPS

s/reach/make

--HPS

From owner-freebsd-hackers@FreeBSD.ORG  Sun Oct  7 12:43:38 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id F27161065670;
	Sun,  7 Oct 2012 12:43:37 +0000 (UTC)
	(envelope-from hselasky@c2i.net)
Received: from swip.net (mailfe02.c2i.net [212.247.154.34])
	by mx1.freebsd.org (Postfix) with ESMTP id 742E38FC12;
	Sun,  7 Oct 2012 12:43:35 +0000 (UTC)
X-T2-Spam-Status: No, hits=-0.2 required=5.0 tests=ALL_TRUSTED,
	BAYES_50
Received: from [176.74.213.204] (account mc467741@c2i.net HELO
	laptop015.hselasky.homeunix.org)
	by mailfe02.swip.net (CommuniGate Pro SMTP 5.4.4)
	with ESMTPA id 329237632; Sun, 07 Oct 2012 14:43:27 +0200
From: Hans Petter Selasky <hselasky@c2i.net>
To: freebsd-hackers@freebsd.org
Date: Sun, 7 Oct 2012 14:44:52 +0200
User-Agent: KMail/1.13.7 (FreeBSD/9.1-PRERELEASE; KDE/4.8.4; amd64; ; )
References: <5070C1E9.5000405@sbcglobal.net>
	<201210071242.11932.hselasky@c2i.net>
In-Reply-To: <201210071242.11932.hselasky@c2i.net>
X-Face: 'mmZ:T{)),Oru^0c+/}w'`gU1$ubmG?lp!=R4Wy\ELYo2)@'UZ24N@d2+AyewRX}mAm; Yp
	|U[@, _z/([?1bCfM{_"B<.J>mICJCHAzzGHI{y7{%JVz%R~yJHIji`y>Y}k1C4TfysrsUI
	-%GU9V5]iUZF&nRn9mJ'?&>O
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201210071444.52845.hselasky@c2i.net>
Cc: bugs@freebsd.org, hackers@freebsd.org, Jin Guojun <jguojun@sbcglobal.net>,
	freebsd-usb@freebsd.org, current@freebsd.org
Subject: Re: 9.1-RCs issues
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2012 12:43:38 -0000

On Sunday 07 October 2012 12:42:11 Hans Petter Selasky wrote:
> On Sunday 07 October 2012 01:42:33 Jin Guojun wrote:
> > 1) moused stops functioning on 9.1-RC2. Neither PS2 nor USB mouse can
> > work.
> > 
> >     9.1-RC1 has no such problem.
> > 
> > 2) All i386 / amd64 of 9.1-RC1/RC2 have USB read failure -- see dmesg
> > output at end of this email.
> > ada0 is internal SATA drive for system disk -- s# partitions: /, /tmp,
> > /var, /usr
> > 
> >     s1   -- 6.4-Release
> >     s2   -- 8.3-Release
> >     s3   -- 9.1-RC2 amd64
> >     s4   --  9.1-RC2 i386 --   This slice also contains /home
> > 
> > da0 is external USB2 drive (300GB) plugged in USB2 port -- mounted on
> > /mnt
> 
> Regarding USB, it might be some patches did not reach it for the RC's. Have
> you tried 9-stable, or any 10-current snapshots?
> 
> --HPS

s/reach/make

--HPS

From owner-freebsd-hackers@FreeBSD.ORG  Sun Oct  7 14:11:23 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 75786106564A;
	Sun,  7 Oct 2012 14:11:23 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id CBE7D8FC0A;
	Sun,  7 Oct 2012 14:11:22 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q97EBUU8023980;
	Sun, 7 Oct 2012 17:11:30 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q97EBIuV004831; Sun, 7 Oct 2012 17:11:18 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q97EBIRA004830; 
	Sun, 7 Oct 2012 17:11:18 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Sun, 7 Oct 2012 17:11:18 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Andriy Gapon <avg@freebsd.org>
Message-ID: <20121007141118.GW35915@deviant.kiev.zoral.com.ua>
References: <507172DA.2020309@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="CLy2MrvMHpW9mjhY"
Content-Disposition: inline
In-Reply-To: <507172DA.2020309@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-hackers <freebsd-hackers@freebsd.org>
Subject: Re: machine/cpu.h in userland
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2012 14:11:23 -0000


--CLy2MrvMHpW9mjhY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Oct 07, 2012 at 03:17:30PM +0300, Andriy Gapon wrote:
>=20
> I noticed that couple of our userland file include machine/cpu.h for no a=
pparent
> good reason.  It looks like at least amd64 cpu.h has no userland servicea=
ble
> parts inside.
> Maybe its content should be put under _KERNEL ?
>=20
> Also:
>=20
> --- a/sbin/adjkerntz/adjkerntz.c
> +++ b/sbin/adjkerntz/adjkerntz.c
> @@ -51,7 +51,6 @@ __FBSDID("$FreeBSD$");
>  #include <syslog.h>
>  #include <sys/time.h>
>  #include <sys/param.h>
> -#include <machine/cpu.h>
>  #include <sys/sysctl.h>
>=20
>  #include "pathnames.h"
> diff --git a/usr.bin/w/w.c b/usr.bin/w/w.c
> index 8441ce5..9674fc2 100644
> --- a/usr.bin/w/w.c
> +++ b/usr.bin/w/w.c
> @@ -57,7 +57,6 @@ static const char sccsid[] =3D "@(#)w.c	8.4 (Berkeley) =
4/16/94";
>  #include <sys/socket.h>
>  #include <sys/tty.h>
>=20
> -#include <machine/cpu.h>
>  #include <netinet/in.h>
>  #include <arpa/inet.h>
>  #include <arpa/nameser.h>

Both proposals are obviously fine. Some research with git blame traces
the include presence in the w.c to the 4.4 Lite import.

What I do not understand is why do you spam lists instead of
commmitting the obvious changes ?

--CLy2MrvMHpW9mjhY
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAlBxjYYACgkQC3+MBN1Mb4gbZACgjj5XP4xZJPDVQkz/9t13TBA8
QDsAnR8v1n5IgnKkqOkgzq802v6pwA8m
=e9Kw
-----END PGP SIGNATURE-----

--CLy2MrvMHpW9mjhY--

From owner-freebsd-hackers@FreeBSD.ORG  Sun Oct  7 14:22:10 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id DCD56106570F
	for <freebsd-hackers@freebsd.org>; Sun,  7 Oct 2012 14:22:10 +0000 (UTC)
	(envelope-from lists@eitanadler.com)
Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com
	[209.85.220.54])
	by mx1.freebsd.org (Postfix) with ESMTP id A53368FC14
	for <freebsd-hackers@freebsd.org>; Sun,  7 Oct 2012 14:22:10 +0000 (UTC)
Received: by mail-pa0-f54.google.com with SMTP id bi1so3593847pad.13
	for <freebsd-hackers@freebsd.org>; Sun, 07 Oct 2012 07:22:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=eitanadler.com; s=0xdeadbeef;
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to
	:cc:content-type;
	bh=63fq90DDONBRRVjHb2cXdj9MMe2ka3/CyO6LDH4bRAg=;
	b=sJ5M6Lf/OrFbr6oH0RwLsypPmnz+7rM9GcZbXnFa68aQhr+IYY4l6RWEhdypjq/akT
	juPZsLHA+Tw45fu1BGZuxYoQ1Q16CTbB2HalbueCPQEw9Hsu/81tK3GKrI21Lvqg3LHC
	alEUauzfpLnoLLFj/VWgf+SJnjPMq3Yf6dIhs=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to
	:cc:content-type:x-gm-message-state;
	bh=63fq90DDONBRRVjHb2cXdj9MMe2ka3/CyO6LDH4bRAg=;
	b=EGxHc3PUhOjbOsLYeCrOyk+5oZ9xSlggKMYwITPwRhzo1qKOpOioPW0b7Ck0MGhp51
	bGax1OEXQT0YQd0DNBd5UvxDtgGk/nykcisKu3GTEWs1d4nXvYge4qOIa06e3ABecomw
	XMwwbWg1VujfK4NJ6ZZ/yGPhbFcYSLK6NjhuKJo7lwqqb33MbuHVj3QxKzPzRS+/By1I
	vlRXynOLnMX8j+pr0b7pxyu8Rb4KCGFI2yLnGLQbPURCwNR/LGZiuF9m+pg4MxCREoOV
	ZziApng9lO70PPuM8VHUWYigXGRFR/3ZUVEed3jsNnuUyo7mhBiXFDswbKkgOc9DBMcH
	SiIg==
Received: by 10.66.86.2 with SMTP id l2mr36225740paz.70.1349619729740; Sun, 07
	Oct 2012 07:22:09 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.66.161.163 with HTTP; Sun, 7 Oct 2012 07:21:38 -0700 (PDT)
In-Reply-To: <20121007141118.GW35915@deviant.kiev.zoral.com.ua>
References: <507172DA.2020309@FreeBSD.org>
	<20121007141118.GW35915@deviant.kiev.zoral.com.ua>
From: Eitan Adler <lists@eitanadler.com>
Date: Sun, 7 Oct 2012 10:21:38 -0400
Message-ID: <CAF6rxg=j2Fq_XxydoC2XWS+xS2vh58RTUkk0Qm1AKzF8thmW9g@mail.gmail.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-Gm-Message-State: ALoCoQlEapypI34uD0bu7BTbt6yRBnCicdOjcySGyBZ0UiyYUohLm6Iop5IzH0HiX40hvBWizslT
Cc: freebsd-hackers <freebsd-hackers@freebsd.org>,
	Andriy Gapon <avg@freebsd.org>
Subject: Re: machine/cpu.h in userland
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2012 14:22:11 -0000

On 7 October 2012 10:11, Konstantin Belousov <kostikbel@gmail.com> wrote:
> What I do not understand is why do you spam lists instead of
> commmitting the obvious changes ?

It is not spam to ask for review. He was uncertain, and asked for some
clarification.


-- 
Eitan Adler

From owner-freebsd-hackers@FreeBSD.ORG  Sun Oct  7 17:06:27 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id E5C9B1065670
	for <freebsd-hackers@freebsd.org>; Sun,  7 Oct 2012 17:06:27 +0000 (UTC)
	(envelope-from wojtek@wojtek.tensor.gdynia.pl)
Received: from wojtek.tensor.gdynia.pl (wojtek.tensor.gdynia.pl [89.206.35.99])
	by mx1.freebsd.org (Postfix) with ESMTP id 19AEB8FC08
	for <freebsd-hackers@freebsd.org>; Sun,  7 Oct 2012 17:06:26 +0000 (UTC)
Received: from wojtek.tensor.gdynia.pl (localhost [127.0.0.1])
	by wojtek.tensor.gdynia.pl (8.14.5/8.14.5) with ESMTP id q97H0RMq016006;
	Sun, 7 Oct 2012 19:00:27 +0200 (CEST)
	(envelope-from wojtek@wojtek.tensor.gdynia.pl)
Received: from localhost (wojtek@localhost)
	by wojtek.tensor.gdynia.pl (8.14.5/8.14.5/Submit) with ESMTP id
	q97H0QL4016003; Sun, 7 Oct 2012 19:00:27 +0200 (CEST)
	(envelope-from wojtek@wojtek.tensor.gdynia.pl)
Date: Sun, 7 Oct 2012 19:00:26 +0200 (CEST)
From: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>
To: Brandon Falk <bfalk_bsd@brandonfa.lk>
In-Reply-To: <5069C9FC.6020400@brandonfa.lk>
Message-ID: <alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl>
References: <5069C9FC.6020400@brandonfa.lk>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.7
	(wojtek.tensor.gdynia.pl [127.0.0.1]);
	Sun, 07 Oct 2012 19:00:27 +0200 (CEST)
Cc: freebsd-hackers@freebsd.org
Subject: Re: SMP Version of tar
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2012 17:06:28 -0000

> I would be willing to work on a SMP version of tar (initially just gzip or 
> something).
>
> I don't have the best experience in compression, and how to multi-thread it, 
> but I think I would be able to learn and help out.

gzip cannot - it is single stream.

bzip2 - no idea

grzip (from ports, i use it) - can be multithreaded as it packs using 
fixed large chunks.

From owner-freebsd-hackers@FreeBSD.ORG  Sun Oct  7 17:51:55 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id DE47B1065728
	for <freebsd-hackers@freebsd.org>; Sun,  7 Oct 2012 17:51:55 +0000 (UTC)
	(envelope-from tim@kientzle.com)
Received: from monday.kientzle.com (99-115-135-74.uvs.sntcca.sbcglobal.net
	[99.115.135.74])
	by mx1.freebsd.org (Postfix) with ESMTP id B86A48FC1C
	for <freebsd-hackers@freebsd.org>; Sun,  7 Oct 2012 17:51:55 +0000 (UTC)
Received: (from root@localhost)
	by monday.kientzle.com (8.14.4/8.14.4) id q97HpkUQ017185;
	Sun, 7 Oct 2012 17:51:46 GMT (envelope-from tim@kientzle.com)
Received: from [192.168.2.143] (CiscoE3000 [192.168.1.65])
	by kientzle.com with SMTP id 64ke244kk3upjc7r2g7je68tq6;
	Sun, 07 Oct 2012 17:51:46 +0000 (UTC)
	(envelope-from tim@kientzle.com)
Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: text/plain; charset=us-ascii
From: Tim Kientzle <tim@kientzle.com>
In-Reply-To: <alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl>
Date: Sun, 7 Oct 2012 10:52:27 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <324B736D-8961-4E44-A212-2ECF3E60F2A0@kientzle.com>
References: <5069C9FC.6020400@brandonfa.lk>
	<alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl>
To: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>
X-Mailer: Apple Mail (2.1278)
Cc: freebsd-hackers@freebsd.org, Brandon Falk <bfalk_bsd@brandonfa.lk>
Subject: Re: SMP Version of tar
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2012 17:51:56 -0000

On Oct 7, 2012, at 10:00 AM, Wojciech Puchar wrote:

>> I would be willing to work on a SMP version of tar (initially just =
gzip or something).
>>=20
>> I don't have the best experience in compression, and how to =
multi-thread it, but I think I would be able to learn and help out.
>=20
> gzip cannot - it is single stream.

gunzip commutes with cat, so gzip
compression can be multi-threaded
by compressing separate blocks and
concatenating the result.

For proof, look at Mark Adler's pigz
program, which does exactly this.

GZip decompression is admittedly trickier.


> bzip2 - no idea

bzip2 is block oriented and can be multi-threaded for both compression =
and decompression.


Tim


From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct  8 06:36:40 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id DCF6D1065672
	for <freebsd-hackers@freebsd.org>; Mon,  8 Oct 2012 06:36:40 +0000 (UTC)
	(envelope-from trond@fagskolen.gjovik.no)
Received: from smtp.fagskolen.gjovik.no (smtp.fagskolen.gjovik.no
	[IPv6:2001:700:1100:1:200:ff:fe00:b])
	by mx1.freebsd.org (Postfix) with ESMTP id 5E5F88FC0A
	for <freebsd-hackers@freebsd.org>; Mon,  8 Oct 2012 06:36:38 +0000 (UTC)
Received: from mail.fig.ol.no (localhost [127.0.0.1])
	by mail.fig.ol.no (8.14.5/8.14.5) with ESMTP id q986aUKx069523
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 8 Oct 2012 08:36:30 +0200 (CEST)
	(envelope-from trond@fagskolen.gjovik.no)
Received: from localhost (trond@localhost)
	by mail.fig.ol.no (8.14.5/8.14.5/Submit) with ESMTP id q986aTrZ069520; 
	Mon, 8 Oct 2012 08:36:30 +0200 (CEST)
	(envelope-from trond@fagskolen.gjovik.no)
X-Authentication-Warning: mail.fig.ol.no: trond owned process doing -bs
Date: Mon, 8 Oct 2012 08:36:29 +0200 (CEST)
From: =?ISO-8859-1?Q?Trond_Endrest=F8l?= <Trond.Endrestol@fagskolen.gjovik.no>
Sender: Trond.Endrestol@fagskolen.gjovik.no
To: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>
In-Reply-To: <alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl>
Message-ID: <alpine.BSF.2.00.1210080835340.36463@mail.fig.ol.no>
References: <5069C9FC.6020400@brandonfa.lk>
	<alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
Organization: Fagskolen Innlandet
OpenPGP: url=http://fig.ol.no/~trond/trond.key
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED;
	BOUNDARY="2055831798-1630032467-1349678190=:36463"
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail.fig.ol.no
Cc: freebsd-hackers@freebsd.org, Brandon Falk <bfalk_bsd@brandonfa.lk>
Subject: Re: SMP Version of tar
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2012 06:36:41 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--2055831798-1630032467-1349678190=:36463
Content-Type: TEXT/PLAIN; charset=ISO-8859-1
Content-Transfer-Encoding: 8BIT

On Sun, 7 Oct 2012 19:00+0200, Wojciech Puchar wrote:

> > I would be willing to work on a SMP version of tar (initially just gzip or
> > something).
> > 
> > I don't have the best experience in compression, and how to multi-thread it,
> > but I think I would be able to learn and help out.
> 
> gzip cannot - it is single stream.
> 
> bzip2 - no idea

Check out archivers/pbzip2.

> grzip (from ports, i use it) - can be multithreaded as it packs using fixed
> large chunks.

-- 
+-------------------------------+------------------------------------+
| Vennlig hilsen,               | Best regards,                      |
| Trond Endrest�l,              | Trond Endrest�l,                   |
| IT-ansvarlig,                 | System administrator,              |
| Fagskolen Innlandet,          | Gj�vik Technical College, Norway,  |
| tlf. mob.   952 62 567,       | Cellular...: +47 952 62 567,       |
| sentralbord 61 14 54 00.      | Switchboard: +47 61 14 54 00.      |
+-------------------------------+------------------------------------+
--2055831798-1630032467-1349678190=:36463--

From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct  8 06:38:44 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 5099E106566B
	for <freebsd-hackers@freebsd.org>; Mon,  8 Oct 2012 06:38:44 +0000 (UTC)
	(envelope-from wojtek@wojtek.tensor.gdynia.pl)
Received: from wojtek.tensor.gdynia.pl (wojtek.tensor.gdynia.pl [89.206.35.99])
	by mx1.freebsd.org (Postfix) with ESMTP id A56528FC0A
	for <freebsd-hackers@freebsd.org>; Mon,  8 Oct 2012 06:38:43 +0000 (UTC)
Received: from wojtek.tensor.gdynia.pl (localhost [127.0.0.1])
	by wojtek.tensor.gdynia.pl (8.14.5/8.14.5) with ESMTP id q986cY0E003668;
	Mon, 8 Oct 2012 08:38:34 +0200 (CEST)
	(envelope-from wojtek@wojtek.tensor.gdynia.pl)
Received: from localhost (wojtek@localhost)
	by wojtek.tensor.gdynia.pl (8.14.5/8.14.5/Submit) with ESMTP id
	q986cYpm003665; Mon, 8 Oct 2012 08:38:34 +0200 (CEST)
	(envelope-from wojtek@wojtek.tensor.gdynia.pl)
Date: Mon, 8 Oct 2012 08:38:33 +0200 (CEST)
From: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>
To: Tim Kientzle <tim@kientzle.com>
In-Reply-To: <324B736D-8961-4E44-A212-2ECF3E60F2A0@kientzle.com>
Message-ID: <alpine.BSF.2.00.1210080838170.3664@wojtek.tensor.gdynia.pl>
References: <5069C9FC.6020400@brandonfa.lk>
	<alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl>
	<324B736D-8961-4E44-A212-2ECF3E60F2A0@kientzle.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.7
	(wojtek.tensor.gdynia.pl [127.0.0.1]);
	Mon, 08 Oct 2012 08:38:34 +0200 (CEST)
Cc: freebsd-hackers@freebsd.org, Brandon Falk <bfalk_bsd@brandonfa.lk>
Subject: Re: SMP Version of tar
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2012 06:38:44 -0000

>> gzip cannot - it is single stream.
>
> gunzip commutes with cat, so gzip
> compression can be multi-threaded
> by compressing separate blocks and
> concatenating the result.

right. but resulting file format must be different.

From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct  8 08:45:29 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id A32A1106564A
	for <freebsd-hackers@freebsd.org>; Mon,  8 Oct 2012 08:45:29 +0000 (UTC)
	(envelope-from roam@ringlet.net)
Received: from nimbus.fccf.net (nimbus.fccf.net [77.77.144.35])
	by mx1.freebsd.org (Postfix) with ESMTP id 4BE678FC1B
	for <freebsd-hackers@freebsd.org>; Mon,  8 Oct 2012 08:45:29 +0000 (UTC)
Received: from straylight.m.ringlet.net (unknown [78.90.13.150])
	by nimbus.fccf.net (Postfix) with ESMTPSA id 03BC1952
	for <freebsd-hackers@freebsd.org>;
	Mon,  8 Oct 2012 11:38:16 +0300 (EEST)
Received: from roam (uid 1000) (envelope-from roam@ringlet.net) id 316002
	by straylight.m.ringlet.net (DragonFly Mail Agent);
	Mon, 08 Oct 2012 11:38:14 +0300
Date: Mon, 8 Oct 2012 11:38:14 +0300
From: Peter Pentchev <roam@ringlet.net>
To: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>
Message-ID: <20121008083814.GA5830@straylight.m.ringlet.net>
Mail-Followup-To: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>,
	Tim Kientzle <tim@kientzle.com>, freebsd-hackers@freebsd.org,
	Brandon Falk <bfalk_bsd@brandonfa.lk>
References: <5069C9FC.6020400@brandonfa.lk>
	<alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl>
	<324B736D-8961-4E44-A212-2ECF3E60F2A0@kientzle.com>
	<alpine.BSF.2.00.1210080838170.3664@wojtek.tensor.gdynia.pl>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="n8g4imXOkfNTN/H1"
Content-Disposition: inline
In-Reply-To: <alpine.BSF.2.00.1210080838170.3664@wojtek.tensor.gdynia.pl>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: Brandon Falk <bfalk_bsd@brandonfa.lk>, freebsd-hackers@freebsd.org
Subject: Re: SMP Version of tar
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2012 08:45:29 -0000


--n8g4imXOkfNTN/H1
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Oct 08, 2012 at 08:38:33AM +0200, Wojciech Puchar wrote:
> >>gzip cannot - it is single stream.
> >
> >gunzip commutes with cat, so gzip
> >compression can be multi-threaded
> >by compressing separate blocks and
> >concatenating the result.
>=20
> right. but resulting file format must be different.

Not necessarily.  If I understand correctly what Tim means, he's talking
about an in-memory compression of several blocks by several separate
threads, and then - after all the threads have compressed their
respective blocks - writing out the result to the output file in order.
Of course, this would incur a small penalty in that the dictionary would
not be reused between blocks, but it might still be worth it.

G'luck,
Peter

--=20
Peter Pentchev	roam@ringlet.net roam@FreeBSD.org peter@packetscale.com
PGP key:	http://people.FreeBSD.org/~roam/roam.key.asc
Key fingerprint	FDBA FD79 C26F 3C51 C95E  DF9E ED18 B68D 1619 4553
Hey, out there - is it *you* reading me, or is it someone else?

--n8g4imXOkfNTN/H1
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iQIcBAEBCAAGBQJQcpDwAAoJEGUe77AlJ98TLBMP/jQ74ESXef5g/Uedklzi/PXI
wsgP8BFBzHwldymZnH/lRMYKLUbjYka+HIrf/hrdLBRVu4/uyYP5+3aYD2DuFxHP
gONtqrBo9FSuXVxk9fB8tfldoM4rudovgBZbUHkm+mONRtMkyQ4diBEvLnJHUKmL
oiphw/QjOUveuxssnFiOBVu9x07yWORNNarVT4xl7otjhL+G7aapvU+NqVvSidzG
aq8ftYAgo1npyoZubSVb0KHHASRAryLz3iMSW3tJSg9mMbReZbxZ60no0X3X0c8Y
9fs8gP3eH2T2R8rxh/A9+ursgC/gSDNsSIQo3ta0eJ+Rp9U+7il3Y3K7BlsltmNg
yxdhQjF6PRDCpt3KGS10oijNdHpmKrOGBH0pY9nJoDUlSYGIjScHlqX7dY4vbtLO
R+3w9f33iowMWG1skY0fcbCZnljpQyqIwRiC1iCLDn/qpPAyG9bw4ZAdfbF27P7d
sEUaFe2Sj5hEoDkLuArXOIcOokLNQhGcf5nZmg9uCgbnHibfk65d053L7zeexGqQ
oxBl63HHx/Xh25qEzndfVrDahDgxS8+vsU5BKlA12VPBq7Kg1CB+pFKme7jHaFcW
JjtVU39/ml/pkINEMw5HL/T79HdrN2I4jkiWKlCsq3jsySKVH8pcEA8+Og82nvcD
lGHdNT7Zd3X0qM90dix9
=yNTU
-----END PGP SIGNATURE-----

--n8g4imXOkfNTN/H1--

From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct  8 10:22:09 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 5779A1065673
	for <freebsd-hackers@freebsd.org>; Mon,  8 Oct 2012 10:22:09 +0000 (UTC)
	(envelope-from wojtek@wojtek.tensor.gdynia.pl)
Received: from wojtek.tensor.gdynia.pl (wojtek.tensor.gdynia.pl [89.206.35.99])
	by mx1.freebsd.org (Postfix) with ESMTP id AA82F8FC12
	for <freebsd-hackers@freebsd.org>; Mon,  8 Oct 2012 10:22:08 +0000 (UTC)
Received: from wojtek.tensor.gdynia.pl (localhost [127.0.0.1])
	by wojtek.tensor.gdynia.pl (8.14.5/8.14.5) with ESMTP id q98ALxHI004680;
	Mon, 8 Oct 2012 12:21:59 +0200 (CEST)
	(envelope-from wojtek@wojtek.tensor.gdynia.pl)
Received: from localhost (wojtek@localhost)
	by wojtek.tensor.gdynia.pl (8.14.5/8.14.5/Submit) with ESMTP id
	q98ALx0w004677; Mon, 8 Oct 2012 12:21:59 +0200 (CEST)
	(envelope-from wojtek@wojtek.tensor.gdynia.pl)
Date: Mon, 8 Oct 2012 12:21:59 +0200 (CEST)
From: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>
To: Peter Pentchev <roam@ringlet.net>
In-Reply-To: <20121008083814.GA5830@straylight.m.ringlet.net>
Message-ID: <alpine.BSF.2.00.1210081219300.4673@wojtek.tensor.gdynia.pl>
References: <5069C9FC.6020400@brandonfa.lk>
	<alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl>
	<324B736D-8961-4E44-A212-2ECF3E60F2A0@kientzle.com>
	<alpine.BSF.2.00.1210080838170.3664@wojtek.tensor.gdynia.pl>
	<20121008083814.GA5830@straylight.m.ringlet.net>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.7
	(wojtek.tensor.gdynia.pl [127.0.0.1]);
	Mon, 08 Oct 2012 12:21:59 +0200 (CEST)
Cc: freebsd-hackers@freebsd.org, Brandon Falk <bfalk_bsd@brandonfa.lk>
Subject: Re: SMP Version of tar
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2012 10:22:09 -0000

> Not necessarily.  If I understand correctly what Tim means, he's talking
> about an in-memory compression of several blocks by several separate
> threads, and then - after all the threads have compressed their

but gzip format is single stream. dictionary IMHO is not reset every X 
kilobytes.

parallel gzip is possible but not with same data format.


By the way in my opinion grzip (in ports/archivers/grzip) is one of the 
most under-valued software. almost unknown. compresses faster than bzip2, 
with better results, it is very simple and making parallel version is 
trivial - there is just a procedure to compress single block.

Personally i use gzip for fast compression and grzip for strong 
compression, and don't use bzip2 at all

From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct  8 16:11:35 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id A6E96106566B;
	Mon,  8 Oct 2012 16:11:35 +0000 (UTC)
	(envelope-from marcel@xcllnt.net)
Received: from mail.xcllnt.net (mail.xcllnt.net [70.36.220.4])
	by mx1.freebsd.org (Postfix) with ESMTP id 5DFBB8FC0A;
	Mon,  8 Oct 2012 16:11:35 +0000 (UTC)
Received: from sa-nc-cs-116.static.jnpr.net (natint3.juniper.net
	[66.129.224.36]) (authenticated bits=0)
	by mail.xcllnt.net (8.14.5/8.14.5) with ESMTP id q98GBW7O082253
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Mon, 8 Oct 2012 09:11:33 -0700 (PDT)
	(envelope-from marcel@xcllnt.net)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
From: Marcel Moolenaar <marcel@xcllnt.net>
In-Reply-To: <CAGH67wQffjVHqFw_eN=mfeg-Ac2Z6XBT5Hv72ev0kjjx7YH7SA@mail.gmail.com>
Date: Mon, 8 Oct 2012 09:11:29 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <127FA63D-8EEE-4616-AE1E-C39469DDCC6A@xcllnt.net>
References: <CAGH67wRkOmy7rWLkxXnT2155PuSQpwOMyu7dTAKeO1WW2dju7g@mail.gmail.com>
	<201210020750.23358.jhb@freebsd.org>
	<CAGH67wTM1VDrpu7rS=VE1G_kVEOHhS4-OCy5FX_6eDGmiNTA8A@mail.gmail.com>
	<201210021037.27762.jhb@freebsd.org>
	<CAGH67wQffjVHqFw_eN=mfeg-Ac2Z6XBT5Hv72ev0kjjx7YH7SA@mail.gmail.com>
To: Garrett Cooper <yanegomi@gmail.com>
X-Mailer: Apple Mail (2.1499)
Cc: freebsd-hackers@freebsd.org, freebsd-arch@freebsd.org,
	"Simon J. Gerraty" <sjg@juniper.net>
Subject: Re: [CFT/RFC]: refactor bsd.prog.mk to understand multiple programs
	instead of a singular program
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2012 16:11:35 -0000


On Oct 4, 2012, at 9:42 AM, Garrett Cooper <yanegomi@gmail.com> wrote:
>>> Both parties (Isilon/Juniper) are converging on the ATF porting work
>>> that Giorgos/myself have done after talking at the FreeBSD =
Foundation
>>> meet-n-greet. I have contributed all of the patches that I have =
other
>>> to marcel for feedback.
>>=20
>> This is very non-obvious to the public at large (e.g. there was no =
public
>> response to one group's inquiry about the second ATF import for =
example).
>> Also, given that you had no idea that sgf@ and obrien@ were working =
on
>> importing NetBSD's bmake as a prerequisite for ATF, it seems that =
whatever
>> discussions were held were not very detailed at best.  I think it =
would be
>> good to have the various folks working on ATF to at least summarize =
the
>> current state of things and sketch out some sort of plan or roadmap =
for future
>> work in a public forum (such as atf@, though a summary mail would be =
quite
>> appropriate for arch@).
>=20
> I'm in part to blame for this. There was some discussion -- but not at
> length; unfortunately no one from Juniper was present at the meet and
> greet; the information I got was second hand; I didn't follow up to
> figure out the exact details / clarify what I had in mind with the
> appropriate parties.

Hang on. I want in on the blame part! :-)

Seriously: no-one is really to blame as far as I can see. We just had
two independent efforts (ATF & bmake) and there was no indication that
one would be greatly benefitted from the other. At least not to the
point of creating a dependency.

I just committed the bmake bits. It not only adds bmake to the build,
but also includes the changes necessary to use bmake.

With that in place it's easier to decide whether we want the dependency
or not.

Before we can switch permanently to bmake, we need to do the following
first:
1.  Request an EXP ports build with bmake as make(1). This should tell
    us the "damage" of switching to bmake for ports.
2.  In parallel with 1: build www & docs with bmake and assess the
    damage
3.  Fix all the damage

Then:

4.  Switch.

It could be a while (many weeks) before we get to 4, so the question
really is whether the people working on ATF are willing and able to
build and install FreeBSD using WITH_BMAKE?

--=20
Marcel Moolenaar
marcel@xcllnt.net


From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct  8 16:17:39 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 7F083106580A
	for <freebsd-hackers@freebsd.org>; Mon,  8 Oct 2012 16:17:39 +0000 (UTC)
	(envelope-from lists@eitanadler.com)
Received: from mail-da0-f54.google.com (mail-da0-f54.google.com
	[209.85.210.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 4D9C58FC12
	for <freebsd-hackers@freebsd.org>; Mon,  8 Oct 2012 16:17:39 +0000 (UTC)
Received: by mail-da0-f54.google.com with SMTP id z9so1794842dad.13
	for <freebsd-hackers@freebsd.org>; Mon, 08 Oct 2012 09:17:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=eitanadler.com; s=0xdeadbeef;
	h=mime-version:from:date:message-id:subject:to:content-type;
	bh=HQ4PKGPFVfuTDVQcvy/m/Rq+8EL54vq2oyC4HGCRtKs=;
	b=FVA0/RRTpMIKAPD0MeB2ofw9J0UDeHjjP+miu8dmLMPAEMOfs1DjWsLGh7MWS8YA64
	/mpJvjDoM2RYV2py8rVlUPU8qnBsqqs4/h+xa/ok4afsj+kufFJvvvjyi13B4T90KfrW
	/ufUAPB/1HuRIMfrV/hvnQTXXJxTajPYhIalI=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=mime-version:from:date:message-id:subject:to:content-type
	:x-gm-message-state;
	bh=HQ4PKGPFVfuTDVQcvy/m/Rq+8EL54vq2oyC4HGCRtKs=;
	b=l+gUlrB92tXpnZ+Gz/mu7InkfNtShIHRAaWuX5nL30swUR2ULsxLHXImZexIX/JzFy
	YZpdIueqYS6YEyN7Hs60glkbZjZXdBiYXmVu2a4PSRlCUahEGlXsKHRg5v40XANXJp+L
	Mzu6hw2llIv0UGxoiEWDNXuQWrPsz8DonJkE7ZO9MlgORazip2PCbbvQSK8NRVn65zQ2
	0JaiGVkiLouAGDAJ5HXMcrEyF1xXID31IZMwTbFtjKhejNv+ezxCgPmqkKaSzphsErr4
	98w7eLks8idAqqdFEEqtoXf/XhH7NNh7/pJ70N+S1oLwJNEog12A3VrwUT3Rqcidp1ME
	/cig==
Received: by 10.68.200.72 with SMTP id jq8mr54316898pbc.38.1349713058528; Mon,
	08 Oct 2012 09:17:38 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.66.161.163 with HTTP; Mon, 8 Oct 2012 09:17:08 -0700 (PDT)
From: Eitan Adler <lists@eitanadler.com>
Date: Mon, 8 Oct 2012 12:17:08 -0400
Message-ID: <CAF6rxg=PnQvtXydhx8+oRZJ2ERBoGwedXPcGi_9icYxAtPuxVw@mail.gmail.com>
To: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-Gm-Message-State: ALoCoQnBsbR5BYdqjUYL9cz9fesooBQZzaDJMG8R+KY7j43Rth0wXNA0MQVf7jHMh7w5I0FKX3Xs
Subject: -lpthread vs -pthread: does -D_REENTRANT matter?
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Oct 2012 16:17:39 -0000

The only difference between -lpthread and -pthread that I could see is
that the latter also sets -D_REENTRANT.
However, I can't find any uses of _REENTRANT anywhere outside of a few
utilities that seem to define it manually.

Testing with various manually written pthread programs resulted in
identical binaries, let alone identical results.

Is there an actual difference between -pthread and -lpthread or is
this just a historical artifact?

-- 
Eitan Adler

From owner-freebsd-hackers@FreeBSD.ORG  Tue Oct  9 06:30:41 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id EC1381019
 for <freebsd-hackers@freebsd.org>; Tue,  9 Oct 2012 06:30:41 +0000 (UTC)
 (envelope-from wojtek@wojtek.tensor.gdynia.pl)
Received: from wojtek.tensor.gdynia.pl (wojtek.tensor.gdynia.pl [89.206.35.99])
 by mx1.freebsd.org (Postfix) with ESMTP id 296758FC19
 for <freebsd-hackers@freebsd.org>; Tue,  9 Oct 2012 06:30:40 +0000 (UTC)
Received: from wojtek.tensor.gdynia.pl (localhost [127.0.0.1])
 by wojtek.tensor.gdynia.pl (8.14.5/8.14.5) with ESMTP id q996R9CM013488;
 Tue, 9 Oct 2012 08:27:09 +0200 (CEST)
 (envelope-from wojtek@wojtek.tensor.gdynia.pl)
Received: from localhost (wojtek@localhost)
 by wojtek.tensor.gdynia.pl (8.14.5/8.14.5/Submit) with ESMTP id q996R9XU013485;
 Tue, 9 Oct 2012 08:27:09 +0200 (CEST)
 (envelope-from wojtek@wojtek.tensor.gdynia.pl)
Date: Tue, 9 Oct 2012 08:27:09 +0200 (CEST)
From: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>
To: Peter Pentchev <roam@ringlet.net>
Subject: Re: SMP Version of tar
In-Reply-To: <20121008083814.GA5830@straylight.m.ringlet.net>
Message-ID: <alpine.BSF.2.00.1210090826240.13459@wojtek.tensor.gdynia.pl>
References: <5069C9FC.6020400@brandonfa.lk>
 <alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl>
 <324B736D-8961-4E44-A212-2ECF3E60F2A0@kientzle.com>
 <alpine.BSF.2.00.1210080838170.3664@wojtek.tensor.gdynia.pl>
 <20121008083814.GA5830@straylight.m.ringlet.net>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.7
 (wojtek.tensor.gdynia.pl [127.0.0.1]); Tue, 09 Oct 2012 08:27:10 +0200 (CEST)
Cc: freebsd-hackers@freebsd.org, Brandon Falk <bfalk_bsd@brandonfa.lk>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2012 06:30:42 -0000

> Not necessarily.  If I understand correctly what Tim means, he's talking
> about an in-memory compression of several blocks by several separate
> threads, and then - after all the threads have compressed their
> respective blocks - writing out the result to the output file in order.
> Of course, this would incur a small penalty in that the dictionary would
> not be reused between blocks, but it might still be worth it.
all fine. i just wanted to point out that ungzipping normal standard gzip 
file cannot be multithreaded, and multithreaded-compressed gzip 
output would be different.

From owner-freebsd-hackers@FreeBSD.ORG  Tue Oct  9 10:46:55 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 596ACF5C
 for <hackers@freebsd.org>; Tue,  9 Oct 2012 10:46:55 +0000 (UTC)
 (envelope-from danny@cs.huji.ac.il)
Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.16.84])
 by mx1.freebsd.org (Postfix) with ESMTP id 00AE18FC18
 for <hackers@freebsd.org>; Tue,  9 Oct 2012 10:46:54 +0000 (UTC)
Received: from pampa.cs.huji.ac.il ([132.65.80.32])
 by kabab.cs.huji.ac.il with esmtp id 1TLXKv-0003ON-5r
 for hackers@freebsd.org; Tue, 09 Oct 2012 12:46:53 +0200
X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.3
To: hackers@freebsd.org
Subject: Re: problem cross-compiling 9.1
In-reply-to: <E8E12A21-4A53-4CE3-B215-2E1D8C7B20CE@bsdimp.com>
References: <E1TJLSb-0008k7-Bd@kabab.cs.huji.ac.il> 
 <E8E12A21-4A53-4CE3-B215-2E1D8C7B20CE@bsdimp.com>
Comments: In-reply-to Warner Losh <imp@bsdimp.com>
 message dated "Mon, 08 Oct 2012 14:55:51 -0600."
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Tue, 09 Oct 2012 12:46:53 +0200
From: Daniel Braniss <danny@cs.huji.ac.il>
Message-ID: <E1TLXKv-0003ON-5r@kabab.cs.huji.ac.il>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2012 10:46:55 -0000

[snip]
> any fix?
> > You have found the fix.  Remove the WITHOUT_XXXX options from the build that keep it from completing.  You'll be able to add them at installworld time w/o a hassle.  nanobsd uses this to keep things down, while still being able to build the system.
> > Warner
> 
where can I find the with/without list?
btw, I did look at nanobsd in the past and have borrowed some ideas :-)

thanks,
	danny


From owner-freebsd-hackers@FreeBSD.ORG  Tue Oct  9 13:36:07 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 9F6BF8C4
 for <hackers@freebsd.org>; Tue,  9 Oct 2012 13:36:07 +0000 (UTC)
 (envelope-from yanegomi@gmail.com)
Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com
 [209.85.220.54])
 by mx1.freebsd.org (Postfix) with ESMTP id 6A0798FC19
 for <hackers@freebsd.org>; Tue,  9 Oct 2012 13:36:07 +0000 (UTC)
Received: by mail-pa0-f54.google.com with SMTP id bi1so5699437pad.13
 for <hackers@freebsd.org>; Tue, 09 Oct 2012 06:36:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=references:mime-version:in-reply-to:content-type
 :content-transfer-encoding:message-id:cc:x-mailer:from:subject:date
 :to; bh=pGmQa7qJmi2BEJYIbTgTgzLwFlXfA65dzZPixR8ihWU=;
 b=TulAXWf4deVAlI/twHshHtwkvK2HyWlwUBLouC7wDHMt5M/Obf4mjLTbmDezcHXr12
 DR3dDGYTCdt1TRc+eUlKXlkbK/ZMy8FjJz8IKdWx/0rmS4YOx1hqimNibOA9DYv/AU6P
 Nf5E5/NtGspSbkwLydDonHVfJdnbHaP7d3WTJgCqApDQ4rgzOKPQbrTZ3mxGSVQjG6x/
 azlbqT2OXcQ7F4MB5miuz5OrgkcygWxqEncBVAFuqVbsIyscHicpOdWmxXW6pTmGwbks
 HTy2EVUUfCrECv2xc505hgmQDJM1vePqnl1u/X/jw/YUl+V77n4wLTvuW0rSzoyhVV04
 GBCw==
Received: by 10.68.234.7 with SMTP id ua7mr64410041pbc.91.1349789761138;
 Tue, 09 Oct 2012 06:36:01 -0700 (PDT)
Received: from [192.168.20.12] (c-24-19-191-56.hsd1.wa.comcast.net.
 [24.19.191.56])
 by mx.google.com with ESMTPS id oi2sm11178812pbb.62.2012.10.09.06.35.59
 (version=SSLv3 cipher=OTHER); Tue, 09 Oct 2012 06:35:59 -0700 (PDT)
References: <E1TJLSb-0008k7-Bd@kabab.cs.huji.ac.il>
 <E8E12A21-4A53-4CE3-B215-2E1D8C7B20CE@bsdimp.com>
 <E1TLXKv-0003ON-5r@kabab.cs.huji.ac.il>
Mime-Version: 1.0 (1.0)
In-Reply-To: <E1TLXKv-0003ON-5r@kabab.cs.huji.ac.il>
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Message-Id: <9036FCFF-AAEA-4D20-AD73-10726F81440A@gmail.com>
X-Mailer: iPhone Mail (10A403)
From: Garrett Cooper <yanegomi@gmail.com>
Subject: Re: problem cross-compiling 9.1
Date: Tue, 9 Oct 2012 06:35:57 -0700
To: Daniel Braniss <danny@cs.huji.ac.il>
Cc: "hackers@freebsd.org" <hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2012 13:36:07 -0000

On Oct 9, 2012, at 3:46 AM, Daniel Braniss <danny@cs.huji.ac.il> wrote:

> [snip]
>> any fix?
>>> You have found the fix.  Remove the WITHOUT_XXXX options from the build t=
hat keep it from completing.  You'll be able to add them at installworld tim=
e w/o a hassle.  nanobsd uses this to keep things down, while still being ab=
le to build the system.
>>> Warner
> where can I find the with/without list?
> btw, I did look at nanobsd in the past and have borrowed some ideas :-)

man make.conf and man src.conf, then read through bsd.own.mk if interested i=
n knowing what exactly can be used.

HTH!
-Garrett=

From owner-freebsd-hackers@FreeBSD.ORG  Tue Oct  9 14:12:48 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id AF7B86D1;
 Tue,  9 Oct 2012 14:12:48 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com
 [209.85.212.178])
 by mx1.freebsd.org (Postfix) with ESMTP id C6A3B8FC0C;
 Tue,  9 Oct 2012 14:12:47 +0000 (UTC)
Received: by mail-wi0-f178.google.com with SMTP id hr7so4072576wib.13
 for <multiple recipients>; Tue, 09 Oct 2012 07:12:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=j14Cl+Whi217hvrLClyuARGwY6JBTXtO+V8PPUZFYbM=;
 b=WOeCJHK9XhIClbeeqshTnOfQME2/diIiIVemFITKROgexWSI4Kst4Z9PIl19qW7Aax
 Yio2uKsESDfqM3XkQgix6c856w0o4Ll/Qg3cBX86b25x7vE6dEcs/whksgmS3z+jHaT8
 X4GmtcaNFZr/xfFfgcyX3Xg6vxlVPMEfmecLzR8ibOMRHsbWd+UIzI5sLLdk45ch3Pbs
 5k8ByelmxiyBIpujDsnuE8OnI9o8aGHOHcBlWXhaWDZLhbc9V7JYrT41fbDXqh2i7J6s
 Bnjoy9cgyNRO9WLfmtuqAZWUBiwY3ynRKlr3enPVCFuMW0j8H01mNcxJsMJIxg+sNDbo
 YH1g==
Received: by 10.217.2.146 with SMTP id p18mr12374882wes.198.1349791966518;
 Tue, 09 Oct 2012 07:12:46 -0700 (PDT)
Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com.
 [217.18.249.148])
 by mx.google.com with ESMTPS id gg4sm28310910wib.6.2012.10.09.07.12.44
 (version=TLSv1/SSLv3 cipher=OTHER);
 Tue, 09 Oct 2012 07:12:45 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <1666343702.1682678.1349300219198.JavaMail.root@erie.cs.uoguelph.ca>
Date: Tue, 9 Oct 2012 17:12:43 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <F865A337-7A68-4DC5-8B9E-627C9E6F3518@gmail.com>
References: <1666343702.1682678.1349300219198.JavaMail.root@erie.cs.uoguelph.ca>
To: freebsd-hackers@freebsd.org
X-Mailer: Apple Mail (2.1498)
Cc: rmacklem@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2012 14:12:48 -0000


On Oct 4, 2012, at 12:36 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Garrett Wollman wrote:
>> <<On Wed, 3 Oct 2012 09:21:06 -0400 (EDT), Rick Macklem
>> <rmacklem@uoguelph.ca> said:
>>=20
>>>> Simple: just use a sepatate mutex for each list that a cache entry
>>>> is on, rather than a global lock for everything. This would reduce
>>>> the mutex contention, but I'm not sure how significantly since I
>>>> don't have the means to measure it yet.
>>>>=20
>>> Well, since the cache trimming is removing entries from the lists, I
>>> don't
>>> see how that can be done with a global lock for list updates?
>>=20
>> Well, the global lock is what we have now, but the cache trimming
>> process only looks at one list at a time, so not locking the list =
that
>> isn't being iterated over probably wouldn't hurt, unless there's some
>> mechanism (that I didn't see) for entries to move from one list to
>> another. Note that I'm considering each hash bucket a separate
>> "list". (One issue to worry about in that case would be cache-line
>> contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE
>> ought to be increased to reduce that.)
>>=20
> Yea, a separate mutex for each hash list might help. There is also the
> LRU list that all entries end up on, that gets used by the trimming =
code.
> (I think? I wrote this stuff about 8 years ago, so I haven't looked at
> it in a while.)
>=20
> Also, increasing the hash table size is probably a good idea, =
especially
> if you reduce how aggressively the cache is trimmed.
>=20
>>> Only doing it once/sec would result in a very large cache when
>>> bursts of
>>> traffic arrives.
>>=20
>> My servers have 96 GB of memory so that's not a big deal for me.
>>=20
> This code was originally "production tested" on a server with 1Gbyte,
> so times have changed a bit;-)
>=20
>>> I'm not sure I see why doing it as a separate thread will improve
>>> things.
>>> There are N nfsd threads already (N can be bumped up to 256 if you
>>> wish)
>>> and having a bunch more "cache trimming threads" would just increase
>>> contention, wouldn't it?
>>=20
>> Only one cache-trimming thread. The cache trim holds the (global)
>> mutex for much longer than any individual nfsd service thread has any
>> need to, and having N threads doing that in parallel is why it's so
>> heavily contended. If there's only one thread doing the trim, then
>> the nfsd service threads aren't spending time either contending on =
the
>> mutex (it will be held less frequently and for shorter periods).
>>=20
> I think the little drc2.patch which will keep the nfsd threads from
> acquiring the mutex and doing the trimming most of the time, might be
> sufficient. I still don't see why a separate trimming thread will be
> an advantage. I'd also be worried that the one cache trimming thread
> won't get the job done soon enough.
>=20
> When I did production testing on a 1Gbyte server that saw a peak
> load of about 100RPCs/sec, it was necessary to trim aggressively.
> (Although I'd be tempted to say that a server with 1Gbyte is no
> longer relevant, I recently recall someone trying to run FreeBSD
> on a i486, although I doubt they wanted to run the nfsd on it.)
>=20
>>> The only negative effect I can think of w.r.t. having the nfsd
>>> threads doing it would be a (I believe negligible) increase in RPC
>>> response times (the time the nfsd thread spends trimming the cache).
>>> As noted, I think this time would be negligible compared to disk I/O
>>> and network transit times in the total RPC response time?
>>=20
>> With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G
>> network connectivity, spinning on a contended mutex takes a
>> significant amount of CPU time. (For the current design of the NFS
>> server, it may actually be a win to turn off adaptive mutexes -- I
>> should give that a try once I'm able to do more testing.)
>>=20
> Have fun with it. Let me know when you have what you think is a good =
patch.
>=20
> rick
>=20
>> -GAWollman
>> _______________________________________________
>> freebsd-hackers@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to
>> "freebsd-hackers-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

My quest for IOPS over NFS continues :)
So far I'm not able to achieve more than about 3000 8K read requests =
over NFS,
while the server locally gives much more.
And this is all from a file that is completely in ARC cache, no disk IO =
involved.

I've snatched some sample DTrace script from the net : [ =
http://utcc.utoronto.ca/~cks/space/blog/solaris/DTraceQuantizationNotes =
]

And modified it for our new NFS server :

#!/usr/sbin/dtrace -qs=20

fbt:kernel:nfsrvd_*:entry
{
	self->ts =3D timestamp;=20
	@counts[probefunc] =3D count();
}

fbt:kernel:nfsrvd_*:return
/ self->ts > 0 /
{
	this->delta =3D (timestamp-self->ts)/1000000;
}

fbt:kernel:nfsrvd_*:return
/ self->ts > 0 && this->delta > 100 /
{
	@slow[probefunc, "ms"] =3D lquantize(this->delta, 100, 500, 50);
}

fbt:kernel:nfsrvd_*:return
/ self->ts > 0 /
{
	@dist[probefunc, "ms"] =3D quantize(this->delta);
	self->ts =3D 0;
}

END
{
	printf("\n");
	printa("function %-20s  %@10d\n", @counts);
	printf("\n");
	printa("function %s(), time in %s:%@d\n", @dist);
	printf("\n");
	printa("function %s(), time in %s for >=3D 100 ms:%@d\n", =
@slow);
}

And here's a sample output from one or two minutes during the run of =
Oracle's ORION benchmark
tool from a Linux machine, on a 32G file on NFS mount over 10G ethernet:

[16:01]root@goliath:/home/ndenev# ./nfsrvd.d =20
^C

function nfsrvd_access                  4
function nfsrvd_statfs                 10
function nfsrvd_getattr                14
function nfsrvd_commit                 76
function nfsrvd_sentcache          110048
function nfsrvd_write              110048
function nfsrvd_read               283648
function nfsrvd_dorpc              393800
function nfsrvd_getcache           393800
function nfsrvd_rephead            393800
function nfsrvd_updatecache        393800

function nfsrvd_access(), time in ms:
           value  ------------- Distribution ------------- count   =20
              -1 |                                         0       =20
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4       =20
               1 |                                         0       =20

function nfsrvd_statfs(), time in ms:
           value  ------------- Distribution ------------- count   =20
              -1 |                                         0       =20
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10      =20
               1 |                                         0       =20

function nfsrvd_getattr(), time in ms:
           value  ------------- Distribution ------------- count   =20
              -1 |                                         0       =20
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14      =20
               1 |                                         0       =20

function nfsrvd_sentcache(), time in ms:
           value  ------------- Distribution ------------- count   =20
              -1 |                                         0       =20
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110048  =20
               1 |                                         0       =20

function nfsrvd_rephead(), time in ms:
           value  ------------- Distribution ------------- count   =20
              -1 |                                         0       =20
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800  =20
               1 |                                         0       =20

function nfsrvd_updatecache(), time in ms:
           value  ------------- Distribution ------------- count   =20
              -1 |                                         0       =20
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800  =20
               1 |                                         0       =20

function nfsrvd_getcache(), time in ms:
           value  ------------- Distribution ------------- count   =20
              -1 |                                         0       =20
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393798  =20
               1 |                                         1       =20
               2 |                                         0       =20
               4 |                                         1       =20
               8 |                                         0       =20

function nfsrvd_write(), time in ms:
           value  ------------- Distribution ------------- count   =20
              -1 |                                         0       =20
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110039  =20
               1 |                                         5       =20
               2 |                                         4       =20
               4 |                                         0       =20

function nfsrvd_read(), time in ms:
           value  ------------- Distribution ------------- count   =20
              -1 |                                         0       =20
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 283622  =20
               1 |                                         19      =20
               2 |                                         3       =20
               4 |                                         2       =20
               8 |                                         0       =20
              16 |                                         1       =20
              32 |                                         0       =20
              64 |                                         0       =20
             128 |                                         0       =20
             256 |                                         1       =20
             512 |                                         0       =20

function nfsrvd_commit(), time in ms:
           value  ------------- Distribution ------------- count   =20
              -1 |                                         0       =20
               0 |@@@@@@@@@@@@@@@@@@@@@@@                  44      =20
               1 |@@@@@@@                                  14      =20
               2 |                                         0       =20
               4 |@                                        1       =20
               8 |@                                        1       =20
              16 |                                         0       =20
              32 |@@@@@@@                                  14      =20
              64 |@                                        2       =20
             128 |                                         0       =20


function nfsrvd_commit(), time in ms for >=3D 100 ms:
           value  ------------- Distribution ------------- count   =20
           < 100 |                                         0       =20
             100 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1       =20
             150 |                                         0       =20

function nfsrvd_read(), time in ms for >=3D 100 ms:
           value  ------------- Distribution ------------- count   =20
             250 |                                         0       =20
             300 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1       =20
             350 |                                         0       =20


Looks like the nfs server cache functions are quite fast, but extremely =
frequently called.

I hope someone can find this information useful.


From owner-freebsd-hackers@FreeBSD.ORG  Tue Oct  9 15:23:44 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id E60A8ED5;
 Tue,  9 Oct 2012 15:23:44 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50])
 by mx1.freebsd.org (Postfix) with ESMTP id 1D2BE8FC1B;
 Tue,  9 Oct 2012 15:23:43 +0000 (UTC)
Received: by mail-wg0-f50.google.com with SMTP id 16so4398578wgi.31
 for <multiple recipients>; Tue, 09 Oct 2012 08:23:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=UGqIpUVIh4maA0DSdOTvB4Q5EkRwYYo7xq2/jBodgGo=;
 b=KoF9yXqS88pNipeojuA0HCRgJcmQXBhHm9mmf/FyM+y4Bk24X8TXAGCcgGOyQRo9DI
 +npJbCaLGyZae0QpQe4C5l8hNVMorEJOkeTGrMgT70ve+KOi9sSbH9BpsA5jCB1D/FWm
 juALvKsIlkDnoiqr+G4ZliTuIVc21HrzsWpXzWx17dDj0REY5Bu8yaIUyn8AXVQgXeMz
 Q11yDhzfeMfKwdFJ/HzTGOhPeSZhAG6zaMo+yw8glM1gsK5MmOYfOs1nIE8pmQ4BqFTu
 fZVf+bJ1QlI5wzBr/yuJ6pH1AlLNK0JuEtM5C/fS6KFOOzlTen/RebUoUM50C33xZQ5G
 RLag==
Received: by 10.216.201.80 with SMTP id a58mr12447484weo.15.1349796222818;
 Tue, 09 Oct 2012 08:23:42 -0700 (PDT)
Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com.
 [217.18.249.148])
 by mx.google.com with ESMTPS id cu1sm25356406wib.6.2012.10.09.08.23.39
 (version=TLSv1/SSLv3 cipher=OTHER);
 Tue, 09 Oct 2012 08:23:40 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <F865A337-7A68-4DC5-8B9E-627C9E6F3518@gmail.com>
Date: Tue, 9 Oct 2012 18:23:37 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <59B593A0-DA96-4395-B6B9-196F73A1415C@gmail.com>
References: <1666343702.1682678.1349300219198.JavaMail.root@erie.cs.uoguelph.ca>
 <F865A337-7A68-4DC5-8B9E-627C9E6F3518@gmail.com>
To: freebsd-hackers@freebsd.org
X-Mailer: Apple Mail (2.1498)
Cc: rmacklem@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2012 15:23:45 -0000

On Oct 9, 2012, at 5:12 PM, Nikolay Denev <ndenev@gmail.com> wrote:

>=20
> On Oct 4, 2012, at 12:36 AM, Rick Macklem <rmacklem@uoguelph.ca> =
wrote:
>=20
>> Garrett Wollman wrote:
>>> <<On Wed, 3 Oct 2012 09:21:06 -0400 (EDT), Rick Macklem
>>> <rmacklem@uoguelph.ca> said:
>>>=20
>>>>> Simple: just use a sepatate mutex for each list that a cache entry
>>>>> is on, rather than a global lock for everything. This would reduce
>>>>> the mutex contention, but I'm not sure how significantly since I
>>>>> don't have the means to measure it yet.
>>>>>=20
>>>> Well, since the cache trimming is removing entries from the lists, =
I
>>>> don't
>>>> see how that can be done with a global lock for list updates?
>>>=20
>>> Well, the global lock is what we have now, but the cache trimming
>>> process only looks at one list at a time, so not locking the list =
that
>>> isn't being iterated over probably wouldn't hurt, unless there's =
some
>>> mechanism (that I didn't see) for entries to move from one list to
>>> another. Note that I'm considering each hash bucket a separate
>>> "list". (One issue to worry about in that case would be cache-line
>>> contention in the array of hash buckets; perhaps NFSRVCACHE_HASHSIZE
>>> ought to be increased to reduce that.)
>>>=20
>> Yea, a separate mutex for each hash list might help. There is also =
the
>> LRU list that all entries end up on, that gets used by the trimming =
code.
>> (I think? I wrote this stuff about 8 years ago, so I haven't looked =
at
>> it in a while.)
>>=20
>> Also, increasing the hash table size is probably a good idea, =
especially
>> if you reduce how aggressively the cache is trimmed.
>>=20
>>>> Only doing it once/sec would result in a very large cache when
>>>> bursts of
>>>> traffic arrives.
>>>=20
>>> My servers have 96 GB of memory so that's not a big deal for me.
>>>=20
>> This code was originally "production tested" on a server with 1Gbyte,
>> so times have changed a bit;-)
>>=20
>>>> I'm not sure I see why doing it as a separate thread will improve
>>>> things.
>>>> There are N nfsd threads already (N can be bumped up to 256 if you
>>>> wish)
>>>> and having a bunch more "cache trimming threads" would just =
increase
>>>> contention, wouldn't it?
>>>=20
>>> Only one cache-trimming thread. The cache trim holds the (global)
>>> mutex for much longer than any individual nfsd service thread has =
any
>>> need to, and having N threads doing that in parallel is why it's so
>>> heavily contended. If there's only one thread doing the trim, then
>>> the nfsd service threads aren't spending time either contending on =
the
>>> mutex (it will be held less frequently and for shorter periods).
>>>=20
>> I think the little drc2.patch which will keep the nfsd threads from
>> acquiring the mutex and doing the trimming most of the time, might be
>> sufficient. I still don't see why a separate trimming thread will be
>> an advantage. I'd also be worried that the one cache trimming thread
>> won't get the job done soon enough.
>>=20
>> When I did production testing on a 1Gbyte server that saw a peak
>> load of about 100RPCs/sec, it was necessary to trim aggressively.
>> (Although I'd be tempted to say that a server with 1Gbyte is no
>> longer relevant, I recently recall someone trying to run FreeBSD
>> on a i486, although I doubt they wanted to run the nfsd on it.)
>>=20
>>>> The only negative effect I can think of w.r.t. having the nfsd
>>>> threads doing it would be a (I believe negligible) increase in RPC
>>>> response times (the time the nfsd thread spends trimming the =
cache).
>>>> As noted, I think this time would be negligible compared to disk =
I/O
>>>> and network transit times in the total RPC response time?
>>>=20
>>> With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G
>>> network connectivity, spinning on a contended mutex takes a
>>> significant amount of CPU time. (For the current design of the NFS
>>> server, it may actually be a win to turn off adaptive mutexes -- I
>>> should give that a try once I'm able to do more testing.)
>>>=20
>> Have fun with it. Let me know when you have what you think is a good =
patch.
>>=20
>> rick
>>=20
>>> -GAWollman
>>> _______________________________________________
>>> freebsd-hackers@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>> To unsubscribe, send any mail to
>>> "freebsd-hackers-unsubscribe@freebsd.org"
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>=20
> My quest for IOPS over NFS continues :)
> So far I'm not able to achieve more than about 3000 8K read requests =
over NFS,
> while the server locally gives much more.
> And this is all from a file that is completely in ARC cache, no disk =
IO involved.
>=20
> I've snatched some sample DTrace script from the net : [ =
http://utcc.utoronto.ca/~cks/space/blog/solaris/DTraceQuantizationNotes =
]
>=20
> And modified it for our new NFS server :
>=20
> #!/usr/sbin/dtrace -qs=20
>=20
> fbt:kernel:nfsrvd_*:entry
> {
> 	self->ts =3D timestamp;=20
> 	@counts[probefunc] =3D count();
> }
>=20
> fbt:kernel:nfsrvd_*:return
> / self->ts > 0 /
> {
> 	this->delta =3D (timestamp-self->ts)/1000000;
> }
>=20
> fbt:kernel:nfsrvd_*:return
> / self->ts > 0 && this->delta > 100 /
> {
> 	@slow[probefunc, "ms"] =3D lquantize(this->delta, 100, 500, 50);
> }
>=20
> fbt:kernel:nfsrvd_*:return
> / self->ts > 0 /
> {
> 	@dist[probefunc, "ms"] =3D quantize(this->delta);
> 	self->ts =3D 0;
> }
>=20
> END
> {
> 	printf("\n");
> 	printa("function %-20s  %@10d\n", @counts);
> 	printf("\n");
> 	printa("function %s(), time in %s:%@d\n", @dist);
> 	printf("\n");
> 	printa("function %s(), time in %s for >=3D 100 ms:%@d\n", =
@slow);
> }
>=20
> And here's a sample output from one or two minutes during the run of =
Oracle's ORION benchmark
> tool from a Linux machine, on a 32G file on NFS mount over 10G =
ethernet:
>=20
> [16:01]root@goliath:/home/ndenev# ./nfsrvd.d =20
> ^C
>=20
> function nfsrvd_access                  4
> function nfsrvd_statfs                 10
> function nfsrvd_getattr                14
> function nfsrvd_commit                 76
> function nfsrvd_sentcache          110048
> function nfsrvd_write              110048
> function nfsrvd_read               283648
> function nfsrvd_dorpc              393800
> function nfsrvd_getcache           393800
> function nfsrvd_rephead            393800
> function nfsrvd_updatecache        393800
>=20
> function nfsrvd_access(), time in ms:
>           value  ------------- Distribution ------------- count   =20
>              -1 |                                         0       =20
>               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4       =20
>               1 |                                         0       =20
>=20
> function nfsrvd_statfs(), time in ms:
>           value  ------------- Distribution ------------- count   =20
>              -1 |                                         0       =20
>               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10      =20
>               1 |                                         0       =20
>=20
> function nfsrvd_getattr(), time in ms:
>           value  ------------- Distribution ------------- count   =20
>              -1 |                                         0       =20
>               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14      =20
>               1 |                                         0       =20
>=20
> function nfsrvd_sentcache(), time in ms:
>           value  ------------- Distribution ------------- count   =20
>              -1 |                                         0       =20
>               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110048  =20
>               1 |                                         0       =20
>=20
> function nfsrvd_rephead(), time in ms:
>           value  ------------- Distribution ------------- count   =20
>              -1 |                                         0       =20
>               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800  =20
>               1 |                                         0       =20
>=20
> function nfsrvd_updatecache(), time in ms:
>           value  ------------- Distribution ------------- count   =20
>              -1 |                                         0       =20
>               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800  =20
>               1 |                                         0       =20
>=20
> function nfsrvd_getcache(), time in ms:
>           value  ------------- Distribution ------------- count   =20
>              -1 |                                         0       =20
>               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393798  =20
>               1 |                                         1       =20
>               2 |                                         0       =20
>               4 |                                         1       =20
>               8 |                                         0       =20
>=20
> function nfsrvd_write(), time in ms:
>           value  ------------- Distribution ------------- count   =20
>              -1 |                                         0       =20
>               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110039  =20
>               1 |                                         5       =20
>               2 |                                         4       =20
>               4 |                                         0       =20
>=20
> function nfsrvd_read(), time in ms:
>           value  ------------- Distribution ------------- count   =20
>              -1 |                                         0       =20
>               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 283622  =20
>               1 |                                         19      =20
>               2 |                                         3       =20
>               4 |                                         2       =20
>               8 |                                         0       =20
>              16 |                                         1       =20
>              32 |                                         0       =20
>              64 |                                         0       =20
>             128 |                                         0       =20
>             256 |                                         1       =20
>             512 |                                         0       =20
>=20
> function nfsrvd_commit(), time in ms:
>           value  ------------- Distribution ------------- count   =20
>              -1 |                                         0       =20
>               0 |@@@@@@@@@@@@@@@@@@@@@@@                  44      =20
>               1 |@@@@@@@                                  14      =20
>               2 |                                         0       =20
>               4 |@                                        1       =20
>               8 |@                                        1       =20
>              16 |                                         0       =20
>              32 |@@@@@@@                                  14      =20
>              64 |@                                        2       =20
>             128 |                                         0       =20
>=20
>=20
> function nfsrvd_commit(), time in ms for >=3D 100 ms:
>           value  ------------- Distribution ------------- count   =20
>           < 100 |                                         0       =20
>             100 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1       =20
>             150 |                                         0       =20
>=20
> function nfsrvd_read(), time in ms for >=3D 100 ms:
>           value  ------------- Distribution ------------- count   =20
>             250 |                                         0       =20
>             300 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1       =20
>             350 |                                         0       =20
>=20
>=20
> Looks like the nfs server cache functions are quite fast, but =
extremely frequently called.
>=20
> I hope someone can find this information useful.
>=20

Here's another quick one :

#!/usr/sbin/dtrace -qs=20

#pragma D option quiet

fbt:kernel:nfsrvd_*:entry
{
        self->trace =3D 1;
}

fbt:kernel:nfsrvd_*:return
/ self->trace /
{
	@calls[probefunc] =3D count();
}

tick-1sec
{
        printf("%40s | %s\n", "function", "calls per second");
	printa("%40s %10@d\n", @calls);
        clear(@calls);
        printf("\n");
}

Showing the number of calls per second to the nfsrvd_* functions.


From owner-freebsd-hackers@FreeBSD.ORG  Tue Oct  9 15:35:15 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id B81404E8
 for <hackers@FreeBSD.org>; Tue,  9 Oct 2012 15:35:15 +0000 (UTC)
 (envelope-from erik@cederstrand.dk)
Received: from csmtp2.one.com (csmtp2.one.com [91.198.169.22])
 by mx1.freebsd.org (Postfix) with ESMTP id 757198FC08
 for <hackers@FreeBSD.org>; Tue,  9 Oct 2012 15:35:14 +0000 (UTC)
Received: from [192.168.1.18] (unknown [217.157.7.221])
 by csmtp2.one.com (Postfix) with ESMTPA id B44F93018818
 for <hackers@FreeBSD.org>; Tue,  9 Oct 2012 15:35:07 +0000 (UTC)
From: Erik Cederstrand <erik@cederstrand.dk>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Subject: time_t when used as timedelta
Message-Id: <787F09EF-E3F7-467E-B023-B7846509D2A1@cederstrand.dk>
Date: Tue, 9 Oct 2012 17:35:09 +0200
To: FreeBSD Hackers <hackers@FreeBSD.org>
Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\))
X-Mailer: Apple Mail (2.1486)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2012 15:35:15 -0000

Hi list,

I'm looking at this possible divide-by zero in dhclient: =
http://scan.freebsd.your.org/freebsd-head/WORLD/2012-10-07-amd64/report-nB=
hqE2.html.gz#EndPath

In this specific case, it's obvious from the intention of the code that =
ip->client->interval is always >0, but it's not obvious to me in the =
code. I could add an assert before the possible divide-by-zero:

assert(ip->client->interval > 0);

But looking at the code, I'm not sure it's very elegant. =
ip->client->interval is defined as time_t (see =
src/sbin/dhclient/dhcpd.h), which is a signed integer type, if I'm =
correct. However, some time_t members of struct client_state and struct =
client_config (see said header file) are assumed in the code to be =
positive and possibly non-null. Instead of plastering the code with =
asserts, is there something like an utime_t type? Or are there better =
ways to enforce the invariant?

Thanks,
Erik=

From owner-freebsd-hackers@FreeBSD.ORG  Tue Oct  9 16:02:38 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id BBCE2F25
 for <hackers@freebsd.org>; Tue,  9 Oct 2012 16:02:38 +0000 (UTC)
 (envelope-from freebsd@damnhippie.dyndns.org)
Received: from duck.symmetricom.us (duck.symmetricom.us [206.168.13.214])
 by mx1.freebsd.org (Postfix) with ESMTP id C5E898FC16
 for <hackers@freebsd.org>; Tue,  9 Oct 2012 16:02:31 +0000 (UTC)
Received: from damnhippie.dyndns.org (daffy.symmetricom.us [206.168.13.218])
 by duck.symmetricom.us (8.14.5/8.14.5) with ESMTP id q99G2OrY096571
 for <hackers@freebsd.org>; Tue, 9 Oct 2012 10:02:24 -0600 (MDT)
 (envelope-from freebsd@damnhippie.dyndns.org)
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id q99G2LCZ080554;
 Tue, 9 Oct 2012 10:02:21 -0600 (MDT)
 (envelope-from freebsd@damnhippie.dyndns.org)
Subject: Re: time_t when used as timedelta
From: Ian Lepore <freebsd@damnhippie.dyndns.org>
To: Erik Cederstrand <erik@cederstrand.dk>
In-Reply-To: <787F09EF-E3F7-467E-B023-B7846509D2A1@cederstrand.dk>
References: <787F09EF-E3F7-467E-B023-B7846509D2A1@cederstrand.dk>
Content-Type: text/plain; charset="us-ascii"
Date: Tue, 09 Oct 2012 10:02:21 -0600
Message-ID: <1349798541.1123.6.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
Cc: FreeBSD Hackers <hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2012 16:02:38 -0000

On Tue, 2012-10-09 at 17:35 +0200, Erik Cederstrand wrote:
> Hi list,
> 
> I'm looking at this possible divide-by zero in dhclient: http://scan.freebsd.your.org/freebsd-head/WORLD/2012-10-07-amd64/report-nBhqE2.html.gz#EndPath
> 
> In this specific case, it's obvious from the intention of the code that ip->client->interval is always >0, but it's not obvious to me in the code. I could add an assert before the possible divide-by-zero:
> 
> assert(ip->client->interval > 0);
> 
> But looking at the code, I'm not sure it's very elegant. ip->client->interval is defined as time_t (see src/sbin/dhclient/dhcpd.h), which is a signed integer type, if I'm correct. However, some time_t members of struct client_state and struct client_config (see said header file) are assumed in the code to be positive and possibly non-null. Instead of plastering the code with asserts, is there something like an utime_t type? Or are there better ways to enforce the invariant?
> 

It looks to me like the place where enforcement is really needed is in
parse_lease_time() which should ensure at the very least that negative
values never get through, and in some cases that zeroes don't sneak in
from config files.  If it were ensured that
ip->client->config->backoff_cutoff could never be less than 1 (and it
appears any value less than 1 would be insane), then the division by
zero case could never happen.  However, at least one of the config
statements handled by parse_lease_time() allows a value of zero.

Since nothing seems to ensure that backoff_cutoff is non-zero, it seems
like a potential source of div-by-zero errors too, in that same
function.

-- Ian


From owner-freebsd-hackers@FreeBSD.ORG  Tue Oct  9 17:52:23 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 0180FA7A
 for <hackers@freebsd.org>; Tue,  9 Oct 2012 17:52:23 +0000 (UTC)
 (envelope-from imp@bsdimp.com)
Received: from mail-qa0-f54.google.com (mail-qa0-f54.google.com
 [209.85.216.54])
 by mx1.freebsd.org (Postfix) with ESMTP id A33788FC1A
 for <hackers@freebsd.org>; Tue,  9 Oct 2012 17:52:22 +0000 (UTC)
Received: by mail-qa0-f54.google.com with SMTP id y23so3560183qad.13
 for <hackers@freebsd.org>; Tue, 09 Oct 2012 10:52:21 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer
 :x-gm-message-state;
 bh=54YCas1zo0O0B8GzkoSjXAR0Pc0wLxtRxKpGn2KMijE=;
 b=ROv4t79k0RMZAYLWo8dkzlfxERCI+mSEQKEDHbZFrMAyv/SwlifQbcz6mt8ch2zFGn
 15GNmagwMpgJ8QWmLWgs16238dim5wAuckw0c2YvZKkVidRbSTzUhS2Qn/7u1tHxuW5W
 Rm48gApPOiT8gPsxIfCBD+v1HwqDo3BcffUJV62ua9yHfBnRU47hkvtHLpk4ujtWYdFU
 UubMO+ol27e4+JJykSts4pv5D79SaqCb0SWq7Leb2/+/vsjHAECPlYbtcHdpsUW8L6CY
 V27aov+cT+Ks6M2nyr9Yt7x8B6bR8m/JdFpZ6oauaFOP0uNx0y8Bz6JIXfCWh69s2Csi
 7nDg==
Received: by 10.224.199.2 with SMTP id eq2mr36071159qab.55.1349805141702;
 Tue, 09 Oct 2012 10:52:21 -0700 (PDT)
Received: from [10.30.101.53] ([209.117.142.2])
 by mx.google.com with ESMTPS id j3sm21055821qek.7.2012.10.09.10.52.18
 (version=TLSv1/SSLv3 cipher=OTHER);
 Tue, 09 Oct 2012 10:52:19 -0700 (PDT)
Sender: Warner Losh <wlosh@bsdimp.com>
Subject: Re: problem cross-compiling 9.1
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Warner Losh <imp@bsdimp.com>
In-Reply-To: <E1TLXKv-0003ON-5r@kabab.cs.huji.ac.il>
Date: Tue, 9 Oct 2012 11:52:15 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <C8DB271D-E9BE-4BEB-84C8-FD2929587BB4@bsdimp.com>
References: <E1TJLSb-0008k7-Bd@kabab.cs.huji.ac.il>
 <E8E12A21-4A53-4CE3-B215-2E1D8C7B20CE@bsdimp.com>
 <E1TLXKv-0003ON-5r@kabab.cs.huji.ac.il>
To: Daniel Braniss <danny@cs.huji.ac.il>
X-Mailer: Apple Mail (2.1084)
X-Gm-Message-State: ALoCoQmsXOwFo5V6QFrCfWNHsys9oUDK22708lgE5fRN1rofjs2Onb/eFxi2bb4siMMeQDG8G1fp
Cc: hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2012 17:52:23 -0000


On Oct 9, 2012, at 4:46 AM, Daniel Braniss wrote:

> [snip]
>> any fix?
>>> You have found the fix.  Remove the WITHOUT_XXXX options from the =
build that keep it from completing.  You'll be able to add them at =
installworld time w/o a hassle.  nanobsd uses this to keep things down, =
while still being able to build the system.
>>> Warner
>>=20
> where can I find the with/without list?
> btw, I did look at nanobsd in the past and have borrowed some ideas =
:-)

bsd.own.mk

Warner


From owner-freebsd-hackers@FreeBSD.ORG  Tue Oct  9 18:25:48 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 3FCEBACE
 for <freebsd-hackers@freebsd.org>; Tue,  9 Oct 2012 18:25:48 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
 [IPv6:2001:470:1f10:75::2])
 by mx1.freebsd.org (Postfix) with ESMTP id 1210F8FC1D
 for <freebsd-hackers@freebsd.org>; Tue,  9 Oct 2012 18:25:48 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6333FB91E;
 Tue,  9 Oct 2012 14:25:47 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Warner Losh <imp@bsdimp.com>
Subject: Re: No bus_space_read_8 on x86 ?
Date: Tue, 9 Oct 2012 11:54:15 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; )
References: <506DC574.9010300@intel.com> <201210051208.45550.jhb@freebsd.org>
 <8BC4C95F-2D10-46A5-89C8-74801BB4E23A@bsdimp.com>
In-Reply-To: <8BC4C95F-2D10-46A5-89C8-74801BB4E23A@bsdimp.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201210091154.15873.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Tue, 09 Oct 2012 14:25:47 -0400 (EDT)
Cc: freebsd-hackers@freebsd.org, Carl Delsey <carl.r.delsey@intel.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Oct 2012 18:25:48 -0000

On Monday, October 08, 2012 4:59:24 pm Warner Losh wrote:
> 
> On Oct 5, 2012, at 10:08 AM, John Baldwin wrote:
> 
> > On Thursday, October 04, 2012 1:20:52 pm Carl Delsey wrote:
> >> I noticed that the bus_space_*_8 functions are unimplemented for x86. 
> >> Looking at the code, it seems this is intentional.
> >> 
> >> Is this done because on 32-bit systems we don't know, in the general 
> >> case, whether to read the upper or lower 32-bits first?
> >> 
> >> If that's the reason, I was thinking we could provide two 
> >> implementations for i386: bus_space_read_8_upper_first and 
> >> bus_space_read_8_lower_first. For amd64 we would just have bus_space_read_8
> >> 
> >> Anybody who wants to use bus_space_read_8 in their file would do 
> >> something like:
> >> #define BUS_SPACE_8_BYTES     LOWER_FIRST
> >> or
> >> #define BUS_SPACE_8_BYTES     UPPER_FIRST
> >> whichever is appropriate for their hardware.
> >> 
> >> This would go in their source file before including bus.h and we would 
> >> take care of mapping to the correct implementation.
> >> 
> >> With the prevalence of 64-bit registers these days, if we don't provide 
> >> an implementation, I expect many drivers will end up rolling their own.
> >> 
> >> If this seems like a good idea, I'll happily whip up a patch and submit it.
> > 
> > I think cxgb* already have an implementation.  For amd64 we should certainly 
> > have bus_space_*_8(), at least for SYS_RES_MEMORY.  I think they should fail 
> > for SYS_RES_IOPORT.  I don't think we can force a compile-time error though, 
> > would just have to return -1 on reads or some such?
> 
> I believe it was because bus reads weren't guaranteed to be atomic on i386.
> don't know if that's still the case or a concern, but it was an intentional omission.

True.  If you are on a 32-bit system you can read the two 4 byte values and
then build a 64-bit value.  For 64-bit platforms we should offer bus_read_8()
however.

-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 10 00:18:07 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id C6551FAD;
 Wed, 10 Oct 2012 00:18:07 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id 376EE8FC1E;
 Wed, 10 Oct 2012 00:18:06 +0000 (UTC)
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 09 Oct 2012 20:18:00 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 3900DB4056;
 Tue,  9 Oct 2012 20:18:00 -0400 (EDT)
Date: Tue, 9 Oct 2012 20:18:00 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Nikolay Denev <ndenev@gmail.com>
Message-ID: <1492364164.1964483.1349828280211.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <F865A337-7A68-4DC5-8B9E-627C9E6F3518@gmail.com>
Subject: Re: NFS server bottlenecks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@freebsd.org, Garrett Wollman <wollman@freebsd.org>,
 freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 00:18:07 -0000

Nikolay Denev wrote:
> On Oct 4, 2012, at 12:36 AM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> 
> > Garrett Wollman wrote:
> >> <<On Wed, 3 Oct 2012 09:21:06 -0400 (EDT), Rick Macklem
> >> <rmacklem@uoguelph.ca> said:
> >>
> >>>> Simple: just use a sepatate mutex for each list that a cache
> >>>> entry
> >>>> is on, rather than a global lock for everything. This would
> >>>> reduce
> >>>> the mutex contention, but I'm not sure how significantly since I
> >>>> don't have the means to measure it yet.
> >>>>
> >>> Well, since the cache trimming is removing entries from the lists,
> >>> I
> >>> don't
> >>> see how that can be done with a global lock for list updates?
> >>
> >> Well, the global lock is what we have now, but the cache trimming
> >> process only looks at one list at a time, so not locking the list
> >> that
> >> isn't being iterated over probably wouldn't hurt, unless there's
> >> some
> >> mechanism (that I didn't see) for entries to move from one list to
> >> another. Note that I'm considering each hash bucket a separate
> >> "list". (One issue to worry about in that case would be cache-line
> >> contention in the array of hash buckets; perhaps
> >> NFSRVCACHE_HASHSIZE
> >> ought to be increased to reduce that.)
> >>
> > Yea, a separate mutex for each hash list might help. There is also
> > the
> > LRU list that all entries end up on, that gets used by the trimming
> > code.
> > (I think? I wrote this stuff about 8 years ago, so I haven't looked
> > at
> > it in a while.)
> >
> > Also, increasing the hash table size is probably a good idea,
> > especially
> > if you reduce how aggressively the cache is trimmed.
> >
> >>> Only doing it once/sec would result in a very large cache when
> >>> bursts of
> >>> traffic arrives.
> >>
> >> My servers have 96 GB of memory so that's not a big deal for me.
> >>
> > This code was originally "production tested" on a server with
> > 1Gbyte,
> > so times have changed a bit;-)
> >
> >>> I'm not sure I see why doing it as a separate thread will improve
> >>> things.
> >>> There are N nfsd threads already (N can be bumped up to 256 if you
> >>> wish)
> >>> and having a bunch more "cache trimming threads" would just
> >>> increase
> >>> contention, wouldn't it?
> >>
> >> Only one cache-trimming thread. The cache trim holds the (global)
> >> mutex for much longer than any individual nfsd service thread has
> >> any
> >> need to, and having N threads doing that in parallel is why it's so
> >> heavily contended. If there's only one thread doing the trim, then
> >> the nfsd service threads aren't spending time either contending on
> >> the
> >> mutex (it will be held less frequently and for shorter periods).
> >>
> > I think the little drc2.patch which will keep the nfsd threads from
> > acquiring the mutex and doing the trimming most of the time, might
> > be
> > sufficient. I still don't see why a separate trimming thread will be
> > an advantage. I'd also be worried that the one cache trimming thread
> > won't get the job done soon enough.
> >
> > When I did production testing on a 1Gbyte server that saw a peak
> > load of about 100RPCs/sec, it was necessary to trim aggressively.
> > (Although I'd be tempted to say that a server with 1Gbyte is no
> > longer relevant, I recently recall someone trying to run FreeBSD
> > on a i486, although I doubt they wanted to run the nfsd on it.)
> >
> >>> The only negative effect I can think of w.r.t. having the nfsd
> >>> threads doing it would be a (I believe negligible) increase in RPC
> >>> response times (the time the nfsd thread spends trimming the
> >>> cache).
> >>> As noted, I think this time would be negligible compared to disk
> >>> I/O
> >>> and network transit times in the total RPC response time?
> >>
> >> With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G
> >> network connectivity, spinning on a contended mutex takes a
> >> significant amount of CPU time. (For the current design of the NFS
> >> server, it may actually be a win to turn off adaptive mutexes -- I
> >> should give that a try once I'm able to do more testing.)
> >>
> > Have fun with it. Let me know when you have what you think is a good
> > patch.
> >
> > rick
> >
> >> -GAWollman
> >> _______________________________________________
> >> freebsd-hackers@freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> >> To unsubscribe, send any mail to
> >> "freebsd-hackers-unsubscribe@freebsd.org"
> > _______________________________________________
> > freebsd-fs@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to
> > "freebsd-fs-unsubscribe@freebsd.org"
> 
> My quest for IOPS over NFS continues :)
> So far I'm not able to achieve more than about 3000 8K read requests
> over NFS,
> while the server locally gives much more.
> And this is all from a file that is completely in ARC cache, no disk
> IO involved.
> 
Just out of curiousity, why do you use 8K reads instead of 64K reads.
Since the RPC overhead (including the DRC functions) is per RPC, doing
fewer larger RPCs should usually work better. (Sometimes large rsize/wsize
values generate too large a burst of traffic for a network interface to
handle and then the rsize/wsize has to be decreased to avoid this issue.)

And, although this experiment seems useful for testing patches that try
and reduce DRC CPU overheads, most "real" NFS servers will be doing disk
I/O.

> I've snatched some sample DTrace script from the net : [
> http://utcc.utoronto.ca/~cks/space/blog/solaris/DTraceQuantizationNotes
> ]
> 
> And modified it for our new NFS server :
> 
> #!/usr/sbin/dtrace -qs
> 
> fbt:kernel:nfsrvd_*:entry
> {
> self->ts = timestamp;
> @counts[probefunc] = count();
> }
> 
> fbt:kernel:nfsrvd_*:return
> / self->ts > 0 /
> {
> this->delta = (timestamp-self->ts)/1000000;
> }
> 
> fbt:kernel:nfsrvd_*:return
> / self->ts > 0 && this->delta > 100 /
> {
> @slow[probefunc, "ms"] = lquantize(this->delta, 100, 500, 50);
> }
> 
> fbt:kernel:nfsrvd_*:return
> / self->ts > 0 /
> {
> @dist[probefunc, "ms"] = quantize(this->delta);
> self->ts = 0;
> }
> 
> END
> {
> printf("\n");
> printa("function %-20s %@10d\n", @counts);
> printf("\n");
> printa("function %s(), time in %s:%@d\n", @dist);
> printf("\n");
> printa("function %s(), time in %s for >= 100 ms:%@d\n", @slow);
> }
> 
> And here's a sample output from one or two minutes during the run of
> Oracle's ORION benchmark
> tool from a Linux machine, on a 32G file on NFS mount over 10G
> ethernet:
> 
> [16:01]root@goliath:/home/ndenev# ./nfsrvd.d
> ^C
> 
> function nfsrvd_access 4
> function nfsrvd_statfs 10
> function nfsrvd_getattr 14
> function nfsrvd_commit 76
> function nfsrvd_sentcache 110048
> function nfsrvd_write 110048
> function nfsrvd_read 283648
> function nfsrvd_dorpc 393800
> function nfsrvd_getcache 393800
> function nfsrvd_rephead 393800
> function nfsrvd_updatecache 393800
> 
> function nfsrvd_access(), time in ms:
> value ------------- Distribution ------------- count
> -1 | 0
> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4
> 1 | 0
> 
> function nfsrvd_statfs(), time in ms:
> value ------------- Distribution ------------- count
> -1 | 0
> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
> 1 | 0
> 
> function nfsrvd_getattr(), time in ms:
> value ------------- Distribution ------------- count
> -1 | 0
> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14
> 1 | 0
> 
> function nfsrvd_sentcache(), time in ms:
> value ------------- Distribution ------------- count
> -1 | 0
> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110048
> 1 | 0
> 
> function nfsrvd_rephead(), time in ms:
> value ------------- Distribution ------------- count
> -1 | 0
> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
> 1 | 0
> 
> function nfsrvd_updatecache(), time in ms:
> value ------------- Distribution ------------- count
> -1 | 0
> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
> 1 | 0
> 
> function nfsrvd_getcache(), time in ms:
> value ------------- Distribution ------------- count
> -1 | 0
> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393798
> 1 | 1
> 2 | 0
> 4 | 1
> 8 | 0
> 
> function nfsrvd_write(), time in ms:
> value ------------- Distribution ------------- count
> -1 | 0
> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110039
> 1 | 5
> 2 | 4
> 4 | 0
> 
> function nfsrvd_read(), time in ms:
> value ------------- Distribution ------------- count
> -1 | 0
> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 283622
> 1 | 19
> 2 | 3
> 4 | 2
> 8 | 0
> 16 | 1
> 32 | 0
> 64 | 0
> 128 | 0
> 256 | 1
> 512 | 0
> 
> function nfsrvd_commit(), time in ms:
> value ------------- Distribution ------------- count
> -1 | 0
> 0 |@@@@@@@@@@@@@@@@@@@@@@@ 44
> 1 |@@@@@@@ 14
> 2 | 0
> 4 |@ 1
> 8 |@ 1
> 16 | 0
> 32 |@@@@@@@ 14
> 64 |@ 2
> 128 | 0
> 
> 
> function nfsrvd_commit(), time in ms for >= 100 ms:
> value ------------- Distribution ------------- count
> < 100 | 0
> 100 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
> 150 | 0
> 
> function nfsrvd_read(), time in ms for >= 100 ms:
> value ------------- Distribution ------------- count
> 250 | 0
> 300 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
> 350 | 0
> 
> 
> Looks like the nfs server cache functions are quite fast, but
> extremely frequently called.
> 
Yep, they are called for every RPC.

I may try coding up a patch that replaces the single mutex with
one for each hash bucket, for TCP.

I'll post if/when I get this patch to a testing/review stage, rick

> I hope someone can find this information useful.
> 
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 10 04:53:31 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 10339C17
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 04:53:31 +0000 (UTC)
 (envelope-from tim@kientzle.com)
Received: from monday.kientzle.com (99-115-135-74.uvs.sntcca.sbcglobal.net
 [99.115.135.74])
 by mx1.freebsd.org (Postfix) with ESMTP id D6CA08FC08
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 04:53:30 +0000 (UTC)
Received: (from root@localhost)
 by monday.kientzle.com (8.14.4/8.14.4) id q9A4rM96032111;
 Wed, 10 Oct 2012 04:53:22 GMT (envelope-from tim@kientzle.com)
Received: from [192.168.2.143] (CiscoE3000 [192.168.1.65])
 by kientzle.com with SMTP id b7sp22idag68zchregtgjzrste;
 Wed, 10 Oct 2012 04:53:22 +0000 (UTC)
 (envelope-from tim@kientzle.com)
Subject: Re: SMP Version of tar
Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: text/plain; charset=us-ascii
From: Tim Kientzle <tim@kientzle.com>
In-Reply-To: <alpine.BSF.2.00.1210081219300.4673@wojtek.tensor.gdynia.pl>
Date: Tue, 9 Oct 2012 21:54:03 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <15DBA1A9-A4B6-4F7D-A9DC-3412C4BE3517@kientzle.com>
References: <5069C9FC.6020400@brandonfa.lk>
 <alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl>
 <324B736D-8961-4E44-A212-2ECF3E60F2A0@kientzle.com>
 <alpine.BSF.2.00.1210080838170.3664@wojtek.tensor.gdynia.pl>
 <20121008083814.GA5830@straylight.m.ringlet.net>
 <alpine.BSF.2.00.1210081219300.4673@wojtek.tensor.gdynia.pl>
To: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>
X-Mailer: Apple Mail (2.1278)
Cc: freebsd-hackers@freebsd.org, Brandon Falk <bfalk_bsd@brandonfa.lk>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 04:53:31 -0000


On Oct 8, 2012, at 3:21 AM, Wojciech Puchar wrote:

>> Not necessarily.  If I understand correctly what Tim means, he's =
talking
>> about an in-memory compression of several blocks by several separate
>> threads, and then - after all the threads have compressed their
>=20
> but gzip format is single stream. dictionary IMHO is not reset every X =
kilobytes.
>=20
> parallel gzip is possible but not with same data format.

Yes, it is.

The following creates a compressed file that
is completely compatible with the standard
gzip/gunzip tools:

   * Break file into blocks
   * Compress each block into a gzip file (with gzip header and trailer =
information)
   * Concatenate the result.

This can be correctly decoded by gunzip.

In theory, you get slightly worse compression.  In practice, if your =
blocks are reasonably large (a megabyte or so each), the difference is =
negligible.

Tim


From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 10 12:08:46 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 540F6294
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 12:08:46 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id 0A9C28FC14
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 12:08:45 +0000 (UTC)
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 10 Oct 2012 08:08:44 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id CD252B4035;
 Wed, 10 Oct 2012 08:08:44 -0400 (EDT)
Date: Wed, 10 Oct 2012 08:08:44 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Garrett Wollman <wollman@bimajority.org>
Message-ID: <461825404.1975816.1349870924809.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20596.52616.867711.175010@hergotha.csail.mit.edu>
Subject: Re: NFS server bottlenecks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: Nikolay Denev <ndenev@gmail.com>, freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 12:08:46 -0000

Garrett Wollman wrote:
> <<On Tue, 9 Oct 2012 20:18:00 -0400 (EDT), Rick Macklem
> <rmacklem@uoguelph.ca> said:
> 
> > And, although this experiment seems useful for testing patches that
> > try
> > and reduce DRC CPU overheads, most "real" NFS servers will be doing
> > disk
> > I/O.
> 
> We don't always have control over what the user does. I think the
> worst-case for my users involves a third-party program (that they're
> not willing to modify) that does line-buffered writes in append mode.
> This uses nearly all of the CPU on per-RPC overhead (each write is
> three RPCs: GETATTR, WRITE, COMMIT).
> 
Yes. My comment was simply meant to imply that his testing isn't a
realistic load for most NFS servers. It was not meant to imply that
reducing the CPU overhead/lock contention of the DRC is a useless
exercise.

rick

> -GAWollman

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 10 01:21:14 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id AA266A5
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 01:21:14 +0000 (UTC)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: from hergotha.csail.mit.edu
 (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2])
 by mx1.freebsd.org (Postfix) with ESMTP id 585718FC14
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 01:21:14 +0000 (UTC)
Received: from hergotha.csail.mit.edu (localhost [127.0.0.1])
 by hergotha.csail.mit.edu (8.14.5/8.14.5) with ESMTP id q9A1LD2m043208;
 Tue, 9 Oct 2012 21:21:13 -0400 (EDT)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: (from wollman@localhost)
 by hergotha.csail.mit.edu (8.14.5/8.14.4/Submit) id q9A1LDoI043205;
 Tue, 9 Oct 2012 21:21:13 -0400 (EDT) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <20596.52616.867711.175010@hergotha.csail.mit.edu>
Date: Tue, 9 Oct 2012 21:21:12 -0400
From: Garrett Wollman <wollman@bimajority.org>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: NFS server bottlenecks
In-Reply-To: <1492364164.1964483.1349828280211.JavaMail.root@erie.cs.uoguelph.ca>
References: <F865A337-7A68-4DC5-8B9E-627C9E6F3518@gmail.com>
 <1492364164.1964483.1349828280211.JavaMail.root@erie.cs.uoguelph.ca>
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7
 (hergotha.csail.mit.edu [127.0.0.1]); Tue, 09 Oct 2012 21:21:13 -0400 (EDT)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED
 autolearn=disabled version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 hergotha.csail.mit.edu
X-Mailman-Approved-At: Wed, 10 Oct 2012 12:30:45 +0000
Cc: Nikolay Denev <ndenev@gmail.com>, freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 01:21:14 -0000

<<On Tue, 9 Oct 2012 20:18:00 -0400 (EDT), Rick Macklem <rmacklem@uoguelph.ca> said:

> And, although this experiment seems useful for testing patches that try
> and reduce DRC CPU overheads, most "real" NFS servers will be doing disk
> I/O.

We don't always have control over what the user does.  I think the
worst-case for my users involves a third-party program (that they're
not willing to modify) that does line-buffered writes in append mode.
This uses nearly all of the CPU on per-RPC overhead (each write is
three RPCs: GETATTR, WRITE, COMMIT).

-GAWollman


From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 10 14:33:18 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 351D74A6
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 14:33:18 +0000 (UTC)
 (envelope-from lidl@hydra.pix.net)
Received: from hydra.pix.net (hydra.pix.net [IPv6:2001:470:e254:10::3c])
 by mx1.freebsd.org (Postfix) with ESMTP id F3E418FC0A
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 14:33:17 +0000 (UTC)
Received: from hydra.pix.net (localhost [127.0.0.1])
 by hydra.pix.net (8.14.5/8.14.5) with ESMTP id q9AEXGuA008619;
 Wed, 10 Oct 2012 10:33:16 -0400 (EDT)
 (envelope-from lidl@hydra.pix.net)
X-Virus-Status: Clean
X-Virus-Scanned: clamav-milter 0.97.5 at mail.pix.net
Received: (from lidl@localhost)
 by hydra.pix.net (8.14.5/8.14.5/Submit) id q9AEXEl9008618;
 Wed, 10 Oct 2012 10:33:14 -0400 (EDT) (envelope-from lidl)
Date: Wed, 10 Oct 2012 10:33:14 -0400
From: Kurt Lidl <lidl@pix.net>
To: Tim Kientzle <tim@kientzle.com>
Subject: Re: SMP Version of tar
Message-ID: <20121010143314.GA8402@pix.net>
References: <5069C9FC.6020400@brandonfa.lk>
 <alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl>
 <324B736D-8961-4E44-A212-2ECF3E60F2A0@kientzle.com>
 <alpine.BSF.2.00.1210080838170.3664@wojtek.tensor.gdynia.pl>
 <20121008083814.GA5830@straylight.m.ringlet.net>
 <alpine.BSF.2.00.1210081219300.4673@wojtek.tensor.gdynia.pl>
 <15DBA1A9-A4B6-4F7D-A9DC-3412C4BE3517@kientzle.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <15DBA1A9-A4B6-4F7D-A9DC-3412C4BE3517@kientzle.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>,
 Brandon Falk <bfalk_bsd@brandonfa.lk>, freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 14:33:18 -0000

On Tue, Oct 09, 2012 at 09:54:03PM -0700, Tim Kientzle wrote:
> 
> On Oct 8, 2012, at 3:21 AM, Wojciech Puchar wrote:
> 
> >> Not necessarily.  If I understand correctly what Tim means, he's talking
> >> about an in-memory compression of several blocks by several separate
> >> threads, and then - after all the threads have compressed their
> > 
> > but gzip format is single stream. dictionary IMHO is not reset every X kilobytes.
> > 
> > parallel gzip is possible but not with same data format.
> 
> Yes, it is.
> 
> The following creates a compressed file that
> is completely compatible with the standard
> gzip/gunzip tools:
> 
>    * Break file into blocks
>    * Compress each block into a gzip file (with gzip header and trailer information)
>    * Concatenate the result.
> 
> This can be correctly decoded by gunzip.
> 
> In theory, you get slightly worse compression.  In practice, if your blocks are reasonably large (a megabyte or so each), the difference is negligible.

I am not sure, but I think this conversation might have a slight
misunderstanding due to imprecisely specified language, while the
technical part is in agreement.

Tim is correct in that gzip datastream allows for concatenation of
compressed blocks of data, so you might break the input stream into
a bunch of blocks [A, B, C, etc], and then can append those together
into [A.gz, B.gz, C.gz, etc], and when uncompressed, you will get
the original input stream.

I think that Wojciech's point is that the compressed data stream for
for the single datastream is different than the compressed data
stream of [A.gz, B.gz, C.gz, etc].  Both will decompress to the same
thing, but the intermediate compressed representation will be different.

-Kurt

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 10 14:42:18 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id E5C50771;
 Wed, 10 Oct 2012 14:42:18 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com
 [209.85.212.178])
 by mx1.freebsd.org (Postfix) with ESMTP id 1AB1E8FC12;
 Wed, 10 Oct 2012 14:42:17 +0000 (UTC)
Received: by mail-wi0-f178.google.com with SMTP id hr7so642396wib.13
 for <multiple recipients>; Wed, 10 Oct 2012 07:42:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=0uvAubi10Zr77AqIE0leBfCOdJWQKIhIhklmMQFQfMI=;
 b=Rb4RRMd5uRFPdwtkobaijb/QcZ2uvBLo0dxLXHI6omth/FE7RRQ+WXekWWjd6U92mG
 nVc5pUpSIu16Hwn8Syp5VRDS6qzl36Y0oOSxKKd0RnoySOQngApdf3HmtZVRcSf5CGLS
 Oeh5nahMLQTC3LRav0vvqrnTwZfUV6inw2VuqC0uIRJ5FmBFg6LM9wNW5n9sReR7wQRT
 yyPRRsCKKoGaJkNdwq2ox1LRJ9U03Z+UOSwCjP42t9mY5DEHjN5GjgLgKl0+sRaixFyn
 7zTqvvrR5IR6V5FGT1UO749psZ3PveGUZE6epLyfVcJQ7azeO6Vz/7sDYAhaQDx2avwA
 ZYDA==
Received: by 10.216.218.105 with SMTP id j83mr8876869wep.164.1349880136909;
 Wed, 10 Oct 2012 07:42:16 -0700 (PDT)
Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com.
 [217.18.249.148])
 by mx.google.com with ESMTPS id b7sm30432075wiz.3.2012.10.10.07.42.15
 (version=TLSv1/SSLv3 cipher=OTHER);
 Wed, 10 Oct 2012 07:42:16 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <1492364164.1964483.1349828280211.JavaMail.root@erie.cs.uoguelph.ca>
Date: Wed, 10 Oct 2012 17:42:15 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <B2CD757D-25D8-4353-8487-B3583EEC57FC@gmail.com>
References: <1492364164.1964483.1349828280211.JavaMail.root@erie.cs.uoguelph.ca>
To: Rick Macklem <rmacklem@uoguelph.ca>
X-Mailer: Apple Mail (2.1498)
Cc: rmacklem@freebsd.org, Garrett Wollman <wollman@freebsd.org>,
 freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 14:42:19 -0000


On Oct 10, 2012, at 3:18 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Nikolay Denev wrote:
>> On Oct 4, 2012, at 12:36 AM, Rick Macklem <rmacklem@uoguelph.ca>
>> wrote:
>>=20
>>> Garrett Wollman wrote:
>>>> <<On Wed, 3 Oct 2012 09:21:06 -0400 (EDT), Rick Macklem
>>>> <rmacklem@uoguelph.ca> said:
>>>>=20
>>>>>> Simple: just use a sepatate mutex for each list that a cache
>>>>>> entry
>>>>>> is on, rather than a global lock for everything. This would
>>>>>> reduce
>>>>>> the mutex contention, but I'm not sure how significantly since I
>>>>>> don't have the means to measure it yet.
>>>>>>=20
>>>>> Well, since the cache trimming is removing entries from the lists,
>>>>> I
>>>>> don't
>>>>> see how that can be done with a global lock for list updates?
>>>>=20
>>>> Well, the global lock is what we have now, but the cache trimming
>>>> process only looks at one list at a time, so not locking the list
>>>> that
>>>> isn't being iterated over probably wouldn't hurt, unless there's
>>>> some
>>>> mechanism (that I didn't see) for entries to move from one list to
>>>> another. Note that I'm considering each hash bucket a separate
>>>> "list". (One issue to worry about in that case would be cache-line
>>>> contention in the array of hash buckets; perhaps
>>>> NFSRVCACHE_HASHSIZE
>>>> ought to be increased to reduce that.)
>>>>=20
>>> Yea, a separate mutex for each hash list might help. There is also
>>> the
>>> LRU list that all entries end up on, that gets used by the trimming
>>> code.
>>> (I think? I wrote this stuff about 8 years ago, so I haven't looked
>>> at
>>> it in a while.)
>>>=20
>>> Also, increasing the hash table size is probably a good idea,
>>> especially
>>> if you reduce how aggressively the cache is trimmed.
>>>=20
>>>>> Only doing it once/sec would result in a very large cache when
>>>>> bursts of
>>>>> traffic arrives.
>>>>=20
>>>> My servers have 96 GB of memory so that's not a big deal for me.
>>>>=20
>>> This code was originally "production tested" on a server with
>>> 1Gbyte,
>>> so times have changed a bit;-)
>>>=20
>>>>> I'm not sure I see why doing it as a separate thread will improve
>>>>> things.
>>>>> There are N nfsd threads already (N can be bumped up to 256 if you
>>>>> wish)
>>>>> and having a bunch more "cache trimming threads" would just
>>>>> increase
>>>>> contention, wouldn't it?
>>>>=20
>>>> Only one cache-trimming thread. The cache trim holds the (global)
>>>> mutex for much longer than any individual nfsd service thread has
>>>> any
>>>> need to, and having N threads doing that in parallel is why it's so
>>>> heavily contended. If there's only one thread doing the trim, then
>>>> the nfsd service threads aren't spending time either contending on
>>>> the
>>>> mutex (it will be held less frequently and for shorter periods).
>>>>=20
>>> I think the little drc2.patch which will keep the nfsd threads from
>>> acquiring the mutex and doing the trimming most of the time, might
>>> be
>>> sufficient. I still don't see why a separate trimming thread will be
>>> an advantage. I'd also be worried that the one cache trimming thread
>>> won't get the job done soon enough.
>>>=20
>>> When I did production testing on a 1Gbyte server that saw a peak
>>> load of about 100RPCs/sec, it was necessary to trim aggressively.
>>> (Although I'd be tempted to say that a server with 1Gbyte is no
>>> longer relevant, I recently recall someone trying to run FreeBSD
>>> on a i486, although I doubt they wanted to run the nfsd on it.)
>>>=20
>>>>> The only negative effect I can think of w.r.t. having the nfsd
>>>>> threads doing it would be a (I believe negligible) increase in RPC
>>>>> response times (the time the nfsd thread spends trimming the
>>>>> cache).
>>>>> As noted, I think this time would be negligible compared to disk
>>>>> I/O
>>>>> and network transit times in the total RPC response time?
>>>>=20
>>>> With adaptive mutexes, many CPUs, lots of in-memory cache, and 10G
>>>> network connectivity, spinning on a contended mutex takes a
>>>> significant amount of CPU time. (For the current design of the NFS
>>>> server, it may actually be a win to turn off adaptive mutexes -- I
>>>> should give that a try once I'm able to do more testing.)
>>>>=20
>>> Have fun with it. Let me know when you have what you think is a good
>>> patch.
>>>=20
>>> rick
>>>=20
>>>> -GAWollman
>>>> _______________________________________________
>>>> freebsd-hackers@freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>>> To unsubscribe, send any mail to
>>>> "freebsd-hackers-unsubscribe@freebsd.org"
>>> _______________________________________________
>>> freebsd-fs@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to
>>> "freebsd-fs-unsubscribe@freebsd.org"
>>=20
>> My quest for IOPS over NFS continues :)
>> So far I'm not able to achieve more than about 3000 8K read requests
>> over NFS,
>> while the server locally gives much more.
>> And this is all from a file that is completely in ARC cache, no disk
>> IO involved.
>>=20
> Just out of curiousity, why do you use 8K reads instead of 64K reads.
> Since the RPC overhead (including the DRC functions) is per RPC, doing
> fewer larger RPCs should usually work better. (Sometimes large =
rsize/wsize
> values generate too large a burst of traffic for a network interface =
to
> handle and then the rsize/wsize has to be decreased to avoid this =
issue.)
>=20
> And, although this experiment seems useful for testing patches that =
try
> and reduce DRC CPU overheads, most "real" NFS servers will be doing =
disk
> I/O.
>=20

This is the default blocksize the Oracle and probably most databases =
use.
It uses also larger blocks, but for small random reads in OLTP =
applications this is what is used.


>> I've snatched some sample DTrace script from the net : [
>> =
http://utcc.utoronto.ca/~cks/space/blog/solaris/DTraceQuantizationNotes
>> ]
>>=20
>> And modified it for our new NFS server :
>>=20
>> #!/usr/sbin/dtrace -qs
>>=20
>> fbt:kernel:nfsrvd_*:entry
>> {
>> self->ts =3D timestamp;
>> @counts[probefunc] =3D count();
>> }
>>=20
>> fbt:kernel:nfsrvd_*:return
>> / self->ts > 0 /
>> {
>> this->delta =3D (timestamp-self->ts)/1000000;
>> }
>>=20
>> fbt:kernel:nfsrvd_*:return
>> / self->ts > 0 && this->delta > 100 /
>> {
>> @slow[probefunc, "ms"] =3D lquantize(this->delta, 100, 500, 50);
>> }
>>=20
>> fbt:kernel:nfsrvd_*:return
>> / self->ts > 0 /
>> {
>> @dist[probefunc, "ms"] =3D quantize(this->delta);
>> self->ts =3D 0;
>> }
>>=20
>> END
>> {
>> printf("\n");
>> printa("function %-20s %@10d\n", @counts);
>> printf("\n");
>> printa("function %s(), time in %s:%@d\n", @dist);
>> printf("\n");
>> printa("function %s(), time in %s for >=3D 100 ms:%@d\n", @slow);
>> }
>>=20
>> And here's a sample output from one or two minutes during the run of
>> Oracle's ORION benchmark
>> tool from a Linux machine, on a 32G file on NFS mount over 10G
>> ethernet:
>>=20
>> [16:01]root@goliath:/home/ndenev# ./nfsrvd.d
>> ^C
>>=20
>> function nfsrvd_access 4
>> function nfsrvd_statfs 10
>> function nfsrvd_getattr 14
>> function nfsrvd_commit 76
>> function nfsrvd_sentcache 110048
>> function nfsrvd_write 110048
>> function nfsrvd_read 283648
>> function nfsrvd_dorpc 393800
>> function nfsrvd_getcache 393800
>> function nfsrvd_rephead 393800
>> function nfsrvd_updatecache 393800
>>=20
>> function nfsrvd_access(), time in ms:
>> value ------------- Distribution ------------- count
>> -1 | 0
>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4
>> 1 | 0
>>=20
>> function nfsrvd_statfs(), time in ms:
>> value ------------- Distribution ------------- count
>> -1 | 0
>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
>> 1 | 0
>>=20
>> function nfsrvd_getattr(), time in ms:
>> value ------------- Distribution ------------- count
>> -1 | 0
>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14
>> 1 | 0
>>=20
>> function nfsrvd_sentcache(), time in ms:
>> value ------------- Distribution ------------- count
>> -1 | 0
>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110048
>> 1 | 0
>>=20
>> function nfsrvd_rephead(), time in ms:
>> value ------------- Distribution ------------- count
>> -1 | 0
>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
>> 1 | 0
>>=20
>> function nfsrvd_updatecache(), time in ms:
>> value ------------- Distribution ------------- count
>> -1 | 0
>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
>> 1 | 0
>>=20
>> function nfsrvd_getcache(), time in ms:
>> value ------------- Distribution ------------- count
>> -1 | 0
>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393798
>> 1 | 1
>> 2 | 0
>> 4 | 1
>> 8 | 0
>>=20
>> function nfsrvd_write(), time in ms:
>> value ------------- Distribution ------------- count
>> -1 | 0
>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110039
>> 1 | 5
>> 2 | 4
>> 4 | 0
>>=20
>> function nfsrvd_read(), time in ms:
>> value ------------- Distribution ------------- count
>> -1 | 0
>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 283622
>> 1 | 19
>> 2 | 3
>> 4 | 2
>> 8 | 0
>> 16 | 1
>> 32 | 0
>> 64 | 0
>> 128 | 0
>> 256 | 1
>> 512 | 0
>>=20
>> function nfsrvd_commit(), time in ms:
>> value ------------- Distribution ------------- count
>> -1 | 0
>> 0 |@@@@@@@@@@@@@@@@@@@@@@@ 44
>> 1 |@@@@@@@ 14
>> 2 | 0
>> 4 |@ 1
>> 8 |@ 1
>> 16 | 0
>> 32 |@@@@@@@ 14
>> 64 |@ 2
>> 128 | 0
>>=20
>>=20
>> function nfsrvd_commit(), time in ms for >=3D 100 ms:
>> value ------------- Distribution ------------- count
>> < 100 | 0
>> 100 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
>> 150 | 0
>>=20
>> function nfsrvd_read(), time in ms for >=3D 100 ms:
>> value ------------- Distribution ------------- count
>> 250 | 0
>> 300 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
>> 350 | 0
>>=20
>>=20
>> Looks like the nfs server cache functions are quite fast, but
>> extremely frequently called.
>>=20
> Yep, they are called for every RPC.
>=20
> I may try coding up a patch that replaces the single mutex with
> one for each hash bucket, for TCP.
>=20
> I'll post if/when I get this patch to a testing/review stage, rick
>=20

Cool.

I've readjusted the precision of the dtrace script a bit, and I can see
now the following three functions as taking most of the time : =
nfsrvd_getcache(), nfsrc_trimcache() and nfsrvd_updatecache()

This was recorded during a oracle benchmark run called SLOB, which =
caused 99% cpu load on the NFS server.


>> I hope someone can find this information useful.
>>=20
>> _______________________________________________
>> freebsd-hackers@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to
>> "freebsd-hackers-unsubscribe@freebsd.org"


From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 10 16:42:47 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 45374CA4
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 16:42:47 +0000 (UTC)
 (envelope-from rysto32@gmail.com)
Received: from mail-vc0-f182.google.com (mail-vc0-f182.google.com
 [209.85.220.182])
 by mx1.freebsd.org (Postfix) with ESMTP id ECC948FC08
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 16:42:46 +0000 (UTC)
Received: by mail-vc0-f182.google.com with SMTP id fw7so1296566vcb.13
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 09:42:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type:content-transfer-encoding;
 bh=WQtYJzGsYuj7puhxEnEntVwMPFE4PvRkDWMajnzmoW0=;
 b=WwM6x0eDL30fF7ZpmFz3UUknEWNh38+fqwaNrPFVFBF4ccIAQDnPFzo745/vUmVrct
 nAW/QcFp6PSjUyVz/qTuXEuxKqtKv662D7Zjl2IoK1EwyO8ZBjJz9qvhFpVgexYjNH2o
 J8fc+ddajWGS8ahRyn0kEFZ/sSsQ3SDAK64G6JKFVPsBaz+5nzpWwNR0kVoatJgdNbor
 fgzDVjzs0moBbv4EkIJfOOcDIfnRmMBg6C2QQQHDOrox7Nzb6tYt8v6jSyWm0VKw3oJm
 ncpgTzBrQSB+fJChEEPIAApRkCikqMqm2S7HC+eN1Rfy6wzVAO67duUj1UtrEw/lWlB1
 +VPg==
MIME-Version: 1.0
Received: by 10.52.29.74 with SMTP id i10mr5850740vdh.40.1349887366025; Wed,
 10 Oct 2012 09:42:46 -0700 (PDT)
Received: by 10.58.207.114 with HTTP; Wed, 10 Oct 2012 09:42:45 -0700 (PDT)
In-Reply-To: <1349746003.10434.YahooMailClassic@web181706.mail.ne1.yahoo.com>
References: <1349746003.10434.YahooMailClassic@web181706.mail.ne1.yahoo.com>
Date: Wed, 10 Oct 2012 12:42:45 -0400
Message-ID: <CAFMmRNxhgJjoXsBRVT_Y=DLOA=bbk575Qw4a+tbJyx2yuFBwnA@mail.gmail.com>
Subject: Re: Kernel memory usage
From: Ryan Stone <rysto32@gmail.com>
To: Sushanth Rai <sushanth_rai@yahoo.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 16:42:47 -0000

On Mon, Oct 8, 2012 at 9:26 PM, Sushanth Rai <sushanth_rai@yahoo.com> wrote=
:
> I was trying to co-relate the o/p from "top" to that I get from vmstat -z=
. I don't have any user programs that wires memory. Given that, I'm assumin=
g the wired memory count shown by "top" is memory used by kernel. Now I wou=
ld like find out how the kernel is using this "wired" memory. So, I look at=
 dynamic memory allocated by kernel using "vmstat -z". I think memory alloc=
ated via malloc() is serviced by zones if the allocation size is <4k. So, I=
'm not sure how useful "vmstat -m" is. I also add up memory used by buffer =
cache. Is there any other significant chunk I'm missing ? Does vmstat -m sh=
ow memory that is not accounted for in vmstat -z.

All allocations by malloc that are larger than a single page are
served by uma_large_malloc, and as far as I can tell these allocations
will not be accounted for in vmstat -z (they will, of course, be
accounted for in vmstat -m).  Similarly, all allocations through
contigmalloc will not be accounted for in vmstat -z.

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 10 20:46:29 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id A53583E4
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 20:46:29 +0000 (UTC)
 (envelope-from wojtek@wojtek.tensor.gdynia.pl)
Received: from wojtek.tensor.gdynia.pl (wojtek.tensor.gdynia.pl [89.206.35.99])
 by mx1.freebsd.org (Postfix) with ESMTP id CA9E38FC08
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 20:46:28 +0000 (UTC)
Received: from wojtek.tensor.gdynia.pl (localhost [127.0.0.1])
 by wojtek.tensor.gdynia.pl (8.14.5/8.14.5) with ESMTP id q9AKkCHl002228;
 Wed, 10 Oct 2012 22:46:12 +0200 (CEST)
 (envelope-from wojtek@wojtek.tensor.gdynia.pl)
Received: from localhost (wojtek@localhost)
 by wojtek.tensor.gdynia.pl (8.14.5/8.14.5/Submit) with ESMTP id q9AKkBU6002225;
 Wed, 10 Oct 2012 22:46:11 +0200 (CEST)
 (envelope-from wojtek@wojtek.tensor.gdynia.pl)
Date: Wed, 10 Oct 2012 22:46:11 +0200 (CEST)
From: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>
To: Kurt Lidl <lidl@pix.net>
Subject: Re: SMP Version of tar
In-Reply-To: <20121010143314.GA8402@pix.net>
Message-ID: <alpine.BSF.2.00.1210102243440.2176@wojtek.tensor.gdynia.pl>
References: <5069C9FC.6020400@brandonfa.lk>
 <alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl>
 <324B736D-8961-4E44-A212-2ECF3E60F2A0@kientzle.com>
 <alpine.BSF.2.00.1210080838170.3664@wojtek.tensor.gdynia.pl>
 <20121008083814.GA5830@straylight.m.ringlet.net>
 <alpine.BSF.2.00.1210081219300.4673@wojtek.tensor.gdynia.pl>
 <15DBA1A9-A4B6-4F7D-A9DC-3412C4BE3517@kientzle.com>
 <20121010143314.GA8402@pix.net>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.7
 (wojtek.tensor.gdynia.pl [127.0.0.1]); Wed, 10 Oct 2012 22:46:12 +0200 (CEST)
Cc: Brandon Falk <bfalk_bsd@brandonfa.lk>, freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 20:46:29 -0000

>
> Tim is correct in that gzip datastream allows for concatenation of
> compressed blocks of data, so you might break the input stream into
> a bunch of blocks [A, B, C, etc], and then can append those together
> into [A.gz, B.gz, C.gz, etc], and when uncompressed, you will get
> the original input stream.
> I think that Wojciech's point is that the compressed data stream for
> for the single datastream is different than the compressed data
> stream of [A.gz, B.gz, C.gz, etc].  Both will decompress to the same
> thing, but the intermediate compressed representation will be different.

So - after your response it is clear that parallel generated tar.gz will 
be different and have slightly (can be ignored) worse compression, and 
WILL be compatible with standard gzip as it can decompress from multiple 
streams which i wasn't aware of.

That's good. at the same time parallel tar will go back to single thread 
when unpacking standard .tar.gz - not a big deal, as gzip decompression is 
untrafast and I/O is usually a limit.


From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 10 21:44:17 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 0843B8F3;
 Wed, 10 Oct 2012 21:44:17 +0000 (UTC)
 (envelope-from carl.r.delsey@intel.com)
Received: from mga03.intel.com (mga03.intel.com [143.182.124.21])
 by mx1.freebsd.org (Postfix) with ESMTP id C72F88FC14;
 Wed, 10 Oct 2012 21:44:16 +0000 (UTC)
Received: from azsmga002.ch.intel.com ([10.2.17.35])
 by azsmga101.ch.intel.com with ESMTP; 10 Oct 2012 14:44:10 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.80,565,1344236400"; d="scan'208";a="154794507"
Received: from crdelsey-fbsd.ch.intel.com (HELO [10.2.105.127])
 ([10.2.105.127])
 by AZSMGA002.ch.intel.com with ESMTP; 10 Oct 2012 14:44:09 -0700
Message-ID: <5075EC29.1010907@intel.com>
Date: Wed, 10 Oct 2012 14:44:09 -0700
From: Carl Delsey <carl.r.delsey@intel.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:13.0) Gecko/20120724 Thunderbird/13.0.1
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
Subject: Re: No bus_space_read_8 on x86 ?
References: <506DC574.9010300@intel.com> <201210051208.45550.jhb@freebsd.org>
 <8BC4C95F-2D10-46A5-89C8-74801BB4E23A@bsdimp.com>
 <201210091154.15873.jhb@freebsd.org>
In-Reply-To: <201210091154.15873.jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 21:44:17 -0000

Sorry for the slow response. I was dealing with a bit of a family 
emergency. Responses inline below.

On 10/09/12 08:54, John Baldwin wrote:
> On Monday, October 08, 2012 4:59:24 pm Warner Losh wrote:
>> On Oct 5, 2012, at 10:08 AM, John Baldwin wrote:
<snip>
>>> I think cxgb* already have an implementation.  For amd64 we should certainly
>>> have bus_space_*_8(), at least for SYS_RES_MEMORY.  I think they should fail
>>> for SYS_RES_IOPORT.  I don't think we can force a compile-time error though,
>>> would just have to return -1 on reads or some such?

Yes. Exactly what I was thinking.

>> I believe it was because bus reads weren't guaranteed to be atomic on i386.
>> don't know if that's still the case or a concern, but it was an intentional omission.
> True.  If you are on a 32-bit system you can read the two 4 byte values and
> then build a 64-bit value.  For 64-bit platforms we should offer bus_read_8()
> however.

I believe there is still no way to perform a 64-bit read on a i386 (or 
at least without messing with SSE instructions), but if you have to read 
a 64-bit register, you are stuck with doing two 32-bit reads and 
concatenating them. I figure we may as well provide an implementation 
for those who have to do that as well as the implementation for 64-bit.

Anyhow, it sounds like we are basically in agreement. I'll put together 
a patch and send it out for review.

Thanks,
Carl


From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 10 22:09:15 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id C92823D5;
 Wed, 10 Oct 2012 22:09:15 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id 0FE5E8FC14;
 Wed, 10 Oct 2012 22:09:14 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEAJ+LclCDaFvO/2dsb2JhbABFFoV7uhmCIAEBAQQBAQEgBCcgBgUbDgoCAg0ZAikBCSYGCAcEARwBA4dkC6ZJkXWBIYouGoRkgRIDkz6CLYEVjxmDCYFHNA
X-IronPort-AV: E=Sophos;i="4.80,567,1344225600"; d="scan'208";a="185836181"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu-pri.mail.uoguelph.ca with ESMTP; 10 Oct 2012 18:09:07 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id EC7FAB3F62;
 Wed, 10 Oct 2012 18:09:07 -0400 (EDT)
Date: Wed, 10 Oct 2012 18:09:07 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Nikolay Denev <ndenev@gmail.com>
Message-ID: <1071150615.2039567.1349906947942.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <B2CD757D-25D8-4353-8487-B3583EEC57FC@gmail.com>
Subject: Re: NFS server bottlenecks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@freebsd.org, Garrett Wollman <wollman@freebsd.org>,
 freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 22:09:16 -0000

Nikolay Denev wrote:
> On Oct 10, 2012, at 3:18 AM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> 
> > Nikolay Denev wrote:
> >> On Oct 4, 2012, at 12:36 AM, Rick Macklem <rmacklem@uoguelph.ca>
> >> wrote:
> >>
> >>> Garrett Wollman wrote:
> >>>> <<On Wed, 3 Oct 2012 09:21:06 -0400 (EDT), Rick Macklem
> >>>> <rmacklem@uoguelph.ca> said:
> >>>>
> >>>>>> Simple: just use a sepatate mutex for each list that a cache
> >>>>>> entry
> >>>>>> is on, rather than a global lock for everything. This would
> >>>>>> reduce
> >>>>>> the mutex contention, but I'm not sure how significantly since
> >>>>>> I
> >>>>>> don't have the means to measure it yet.
> >>>>>>
> >>>>> Well, since the cache trimming is removing entries from the
> >>>>> lists,
> >>>>> I
> >>>>> don't
> >>>>> see how that can be done with a global lock for list updates?
> >>>>
> >>>> Well, the global lock is what we have now, but the cache trimming
> >>>> process only looks at one list at a time, so not locking the list
> >>>> that
> >>>> isn't being iterated over probably wouldn't hurt, unless there's
> >>>> some
> >>>> mechanism (that I didn't see) for entries to move from one list
> >>>> to
> >>>> another. Note that I'm considering each hash bucket a separate
> >>>> "list". (One issue to worry about in that case would be
> >>>> cache-line
> >>>> contention in the array of hash buckets; perhaps
> >>>> NFSRVCACHE_HASHSIZE
> >>>> ought to be increased to reduce that.)
> >>>>
> >>> Yea, a separate mutex for each hash list might help. There is also
> >>> the
> >>> LRU list that all entries end up on, that gets used by the
> >>> trimming
> >>> code.
> >>> (I think? I wrote this stuff about 8 years ago, so I haven't
> >>> looked
> >>> at
> >>> it in a while.)
> >>>
> >>> Also, increasing the hash table size is probably a good idea,
> >>> especially
> >>> if you reduce how aggressively the cache is trimmed.
> >>>
> >>>>> Only doing it once/sec would result in a very large cache when
> >>>>> bursts of
> >>>>> traffic arrives.
> >>>>
> >>>> My servers have 96 GB of memory so that's not a big deal for me.
> >>>>
> >>> This code was originally "production tested" on a server with
> >>> 1Gbyte,
> >>> so times have changed a bit;-)
> >>>
> >>>>> I'm not sure I see why doing it as a separate thread will
> >>>>> improve
> >>>>> things.
> >>>>> There are N nfsd threads already (N can be bumped up to 256 if
> >>>>> you
> >>>>> wish)
> >>>>> and having a bunch more "cache trimming threads" would just
> >>>>> increase
> >>>>> contention, wouldn't it?
> >>>>
> >>>> Only one cache-trimming thread. The cache trim holds the (global)
> >>>> mutex for much longer than any individual nfsd service thread has
> >>>> any
> >>>> need to, and having N threads doing that in parallel is why it's
> >>>> so
> >>>> heavily contended. If there's only one thread doing the trim,
> >>>> then
> >>>> the nfsd service threads aren't spending time either contending
> >>>> on
> >>>> the
> >>>> mutex (it will be held less frequently and for shorter periods).
> >>>>
> >>> I think the little drc2.patch which will keep the nfsd threads
> >>> from
> >>> acquiring the mutex and doing the trimming most of the time, might
> >>> be
> >>> sufficient. I still don't see why a separate trimming thread will
> >>> be
> >>> an advantage. I'd also be worried that the one cache trimming
> >>> thread
> >>> won't get the job done soon enough.
> >>>
> >>> When I did production testing on a 1Gbyte server that saw a peak
> >>> load of about 100RPCs/sec, it was necessary to trim aggressively.
> >>> (Although I'd be tempted to say that a server with 1Gbyte is no
> >>> longer relevant, I recently recall someone trying to run FreeBSD
> >>> on a i486, although I doubt they wanted to run the nfsd on it.)
> >>>
> >>>>> The only negative effect I can think of w.r.t. having the nfsd
> >>>>> threads doing it would be a (I believe negligible) increase in
> >>>>> RPC
> >>>>> response times (the time the nfsd thread spends trimming the
> >>>>> cache).
> >>>>> As noted, I think this time would be negligible compared to disk
> >>>>> I/O
> >>>>> and network transit times in the total RPC response time?
> >>>>
> >>>> With adaptive mutexes, many CPUs, lots of in-memory cache, and
> >>>> 10G
> >>>> network connectivity, spinning on a contended mutex takes a
> >>>> significant amount of CPU time. (For the current design of the
> >>>> NFS
> >>>> server, it may actually be a win to turn off adaptive mutexes --
> >>>> I
> >>>> should give that a try once I'm able to do more testing.)
> >>>>
> >>> Have fun with it. Let me know when you have what you think is a
> >>> good
> >>> patch.
> >>>
> >>> rick
> >>>
> >>>> -GAWollman
> >>>> _______________________________________________
> >>>> freebsd-hackers@freebsd.org mailing list
> >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> >>>> To unsubscribe, send any mail to
> >>>> "freebsd-hackers-unsubscribe@freebsd.org"
> >>> _______________________________________________
> >>> freebsd-fs@freebsd.org mailing list
> >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>> To unsubscribe, send any mail to
> >>> "freebsd-fs-unsubscribe@freebsd.org"
> >>
> >> My quest for IOPS over NFS continues :)
> >> So far I'm not able to achieve more than about 3000 8K read
> >> requests
> >> over NFS,
> >> while the server locally gives much more.
> >> And this is all from a file that is completely in ARC cache, no
> >> disk
> >> IO involved.
> >>
> > Just out of curiousity, why do you use 8K reads instead of 64K
> > reads.
> > Since the RPC overhead (including the DRC functions) is per RPC,
> > doing
> > fewer larger RPCs should usually work better. (Sometimes large
> > rsize/wsize
> > values generate too large a burst of traffic for a network interface
> > to
> > handle and then the rsize/wsize has to be decreased to avoid this
> > issue.)
> >
> > And, although this experiment seems useful for testing patches that
> > try
> > and reduce DRC CPU overheads, most "real" NFS servers will be doing
> > disk
> > I/O.
> >
> 
> This is the default blocksize the Oracle and probably most databases
> use.
> It uses also larger blocks, but for small random reads in OLTP
> applications this is what is used.
> 
If the client is doing 8K reads, you could increase the read ahead
"readahead=N" (N up to 16), to try and increase the bandwidth.
(But if the CPU is 99% busy, then I don't think it will matter.)

> 
> >> I've snatched some sample DTrace script from the net : [
> >> http://utcc.utoronto.ca/~cks/space/blog/solaris/DTraceQuantizationNotes
> >> ]
> >>
> >> And modified it for our new NFS server :
> >>
> >> #!/usr/sbin/dtrace -qs
> >>
> >> fbt:kernel:nfsrvd_*:entry
> >> {
> >> self->ts = timestamp;
> >> @counts[probefunc] = count();
> >> }
> >>
> >> fbt:kernel:nfsrvd_*:return
> >> / self->ts > 0 /
> >> {
> >> this->delta = (timestamp-self->ts)/1000000;
> >> }
> >>
> >> fbt:kernel:nfsrvd_*:return
> >> / self->ts > 0 && this->delta > 100 /
> >> {
> >> @slow[probefunc, "ms"] = lquantize(this->delta, 100, 500, 50);
> >> }
> >>
> >> fbt:kernel:nfsrvd_*:return
> >> / self->ts > 0 /
> >> {
> >> @dist[probefunc, "ms"] = quantize(this->delta);
> >> self->ts = 0;
> >> }
> >>
> >> END
> >> {
> >> printf("\n");
> >> printa("function %-20s %@10d\n", @counts);
> >> printf("\n");
> >> printa("function %s(), time in %s:%@d\n", @dist);
> >> printf("\n");
> >> printa("function %s(), time in %s for >= 100 ms:%@d\n", @slow);
> >> }
> >>
> >> And here's a sample output from one or two minutes during the run
> >> of
> >> Oracle's ORION benchmark
> >> tool from a Linux machine, on a 32G file on NFS mount over 10G
> >> ethernet:
> >>
> >> [16:01]root@goliath:/home/ndenev# ./nfsrvd.d
> >> ^C
> >>
> >> function nfsrvd_access 4
> >> function nfsrvd_statfs 10
> >> function nfsrvd_getattr 14
> >> function nfsrvd_commit 76
> >> function nfsrvd_sentcache 110048
> >> function nfsrvd_write 110048
> >> function nfsrvd_read 283648
> >> function nfsrvd_dorpc 393800
> >> function nfsrvd_getcache 393800
> >> function nfsrvd_rephead 393800
> >> function nfsrvd_updatecache 393800
> >>
> >> function nfsrvd_access(), time in ms:
> >> value ------------- Distribution ------------- count
> >> -1 | 0
> >> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4
> >> 1 | 0
> >>
> >> function nfsrvd_statfs(), time in ms:
> >> value ------------- Distribution ------------- count
> >> -1 | 0
> >> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
> >> 1 | 0
> >>
> >> function nfsrvd_getattr(), time in ms:
> >> value ------------- Distribution ------------- count
> >> -1 | 0
> >> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14
> >> 1 | 0
> >>
> >> function nfsrvd_sentcache(), time in ms:
> >> value ------------- Distribution ------------- count
> >> -1 | 0
> >> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110048
> >> 1 | 0
> >>
> >> function nfsrvd_rephead(), time in ms:
> >> value ------------- Distribution ------------- count
> >> -1 | 0
> >> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
> >> 1 | 0
> >>
> >> function nfsrvd_updatecache(), time in ms:
> >> value ------------- Distribution ------------- count
> >> -1 | 0
> >> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
> >> 1 | 0
> >>
> >> function nfsrvd_getcache(), time in ms:
> >> value ------------- Distribution ------------- count
> >> -1 | 0
> >> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393798
> >> 1 | 1
> >> 2 | 0
> >> 4 | 1
> >> 8 | 0
> >>
> >> function nfsrvd_write(), time in ms:
> >> value ------------- Distribution ------------- count
> >> -1 | 0
> >> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110039
> >> 1 | 5
> >> 2 | 4
> >> 4 | 0
> >>
> >> function nfsrvd_read(), time in ms:
> >> value ------------- Distribution ------------- count
> >> -1 | 0
> >> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 283622
> >> 1 | 19
> >> 2 | 3
> >> 4 | 2
> >> 8 | 0
> >> 16 | 1
> >> 32 | 0
> >> 64 | 0
> >> 128 | 0
> >> 256 | 1
> >> 512 | 0
> >>
> >> function nfsrvd_commit(), time in ms:
> >> value ------------- Distribution ------------- count
> >> -1 | 0
> >> 0 |@@@@@@@@@@@@@@@@@@@@@@@ 44
> >> 1 |@@@@@@@ 14
> >> 2 | 0
> >> 4 |@ 1
> >> 8 |@ 1
> >> 16 | 0
> >> 32 |@@@@@@@ 14
> >> 64 |@ 2
> >> 128 | 0
> >>
> >>
> >> function nfsrvd_commit(), time in ms for >= 100 ms:
> >> value ------------- Distribution ------------- count
> >> < 100 | 0
> >> 100 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
> >> 150 | 0
> >>
> >> function nfsrvd_read(), time in ms for >= 100 ms:
> >> value ------------- Distribution ------------- count
> >> 250 | 0
> >> 300 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
> >> 350 | 0
> >>
> >>
> >> Looks like the nfs server cache functions are quite fast, but
> >> extremely frequently called.
> >>
> > Yep, they are called for every RPC.
> >
> > I may try coding up a patch that replaces the single mutex with
> > one for each hash bucket, for TCP.
> >
> > I'll post if/when I get this patch to a testing/review stage, rick
> >
> 
> Cool.
> 
> I've readjusted the precision of the dtrace script a bit, and I can
> see
> now the following three functions as taking most of the time :
> nfsrvd_getcache(), nfsrc_trimcache() and nfsrvd_updatecache()
> 
> This was recorded during a oracle benchmark run called SLOB, which
> caused 99% cpu load on the NFS server.
> 
Even with the drc2.patch and a large value for vfs.nfsd.tcphighwater?
(Assuming the mounts are TCP ones.)

Have fun with it, rick

> 
> >> I hope someone can find this information useful.
> >>
> >> _______________________________________________
> >> freebsd-hackers@freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> >> To unsubscribe, send any mail to
> >> "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 11 05:46:55 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 0E7AA9B;
 Thu, 11 Oct 2012 05:46:55 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-we0-f182.google.com (mail-we0-f182.google.com
 [74.125.82.182])
 by mx1.freebsd.org (Postfix) with ESMTP id 3897A8FC18;
 Thu, 11 Oct 2012 05:46:53 +0000 (UTC)
Received: by mail-we0-f182.google.com with SMTP id x43so1026256wey.13
 for <multiple recipients>; Wed, 10 Oct 2012 22:46:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=3fx+jBrdvTe9PkuSAKy238hn/Sx5nfvqwmYbIRsQOZU=;
 b=r3hd8iJP0VR5cEDzd0gSP+m28tBm5cYtlLiK9SnVplj7rkOpiwuiOia9xNizCrAJIr
 J5gOYpXnKficKdT15eK2JwhWzYh9eZtO25TPbGUroqYf6iyIseTQ9q7l/NSHc9abKsVg
 OMHVf85UWPwSKoqFK8JGNoZOG/2Zh8eTRdT2ADo8zFC0NdjB0/8g03WQ//l3gEZHRnpi
 SQeYcpSEKxtslYKMtCcFD5cEalMtMs+2Po9K+f3JxryWdbZq4O/89vj+goURarLmjR1r
 Q6h0F3ohvhU+WsCbgTBARAfKZpTLezwZAN4Q+2mxrwewcHjTKXt5uuNvLeGvJsCp1GT0
 qDIw==
Received: by 10.180.97.35 with SMTP id dx3mr18002577wib.14.1349934412989;
 Wed, 10 Oct 2012 22:46:52 -0700 (PDT)
Received: from [10.0.0.86] ([93.152.184.10])
 by mx.google.com with ESMTPS id b3sm33623528wie.0.2012.10.10.22.46.50
 (version=TLSv1/SSLv3 cipher=OTHER);
 Wed, 10 Oct 2012 22:46:52 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <1071150615.2039567.1349906947942.JavaMail.root@erie.cs.uoguelph.ca>
Date: Thu, 11 Oct 2012 08:46:49 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <19724137-ABB0-43AF-BCB9-EBE8ACD6E3BD@gmail.com>
References: <1071150615.2039567.1349906947942.JavaMail.root@erie.cs.uoguelph.ca>
To: Rick Macklem <rmacklem@uoguelph.ca>
X-Mailer: Apple Mail (2.1498)
Cc: rmacklem@freebsd.org, Garrett Wollman <wollman@freebsd.org>,
 freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2012 05:46:55 -0000


On Oct 11, 2012, at 1:09 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Nikolay Denev wrote:
>> On Oct 10, 2012, at 3:18 AM, Rick Macklem <rmacklem@uoguelph.ca>
>> wrote:
>>=20
>>> Nikolay Denev wrote:
>>>> On Oct 4, 2012, at 12:36 AM, Rick Macklem <rmacklem@uoguelph.ca>
>>>> wrote:
>>>>=20
>>>>> Garrett Wollman wrote:
>>>>>> <<On Wed, 3 Oct 2012 09:21:06 -0400 (EDT), Rick Macklem
>>>>>> <rmacklem@uoguelph.ca> said:
>>>>>>=20
>>>>>>>> Simple: just use a sepatate mutex for each list that a cache
>>>>>>>> entry
>>>>>>>> is on, rather than a global lock for everything. This would
>>>>>>>> reduce
>>>>>>>> the mutex contention, but I'm not sure how significantly since
>>>>>>>> I
>>>>>>>> don't have the means to measure it yet.
>>>>>>>>=20
>>>>>>> Well, since the cache trimming is removing entries from the
>>>>>>> lists,
>>>>>>> I
>>>>>>> don't
>>>>>>> see how that can be done with a global lock for list updates?
>>>>>>=20
>>>>>> Well, the global lock is what we have now, but the cache trimming
>>>>>> process only looks at one list at a time, so not locking the list
>>>>>> that
>>>>>> isn't being iterated over probably wouldn't hurt, unless there's
>>>>>> some
>>>>>> mechanism (that I didn't see) for entries to move from one list
>>>>>> to
>>>>>> another. Note that I'm considering each hash bucket a separate
>>>>>> "list". (One issue to worry about in that case would be
>>>>>> cache-line
>>>>>> contention in the array of hash buckets; perhaps
>>>>>> NFSRVCACHE_HASHSIZE
>>>>>> ought to be increased to reduce that.)
>>>>>>=20
>>>>> Yea, a separate mutex for each hash list might help. There is also
>>>>> the
>>>>> LRU list that all entries end up on, that gets used by the
>>>>> trimming
>>>>> code.
>>>>> (I think? I wrote this stuff about 8 years ago, so I haven't
>>>>> looked
>>>>> at
>>>>> it in a while.)
>>>>>=20
>>>>> Also, increasing the hash table size is probably a good idea,
>>>>> especially
>>>>> if you reduce how aggressively the cache is trimmed.
>>>>>=20
>>>>>>> Only doing it once/sec would result in a very large cache when
>>>>>>> bursts of
>>>>>>> traffic arrives.
>>>>>>=20
>>>>>> My servers have 96 GB of memory so that's not a big deal for me.
>>>>>>=20
>>>>> This code was originally "production tested" on a server with
>>>>> 1Gbyte,
>>>>> so times have changed a bit;-)
>>>>>=20
>>>>>>> I'm not sure I see why doing it as a separate thread will
>>>>>>> improve
>>>>>>> things.
>>>>>>> There are N nfsd threads already (N can be bumped up to 256 if
>>>>>>> you
>>>>>>> wish)
>>>>>>> and having a bunch more "cache trimming threads" would just
>>>>>>> increase
>>>>>>> contention, wouldn't it?
>>>>>>=20
>>>>>> Only one cache-trimming thread. The cache trim holds the (global)
>>>>>> mutex for much longer than any individual nfsd service thread has
>>>>>> any
>>>>>> need to, and having N threads doing that in parallel is why it's
>>>>>> so
>>>>>> heavily contended. If there's only one thread doing the trim,
>>>>>> then
>>>>>> the nfsd service threads aren't spending time either contending
>>>>>> on
>>>>>> the
>>>>>> mutex (it will be held less frequently and for shorter periods).
>>>>>>=20
>>>>> I think the little drc2.patch which will keep the nfsd threads
>>>>> from
>>>>> acquiring the mutex and doing the trimming most of the time, might
>>>>> be
>>>>> sufficient. I still don't see why a separate trimming thread will
>>>>> be
>>>>> an advantage. I'd also be worried that the one cache trimming
>>>>> thread
>>>>> won't get the job done soon enough.
>>>>>=20
>>>>> When I did production testing on a 1Gbyte server that saw a peak
>>>>> load of about 100RPCs/sec, it was necessary to trim aggressively.
>>>>> (Although I'd be tempted to say that a server with 1Gbyte is no
>>>>> longer relevant, I recently recall someone trying to run FreeBSD
>>>>> on a i486, although I doubt they wanted to run the nfsd on it.)
>>>>>=20
>>>>>>> The only negative effect I can think of w.r.t. having the nfsd
>>>>>>> threads doing it would be a (I believe negligible) increase in
>>>>>>> RPC
>>>>>>> response times (the time the nfsd thread spends trimming the
>>>>>>> cache).
>>>>>>> As noted, I think this time would be negligible compared to disk
>>>>>>> I/O
>>>>>>> and network transit times in the total RPC response time?
>>>>>>=20
>>>>>> With adaptive mutexes, many CPUs, lots of in-memory cache, and
>>>>>> 10G
>>>>>> network connectivity, spinning on a contended mutex takes a
>>>>>> significant amount of CPU time. (For the current design of the
>>>>>> NFS
>>>>>> server, it may actually be a win to turn off adaptive mutexes --
>>>>>> I
>>>>>> should give that a try once I'm able to do more testing.)
>>>>>>=20
>>>>> Have fun with it. Let me know when you have what you think is a
>>>>> good
>>>>> patch.
>>>>>=20
>>>>> rick
>>>>>=20
>>>>>> -GAWollman
>>>>>> _______________________________________________
>>>>>> freebsd-hackers@freebsd.org mailing list
>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>>>>> To unsubscribe, send any mail to
>>>>>> "freebsd-hackers-unsubscribe@freebsd.org"
>>>>> _______________________________________________
>>>>> freebsd-fs@freebsd.org mailing list
>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>> To unsubscribe, send any mail to
>>>>> "freebsd-fs-unsubscribe@freebsd.org"
>>>>=20
>>>> My quest for IOPS over NFS continues :)
>>>> So far I'm not able to achieve more than about 3000 8K read
>>>> requests
>>>> over NFS,
>>>> while the server locally gives much more.
>>>> And this is all from a file that is completely in ARC cache, no
>>>> disk
>>>> IO involved.
>>>>=20
>>> Just out of curiousity, why do you use 8K reads instead of 64K
>>> reads.
>>> Since the RPC overhead (including the DRC functions) is per RPC,
>>> doing
>>> fewer larger RPCs should usually work better. (Sometimes large
>>> rsize/wsize
>>> values generate too large a burst of traffic for a network interface
>>> to
>>> handle and then the rsize/wsize has to be decreased to avoid this
>>> issue.)
>>>=20
>>> And, although this experiment seems useful for testing patches that
>>> try
>>> and reduce DRC CPU overheads, most "real" NFS servers will be doing
>>> disk
>>> I/O.
>>>=20
>>=20
>> This is the default blocksize the Oracle and probably most databases
>> use.
>> It uses also larger blocks, but for small random reads in OLTP
>> applications this is what is used.
>>=20
> If the client is doing 8K reads, you could increase the read ahead
> "readahead=3DN" (N up to 16), to try and increase the bandwidth.
> (But if the CPU is 99% busy, then I don't think it will matter.)

I'll try to check if this is possible to be set, as we are testing not =
only with the Linux NFS client,
but also with the Oracle's built in so called DirectNFS client that is =
built in to the app.

>=20
>>=20
>>>> I've snatched some sample DTrace script from the net : [
>>>> =
http://utcc.utoronto.ca/~cks/space/blog/solaris/DTraceQuantizationNotes
>>>> ]
>>>>=20
>>>> And modified it for our new NFS server :
>>>>=20
>>>> #!/usr/sbin/dtrace -qs
>>>>=20
>>>> fbt:kernel:nfsrvd_*:entry
>>>> {
>>>> self->ts =3D timestamp;
>>>> @counts[probefunc] =3D count();
>>>> }
>>>>=20
>>>> fbt:kernel:nfsrvd_*:return
>>>> / self->ts > 0 /
>>>> {
>>>> this->delta =3D (timestamp-self->ts)/1000000;
>>>> }
>>>>=20
>>>> fbt:kernel:nfsrvd_*:return
>>>> / self->ts > 0 && this->delta > 100 /
>>>> {
>>>> @slow[probefunc, "ms"] =3D lquantize(this->delta, 100, 500, 50);
>>>> }
>>>>=20
>>>> fbt:kernel:nfsrvd_*:return
>>>> / self->ts > 0 /
>>>> {
>>>> @dist[probefunc, "ms"] =3D quantize(this->delta);
>>>> self->ts =3D 0;
>>>> }
>>>>=20
>>>> END
>>>> {
>>>> printf("\n");
>>>> printa("function %-20s %@10d\n", @counts);
>>>> printf("\n");
>>>> printa("function %s(), time in %s:%@d\n", @dist);
>>>> printf("\n");
>>>> printa("function %s(), time in %s for >=3D 100 ms:%@d\n", @slow);
>>>> }
>>>>=20
>>>> And here's a sample output from one or two minutes during the run
>>>> of
>>>> Oracle's ORION benchmark
>>>> tool from a Linux machine, on a 32G file on NFS mount over 10G
>>>> ethernet:
>>>>=20
>>>> [16:01]root@goliath:/home/ndenev# ./nfsrvd.d
>>>> ^C
>>>>=20
>>>> function nfsrvd_access 4
>>>> function nfsrvd_statfs 10
>>>> function nfsrvd_getattr 14
>>>> function nfsrvd_commit 76
>>>> function nfsrvd_sentcache 110048
>>>> function nfsrvd_write 110048
>>>> function nfsrvd_read 283648
>>>> function nfsrvd_dorpc 393800
>>>> function nfsrvd_getcache 393800
>>>> function nfsrvd_rephead 393800
>>>> function nfsrvd_updatecache 393800
>>>>=20
>>>> function nfsrvd_access(), time in ms:
>>>> value ------------- Distribution ------------- count
>>>> -1 | 0
>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4
>>>> 1 | 0
>>>>=20
>>>> function nfsrvd_statfs(), time in ms:
>>>> value ------------- Distribution ------------- count
>>>> -1 | 0
>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
>>>> 1 | 0
>>>>=20
>>>> function nfsrvd_getattr(), time in ms:
>>>> value ------------- Distribution ------------- count
>>>> -1 | 0
>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14
>>>> 1 | 0
>>>>=20
>>>> function nfsrvd_sentcache(), time in ms:
>>>> value ------------- Distribution ------------- count
>>>> -1 | 0
>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110048
>>>> 1 | 0
>>>>=20
>>>> function nfsrvd_rephead(), time in ms:
>>>> value ------------- Distribution ------------- count
>>>> -1 | 0
>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
>>>> 1 | 0
>>>>=20
>>>> function nfsrvd_updatecache(), time in ms:
>>>> value ------------- Distribution ------------- count
>>>> -1 | 0
>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
>>>> 1 | 0
>>>>=20
>>>> function nfsrvd_getcache(), time in ms:
>>>> value ------------- Distribution ------------- count
>>>> -1 | 0
>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393798
>>>> 1 | 1
>>>> 2 | 0
>>>> 4 | 1
>>>> 8 | 0
>>>>=20
>>>> function nfsrvd_write(), time in ms:
>>>> value ------------- Distribution ------------- count
>>>> -1 | 0
>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110039
>>>> 1 | 5
>>>> 2 | 4
>>>> 4 | 0
>>>>=20
>>>> function nfsrvd_read(), time in ms:
>>>> value ------------- Distribution ------------- count
>>>> -1 | 0
>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 283622
>>>> 1 | 19
>>>> 2 | 3
>>>> 4 | 2
>>>> 8 | 0
>>>> 16 | 1
>>>> 32 | 0
>>>> 64 | 0
>>>> 128 | 0
>>>> 256 | 1
>>>> 512 | 0
>>>>=20
>>>> function nfsrvd_commit(), time in ms:
>>>> value ------------- Distribution ------------- count
>>>> -1 | 0
>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@ 44
>>>> 1 |@@@@@@@ 14
>>>> 2 | 0
>>>> 4 |@ 1
>>>> 8 |@ 1
>>>> 16 | 0
>>>> 32 |@@@@@@@ 14
>>>> 64 |@ 2
>>>> 128 | 0
>>>>=20
>>>>=20
>>>> function nfsrvd_commit(), time in ms for >=3D 100 ms:
>>>> value ------------- Distribution ------------- count
>>>> < 100 | 0
>>>> 100 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
>>>> 150 | 0
>>>>=20
>>>> function nfsrvd_read(), time in ms for >=3D 100 ms:
>>>> value ------------- Distribution ------------- count
>>>> 250 | 0
>>>> 300 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
>>>> 350 | 0
>>>>=20
>>>>=20
>>>> Looks like the nfs server cache functions are quite fast, but
>>>> extremely frequently called.
>>>>=20
>>> Yep, they are called for every RPC.
>>>=20
>>> I may try coding up a patch that replaces the single mutex with
>>> one for each hash bucket, for TCP.
>>>=20
>>> I'll post if/when I get this patch to a testing/review stage, rick
>>>=20
>>=20
>> Cool.
>>=20
>> I've readjusted the precision of the dtrace script a bit, and I can
>> see
>> now the following three functions as taking most of the time :
>> nfsrvd_getcache(), nfsrc_trimcache() and nfsrvd_updatecache()
>>=20
>> This was recorded during a oracle benchmark run called SLOB, which
>> caused 99% cpu load on the NFS server.
>>=20
> Even with the drc2.patch and a large value for vfs.nfsd.tcphighwater?
> (Assuming the mounts are TCP ones.)
>=20
> Have fun with it, rick
>=20

I had upped it, but probably not enough. I'm now running with =
vfs.nfsd.tcphighwater set
to some ridiculous number, and NFSRVCACHE_HASHSIZE set to 500.
So far it looks like good improvement as those functions no longer show =
up in the dtrace script output.
I'll run some more benchmarks and testing today.

Thanks!

>>=20
>>>> I hope someone can find this information useful.
>>>>=20
>>>> _______________________________________________
>>>> freebsd-hackers@freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>>> To unsubscribe, send any mail to
>>>> "freebsd-hackers-unsubscribe@freebsd.org"


From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 11 11:49:44 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 0F75F5F1
 for <freebsd-hackers@freebsd.org>; Thu, 11 Oct 2012 11:49:44 +0000 (UTC)
 (envelope-from kwiat3k@panic.pl)
Received: from mail.panic.pl (mail.panic.pl [188.116.33.105])
 by mx1.freebsd.org (Postfix) with ESMTP id BB66E8FC14
 for <freebsd-hackers@freebsd.org>; Thu, 11 Oct 2012 11:49:43 +0000 (UTC)
Received: from mail.panic.pl (unknown [188.116.33.105])
 by mail.panic.pl (Postfix) with ESMTP id 73C789B794
 for <freebsd-hackers@freebsd.org>; Thu, 11 Oct 2012 13:40:56 +0200 (CEST)
X-Virus-Scanned: amavisd-new at panic.pl
Received: from mail.panic.pl ([188.116.33.105])
 by mail.panic.pl (mail.panic.pl [188.116.33.105]) (amavisd-new, port 10024)
 with ESMTP id vx+vnmdr1cJG for <freebsd-hackers@freebsd.org>;
 Thu, 11 Oct 2012 13:40:56 +0200 (CEST)
Received: from localhost.localdomain (admin.wp-sa.pl [212.77.105.137])
 by mail.panic.pl (Postfix) with ESMTPSA id 1E64A9B78D
 for <freebsd-hackers@freebsd.org>; Thu, 11 Oct 2012 13:40:56 +0200 (CEST)
Date: Thu, 11 Oct 2012 13:41:08 +0200
From: Mateusz Kwiatkowski <kwiat3k@panic.pl>
To: freebsd-hackers@freebsd.org
Subject: truss kills process?
Message-ID: <20121011134108.65fd11ba@panic.pl>
Organization: Panic.PL
X-Mailer: Claws Mail 3.8.1 (GTK+ 2.24.13; x86_64-unknown-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2012 11:49:44 -0000

Hello, 

We noticed quite strange behaviour of truss:

# sleep 100 &
[1] 93212
# truss -p 93212
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0)
= 0 (0x0) sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0)
= 0 (0x0) sigprocmask(SIG_SETMASK,0x0,0x0)                 = 0 (0x0)
process exit, rval = 0
[1]  + done       sleep 100

Sleep ends immediately instead of waiting for desired number of seconds.
I wonder if this is normal behavior or maybe a bug?
Checked under 8.2-RELEASE-p3 and 9.0-RELEASE.

-- 
Mateusz Kwiatkowski

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 11 13:06:44 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 069DA3C6
 for <hackers@FreeBSD.org>; Thu, 11 Oct 2012 13:06:44 +0000 (UTC)
 (envelope-from erik@cederstrand.dk)
Received: from csmtp3.one.com (csmtp3.one.com [91.198.169.23])
 by mx1.freebsd.org (Postfix) with ESMTP id B7DFB8FC18
 for <hackers@FreeBSD.org>; Thu, 11 Oct 2012 13:06:43 +0000 (UTC)
Received: from [192.168.1.18] (unknown [217.157.7.221])
 by csmtp3.one.com (Postfix) with ESMTPA id 66F052404978
 for <hackers@FreeBSD.org>; Thu, 11 Oct 2012 13:06:42 +0000 (UTC)
From: Erik Cederstrand <erik@cederstrand.dk>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Subject: curcpu false positive?
Message-Id: <3A22DF7A-00BB-408C-8F76-C1E119E0E48C@cederstrand.dk>
Date: Thu, 11 Oct 2012 15:06:41 +0200
To: FreeBSD Hackers <hackers@FreeBSD.org>
Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\))
X-Mailer: Apple Mail (2.1486)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2012 13:06:44 -0000

Hello,

I'm looking at some Clang Static Analyzer reports in the kernel, and a =
lot of them point back to a null pointer dereference in __pcpu_type =
(sys/amd64/include/pcpu.h:102) which is defined as:

102	 /*
103	 * Evaluates to the type of the per-cpu variable name.
104	 */
105	#define __pcpu_type(name)                                        =
       \
106	        __typeof(((struct pcpu *)0)->name)


Which indeed looks like a NULL pointer dereference. Looking at the =
latest commit message there, I'm sure the code is correct, but I'm =
unsure why the null pointer is OK. I'd appreciate an explanation :-)

Thanks,
Erik=

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 11 13:11:23 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id A92486CD
 for <hackers@FreeBSD.org>; Thu, 11 Oct 2012 13:11:23 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 00B648FC12
 for <hackers@FreeBSD.org>; Thu, 11 Oct 2012 13:11:21 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
 [212.40.38.101])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA03299;
 Thu, 11 Oct 2012 16:11:18 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Message-ID: <5076C576.3020306@FreeBSD.org>
Date: Thu, 11 Oct 2012 16:11:18 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:15.0) Gecko/20120911 Thunderbird/15.0.1
MIME-Version: 1.0
To: Erik Cederstrand <erik@cederstrand.dk>
Subject: Re: curcpu false positive?
References: <3A22DF7A-00BB-408C-8F76-C1E119E0E48C@cederstrand.dk>
In-Reply-To: <3A22DF7A-00BB-408C-8F76-C1E119E0E48C@cederstrand.dk>
X-Enigmail-Version: 1.4.3
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: FreeBSD Hackers <hackers@FreeBSD.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2012 13:11:23 -0000

on 11/10/2012 16:06 Erik Cederstrand said the following:
> Hello,
> 
> I'm looking at some Clang Static Analyzer reports in the kernel, and a lot of them point back to a null pointer dereference in __pcpu_type (sys/amd64/include/pcpu.h:102) which is defined as:
> 
> 102	 /*
> 103	 * Evaluates to the type of the per-cpu variable name.
> 104	 */
> 105	#define __pcpu_type(name)                                               \
> 106	        __typeof(((struct pcpu *)0)->name)
> 
> 
> Which indeed looks like a NULL pointer dereference. Looking at the latest commit message there, I'm sure the code is correct, but I'm unsure why the null pointer is OK. I'd appreciate an explanation :-)

Read about __typeof [1].
It's evaluated at compile time, so actual value of an expression does not matter
at all.

[1] http://gcc.gnu.org/onlinedocs/gcc/Typeof.html
-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 11 16:21:05 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id B6B92913;
 Thu, 11 Oct 2012 16:21:05 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-we0-f182.google.com (mail-we0-f182.google.com
 [74.125.82.182])
 by mx1.freebsd.org (Postfix) with ESMTP id D74538FC14;
 Thu, 11 Oct 2012 16:21:04 +0000 (UTC)
Received: by mail-we0-f182.google.com with SMTP id x43so1461794wey.13
 for <multiple recipients>; Thu, 11 Oct 2012 09:20:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=qAI24G5yefhBz4Gpif8LkmBCnqIDNTpEReTQU/t9kNg=;
 b=YXrCDRfbYNAUo5rvqQnRsKryJF+23pgBPpa1HZtwA1oTy1eBxIciw47Q1sPd+yZcJp
 EYvAcnpuDyIviFetjRNoI8aIqD/nnVAtlv0kMi6uoMBv/cQ0DAZ0Xfr6EWcinvHPzJD3
 zdqL6ZAKgGXzf70tijHwFimHBGm980IT2vpK11qErQ8j+9VH8W1MLq8XzmyaKKPzufrs
 Ych54ZHZW0jmydsZXm7CGVHguYkWVgGz4IQwzClFe34pCxKmN2qxkoMFZbMvQCm8gByU
 80q4CK64BY6Gu1kspo6QL+m4HQLPhQNJOvRFEJTheINHy09ufduIZin7T/LRcILfjg/n
 O86Q==
Received: by 10.180.86.202 with SMTP id r10mr3394767wiz.12.1349972457860;
 Thu, 11 Oct 2012 09:20:57 -0700 (PDT)
Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com.
 [217.18.249.148])
 by mx.google.com with ESMTPS id b3sm35970004wie.0.2012.10.11.09.20.54
 (version=TLSv1/SSLv3 cipher=OTHER);
 Thu, 11 Oct 2012 09:20:56 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <19724137-ABB0-43AF-BCB9-EBE8ACD6E3BD@gmail.com>
Date: Thu, 11 Oct 2012 19:20:53 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <0A8CDBF9-28C3-46D2-BB58-0559D00BD545@gmail.com>
References: <1071150615.2039567.1349906947942.JavaMail.root@erie.cs.uoguelph.ca>
 <19724137-ABB0-43AF-BCB9-EBE8ACD6E3BD@gmail.com>
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-Mailer: Apple Mail (2.1498)
Cc: rmacklem@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2012 16:21:05 -0000

On Oct 11, 2012, at 8:46 AM, Nikolay Denev <ndenev@gmail.com> wrote:

>=20
> On Oct 11, 2012, at 1:09 AM, Rick Macklem <rmacklem@uoguelph.ca> =
wrote:
>=20
>> Nikolay Denev wrote:
>>> On Oct 10, 2012, at 3:18 AM, Rick Macklem <rmacklem@uoguelph.ca>
>>> wrote:
>>>=20
>>>> Nikolay Denev wrote:
>>>>> On Oct 4, 2012, at 12:36 AM, Rick Macklem <rmacklem@uoguelph.ca>
>>>>> wrote:
>>>>>=20
>>>>>> Garrett Wollman wrote:
>>>>>>> <<On Wed, 3 Oct 2012 09:21:06 -0400 (EDT), Rick Macklem
>>>>>>> <rmacklem@uoguelph.ca> said:
>>>>>>>=20
>>>>>>>>> Simple: just use a sepatate mutex for each list that a cache
>>>>>>>>> entry
>>>>>>>>> is on, rather than a global lock for everything. This would
>>>>>>>>> reduce
>>>>>>>>> the mutex contention, but I'm not sure how significantly since
>>>>>>>>> I
>>>>>>>>> don't have the means to measure it yet.
>>>>>>>>>=20
>>>>>>>> Well, since the cache trimming is removing entries from the
>>>>>>>> lists,
>>>>>>>> I
>>>>>>>> don't
>>>>>>>> see how that can be done with a global lock for list updates?
>>>>>>>=20
>>>>>>> Well, the global lock is what we have now, but the cache =
trimming
>>>>>>> process only looks at one list at a time, so not locking the =
list
>>>>>>> that
>>>>>>> isn't being iterated over probably wouldn't hurt, unless there's
>>>>>>> some
>>>>>>> mechanism (that I didn't see) for entries to move from one list
>>>>>>> to
>>>>>>> another. Note that I'm considering each hash bucket a separate
>>>>>>> "list". (One issue to worry about in that case would be
>>>>>>> cache-line
>>>>>>> contention in the array of hash buckets; perhaps
>>>>>>> NFSRVCACHE_HASHSIZE
>>>>>>> ought to be increased to reduce that.)
>>>>>>>=20
>>>>>> Yea, a separate mutex for each hash list might help. There is =
also
>>>>>> the
>>>>>> LRU list that all entries end up on, that gets used by the
>>>>>> trimming
>>>>>> code.
>>>>>> (I think? I wrote this stuff about 8 years ago, so I haven't
>>>>>> looked
>>>>>> at
>>>>>> it in a while.)
>>>>>>=20
>>>>>> Also, increasing the hash table size is probably a good idea,
>>>>>> especially
>>>>>> if you reduce how aggressively the cache is trimmed.
>>>>>>=20
>>>>>>>> Only doing it once/sec would result in a very large cache when
>>>>>>>> bursts of
>>>>>>>> traffic arrives.
>>>>>>>=20
>>>>>>> My servers have 96 GB of memory so that's not a big deal for me.
>>>>>>>=20
>>>>>> This code was originally "production tested" on a server with
>>>>>> 1Gbyte,
>>>>>> so times have changed a bit;-)
>>>>>>=20
>>>>>>>> I'm not sure I see why doing it as a separate thread will
>>>>>>>> improve
>>>>>>>> things.
>>>>>>>> There are N nfsd threads already (N can be bumped up to 256 if
>>>>>>>> you
>>>>>>>> wish)
>>>>>>>> and having a bunch more "cache trimming threads" would just
>>>>>>>> increase
>>>>>>>> contention, wouldn't it?
>>>>>>>=20
>>>>>>> Only one cache-trimming thread. The cache trim holds the =
(global)
>>>>>>> mutex for much longer than any individual nfsd service thread =
has
>>>>>>> any
>>>>>>> need to, and having N threads doing that in parallel is why it's
>>>>>>> so
>>>>>>> heavily contended. If there's only one thread doing the trim,
>>>>>>> then
>>>>>>> the nfsd service threads aren't spending time either contending
>>>>>>> on
>>>>>>> the
>>>>>>> mutex (it will be held less frequently and for shorter periods).
>>>>>>>=20
>>>>>> I think the little drc2.patch which will keep the nfsd threads
>>>>>> from
>>>>>> acquiring the mutex and doing the trimming most of the time, =
might
>>>>>> be
>>>>>> sufficient. I still don't see why a separate trimming thread will
>>>>>> be
>>>>>> an advantage. I'd also be worried that the one cache trimming
>>>>>> thread
>>>>>> won't get the job done soon enough.
>>>>>>=20
>>>>>> When I did production testing on a 1Gbyte server that saw a peak
>>>>>> load of about 100RPCs/sec, it was necessary to trim aggressively.
>>>>>> (Although I'd be tempted to say that a server with 1Gbyte is no
>>>>>> longer relevant, I recently recall someone trying to run FreeBSD
>>>>>> on a i486, although I doubt they wanted to run the nfsd on it.)
>>>>>>=20
>>>>>>>> The only negative effect I can think of w.r.t. having the nfsd
>>>>>>>> threads doing it would be a (I believe negligible) increase in
>>>>>>>> RPC
>>>>>>>> response times (the time the nfsd thread spends trimming the
>>>>>>>> cache).
>>>>>>>> As noted, I think this time would be negligible compared to =
disk
>>>>>>>> I/O
>>>>>>>> and network transit times in the total RPC response time?
>>>>>>>=20
>>>>>>> With adaptive mutexes, many CPUs, lots of in-memory cache, and
>>>>>>> 10G
>>>>>>> network connectivity, spinning on a contended mutex takes a
>>>>>>> significant amount of CPU time. (For the current design of the
>>>>>>> NFS
>>>>>>> server, it may actually be a win to turn off adaptive mutexes --
>>>>>>> I
>>>>>>> should give that a try once I'm able to do more testing.)
>>>>>>>=20
>>>>>> Have fun with it. Let me know when you have what you think is a
>>>>>> good
>>>>>> patch.
>>>>>>=20
>>>>>> rick
>>>>>>=20
>>>>>>> -GAWollman
>>>>>>> _______________________________________________
>>>>>>> freebsd-hackers@freebsd.org mailing list
>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>>>>>> To unsubscribe, send any mail to
>>>>>>> "freebsd-hackers-unsubscribe@freebsd.org"
>>>>>> _______________________________________________
>>>>>> freebsd-fs@freebsd.org mailing list
>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>> To unsubscribe, send any mail to
>>>>>> "freebsd-fs-unsubscribe@freebsd.org"
>>>>>=20
>>>>> My quest for IOPS over NFS continues :)
>>>>> So far I'm not able to achieve more than about 3000 8K read
>>>>> requests
>>>>> over NFS,
>>>>> while the server locally gives much more.
>>>>> And this is all from a file that is completely in ARC cache, no
>>>>> disk
>>>>> IO involved.
>>>>>=20
>>>> Just out of curiousity, why do you use 8K reads instead of 64K
>>>> reads.
>>>> Since the RPC overhead (including the DRC functions) is per RPC,
>>>> doing
>>>> fewer larger RPCs should usually work better. (Sometimes large
>>>> rsize/wsize
>>>> values generate too large a burst of traffic for a network =
interface
>>>> to
>>>> handle and then the rsize/wsize has to be decreased to avoid this
>>>> issue.)
>>>>=20
>>>> And, although this experiment seems useful for testing patches that
>>>> try
>>>> and reduce DRC CPU overheads, most "real" NFS servers will be doing
>>>> disk
>>>> I/O.
>>>>=20
>>>=20
>>> This is the default blocksize the Oracle and probably most databases
>>> use.
>>> It uses also larger blocks, but for small random reads in OLTP
>>> applications this is what is used.
>>>=20
>> If the client is doing 8K reads, you could increase the read ahead
>> "readahead=3DN" (N up to 16), to try and increase the bandwidth.
>> (But if the CPU is 99% busy, then I don't think it will matter.)
>=20
> I'll try to check if this is possible to be set, as we are testing not =
only with the Linux NFS client,
> but also with the Oracle's built in so called DirectNFS client that is =
built in to the app.
>=20
>>=20
>>>=20
>>>>> I've snatched some sample DTrace script from the net : [
>>>>> =
http://utcc.utoronto.ca/~cks/space/blog/solaris/DTraceQuantizationNotes
>>>>> ]
>>>>>=20
>>>>> And modified it for our new NFS server :
>>>>>=20
>>>>> #!/usr/sbin/dtrace -qs
>>>>>=20
>>>>> fbt:kernel:nfsrvd_*:entry
>>>>> {
>>>>> self->ts =3D timestamp;
>>>>> @counts[probefunc] =3D count();
>>>>> }
>>>>>=20
>>>>> fbt:kernel:nfsrvd_*:return
>>>>> / self->ts > 0 /
>>>>> {
>>>>> this->delta =3D (timestamp-self->ts)/1000000;
>>>>> }
>>>>>=20
>>>>> fbt:kernel:nfsrvd_*:return
>>>>> / self->ts > 0 && this->delta > 100 /
>>>>> {
>>>>> @slow[probefunc, "ms"] =3D lquantize(this->delta, 100, 500, 50);
>>>>> }
>>>>>=20
>>>>> fbt:kernel:nfsrvd_*:return
>>>>> / self->ts > 0 /
>>>>> {
>>>>> @dist[probefunc, "ms"] =3D quantize(this->delta);
>>>>> self->ts =3D 0;
>>>>> }
>>>>>=20
>>>>> END
>>>>> {
>>>>> printf("\n");
>>>>> printa("function %-20s %@10d\n", @counts);
>>>>> printf("\n");
>>>>> printa("function %s(), time in %s:%@d\n", @dist);
>>>>> printf("\n");
>>>>> printa("function %s(), time in %s for >=3D 100 ms:%@d\n", @slow);
>>>>> }
>>>>>=20
>>>>> And here's a sample output from one or two minutes during the run
>>>>> of
>>>>> Oracle's ORION benchmark
>>>>> tool from a Linux machine, on a 32G file on NFS mount over 10G
>>>>> ethernet:
>>>>>=20
>>>>> [16:01]root@goliath:/home/ndenev# ./nfsrvd.d
>>>>> ^C
>>>>>=20
>>>>> function nfsrvd_access 4
>>>>> function nfsrvd_statfs 10
>>>>> function nfsrvd_getattr 14
>>>>> function nfsrvd_commit 76
>>>>> function nfsrvd_sentcache 110048
>>>>> function nfsrvd_write 110048
>>>>> function nfsrvd_read 283648
>>>>> function nfsrvd_dorpc 393800
>>>>> function nfsrvd_getcache 393800
>>>>> function nfsrvd_rephead 393800
>>>>> function nfsrvd_updatecache 393800
>>>>>=20
>>>>> function nfsrvd_access(), time in ms:
>>>>> value ------------- Distribution ------------- count
>>>>> -1 | 0
>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4
>>>>> 1 | 0
>>>>>=20
>>>>> function nfsrvd_statfs(), time in ms:
>>>>> value ------------- Distribution ------------- count
>>>>> -1 | 0
>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
>>>>> 1 | 0
>>>>>=20
>>>>> function nfsrvd_getattr(), time in ms:
>>>>> value ------------- Distribution ------------- count
>>>>> -1 | 0
>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14
>>>>> 1 | 0
>>>>>=20
>>>>> function nfsrvd_sentcache(), time in ms:
>>>>> value ------------- Distribution ------------- count
>>>>> -1 | 0
>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110048
>>>>> 1 | 0
>>>>>=20
>>>>> function nfsrvd_rephead(), time in ms:
>>>>> value ------------- Distribution ------------- count
>>>>> -1 | 0
>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
>>>>> 1 | 0
>>>>>=20
>>>>> function nfsrvd_updatecache(), time in ms:
>>>>> value ------------- Distribution ------------- count
>>>>> -1 | 0
>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
>>>>> 1 | 0
>>>>>=20
>>>>> function nfsrvd_getcache(), time in ms:
>>>>> value ------------- Distribution ------------- count
>>>>> -1 | 0
>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393798
>>>>> 1 | 1
>>>>> 2 | 0
>>>>> 4 | 1
>>>>> 8 | 0
>>>>>=20
>>>>> function nfsrvd_write(), time in ms:
>>>>> value ------------- Distribution ------------- count
>>>>> -1 | 0
>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110039
>>>>> 1 | 5
>>>>> 2 | 4
>>>>> 4 | 0
>>>>>=20
>>>>> function nfsrvd_read(), time in ms:
>>>>> value ------------- Distribution ------------- count
>>>>> -1 | 0
>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 283622
>>>>> 1 | 19
>>>>> 2 | 3
>>>>> 4 | 2
>>>>> 8 | 0
>>>>> 16 | 1
>>>>> 32 | 0
>>>>> 64 | 0
>>>>> 128 | 0
>>>>> 256 | 1
>>>>> 512 | 0
>>>>>=20
>>>>> function nfsrvd_commit(), time in ms:
>>>>> value ------------- Distribution ------------- count
>>>>> -1 | 0
>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@ 44
>>>>> 1 |@@@@@@@ 14
>>>>> 2 | 0
>>>>> 4 |@ 1
>>>>> 8 |@ 1
>>>>> 16 | 0
>>>>> 32 |@@@@@@@ 14
>>>>> 64 |@ 2
>>>>> 128 | 0
>>>>>=20
>>>>>=20
>>>>> function nfsrvd_commit(), time in ms for >=3D 100 ms:
>>>>> value ------------- Distribution ------------- count
>>>>> < 100 | 0
>>>>> 100 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
>>>>> 150 | 0
>>>>>=20
>>>>> function nfsrvd_read(), time in ms for >=3D 100 ms:
>>>>> value ------------- Distribution ------------- count
>>>>> 250 | 0
>>>>> 300 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
>>>>> 350 | 0
>>>>>=20
>>>>>=20
>>>>> Looks like the nfs server cache functions are quite fast, but
>>>>> extremely frequently called.
>>>>>=20
>>>> Yep, they are called for every RPC.
>>>>=20
>>>> I may try coding up a patch that replaces the single mutex with
>>>> one for each hash bucket, for TCP.
>>>>=20
>>>> I'll post if/when I get this patch to a testing/review stage, rick
>>>>=20
>>>=20
>>> Cool.
>>>=20
>>> I've readjusted the precision of the dtrace script a bit, and I can
>>> see
>>> now the following three functions as taking most of the time :
>>> nfsrvd_getcache(), nfsrc_trimcache() and nfsrvd_updatecache()
>>>=20
>>> This was recorded during a oracle benchmark run called SLOB, which
>>> caused 99% cpu load on the NFS server.
>>>=20
>> Even with the drc2.patch and a large value for vfs.nfsd.tcphighwater?
>> (Assuming the mounts are TCP ones.)
>>=20
>> Have fun with it, rick
>>=20
>=20
> I had upped it, but probably not enough. I'm now running with =
vfs.nfsd.tcphighwater set
> to some ridiculous number, and NFSRVCACHE_HASHSIZE set to 500.
> So far it looks like good improvement as those functions no longer =
show up in the dtrace script output.
> I'll run some more benchmarks and testing today.
>=20
> Thanks!
>=20
>>>=20
>>>>> I hope someone can find this information useful.
>>>>>=20
>>>>> _______________________________________________
>>>>> freebsd-hackers@freebsd.org mailing list
>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>>>> To unsubscribe, send any mail to
>>>>> "freebsd-hackers-unsubscribe@freebsd.org"
>=20

I haven't had the opportunity today to run more DB tests over NFS as the =
DBA was busy with something else,
however I tested a bit the large file transfers.
And I'm seeing something strange probably not only NFS but also ZFS and =
ARC related.

When I first tested the drc2.patch I reported a huge bandwidth =
improvement,
but now I think that this was probably because of the machine freshly =
rebooted instead of just the patch.
The patch surely improved things, especially CPU utilization combined =
with the increased cache.
But today I'm again having a file completely cached in ZFS's ARC cache, =
which when transferred over NFS
reaches about 300MB/s, when at some tests it reached 900+MB/s (as =
reported in my first email).
The file locally can be read at about 3GB/s as reported by dd.

Local:
[17:51]root@goliath:/tank/spa_db/undo# dd if=3Ddata.dbf of=3D/dev/null =
bs=3D1M             =20
30720+1 records in
30720+1 records out
32212262912 bytes transferred in 10.548485 secs (3053733573 bytes/sec)

Over NFS:
[17:48]root@spa:/mnt/spa_db/undo# dd if=3Ddata.dbf of=3D/dev/null bs=3D1M =
           =20
30720+1 records in
30720+1 records out
32212262912 bytes (32 GB) copied, 88.0663 seconds, 366 MB/s

The machines are almost idle during this transfer and I can't see a =
reason why it can't reach the full bandwith when it's
just reading it from RAM.

I've tried again tracing with DTrace to see what's happening with this =
script :

fbt:kernel:nfs*:entry
{
	this->ts =3D timestamp;
	@counts[probefunc] =3D count();
}

fbt:kernel:nfs*:return
/ this->ts > 0 /
{
	@time[probefunc] =3D avg(timestamp - this->ts);
}

END
{
	trunc(@counts, 10);
	trunc(@time, 10);
	printf("Top 10 called functions\n\n");=09
	printa(@counts);
	printf("\n\nTop 10 slowest functions\n\n");
	printa(@time);
}

And here's the result (several seconds during the dd test):

Top 10 called functions
  nfsrc_freecache                                               88849
  nfsrc_wanted                                                  88849
  nfsrv_fillattr                                                88849
  nfsrv_postopattr                                              88849
  nfsrvd_read                                                   88849
  nfsrvd_rephead                                                88849
  nfsrvd_updatecache                                            88849
  nfsvno_testexp                                                88849
  nfsrc_trimcache                                              177697
  nfsvno_getattr                                               177698

Top 10 slowest functions
  nfsd_excred                                                    5673
  nfsrc_freecache                                                5674
  nfsrv_postopattr                                               5970
  nfsrv_servertimer                                              6327
  nfssvc_nfscommon                                               6596
  nfsd_fhtovp                                                    8000
  nfsrvd_read                                                    8380
  nfssvc_program                                                92752
  nfsvno_read                                                  124979
  nfsvno_fhtovp                                               1789523

I might try now to trace what nfsvno_fhtovp() is doing and where is =
spending it's time.

Any other ideas are welcome :)


From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 11 16:47:17 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 10ED71A1;
 Thu, 11 Oct 2012 16:47:17 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50])
 by mx1.freebsd.org (Postfix) with ESMTP id 38EFA8FC12;
 Thu, 11 Oct 2012 16:47:15 +0000 (UTC)
Received: by mail-wg0-f50.google.com with SMTP id 16so1507684wgi.31
 for <multiple recipients>; Thu, 11 Oct 2012 09:47:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=42qujbjVn/cwcUzL/CgDbTaz1h3ZzyWh3DjKBwwM4uY=;
 b=iorcJB7paRfvs+FzuLp2MO7nQ4qKgUXLQ5wElkdpcqI6JYum3fR2EmxlRv8iHAljCI
 774wMi3FVyuOf26Pe28tCUCrkoH+tC5LlJsDf/P7PfRfvwATCOMdG9XNVzBj6Vp5CJgu
 iWRdx41qemywW+Bb0l11DodMtmR5OrvdGEXBNvH172g4Veldl2dxkE6Ca8ZpTNhrWUVR
 cQ9DXZOkDQzV4XopgdWEpiunM+b3ehAtEhbPL8Sv2wRcHjcdZ5NJuZA7bzbLYmH9HEcq
 KnJSv0F1tKYD+ips391ZdKeB5Hs+ccDD1POcAM6mEaIWZVRTmdpVGfjGnfkEgvB4kJET
 2+4A==
Received: by 10.180.87.132 with SMTP id ay4mr3583965wib.5.1349974034907;
 Thu, 11 Oct 2012 09:47:14 -0700 (PDT)
Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com.
 [217.18.249.148])
 by mx.google.com with ESMTPS id gm7sm9563488wib.10.2012.10.11.09.47.13
 (version=TLSv1/SSLv3 cipher=OTHER);
 Thu, 11 Oct 2012 09:47:14 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <0A8CDBF9-28C3-46D2-BB58-0559D00BD545@gmail.com>
Date: Thu, 11 Oct 2012 19:47:12 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <E211199E-026C-41C8-AFA4-0DB5A787391D@gmail.com>
References: <1071150615.2039567.1349906947942.JavaMail.root@erie.cs.uoguelph.ca>
 <19724137-ABB0-43AF-BCB9-EBE8ACD6E3BD@gmail.com>
 <0A8CDBF9-28C3-46D2-BB58-0559D00BD545@gmail.com>
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-Mailer: Apple Mail (2.1498)
Cc: rmacklem@freebsd.org, Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2012 16:47:17 -0000


On Oct 11, 2012, at 7:20 PM, Nikolay Denev <ndenev@gmail.com> wrote:

> On Oct 11, 2012, at 8:46 AM, Nikolay Denev <ndenev@gmail.com> wrote:
>=20
>>=20
>> On Oct 11, 2012, at 1:09 AM, Rick Macklem <rmacklem@uoguelph.ca> =
wrote:
>>=20
>>> Nikolay Denev wrote:
>>>> On Oct 10, 2012, at 3:18 AM, Rick Macklem <rmacklem@uoguelph.ca>
>>>> wrote:
>>>>=20
>>>>> Nikolay Denev wrote:
>>>>>> On Oct 4, 2012, at 12:36 AM, Rick Macklem <rmacklem@uoguelph.ca>
>>>>>> wrote:
>>>>>>=20
>>>>>>> Garrett Wollman wrote:
>>>>>>>> <<On Wed, 3 Oct 2012 09:21:06 -0400 (EDT), Rick Macklem
>>>>>>>> <rmacklem@uoguelph.ca> said:
>>>>>>>>=20
>>>>>>>>>> Simple: just use a sepatate mutex for each list that a cache
>>>>>>>>>> entry
>>>>>>>>>> is on, rather than a global lock for everything. This would
>>>>>>>>>> reduce
>>>>>>>>>> the mutex contention, but I'm not sure how significantly =
since
>>>>>>>>>> I
>>>>>>>>>> don't have the means to measure it yet.
>>>>>>>>>>=20
>>>>>>>>> Well, since the cache trimming is removing entries from the
>>>>>>>>> lists,
>>>>>>>>> I
>>>>>>>>> don't
>>>>>>>>> see how that can be done with a global lock for list updates?
>>>>>>>>=20
>>>>>>>> Well, the global lock is what we have now, but the cache =
trimming
>>>>>>>> process only looks at one list at a time, so not locking the =
list
>>>>>>>> that
>>>>>>>> isn't being iterated over probably wouldn't hurt, unless =
there's
>>>>>>>> some
>>>>>>>> mechanism (that I didn't see) for entries to move from one list
>>>>>>>> to
>>>>>>>> another. Note that I'm considering each hash bucket a separate
>>>>>>>> "list". (One issue to worry about in that case would be
>>>>>>>> cache-line
>>>>>>>> contention in the array of hash buckets; perhaps
>>>>>>>> NFSRVCACHE_HASHSIZE
>>>>>>>> ought to be increased to reduce that.)
>>>>>>>>=20
>>>>>>> Yea, a separate mutex for each hash list might help. There is =
also
>>>>>>> the
>>>>>>> LRU list that all entries end up on, that gets used by the
>>>>>>> trimming
>>>>>>> code.
>>>>>>> (I think? I wrote this stuff about 8 years ago, so I haven't
>>>>>>> looked
>>>>>>> at
>>>>>>> it in a while.)
>>>>>>>=20
>>>>>>> Also, increasing the hash table size is probably a good idea,
>>>>>>> especially
>>>>>>> if you reduce how aggressively the cache is trimmed.
>>>>>>>=20
>>>>>>>>> Only doing it once/sec would result in a very large cache when
>>>>>>>>> bursts of
>>>>>>>>> traffic arrives.
>>>>>>>>=20
>>>>>>>> My servers have 96 GB of memory so that's not a big deal for =
me.
>>>>>>>>=20
>>>>>>> This code was originally "production tested" on a server with
>>>>>>> 1Gbyte,
>>>>>>> so times have changed a bit;-)
>>>>>>>=20
>>>>>>>>> I'm not sure I see why doing it as a separate thread will
>>>>>>>>> improve
>>>>>>>>> things.
>>>>>>>>> There are N nfsd threads already (N can be bumped up to 256 if
>>>>>>>>> you
>>>>>>>>> wish)
>>>>>>>>> and having a bunch more "cache trimming threads" would just
>>>>>>>>> increase
>>>>>>>>> contention, wouldn't it?
>>>>>>>>=20
>>>>>>>> Only one cache-trimming thread. The cache trim holds the =
(global)
>>>>>>>> mutex for much longer than any individual nfsd service thread =
has
>>>>>>>> any
>>>>>>>> need to, and having N threads doing that in parallel is why =
it's
>>>>>>>> so
>>>>>>>> heavily contended. If there's only one thread doing the trim,
>>>>>>>> then
>>>>>>>> the nfsd service threads aren't spending time either contending
>>>>>>>> on
>>>>>>>> the
>>>>>>>> mutex (it will be held less frequently and for shorter =
periods).
>>>>>>>>=20
>>>>>>> I think the little drc2.patch which will keep the nfsd threads
>>>>>>> from
>>>>>>> acquiring the mutex and doing the trimming most of the time, =
might
>>>>>>> be
>>>>>>> sufficient. I still don't see why a separate trimming thread =
will
>>>>>>> be
>>>>>>> an advantage. I'd also be worried that the one cache trimming
>>>>>>> thread
>>>>>>> won't get the job done soon enough.
>>>>>>>=20
>>>>>>> When I did production testing on a 1Gbyte server that saw a peak
>>>>>>> load of about 100RPCs/sec, it was necessary to trim =
aggressively.
>>>>>>> (Although I'd be tempted to say that a server with 1Gbyte is no
>>>>>>> longer relevant, I recently recall someone trying to run FreeBSD
>>>>>>> on a i486, although I doubt they wanted to run the nfsd on it.)
>>>>>>>=20
>>>>>>>>> The only negative effect I can think of w.r.t. having the nfsd
>>>>>>>>> threads doing it would be a (I believe negligible) increase in
>>>>>>>>> RPC
>>>>>>>>> response times (the time the nfsd thread spends trimming the
>>>>>>>>> cache).
>>>>>>>>> As noted, I think this time would be negligible compared to =
disk
>>>>>>>>> I/O
>>>>>>>>> and network transit times in the total RPC response time?
>>>>>>>>=20
>>>>>>>> With adaptive mutexes, many CPUs, lots of in-memory cache, and
>>>>>>>> 10G
>>>>>>>> network connectivity, spinning on a contended mutex takes a
>>>>>>>> significant amount of CPU time. (For the current design of the
>>>>>>>> NFS
>>>>>>>> server, it may actually be a win to turn off adaptive mutexes =
--
>>>>>>>> I
>>>>>>>> should give that a try once I'm able to do more testing.)
>>>>>>>>=20
>>>>>>> Have fun with it. Let me know when you have what you think is a
>>>>>>> good
>>>>>>> patch.
>>>>>>>=20
>>>>>>> rick
>>>>>>>=20
>>>>>>>> -GAWollman
>>>>>>>> _______________________________________________
>>>>>>>> freebsd-hackers@freebsd.org mailing list
>>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>>>>>>> To unsubscribe, send any mail to
>>>>>>>> "freebsd-hackers-unsubscribe@freebsd.org"
>>>>>>> _______________________________________________
>>>>>>> freebsd-fs@freebsd.org mailing list
>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>> To unsubscribe, send any mail to
>>>>>>> "freebsd-fs-unsubscribe@freebsd.org"
>>>>>>=20
>>>>>> My quest for IOPS over NFS continues :)
>>>>>> So far I'm not able to achieve more than about 3000 8K read
>>>>>> requests
>>>>>> over NFS,
>>>>>> while the server locally gives much more.
>>>>>> And this is all from a file that is completely in ARC cache, no
>>>>>> disk
>>>>>> IO involved.
>>>>>>=20
>>>>> Just out of curiousity, why do you use 8K reads instead of 64K
>>>>> reads.
>>>>> Since the RPC overhead (including the DRC functions) is per RPC,
>>>>> doing
>>>>> fewer larger RPCs should usually work better. (Sometimes large
>>>>> rsize/wsize
>>>>> values generate too large a burst of traffic for a network =
interface
>>>>> to
>>>>> handle and then the rsize/wsize has to be decreased to avoid this
>>>>> issue.)
>>>>>=20
>>>>> And, although this experiment seems useful for testing patches =
that
>>>>> try
>>>>> and reduce DRC CPU overheads, most "real" NFS servers will be =
doing
>>>>> disk
>>>>> I/O.
>>>>>=20
>>>>=20
>>>> This is the default blocksize the Oracle and probably most =
databases
>>>> use.
>>>> It uses also larger blocks, but for small random reads in OLTP
>>>> applications this is what is used.
>>>>=20
>>> If the client is doing 8K reads, you could increase the read ahead
>>> "readahead=3DN" (N up to 16), to try and increase the bandwidth.
>>> (But if the CPU is 99% busy, then I don't think it will matter.)
>>=20
>> I'll try to check if this is possible to be set, as we are testing =
not only with the Linux NFS client,
>> but also with the Oracle's built in so called DirectNFS client that =
is built in to the app.
>>=20
>>>=20
>>>>=20
>>>>>> I've snatched some sample DTrace script from the net : [
>>>>>> =
http://utcc.utoronto.ca/~cks/space/blog/solaris/DTraceQuantizationNotes
>>>>>> ]
>>>>>>=20
>>>>>> And modified it for our new NFS server :
>>>>>>=20
>>>>>> #!/usr/sbin/dtrace -qs
>>>>>>=20
>>>>>> fbt:kernel:nfsrvd_*:entry
>>>>>> {
>>>>>> self->ts =3D timestamp;
>>>>>> @counts[probefunc] =3D count();
>>>>>> }
>>>>>>=20
>>>>>> fbt:kernel:nfsrvd_*:return
>>>>>> / self->ts > 0 /
>>>>>> {
>>>>>> this->delta =3D (timestamp-self->ts)/1000000;
>>>>>> }
>>>>>>=20
>>>>>> fbt:kernel:nfsrvd_*:return
>>>>>> / self->ts > 0 && this->delta > 100 /
>>>>>> {
>>>>>> @slow[probefunc, "ms"] =3D lquantize(this->delta, 100, 500, 50);
>>>>>> }
>>>>>>=20
>>>>>> fbt:kernel:nfsrvd_*:return
>>>>>> / self->ts > 0 /
>>>>>> {
>>>>>> @dist[probefunc, "ms"] =3D quantize(this->delta);
>>>>>> self->ts =3D 0;
>>>>>> }
>>>>>>=20
>>>>>> END
>>>>>> {
>>>>>> printf("\n");
>>>>>> printa("function %-20s %@10d\n", @counts);
>>>>>> printf("\n");
>>>>>> printa("function %s(), time in %s:%@d\n", @dist);
>>>>>> printf("\n");
>>>>>> printa("function %s(), time in %s for >=3D 100 ms:%@d\n", @slow);
>>>>>> }
>>>>>>=20
>>>>>> And here's a sample output from one or two minutes during the run
>>>>>> of
>>>>>> Oracle's ORION benchmark
>>>>>> tool from a Linux machine, on a 32G file on NFS mount over 10G
>>>>>> ethernet:
>>>>>>=20
>>>>>> [16:01]root@goliath:/home/ndenev# ./nfsrvd.d
>>>>>> ^C
>>>>>>=20
>>>>>> function nfsrvd_access 4
>>>>>> function nfsrvd_statfs 10
>>>>>> function nfsrvd_getattr 14
>>>>>> function nfsrvd_commit 76
>>>>>> function nfsrvd_sentcache 110048
>>>>>> function nfsrvd_write 110048
>>>>>> function nfsrvd_read 283648
>>>>>> function nfsrvd_dorpc 393800
>>>>>> function nfsrvd_getcache 393800
>>>>>> function nfsrvd_rephead 393800
>>>>>> function nfsrvd_updatecache 393800
>>>>>>=20
>>>>>> function nfsrvd_access(), time in ms:
>>>>>> value ------------- Distribution ------------- count
>>>>>> -1 | 0
>>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4
>>>>>> 1 | 0
>>>>>>=20
>>>>>> function nfsrvd_statfs(), time in ms:
>>>>>> value ------------- Distribution ------------- count
>>>>>> -1 | 0
>>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
>>>>>> 1 | 0
>>>>>>=20
>>>>>> function nfsrvd_getattr(), time in ms:
>>>>>> value ------------- Distribution ------------- count
>>>>>> -1 | 0
>>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14
>>>>>> 1 | 0
>>>>>>=20
>>>>>> function nfsrvd_sentcache(), time in ms:
>>>>>> value ------------- Distribution ------------- count
>>>>>> -1 | 0
>>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110048
>>>>>> 1 | 0
>>>>>>=20
>>>>>> function nfsrvd_rephead(), time in ms:
>>>>>> value ------------- Distribution ------------- count
>>>>>> -1 | 0
>>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
>>>>>> 1 | 0
>>>>>>=20
>>>>>> function nfsrvd_updatecache(), time in ms:
>>>>>> value ------------- Distribution ------------- count
>>>>>> -1 | 0
>>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
>>>>>> 1 | 0
>>>>>>=20
>>>>>> function nfsrvd_getcache(), time in ms:
>>>>>> value ------------- Distribution ------------- count
>>>>>> -1 | 0
>>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393798
>>>>>> 1 | 1
>>>>>> 2 | 0
>>>>>> 4 | 1
>>>>>> 8 | 0
>>>>>>=20
>>>>>> function nfsrvd_write(), time in ms:
>>>>>> value ------------- Distribution ------------- count
>>>>>> -1 | 0
>>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110039
>>>>>> 1 | 5
>>>>>> 2 | 4
>>>>>> 4 | 0
>>>>>>=20
>>>>>> function nfsrvd_read(), time in ms:
>>>>>> value ------------- Distribution ------------- count
>>>>>> -1 | 0
>>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 283622
>>>>>> 1 | 19
>>>>>> 2 | 3
>>>>>> 4 | 2
>>>>>> 8 | 0
>>>>>> 16 | 1
>>>>>> 32 | 0
>>>>>> 64 | 0
>>>>>> 128 | 0
>>>>>> 256 | 1
>>>>>> 512 | 0
>>>>>>=20
>>>>>> function nfsrvd_commit(), time in ms:
>>>>>> value ------------- Distribution ------------- count
>>>>>> -1 | 0
>>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@ 44
>>>>>> 1 |@@@@@@@ 14
>>>>>> 2 | 0
>>>>>> 4 |@ 1
>>>>>> 8 |@ 1
>>>>>> 16 | 0
>>>>>> 32 |@@@@@@@ 14
>>>>>> 64 |@ 2
>>>>>> 128 | 0
>>>>>>=20
>>>>>>=20
>>>>>> function nfsrvd_commit(), time in ms for >=3D 100 ms:
>>>>>> value ------------- Distribution ------------- count
>>>>>> < 100 | 0
>>>>>> 100 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
>>>>>> 150 | 0
>>>>>>=20
>>>>>> function nfsrvd_read(), time in ms for >=3D 100 ms:
>>>>>> value ------------- Distribution ------------- count
>>>>>> 250 | 0
>>>>>> 300 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
>>>>>> 350 | 0
>>>>>>=20
>>>>>>=20
>>>>>> Looks like the nfs server cache functions are quite fast, but
>>>>>> extremely frequently called.
>>>>>>=20
>>>>> Yep, they are called for every RPC.
>>>>>=20
>>>>> I may try coding up a patch that replaces the single mutex with
>>>>> one for each hash bucket, for TCP.
>>>>>=20
>>>>> I'll post if/when I get this patch to a testing/review stage, rick
>>>>>=20
>>>>=20
>>>> Cool.
>>>>=20
>>>> I've readjusted the precision of the dtrace script a bit, and I can
>>>> see
>>>> now the following three functions as taking most of the time :
>>>> nfsrvd_getcache(), nfsrc_trimcache() and nfsrvd_updatecache()
>>>>=20
>>>> This was recorded during a oracle benchmark run called SLOB, which
>>>> caused 99% cpu load on the NFS server.
>>>>=20
>>> Even with the drc2.patch and a large value for =
vfs.nfsd.tcphighwater?
>>> (Assuming the mounts are TCP ones.)
>>>=20
>>> Have fun with it, rick
>>>=20
>>=20
>> I had upped it, but probably not enough. I'm now running with =
vfs.nfsd.tcphighwater set
>> to some ridiculous number, and NFSRVCACHE_HASHSIZE set to 500.
>> So far it looks like good improvement as those functions no longer =
show up in the dtrace script output.
>> I'll run some more benchmarks and testing today.
>>=20
>> Thanks!
>>=20
>>>>=20
>>>>>> I hope someone can find this information useful.
>>>>>>=20
>>>>>> _______________________________________________
>>>>>> freebsd-hackers@freebsd.org mailing list
>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>>>>> To unsubscribe, send any mail to
>>>>>> "freebsd-hackers-unsubscribe@freebsd.org"
>>=20
>=20
> I haven't had the opportunity today to run more DB tests over NFS as =
the DBA was busy with something else,
> however I tested a bit the large file transfers.
> And I'm seeing something strange probably not only NFS but also ZFS =
and ARC related.
>=20
> When I first tested the drc2.patch I reported a huge bandwidth =
improvement,
> but now I think that this was probably because of the machine freshly =
rebooted instead of just the patch.
> The patch surely improved things, especially CPU utilization combined =
with the increased cache.
> But today I'm again having a file completely cached in ZFS's ARC =
cache, which when transferred over NFS
> reaches about 300MB/s, when at some tests it reached 900+MB/s (as =
reported in my first email).
> The file locally can be read at about 3GB/s as reported by dd.
>=20
> Local:
> [17:51]root@goliath:/tank/spa_db/undo# dd if=3Ddata.dbf of=3D/dev/null =
bs=3D1M             =20
> 30720+1 records in
> 30720+1 records out
> 32212262912 bytes transferred in 10.548485 secs (3053733573 bytes/sec)
>=20
> Over NFS:
> [17:48]root@spa:/mnt/spa_db/undo# dd if=3Ddata.dbf of=3D/dev/null =
bs=3D1M            =20
> 30720+1 records in
> 30720+1 records out
> 32212262912 bytes (32 GB) copied, 88.0663 seconds, 366 MB/s
>=20
> The machines are almost idle during this transfer and I can't see a =
reason why it can't reach the full bandwith when it's
> just reading it from RAM.
>=20
> I've tried again tracing with DTrace to see what's happening with this =
script :
>=20
> fbt:kernel:nfs*:entry
> {
> 	this->ts =3D timestamp;
> 	@counts[probefunc] =3D count();
> }
>=20
> fbt:kernel:nfs*:return
> / this->ts > 0 /
> {
> 	@time[probefunc] =3D avg(timestamp - this->ts);
> }
>=20
> END
> {
> 	trunc(@counts, 10);
> 	trunc(@time, 10);
> 	printf("Top 10 called functions\n\n");=09
> 	printa(@counts);
> 	printf("\n\nTop 10 slowest functions\n\n");
> 	printa(@time);
> }
>=20
> And here's the result (several seconds during the dd test):
>=20
> Top 10 called functions
>  nfsrc_freecache                                               88849
>  nfsrc_wanted                                                  88849
>  nfsrv_fillattr                                                88849
>  nfsrv_postopattr                                              88849
>  nfsrvd_read                                                   88849
>  nfsrvd_rephead                                                88849
>  nfsrvd_updatecache                                            88849
>  nfsvno_testexp                                                88849
>  nfsrc_trimcache                                              177697
>  nfsvno_getattr                                               177698
>=20
> Top 10 slowest functions
>  nfsd_excred                                                    5673
>  nfsrc_freecache                                                5674
>  nfsrv_postopattr                                               5970
>  nfsrv_servertimer                                              6327
>  nfssvc_nfscommon                                               6596
>  nfsd_fhtovp                                                    8000
>  nfsrvd_read                                                    8380
>  nfssvc_program                                                92752
>  nfsvno_read                                                  124979
>  nfsvno_fhtovp                                               1789523
>=20
> I might try now to trace what nfsvno_fhtovp() is doing and where is =
spending it's time.
>=20
> Any other ideas are welcome :)
>=20

To take the network out of the equation I redid the test by mounting the =
same filesystem over NFS on the server:

[18:23]root@goliath:~#  mount -t nfs -o =
rw,hard,intr,tcp,nfsv3,rsize=3D1048576,wsize=3D1048576 =
localhost:/tank/spa_db/undo /mnt
[18:24]root@goliath:~# dd if=3D/mnt/data.dbf of=3D/dev/null bs=3D1M=20
30720+1 records in
30720+1 records out
32212262912 bytes transferred in 79.793343 secs (403696120 bytes/sec)
[18:25]root@goliath:~# dd if=3D/mnt/data.dbf of=3D/dev/null bs=3D1M
30720+1 records in
30720+1 records out
32212262912 bytes transferred in 12.033420 secs (2676900110 bytes/sec)

During the first run I saw several nfsd threads in top, along with dd =
and again zero disk I/O.
There was increase in memory usage because of the double buffering =
ARC->buffercahe.
The second run was with all of the nfsd threads totally idle, and read =
directly from the buffercache.


From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 11 21:04:36 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 85FC8A1A;
 Thu, 11 Oct 2012 21:04:36 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id D55278FC0C;
 Thu, 11 Oct 2012 21:04:35 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEAN6MclCDaFvO/2dsb2JhbABFFoV7uhmCIAEBAQMBAQEBIAQnIAYFBRYOCgICDRkCIwYBCSYGCAIFBAEcAQOHUgMJBgumTIgWDYlUgSGJSGYahGSBEgOTPliBVYEVig6FC4MJgUc0
X-IronPort-AV: E=Sophos;i="4.80,574,1344225600"; d="scan'208";a="183270578"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 11 Oct 2012 17:04:28 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 1F7CCB40FB;
 Thu, 11 Oct 2012 17:04:28 -0400 (EDT)
Date: Thu, 11 Oct 2012 17:04:28 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Nikolay Denev <ndenev@gmail.com>
Message-ID: <1517976814.2112914.1349989468096.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <0A8CDBF9-28C3-46D2-BB58-0559D00BD545@gmail.com>
Subject: Re: NFS server bottlenecks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@freebsd.org, Garrett Wollman <wollman@freebsd.org>,
 freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2012 21:04:36 -0000

Nikolay Denev wrote:
> On Oct 11, 2012, at 8:46 AM, Nikolay Denev <ndenev@gmail.com> wrote:
> 
> >
> > On Oct 11, 2012, at 1:09 AM, Rick Macklem <rmacklem@uoguelph.ca>
> > wrote:
> >
> >> Nikolay Denev wrote:
> >>> On Oct 10, 2012, at 3:18 AM, Rick Macklem <rmacklem@uoguelph.ca>
> >>> wrote:
> >>>
> >>>> Nikolay Denev wrote:
> >>>>> On Oct 4, 2012, at 12:36 AM, Rick Macklem <rmacklem@uoguelph.ca>
> >>>>> wrote:
> >>>>>
> >>>>>> Garrett Wollman wrote:
> >>>>>>> <<On Wed, 3 Oct 2012 09:21:06 -0400 (EDT), Rick Macklem
> >>>>>>> <rmacklem@uoguelph.ca> said:
> >>>>>>>
> >>>>>>>>> Simple: just use a sepatate mutex for each list that a cache
> >>>>>>>>> entry
> >>>>>>>>> is on, rather than a global lock for everything. This would
> >>>>>>>>> reduce
> >>>>>>>>> the mutex contention, but I'm not sure how significantly
> >>>>>>>>> since
> >>>>>>>>> I
> >>>>>>>>> don't have the means to measure it yet.
> >>>>>>>>>
> >>>>>>>> Well, since the cache trimming is removing entries from the
> >>>>>>>> lists,
> >>>>>>>> I
> >>>>>>>> don't
> >>>>>>>> see how that can be done with a global lock for list updates?
> >>>>>>>
> >>>>>>> Well, the global lock is what we have now, but the cache
> >>>>>>> trimming
> >>>>>>> process only looks at one list at a time, so not locking the
> >>>>>>> list
> >>>>>>> that
> >>>>>>> isn't being iterated over probably wouldn't hurt, unless
> >>>>>>> there's
> >>>>>>> some
> >>>>>>> mechanism (that I didn't see) for entries to move from one
> >>>>>>> list
> >>>>>>> to
> >>>>>>> another. Note that I'm considering each hash bucket a separate
> >>>>>>> "list". (One issue to worry about in that case would be
> >>>>>>> cache-line
> >>>>>>> contention in the array of hash buckets; perhaps
> >>>>>>> NFSRVCACHE_HASHSIZE
> >>>>>>> ought to be increased to reduce that.)
> >>>>>>>
> >>>>>> Yea, a separate mutex for each hash list might help. There is
> >>>>>> also
> >>>>>> the
> >>>>>> LRU list that all entries end up on, that gets used by the
> >>>>>> trimming
> >>>>>> code.
> >>>>>> (I think? I wrote this stuff about 8 years ago, so I haven't
> >>>>>> looked
> >>>>>> at
> >>>>>> it in a while.)
> >>>>>>
> >>>>>> Also, increasing the hash table size is probably a good idea,
> >>>>>> especially
> >>>>>> if you reduce how aggressively the cache is trimmed.
> >>>>>>
> >>>>>>>> Only doing it once/sec would result in a very large cache
> >>>>>>>> when
> >>>>>>>> bursts of
> >>>>>>>> traffic arrives.
> >>>>>>>
> >>>>>>> My servers have 96 GB of memory so that's not a big deal for
> >>>>>>> me.
> >>>>>>>
> >>>>>> This code was originally "production tested" on a server with
> >>>>>> 1Gbyte,
> >>>>>> so times have changed a bit;-)
> >>>>>>
> >>>>>>>> I'm not sure I see why doing it as a separate thread will
> >>>>>>>> improve
> >>>>>>>> things.
> >>>>>>>> There are N nfsd threads already (N can be bumped up to 256
> >>>>>>>> if
> >>>>>>>> you
> >>>>>>>> wish)
> >>>>>>>> and having a bunch more "cache trimming threads" would just
> >>>>>>>> increase
> >>>>>>>> contention, wouldn't it?
> >>>>>>>
> >>>>>>> Only one cache-trimming thread. The cache trim holds the
> >>>>>>> (global)
> >>>>>>> mutex for much longer than any individual nfsd service thread
> >>>>>>> has
> >>>>>>> any
> >>>>>>> need to, and having N threads doing that in parallel is why
> >>>>>>> it's
> >>>>>>> so
> >>>>>>> heavily contended. If there's only one thread doing the trim,
> >>>>>>> then
> >>>>>>> the nfsd service threads aren't spending time either
> >>>>>>> contending
> >>>>>>> on
> >>>>>>> the
> >>>>>>> mutex (it will be held less frequently and for shorter
> >>>>>>> periods).
> >>>>>>>
> >>>>>> I think the little drc2.patch which will keep the nfsd threads
> >>>>>> from
> >>>>>> acquiring the mutex and doing the trimming most of the time,
> >>>>>> might
> >>>>>> be
> >>>>>> sufficient. I still don't see why a separate trimming thread
> >>>>>> will
> >>>>>> be
> >>>>>> an advantage. I'd also be worried that the one cache trimming
> >>>>>> thread
> >>>>>> won't get the job done soon enough.
> >>>>>>
> >>>>>> When I did production testing on a 1Gbyte server that saw a
> >>>>>> peak
> >>>>>> load of about 100RPCs/sec, it was necessary to trim
> >>>>>> aggressively.
> >>>>>> (Although I'd be tempted to say that a server with 1Gbyte is no
> >>>>>> longer relevant, I recently recall someone trying to run
> >>>>>> FreeBSD
> >>>>>> on a i486, although I doubt they wanted to run the nfsd on it.)
> >>>>>>
> >>>>>>>> The only negative effect I can think of w.r.t. having the
> >>>>>>>> nfsd
> >>>>>>>> threads doing it would be a (I believe negligible) increase
> >>>>>>>> in
> >>>>>>>> RPC
> >>>>>>>> response times (the time the nfsd thread spends trimming the
> >>>>>>>> cache).
> >>>>>>>> As noted, I think this time would be negligible compared to
> >>>>>>>> disk
> >>>>>>>> I/O
> >>>>>>>> and network transit times in the total RPC response time?
> >>>>>>>
> >>>>>>> With adaptive mutexes, many CPUs, lots of in-memory cache, and
> >>>>>>> 10G
> >>>>>>> network connectivity, spinning on a contended mutex takes a
> >>>>>>> significant amount of CPU time. (For the current design of the
> >>>>>>> NFS
> >>>>>>> server, it may actually be a win to turn off adaptive mutexes
> >>>>>>> --
> >>>>>>> I
> >>>>>>> should give that a try once I'm able to do more testing.)
> >>>>>>>
> >>>>>> Have fun with it. Let me know when you have what you think is a
> >>>>>> good
> >>>>>> patch.
> >>>>>>
> >>>>>> rick
> >>>>>>
> >>>>>>> -GAWollman
> >>>>>>> _______________________________________________
> >>>>>>> freebsd-hackers@freebsd.org mailing list
> >>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> >>>>>>> To unsubscribe, send any mail to
> >>>>>>> "freebsd-hackers-unsubscribe@freebsd.org"
> >>>>>> _______________________________________________
> >>>>>> freebsd-fs@freebsd.org mailing list
> >>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>>>>> To unsubscribe, send any mail to
> >>>>>> "freebsd-fs-unsubscribe@freebsd.org"
> >>>>>
> >>>>> My quest for IOPS over NFS continues :)
> >>>>> So far I'm not able to achieve more than about 3000 8K read
> >>>>> requests
> >>>>> over NFS,
> >>>>> while the server locally gives much more.
> >>>>> And this is all from a file that is completely in ARC cache, no
> >>>>> disk
> >>>>> IO involved.
> >>>>>
> >>>> Just out of curiousity, why do you use 8K reads instead of 64K
> >>>> reads.
> >>>> Since the RPC overhead (including the DRC functions) is per RPC,
> >>>> doing
> >>>> fewer larger RPCs should usually work better. (Sometimes large
> >>>> rsize/wsize
> >>>> values generate too large a burst of traffic for a network
> >>>> interface
> >>>> to
> >>>> handle and then the rsize/wsize has to be decreased to avoid this
> >>>> issue.)
> >>>>
> >>>> And, although this experiment seems useful for testing patches
> >>>> that
> >>>> try
> >>>> and reduce DRC CPU overheads, most "real" NFS servers will be
> >>>> doing
> >>>> disk
> >>>> I/O.
> >>>>
> >>>
> >>> This is the default blocksize the Oracle and probably most
> >>> databases
> >>> use.
> >>> It uses also larger blocks, but for small random reads in OLTP
> >>> applications this is what is used.
> >>>
> >> If the client is doing 8K reads, you could increase the read ahead
> >> "readahead=N" (N up to 16), to try and increase the bandwidth.
> >> (But if the CPU is 99% busy, then I don't think it will matter.)
> >
> > I'll try to check if this is possible to be set, as we are testing
> > not only with the Linux NFS client,
> > but also with the Oracle's built in so called DirectNFS client that
> > is built in to the app.
> >
> >>
> >>>
> >>>>> I've snatched some sample DTrace script from the net : [
> >>>>> http://utcc.utoronto.ca/~cks/space/blog/solaris/DTraceQuantizationNotes
> >>>>> ]
> >>>>>
> >>>>> And modified it for our new NFS server :
> >>>>>
> >>>>> #!/usr/sbin/dtrace -qs
> >>>>>
> >>>>> fbt:kernel:nfsrvd_*:entry
> >>>>> {
> >>>>> self->ts = timestamp;
> >>>>> @counts[probefunc] = count();
> >>>>> }
> >>>>>
> >>>>> fbt:kernel:nfsrvd_*:return
> >>>>> / self->ts > 0 /
> >>>>> {
> >>>>> this->delta = (timestamp-self->ts)/1000000;
> >>>>> }
> >>>>>
> >>>>> fbt:kernel:nfsrvd_*:return
> >>>>> / self->ts > 0 && this->delta > 100 /
> >>>>> {
> >>>>> @slow[probefunc, "ms"] = lquantize(this->delta, 100, 500, 50);
> >>>>> }
> >>>>>
> >>>>> fbt:kernel:nfsrvd_*:return
> >>>>> / self->ts > 0 /
> >>>>> {
> >>>>> @dist[probefunc, "ms"] = quantize(this->delta);
> >>>>> self->ts = 0;
> >>>>> }
> >>>>>
> >>>>> END
> >>>>> {
> >>>>> printf("\n");
> >>>>> printa("function %-20s %@10d\n", @counts);
> >>>>> printf("\n");
> >>>>> printa("function %s(), time in %s:%@d\n", @dist);
> >>>>> printf("\n");
> >>>>> printa("function %s(), time in %s for >= 100 ms:%@d\n", @slow);
> >>>>> }
> >>>>>
> >>>>> And here's a sample output from one or two minutes during the
> >>>>> run
> >>>>> of
> >>>>> Oracle's ORION benchmark
> >>>>> tool from a Linux machine, on a 32G file on NFS mount over 10G
> >>>>> ethernet:
> >>>>>
> >>>>> [16:01]root@goliath:/home/ndenev# ./nfsrvd.d
> >>>>> ^C
> >>>>>
> >>>>> function nfsrvd_access 4
> >>>>> function nfsrvd_statfs 10
> >>>>> function nfsrvd_getattr 14
> >>>>> function nfsrvd_commit 76
> >>>>> function nfsrvd_sentcache 110048
> >>>>> function nfsrvd_write 110048
> >>>>> function nfsrvd_read 283648
> >>>>> function nfsrvd_dorpc 393800
> >>>>> function nfsrvd_getcache 393800
> >>>>> function nfsrvd_rephead 393800
> >>>>> function nfsrvd_updatecache 393800
> >>>>>
> >>>>> function nfsrvd_access(), time in ms:
> >>>>> value ------------- Distribution ------------- count
> >>>>> -1 | 0
> >>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4
> >>>>> 1 | 0
> >>>>>
> >>>>> function nfsrvd_statfs(), time in ms:
> >>>>> value ------------- Distribution ------------- count
> >>>>> -1 | 0
> >>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
> >>>>> 1 | 0
> >>>>>
> >>>>> function nfsrvd_getattr(), time in ms:
> >>>>> value ------------- Distribution ------------- count
> >>>>> -1 | 0
> >>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14
> >>>>> 1 | 0
> >>>>>
> >>>>> function nfsrvd_sentcache(), time in ms:
> >>>>> value ------------- Distribution ------------- count
> >>>>> -1 | 0
> >>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110048
> >>>>> 1 | 0
> >>>>>
> >>>>> function nfsrvd_rephead(), time in ms:
> >>>>> value ------------- Distribution ------------- count
> >>>>> -1 | 0
> >>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
> >>>>> 1 | 0
> >>>>>
> >>>>> function nfsrvd_updatecache(), time in ms:
> >>>>> value ------------- Distribution ------------- count
> >>>>> -1 | 0
> >>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
> >>>>> 1 | 0
> >>>>>
> >>>>> function nfsrvd_getcache(), time in ms:
> >>>>> value ------------- Distribution ------------- count
> >>>>> -1 | 0
> >>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393798
> >>>>> 1 | 1
> >>>>> 2 | 0
> >>>>> 4 | 1
> >>>>> 8 | 0
> >>>>>
> >>>>> function nfsrvd_write(), time in ms:
> >>>>> value ------------- Distribution ------------- count
> >>>>> -1 | 0
> >>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110039
> >>>>> 1 | 5
> >>>>> 2 | 4
> >>>>> 4 | 0
> >>>>>
> >>>>> function nfsrvd_read(), time in ms:
> >>>>> value ------------- Distribution ------------- count
> >>>>> -1 | 0
> >>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 283622
> >>>>> 1 | 19
> >>>>> 2 | 3
> >>>>> 4 | 2
> >>>>> 8 | 0
> >>>>> 16 | 1
> >>>>> 32 | 0
> >>>>> 64 | 0
> >>>>> 128 | 0
> >>>>> 256 | 1
> >>>>> 512 | 0
> >>>>>
> >>>>> function nfsrvd_commit(), time in ms:
> >>>>> value ------------- Distribution ------------- count
> >>>>> -1 | 0
> >>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@ 44
> >>>>> 1 |@@@@@@@ 14
> >>>>> 2 | 0
> >>>>> 4 |@ 1
> >>>>> 8 |@ 1
> >>>>> 16 | 0
> >>>>> 32 |@@@@@@@ 14
> >>>>> 64 |@ 2
> >>>>> 128 | 0
> >>>>>
> >>>>>
> >>>>> function nfsrvd_commit(), time in ms for >= 100 ms:
> >>>>> value ------------- Distribution ------------- count
> >>>>> < 100 | 0
> >>>>> 100 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
> >>>>> 150 | 0
> >>>>>
> >>>>> function nfsrvd_read(), time in ms for >= 100 ms:
> >>>>> value ------------- Distribution ------------- count
> >>>>> 250 | 0
> >>>>> 300 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
> >>>>> 350 | 0
> >>>>>
> >>>>>
> >>>>> Looks like the nfs server cache functions are quite fast, but
> >>>>> extremely frequently called.
> >>>>>
> >>>> Yep, they are called for every RPC.
> >>>>
> >>>> I may try coding up a patch that replaces the single mutex with
> >>>> one for each hash bucket, for TCP.
> >>>>
> >>>> I'll post if/when I get this patch to a testing/review stage,
> >>>> rick
> >>>>
> >>>
> >>> Cool.
> >>>
> >>> I've readjusted the precision of the dtrace script a bit, and I
> >>> can
> >>> see
> >>> now the following three functions as taking most of the time :
> >>> nfsrvd_getcache(), nfsrc_trimcache() and nfsrvd_updatecache()
> >>>
> >>> This was recorded during a oracle benchmark run called SLOB, which
> >>> caused 99% cpu load on the NFS server.
> >>>
> >> Even with the drc2.patch and a large value for
> >> vfs.nfsd.tcphighwater?
> >> (Assuming the mounts are TCP ones.)
> >>
> >> Have fun with it, rick
> >>
> >
> > I had upped it, but probably not enough. I'm now running with
> > vfs.nfsd.tcphighwater set
> > to some ridiculous number, and NFSRVCACHE_HASHSIZE set to 500.
> > So far it looks like good improvement as those functions no longer
> > show up in the dtrace script output.
> > I'll run some more benchmarks and testing today.
> >
> > Thanks!
> >
> >>>
> >>>>> I hope someone can find this information useful.
> >>>>>
> >>>>> _______________________________________________
> >>>>> freebsd-hackers@freebsd.org mailing list
> >>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> >>>>> To unsubscribe, send any mail to
> >>>>> "freebsd-hackers-unsubscribe@freebsd.org"
> >
> 
> I haven't had the opportunity today to run more DB tests over NFS as
> the DBA was busy with something else,
> however I tested a bit the large file transfers.
> And I'm seeing something strange probably not only NFS but also ZFS
> and ARC related.
> 
> When I first tested the drc2.patch I reported a huge bandwidth
> improvement,
> but now I think that this was probably because of the machine freshly
> rebooted instead of just the patch.
> The patch surely improved things, especially CPU utilization combined
> with the increased cache.
> But today I'm again having a file completely cached in ZFS's ARC
> cache, which when transferred over NFS
> reaches about 300MB/s, when at some tests it reached 900+MB/s (as
> reported in my first email).
> The file locally can be read at about 3GB/s as reported by dd.
> 
> Local:
> [17:51]root@goliath:/tank/spa_db/undo# dd if=data.dbf of=/dev/null
> bs=1M
> 30720+1 records in
> 30720+1 records out
> 32212262912 bytes transferred in 10.548485 secs (3053733573 bytes/sec)
> 
> Over NFS:
> [17:48]root@spa:/mnt/spa_db/undo# dd if=data.dbf of=/dev/null bs=1M
> 30720+1 records in
> 30720+1 records out
> 32212262912 bytes (32 GB) copied, 88.0663 seconds, 366 MB/s
> 
> The machines are almost idle during this transfer and I can't see a
> reason why it can't reach the full bandwith when it's
> just reading it from RAM.
> 
Well, one thing is that you must fill the network pipe with bits. That
means that "rsize * readahead" must be >= "data rate * transit delay"
(both in same units, such as bytes/sec).
For example (given a 10Gb/s network):
65536 * 16 = 1Mbyte
- a 10Gbps rate is about 1200Mbytes/sec
--> this should fill the network "pipe" so long as the RPC transit
  delay is less than 1/1200th of a sec (or somewhat less than 1msec)

If you were using rsize=8192 and the default readahead=2:
8192 * 2 = 16Kbytes
- a 10Gbps rate is about 1200000Kbytes/sec
--> the network pipe would be full if the RPC transit delay is
  less than 16/1200000 (or a little over 10usec, unlikely for any
  real network, I think?)

I don't know what your RPC transit delay is (half of a NULL RPC RTT, but
I don't know that, either;-).
That was why I suggested "readahead=16". I'd also use "rsize=65536", but if
that worked very poorly, I'd try decreasing rsize by half (32768, 16384..)
to see what was optimal.

rick
ps: I'm assuming your dd runs are from a FreeBSD client and not the Oracle one.
pss: Did I actually get the above calculations correct without a calculator;-)

> I've tried again tracing with DTrace to see what's happening with this
> script :
> 
> fbt:kernel:nfs*:entry
> {
> this->ts = timestamp;
> @counts[probefunc] = count();
> }
> 
> fbt:kernel:nfs*:return
> / this->ts > 0 /
> {
> @time[probefunc] = avg(timestamp - this->ts);
> }
> 
> END
> {
> trunc(@counts, 10);
> trunc(@time, 10);
> printf("Top 10 called functions\n\n");
> printa(@counts);
> printf("\n\nTop 10 slowest functions\n\n");
> printa(@time);
> }
> 
> And here's the result (several seconds during the dd test):
> 
> Top 10 called functions
> nfsrc_freecache 88849
> nfsrc_wanted 88849
> nfsrv_fillattr 88849
> nfsrv_postopattr 88849
> nfsrvd_read 88849
> nfsrvd_rephead 88849
> nfsrvd_updatecache 88849
> nfsvno_testexp 88849
> nfsrc_trimcache 177697
> nfsvno_getattr 177698
> 
> Top 10 slowest functions
> nfsd_excred 5673
> nfsrc_freecache 5674
> nfsrv_postopattr 5970
> nfsrv_servertimer 6327
> nfssvc_nfscommon 6596
> nfsd_fhtovp 8000
> nfsrvd_read 8380
> nfssvc_program 92752
> nfsvno_read 124979
> nfsvno_fhtovp 1789523
> 
> I might try now to trace what nfsvno_fhtovp() is doing and where is
> spending it's time.
> 
> Any other ideas are welcome :)
> 
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 11 21:33:10 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id E932D5C1;
 Thu, 11 Oct 2012 21:33:10 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id 229128FC16;
 Thu, 11 Oct 2012 21:33:09 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEAJ+LclCDaFvO/2dsb2JhbABFFoV7uhmCIAEBAQMBAQEBIAQnIAYFBRYOCgICDRkCIwYBCSYGCAIFBAEcAQOHUgMJBgumSYgUDYlUgSGJSGYahGSBEgOTPliBVYEVig6FC4MJgUc0
X-IronPort-AV: E=Sophos;i="4.80,574,1344225600"; d="scan'208";a="186026601"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu-pri.mail.uoguelph.ca with ESMTP; 11 Oct 2012 17:33:08 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 567D3B402D;
 Thu, 11 Oct 2012 17:33:08 -0400 (EDT)
Date: Thu, 11 Oct 2012 17:33:08 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Nikolay Denev <ndenev@gmail.com>
Message-ID: <314705086.2114438.1349991188290.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <E211199E-026C-41C8-AFA4-0DB5A787391D@gmail.com>
Subject: Re: NFS server bottlenecks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@freebsd.org, Garrett Wollman <wollman@freebsd.org>,
 freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2012 21:33:11 -0000

Nikolay Denev wrote:
> On Oct 11, 2012, at 7:20 PM, Nikolay Denev <ndenev@gmail.com> wrote:
> 
> > On Oct 11, 2012, at 8:46 AM, Nikolay Denev <ndenev@gmail.com> wrote:
> >
> >>
> >> On Oct 11, 2012, at 1:09 AM, Rick Macklem <rmacklem@uoguelph.ca>
> >> wrote:
> >>
> >>> Nikolay Denev wrote:
> >>>> On Oct 10, 2012, at 3:18 AM, Rick Macklem <rmacklem@uoguelph.ca>
> >>>> wrote:
> >>>>
> >>>>> Nikolay Denev wrote:
> >>>>>> On Oct 4, 2012, at 12:36 AM, Rick Macklem
> >>>>>> <rmacklem@uoguelph.ca>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Garrett Wollman wrote:
> >>>>>>>> <<On Wed, 3 Oct 2012 09:21:06 -0400 (EDT), Rick Macklem
> >>>>>>>> <rmacklem@uoguelph.ca> said:
> >>>>>>>>
> >>>>>>>>>> Simple: just use a sepatate mutex for each list that a
> >>>>>>>>>> cache
> >>>>>>>>>> entry
> >>>>>>>>>> is on, rather than a global lock for everything. This would
> >>>>>>>>>> reduce
> >>>>>>>>>> the mutex contention, but I'm not sure how significantly
> >>>>>>>>>> since
> >>>>>>>>>> I
> >>>>>>>>>> don't have the means to measure it yet.
> >>>>>>>>>>
> >>>>>>>>> Well, since the cache trimming is removing entries from the
> >>>>>>>>> lists,
> >>>>>>>>> I
> >>>>>>>>> don't
> >>>>>>>>> see how that can be done with a global lock for list
> >>>>>>>>> updates?
> >>>>>>>>
> >>>>>>>> Well, the global lock is what we have now, but the cache
> >>>>>>>> trimming
> >>>>>>>> process only looks at one list at a time, so not locking the
> >>>>>>>> list
> >>>>>>>> that
> >>>>>>>> isn't being iterated over probably wouldn't hurt, unless
> >>>>>>>> there's
> >>>>>>>> some
> >>>>>>>> mechanism (that I didn't see) for entries to move from one
> >>>>>>>> list
> >>>>>>>> to
> >>>>>>>> another. Note that I'm considering each hash bucket a
> >>>>>>>> separate
> >>>>>>>> "list". (One issue to worry about in that case would be
> >>>>>>>> cache-line
> >>>>>>>> contention in the array of hash buckets; perhaps
> >>>>>>>> NFSRVCACHE_HASHSIZE
> >>>>>>>> ought to be increased to reduce that.)
> >>>>>>>>
> >>>>>>> Yea, a separate mutex for each hash list might help. There is
> >>>>>>> also
> >>>>>>> the
> >>>>>>> LRU list that all entries end up on, that gets used by the
> >>>>>>> trimming
> >>>>>>> code.
> >>>>>>> (I think? I wrote this stuff about 8 years ago, so I haven't
> >>>>>>> looked
> >>>>>>> at
> >>>>>>> it in a while.)
> >>>>>>>
> >>>>>>> Also, increasing the hash table size is probably a good idea,
> >>>>>>> especially
> >>>>>>> if you reduce how aggressively the cache is trimmed.
> >>>>>>>
> >>>>>>>>> Only doing it once/sec would result in a very large cache
> >>>>>>>>> when
> >>>>>>>>> bursts of
> >>>>>>>>> traffic arrives.
> >>>>>>>>
> >>>>>>>> My servers have 96 GB of memory so that's not a big deal for
> >>>>>>>> me.
> >>>>>>>>
> >>>>>>> This code was originally "production tested" on a server with
> >>>>>>> 1Gbyte,
> >>>>>>> so times have changed a bit;-)
> >>>>>>>
> >>>>>>>>> I'm not sure I see why doing it as a separate thread will
> >>>>>>>>> improve
> >>>>>>>>> things.
> >>>>>>>>> There are N nfsd threads already (N can be bumped up to 256
> >>>>>>>>> if
> >>>>>>>>> you
> >>>>>>>>> wish)
> >>>>>>>>> and having a bunch more "cache trimming threads" would just
> >>>>>>>>> increase
> >>>>>>>>> contention, wouldn't it?
> >>>>>>>>
> >>>>>>>> Only one cache-trimming thread. The cache trim holds the
> >>>>>>>> (global)
> >>>>>>>> mutex for much longer than any individual nfsd service thread
> >>>>>>>> has
> >>>>>>>> any
> >>>>>>>> need to, and having N threads doing that in parallel is why
> >>>>>>>> it's
> >>>>>>>> so
> >>>>>>>> heavily contended. If there's only one thread doing the trim,
> >>>>>>>> then
> >>>>>>>> the nfsd service threads aren't spending time either
> >>>>>>>> contending
> >>>>>>>> on
> >>>>>>>> the
> >>>>>>>> mutex (it will be held less frequently and for shorter
> >>>>>>>> periods).
> >>>>>>>>
> >>>>>>> I think the little drc2.patch which will keep the nfsd threads
> >>>>>>> from
> >>>>>>> acquiring the mutex and doing the trimming most of the time,
> >>>>>>> might
> >>>>>>> be
> >>>>>>> sufficient. I still don't see why a separate trimming thread
> >>>>>>> will
> >>>>>>> be
> >>>>>>> an advantage. I'd also be worried that the one cache trimming
> >>>>>>> thread
> >>>>>>> won't get the job done soon enough.
> >>>>>>>
> >>>>>>> When I did production testing on a 1Gbyte server that saw a
> >>>>>>> peak
> >>>>>>> load of about 100RPCs/sec, it was necessary to trim
> >>>>>>> aggressively.
> >>>>>>> (Although I'd be tempted to say that a server with 1Gbyte is
> >>>>>>> no
> >>>>>>> longer relevant, I recently recall someone trying to run
> >>>>>>> FreeBSD
> >>>>>>> on a i486, although I doubt they wanted to run the nfsd on
> >>>>>>> it.)
> >>>>>>>
> >>>>>>>>> The only negative effect I can think of w.r.t. having the
> >>>>>>>>> nfsd
> >>>>>>>>> threads doing it would be a (I believe negligible) increase
> >>>>>>>>> in
> >>>>>>>>> RPC
> >>>>>>>>> response times (the time the nfsd thread spends trimming the
> >>>>>>>>> cache).
> >>>>>>>>> As noted, I think this time would be negligible compared to
> >>>>>>>>> disk
> >>>>>>>>> I/O
> >>>>>>>>> and network transit times in the total RPC response time?
> >>>>>>>>
> >>>>>>>> With adaptive mutexes, many CPUs, lots of in-memory cache,
> >>>>>>>> and
> >>>>>>>> 10G
> >>>>>>>> network connectivity, spinning on a contended mutex takes a
> >>>>>>>> significant amount of CPU time. (For the current design of
> >>>>>>>> the
> >>>>>>>> NFS
> >>>>>>>> server, it may actually be a win to turn off adaptive mutexes
> >>>>>>>> --
> >>>>>>>> I
> >>>>>>>> should give that a try once I'm able to do more testing.)
> >>>>>>>>
> >>>>>>> Have fun with it. Let me know when you have what you think is
> >>>>>>> a
> >>>>>>> good
> >>>>>>> patch.
> >>>>>>>
> >>>>>>> rick
> >>>>>>>
> >>>>>>>> -GAWollman
> >>>>>>>> _______________________________________________
> >>>>>>>> freebsd-hackers@freebsd.org mailing list
> >>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> >>>>>>>> To unsubscribe, send any mail to
> >>>>>>>> "freebsd-hackers-unsubscribe@freebsd.org"
> >>>>>>> _______________________________________________
> >>>>>>> freebsd-fs@freebsd.org mailing list
> >>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >>>>>>> To unsubscribe, send any mail to
> >>>>>>> "freebsd-fs-unsubscribe@freebsd.org"
> >>>>>>
> >>>>>> My quest for IOPS over NFS continues :)
> >>>>>> So far I'm not able to achieve more than about 3000 8K read
> >>>>>> requests
> >>>>>> over NFS,
> >>>>>> while the server locally gives much more.
> >>>>>> And this is all from a file that is completely in ARC cache, no
> >>>>>> disk
> >>>>>> IO involved.
> >>>>>>
> >>>>> Just out of curiousity, why do you use 8K reads instead of 64K
> >>>>> reads.
> >>>>> Since the RPC overhead (including the DRC functions) is per RPC,
> >>>>> doing
> >>>>> fewer larger RPCs should usually work better. (Sometimes large
> >>>>> rsize/wsize
> >>>>> values generate too large a burst of traffic for a network
> >>>>> interface
> >>>>> to
> >>>>> handle and then the rsize/wsize has to be decreased to avoid
> >>>>> this
> >>>>> issue.)
> >>>>>
> >>>>> And, although this experiment seems useful for testing patches
> >>>>> that
> >>>>> try
> >>>>> and reduce DRC CPU overheads, most "real" NFS servers will be
> >>>>> doing
> >>>>> disk
> >>>>> I/O.
> >>>>>
> >>>>
> >>>> This is the default blocksize the Oracle and probably most
> >>>> databases
> >>>> use.
> >>>> It uses also larger blocks, but for small random reads in OLTP
> >>>> applications this is what is used.
> >>>>
> >>> If the client is doing 8K reads, you could increase the read ahead
> >>> "readahead=N" (N up to 16), to try and increase the bandwidth.
> >>> (But if the CPU is 99% busy, then I don't think it will matter.)
> >>
> >> I'll try to check if this is possible to be set, as we are testing
> >> not only with the Linux NFS client,
> >> but also with the Oracle's built in so called DirectNFS client that
> >> is built in to the app.
> >>
> >>>
> >>>>
> >>>>>> I've snatched some sample DTrace script from the net : [
> >>>>>> http://utcc.utoronto.ca/~cks/space/blog/solaris/DTraceQuantizationNotes
> >>>>>> ]
> >>>>>>
> >>>>>> And modified it for our new NFS server :
> >>>>>>
> >>>>>> #!/usr/sbin/dtrace -qs
> >>>>>>
> >>>>>> fbt:kernel:nfsrvd_*:entry
> >>>>>> {
> >>>>>> self->ts = timestamp;
> >>>>>> @counts[probefunc] = count();
> >>>>>> }
> >>>>>>
> >>>>>> fbt:kernel:nfsrvd_*:return
> >>>>>> / self->ts > 0 /
> >>>>>> {
> >>>>>> this->delta = (timestamp-self->ts)/1000000;
> >>>>>> }
> >>>>>>
> >>>>>> fbt:kernel:nfsrvd_*:return
> >>>>>> / self->ts > 0 && this->delta > 100 /
> >>>>>> {
> >>>>>> @slow[probefunc, "ms"] = lquantize(this->delta, 100, 500, 50);
> >>>>>> }
> >>>>>>
> >>>>>> fbt:kernel:nfsrvd_*:return
> >>>>>> / self->ts > 0 /
> >>>>>> {
> >>>>>> @dist[probefunc, "ms"] = quantize(this->delta);
> >>>>>> self->ts = 0;
> >>>>>> }
> >>>>>>
> >>>>>> END
> >>>>>> {
> >>>>>> printf("\n");
> >>>>>> printa("function %-20s %@10d\n", @counts);
> >>>>>> printf("\n");
> >>>>>> printa("function %s(), time in %s:%@d\n", @dist);
> >>>>>> printf("\n");
> >>>>>> printa("function %s(), time in %s for >= 100 ms:%@d\n", @slow);
> >>>>>> }
> >>>>>>
> >>>>>> And here's a sample output from one or two minutes during the
> >>>>>> run
> >>>>>> of
> >>>>>> Oracle's ORION benchmark
> >>>>>> tool from a Linux machine, on a 32G file on NFS mount over 10G
> >>>>>> ethernet:
> >>>>>>
> >>>>>> [16:01]root@goliath:/home/ndenev# ./nfsrvd.d
> >>>>>> ^C
> >>>>>>
> >>>>>> function nfsrvd_access 4
> >>>>>> function nfsrvd_statfs 10
> >>>>>> function nfsrvd_getattr 14
> >>>>>> function nfsrvd_commit 76
> >>>>>> function nfsrvd_sentcache 110048
> >>>>>> function nfsrvd_write 110048
> >>>>>> function nfsrvd_read 283648
> >>>>>> function nfsrvd_dorpc 393800
> >>>>>> function nfsrvd_getcache 393800
> >>>>>> function nfsrvd_rephead 393800
> >>>>>> function nfsrvd_updatecache 393800
> >>>>>>
> >>>>>> function nfsrvd_access(), time in ms:
> >>>>>> value ------------- Distribution ------------- count
> >>>>>> -1 | 0
> >>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4
> >>>>>> 1 | 0
> >>>>>>
> >>>>>> function nfsrvd_statfs(), time in ms:
> >>>>>> value ------------- Distribution ------------- count
> >>>>>> -1 | 0
> >>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10
> >>>>>> 1 | 0
> >>>>>>
> >>>>>> function nfsrvd_getattr(), time in ms:
> >>>>>> value ------------- Distribution ------------- count
> >>>>>> -1 | 0
> >>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14
> >>>>>> 1 | 0
> >>>>>>
> >>>>>> function nfsrvd_sentcache(), time in ms:
> >>>>>> value ------------- Distribution ------------- count
> >>>>>> -1 | 0
> >>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110048
> >>>>>> 1 | 0
> >>>>>>
> >>>>>> function nfsrvd_rephead(), time in ms:
> >>>>>> value ------------- Distribution ------------- count
> >>>>>> -1 | 0
> >>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
> >>>>>> 1 | 0
> >>>>>>
> >>>>>> function nfsrvd_updatecache(), time in ms:
> >>>>>> value ------------- Distribution ------------- count
> >>>>>> -1 | 0
> >>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393800
> >>>>>> 1 | 0
> >>>>>>
> >>>>>> function nfsrvd_getcache(), time in ms:
> >>>>>> value ------------- Distribution ------------- count
> >>>>>> -1 | 0
> >>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 393798
> >>>>>> 1 | 1
> >>>>>> 2 | 0
> >>>>>> 4 | 1
> >>>>>> 8 | 0
> >>>>>>
> >>>>>> function nfsrvd_write(), time in ms:
> >>>>>> value ------------- Distribution ------------- count
> >>>>>> -1 | 0
> >>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 110039
> >>>>>> 1 | 5
> >>>>>> 2 | 4
> >>>>>> 4 | 0
> >>>>>>
> >>>>>> function nfsrvd_read(), time in ms:
> >>>>>> value ------------- Distribution ------------- count
> >>>>>> -1 | 0
> >>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 283622
> >>>>>> 1 | 19
> >>>>>> 2 | 3
> >>>>>> 4 | 2
> >>>>>> 8 | 0
> >>>>>> 16 | 1
> >>>>>> 32 | 0
> >>>>>> 64 | 0
> >>>>>> 128 | 0
> >>>>>> 256 | 1
> >>>>>> 512 | 0
> >>>>>>
> >>>>>> function nfsrvd_commit(), time in ms:
> >>>>>> value ------------- Distribution ------------- count
> >>>>>> -1 | 0
> >>>>>> 0 |@@@@@@@@@@@@@@@@@@@@@@@ 44
> >>>>>> 1 |@@@@@@@ 14
> >>>>>> 2 | 0
> >>>>>> 4 |@ 1
> >>>>>> 8 |@ 1
> >>>>>> 16 | 0
> >>>>>> 32 |@@@@@@@ 14
> >>>>>> 64 |@ 2
> >>>>>> 128 | 0
> >>>>>>
> >>>>>>
> >>>>>> function nfsrvd_commit(), time in ms for >= 100 ms:
> >>>>>> value ------------- Distribution ------------- count
> >>>>>> < 100 | 0
> >>>>>> 100 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
> >>>>>> 150 | 0
> >>>>>>
> >>>>>> function nfsrvd_read(), time in ms for >= 100 ms:
> >>>>>> value ------------- Distribution ------------- count
> >>>>>> 250 | 0
> >>>>>> 300 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
> >>>>>> 350 | 0
> >>>>>>
> >>>>>>
> >>>>>> Looks like the nfs server cache functions are quite fast, but
> >>>>>> extremely frequently called.
> >>>>>>
> >>>>> Yep, they are called for every RPC.
> >>>>>
> >>>>> I may try coding up a patch that replaces the single mutex with
> >>>>> one for each hash bucket, for TCP.
> >>>>>
> >>>>> I'll post if/when I get this patch to a testing/review stage,
> >>>>> rick
> >>>>>
> >>>>
> >>>> Cool.
> >>>>
> >>>> I've readjusted the precision of the dtrace script a bit, and I
> >>>> can
> >>>> see
> >>>> now the following three functions as taking most of the time :
> >>>> nfsrvd_getcache(), nfsrc_trimcache() and nfsrvd_updatecache()
> >>>>
> >>>> This was recorded during a oracle benchmark run called SLOB,
> >>>> which
> >>>> caused 99% cpu load on the NFS server.
> >>>>
> >>> Even with the drc2.patch and a large value for
> >>> vfs.nfsd.tcphighwater?
> >>> (Assuming the mounts are TCP ones.)
> >>>
> >>> Have fun with it, rick
> >>>
> >>
> >> I had upped it, but probably not enough. I'm now running with
> >> vfs.nfsd.tcphighwater set
> >> to some ridiculous number, and NFSRVCACHE_HASHSIZE set to 500.
> >> So far it looks like good improvement as those functions no longer
> >> show up in the dtrace script output.
> >> I'll run some more benchmarks and testing today.
> >>
> >> Thanks!
> >>
> >>>>
> >>>>>> I hope someone can find this information useful.
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> freebsd-hackers@freebsd.org mailing list
> >>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> >>>>>> To unsubscribe, send any mail to
> >>>>>> "freebsd-hackers-unsubscribe@freebsd.org"
> >>
> >
> > I haven't had the opportunity today to run more DB tests over NFS as
> > the DBA was busy with something else,
> > however I tested a bit the large file transfers.
> > And I'm seeing something strange probably not only NFS but also ZFS
> > and ARC related.
> >
> > When I first tested the drc2.patch I reported a huge bandwidth
> > improvement,
> > but now I think that this was probably because of the machine
> > freshly rebooted instead of just the patch.
> > The patch surely improved things, especially CPU utilization
> > combined with the increased cache.
> > But today I'm again having a file completely cached in ZFS's ARC
> > cache, which when transferred over NFS
> > reaches about 300MB/s, when at some tests it reached 900+MB/s (as
> > reported in my first email).
> > The file locally can be read at about 3GB/s as reported by dd.
> >
> > Local:
> > [17:51]root@goliath:/tank/spa_db/undo# dd if=data.dbf of=/dev/null
> > bs=1M
> > 30720+1 records in
> > 30720+1 records out
> > 32212262912 bytes transferred in 10.548485 secs (3053733573
> > bytes/sec)
> >
> > Over NFS:
> > [17:48]root@spa:/mnt/spa_db/undo# dd if=data.dbf of=/dev/null bs=1M
> > 30720+1 records in
> > 30720+1 records out
> > 32212262912 bytes (32 GB) copied, 88.0663 seconds, 366 MB/s
> >
> > The machines are almost idle during this transfer and I can't see a
> > reason why it can't reach the full bandwith when it's
> > just reading it from RAM.
> >
> > I've tried again tracing with DTrace to see what's happening with
> > this script :
> >
> > fbt:kernel:nfs*:entry
> > {
> > 	this->ts = timestamp;
> > 	@counts[probefunc] = count();
> > }
> >
> > fbt:kernel:nfs*:return
> > / this->ts > 0 /
> > {
> > 	@time[probefunc] = avg(timestamp - this->ts);
> > }
> >
> > END
> > {
> > 	trunc(@counts, 10);
> > 	trunc(@time, 10);
> > 	printf("Top 10 called functions\n\n");
> > 	printa(@counts);
> > 	printf("\n\nTop 10 slowest functions\n\n");
> > 	printa(@time);
> > }
> >
> > And here's the result (several seconds during the dd test):
> >
> > Top 10 called functions
> >  nfsrc_freecache 88849
> >  nfsrc_wanted 88849
> >  nfsrv_fillattr 88849
> >  nfsrv_postopattr 88849
> >  nfsrvd_read 88849
> >  nfsrvd_rephead 88849
> >  nfsrvd_updatecache 88849
> >  nfsvno_testexp 88849
> >  nfsrc_trimcache 177697
> >  nfsvno_getattr 177698
> >
> > Top 10 slowest functions
> >  nfsd_excred 5673
> >  nfsrc_freecache 5674
> >  nfsrv_postopattr 5970
> >  nfsrv_servertimer 6327
> >  nfssvc_nfscommon 6596
> >  nfsd_fhtovp 8000
> >  nfsrvd_read 8380
> >  nfssvc_program 92752
> >  nfsvno_read 124979
> >  nfsvno_fhtovp 1789523
> >
> > I might try now to trace what nfsvno_fhtovp() is doing and where is
> > spending it's time.
> >
> > Any other ideas are welcome :)
> >
> 
> To take the network out of the equation I redid the test by mounting
> the same filesystem over NFS on the server:
> 
> [18:23]root@goliath:~# mount -t nfs -o
> rw,hard,intr,tcp,nfsv3,rsize=1048576,wsize=1048576
Just fyi, the maximum rsize,wsize is MAXBSIZE, which is 65536
for FreeBSD currently. As I noted in the other email, I'd
suggest "rsize=65536,wsize=65536,readahead=16,...".

> localhost:/tank/spa_db/undo /mnt
> [18:24]root@goliath:~# dd if=/mnt/data.dbf of=/dev/null bs=1M
> 30720+1 records in
> 30720+1 records out
> 32212262912 bytes transferred in 79.793343 secs (403696120 bytes/sec)
> [18:25]root@goliath:~# dd if=/mnt/data.dbf of=/dev/null bs=1M
> 30720+1 records in
> 30720+1 records out
> 32212262912 bytes transferred in 12.033420 secs (2676900110 bytes/sec)
> 
> During the first run I saw several nfsd threads in top, along with dd
> and again zero disk I/O.
> There was increase in memory usage because of the double buffering
> ARC->buffercahe.
> The second run was with all of the nfsd threads totally idle, and read
> directly from the buffercache.

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 11 22:02:54 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 2DAD2D35;
 Thu, 11 Oct 2012 22:02:54 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id BC0AF8FC17;
 Thu, 11 Oct 2012 22:02:53 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap0EAJ+LclCDaFvO/2dsb2JhbABFhhG6GYJKVhsODAINGQJfiBimVJF1gSGPLIESA5VrkC6DCYF7
X-IronPort-AV: E=Sophos;i="4.80,574,1344225600"; d="scan'208";a="186031155"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu-pri.mail.uoguelph.ca with ESMTP; 11 Oct 2012 18:02:52 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C0136B4096;
 Thu, 11 Oct 2012 18:02:52 -0400 (EDT)
Date: Thu, 11 Oct 2012 18:02:52 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Nikolay Denev <ndenev@gmail.com>
Message-ID: <608951636.2115684.1349992972756.JavaMail.root@erie.cs.uoguelph.ca>
Subject: Re: NFS server bottlenecks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>,
 Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Oct 2012 22:02:54 -0000

Oops, I didn't get the "readahead" option description
quite right in the last post. The default read ahead
is 1, which does result in "rsize * 2", since there is
the read + 1 readahead.

"rsize * 16" would actually be for the option "readahead=15"
and for "readahead=16" the calculation would be "rsize * 17".

However, the example was otherwise ok, I think? rick

From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct 12 15:54:58 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 4F5DE15D
 for <freebsd-hackers@freebsd.org>; Fri, 12 Oct 2012 15:54:58 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
 [IPv6:2001:470:1f10:75::2])
 by mx1.freebsd.org (Postfix) with ESMTP id 227718FC14
 for <freebsd-hackers@freebsd.org>; Fri, 12 Oct 2012 15:54:58 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 864E0B9BC;
 Fri, 12 Oct 2012 11:54:57 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Carl Delsey <carl.r.delsey@intel.com>
Subject: Re: No bus_space_read_8 on x86 ?
Date: Fri, 12 Oct 2012 11:31:46 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; )
References: <506DC574.9010300@intel.com> <201210091154.15873.jhb@freebsd.org>
 <5075EC29.1010907@intel.com>
In-Reply-To: <5075EC29.1010907@intel.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201210121131.46373.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Fri, 12 Oct 2012 11:54:57 -0400 (EDT)
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Oct 2012 15:54:58 -0000

On Wednesday, October 10, 2012 5:44:09 pm Carl Delsey wrote:
> Sorry for the slow response. I was dealing with a bit of a family 
> emergency. Responses inline below.
> 
> On 10/09/12 08:54, John Baldwin wrote:
> > On Monday, October 08, 2012 4:59:24 pm Warner Losh wrote:
> >> On Oct 5, 2012, at 10:08 AM, John Baldwin wrote:
> <snip>
> >>> I think cxgb* already have an implementation.  For amd64 we should certainly
> >>> have bus_space_*_8(), at least for SYS_RES_MEMORY.  I think they should fail
> >>> for SYS_RES_IOPORT.  I don't think we can force a compile-time error though,
> >>> would just have to return -1 on reads or some such?
> 
> Yes. Exactly what I was thinking.
> 
> >> I believe it was because bus reads weren't guaranteed to be atomic on i386.
> >> don't know if that's still the case or a concern, but it was an intentional omission.
> > True.  If you are on a 32-bit system you can read the two 4 byte values and
> > then build a 64-bit value.  For 64-bit platforms we should offer bus_read_8()
> > however.
> 
> I believe there is still no way to perform a 64-bit read on a i386 (or 
> at least without messing with SSE instructions), but if you have to read 
> a 64-bit register, you are stuck with doing two 32-bit reads and 
> concatenating them. I figure we may as well provide an implementation 
> for those who have to do that as well as the implementation for 64-bit.

I think the problem though is that the way you should glue those two 32-bit
reads together is device dependent.  I don't think you can provide a completely
device-neutral bus_read_8() on i386.  We should certainly have it on 64-bit
platforms, but I think drivers that want to work on 32-bit platforms need to
explicitly merge the two words themselves.
 
-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct 12 16:04:28 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 3382D935;
 Fri, 12 Oct 2012 16:04:28 +0000 (UTC)
 (envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 08D398FC14;
 Fri, 12 Oct 2012 16:04:28 +0000 (UTC)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
 by cyrus.watson.org (Postfix) with ESMTPS id B5C3C46B09;
 Fri, 12 Oct 2012 12:04:27 -0400 (EDT)
Date: Fri, 12 Oct 2012 17:04:27 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: John Baldwin <jhb@freebsd.org>
Subject: Re: No bus_space_read_8 on x86 ?
In-Reply-To: <201210121131.46373.jhb@freebsd.org>
Message-ID: <alpine.BSF.2.00.1210121700220.62497@fledge.watson.org>
References: <506DC574.9010300@intel.com> <201210091154.15873.jhb@freebsd.org>
 <5075EC29.1010907@intel.com> <201210121131.46373.jhb@freebsd.org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-hackers@freebsd.org, Carl Delsey <carl.r.delsey@intel.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Oct 2012 16:04:28 -0000


On Fri, 12 Oct 2012, John Baldwin wrote:

>>>> I believe it was because bus reads weren't guaranteed to be atomic on 
>>>> i386. don't know if that's still the case or a concern, but it was an 
>>>> intentional omission.
>>> True.  If you are on a 32-bit system you can read the two 4 byte values 
>>> and then build a 64-bit value.  For 64-bit platforms we should offer 
>>> bus_read_8() however.
>>
>> I believe there is still no way to perform a 64-bit read on a i386 (or at 
>> least without messing with SSE instructions), but if you have to read a 
>> 64-bit register, you are stuck with doing two 32-bit reads and 
>> concatenating them. I figure we may as well provide an implementation for 
>> those who have to do that as well as the implementation for 64-bit.
>
> I think the problem though is that the way you should glue those two 32-bit 
> reads together is device dependent.  I don't think you can provide a 
> completely device-neutral bus_read_8() on i386.  We should certainly have it 
> on 64-bit platforms, but I think drivers that want to work on 32-bit 
> platforms need to explicitly merge the two words themselves.

Indeed -- and on non-x86, where there are uncached direct map segments, and 
TLB entries that disable caching, reading 2x 32-bit vs 1x 64-bit have quite 
different effects in terms of atomicity.  Where uncached I/Os are being used, 
those differences may affect semantics significantly -- e.g., if your device 
has a 64-bit memory-mapped FIFO or registers, 2x 32-bit gives you two halves 
of two different 64-bit values, rather than two halves of the same value.  As 
device drivers depend on those atomicity semantics, we should (at the busspace 
level) offer only the exactly expected semantics, rather than trying to patch 
things up.  If a device driver accessing 64-bit fields wants to support doing 
it using two 32-bit reads, it can figure out how to splice it together 
following bus_space_read_region_4().

Robert

From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct 12 17:46:11 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id B19CE535;
 Fri, 12 Oct 2012 17:46:11 +0000 (UTC)
 (envelope-from carl.r.delsey@intel.com)
Received: from mga14.intel.com (mga14.intel.com [143.182.124.37])
 by mx1.freebsd.org (Postfix) with ESMTP id 72D258FC14;
 Fri, 12 Oct 2012 17:46:11 +0000 (UTC)
Received: from azsmga001.ch.intel.com ([10.2.17.19])
 by azsmga102.ch.intel.com with ESMTP; 12 Oct 2012 10:46:05 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.80,577,1344236400"; d="scan'208";a="203822158"
Received: from crdelsey-mobl2.amr.corp.intel.com (HELO [10.255.71.218])
 ([10.255.71.218])
 by azsmga001.ch.intel.com with ESMTP; 12 Oct 2012 10:46:04 -0700
Message-ID: <5078575B.2020808@intel.com>
Date: Fri, 12 Oct 2012 10:46:03 -0700
From: Carl Delsey <carl.r.delsey@intel.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:15.0) Gecko/20120824 Thunderbird/15.0
MIME-Version: 1.0
To: Robert Watson <rwatson@FreeBSD.org>
Subject: Re: No bus_space_read_8 on x86 ?
References: <506DC574.9010300@intel.com> <201210091154.15873.jhb@freebsd.org>
 <5075EC29.1010907@intel.com> <201210121131.46373.jhb@freebsd.org>
 <alpine.BSF.2.00.1210121700220.62497@fledge.watson.org>
In-Reply-To: <alpine.BSF.2.00.1210121700220.62497@fledge.watson.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Oct 2012 17:46:11 -0000

On 10/12/2012 9:04 AM, Robert Watson wrote:
>
> On Fri, 12 Oct 2012, John Baldwin wrote:
>
>>>>> I believe it was because bus reads weren't guaranteed to be atomic 
>>>>> on i386. don't know if that's still the case or a concern, but it 
>>>>> was an intentional omission.
>>>> True.  If you are on a 32-bit system you can read the two 4 byte 
>>>> values and then build a 64-bit value.  For 64-bit platforms we 
>>>> should offer bus_read_8() however.
>>>
>>> I believe there is still no way to perform a 64-bit read on a i386 
>>> (or at least without messing with SSE instructions), but if you have 
>>> to read a 64-bit register, you are stuck with doing two 32-bit reads 
>>> and concatenating them. I figure we may as well provide an 
>>> implementation for those who have to do that as well as the 
>>> implementation for 64-bit.
>>
>> I think the problem though is that the way you should glue those two 
>> 32-bit reads together is device dependent.  I don't think you can 
>> provide a completely device-neutral bus_read_8() on i386.  We should 
>> certainly have it on 64-bit platforms, but I think drivers that want 
>> to work on 32-bit platforms need to explicitly merge the two words 
>> themselves.
>
> Indeed -- and on non-x86, where there are uncached direct map 
> segments, and TLB entries that disable caching, reading 2x 32-bit vs 
> 1x 64-bit have quite different effects in terms of atomicity. Where 
> uncached I/Os are being used, those differences may affect semantics 
> significantly -- e.g., if your device has a 64-bit memory-mapped FIFO 
> or registers, 2x 32-bit gives you two halves of two different 64-bit 
> values, rather than two halves of the same value.  As device drivers 
> depend on those atomicity semantics, we should (at the busspace level) 
> offer only the exactly expected semantics, rather than trying to patch 
> things up.  If a device driver accessing 64-bit fields wants to 
> support doing it using two 32-bit reads, it can figure out how to 
> splice it together following bus_space_read_region_4().
I wouldn't make any default behaviour for bus_space_read_8 on i386, just 
amd64. My assumption (which may be unjustified) is that by far the most 
common implementations to read a 64-bit register on i386 would be to 
read the lower 4 bytes first, followed by the upper 4 bytes (or vice 
versa) and then stitch them together.  I think we should provide helper 
functions for these two cases, otherwise I fear our code base will be 
littered with multiple independent implementations of this.

Some driver writer who wants to take advantage of these helper functions 
would do something like
#ifdef i386
#define    bus_space_read_8    bus_space_read_8_lower_first
#endif
otherwise, using bus_space_read_8 won't compile for i386 builds.
If these implementations won't work for their case, they are free to 
write their own implementation or take whatever action is necessary.

I guess my question is, are these cases common enough that it is worth 
helping developers by providing functions that do the double read and 
shifts for them, or do we leave them to deal with it on their own at the 
risk of possibly some duplicated code.

Thanks,
Carl


From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 13 02:06:01 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id D942BA8;
 Sat, 13 Oct 2012 02:06:01 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id 3059A8FC17;
 Sat, 13 Oct 2012 02:06:00 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEAN6MclCDaFvO/2dsb2JhbAA9CBaFe7oZgiABAQEEAQEBIAQnIAsFFg4KERkCBCUBCSYGCAcEARwEh2QLpkyRd4tlBIRkgRIDjm6EUIItgRWPGYMJgUc0
X-IronPort-AV: E=Sophos;i="4.80,580,1344225600"; d="scan'208";a="183450178"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 12 Oct 2012 22:05:54 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 168AEB4034;
 Fri, 12 Oct 2012 22:05:54 -0400 (EDT)
Date: Fri, 12 Oct 2012 22:05:54 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Nikolay Denev <ndenev@gmail.com>
Message-ID: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <608951636.2115684.1349992972756.JavaMail.root@erie.cs.uoguelph.ca>
Subject: Re: NFS server bottlenecks
MIME-Version: 1.0
Content-Type: multipart/mixed; 
 boundary="----=_Part_2185821_351431992.1350093954057"
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>,
 Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Oct 2012 02:06:01 -0000

------=_Part_2185821_351431992.1350093954057
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

I wrote:
> Oops, I didn't get the "readahead" option description
> quite right in the last post. The default read ahead
> is 1, which does result in "rsize * 2", since there is
> the read + 1 readahead.
> 
> "rsize * 16" would actually be for the option "readahead=15"
> and for "readahead=16" the calculation would be "rsize * 17".
> 
> However, the example was otherwise ok, I think? rick

I've attached the patch drc3.patch (it assumes drc2.patch has already been
applied) that replaces the single mutex with one for each hash list
for tcp. It also increases the size of NFSRVCACHE_HASHSIZE to 200.

These patches are also at:
  http://people.freebsd.org/~rmacklem/drc2.patch
  http://people.freebsd.org/~rmacklem/drc3.patch
in case the attachments don't get through.

rick
ps: I haven't tested drc3.patch a lot, but I think it's ok?

> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"

------=_Part_2185821_351431992.1350093954057
Content-Type: text/x-patch; name=drc2.patch
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=drc2.patch

LS0tIGZzL25mc3NlcnZlci9uZnNfbmZzZGNhY2hlLmMub3JpZwkyMDEyLTAyLTI5IDIxOjA3OjUz
LjAwMDAwMDAwMCAtMDUwMAorKysgZnMvbmZzc2VydmVyL25mc19uZnNkY2FjaGUuYwkyMDEyLTEw
LTAzIDA4OjIzOjI0LjAwMDAwMDAwMCAtMDQwMApAQCAtMTY0LDggKzE2NCwxOSBAQCBORlNDQUNI
RU1VVEVYOwogaW50IG5mc3JjX2Zsb29kbGV2ZWwgPSBORlNSVkNBQ0hFX0ZMT09ETEVWRUwsIG5m
c3JjX3RjcHNhdmVkcmVwbGllcyA9IDA7CiAjZW5kaWYJLyogIUFQUExFS0VYVCAqLwogCitTWVND
VExfREVDTChfdmZzX25mc2QpOworCitzdGF0aWMgaW50CW5mc3JjX3RjcGhpZ2h3YXRlciA9IDA7
CitTWVNDVExfSU5UKF92ZnNfbmZzZCwgT0lEX0FVVE8sIHRjcGhpZ2h3YXRlciwgQ1RMRkxBR19S
VywKKyAgICAmbmZzcmNfdGNwaGlnaHdhdGVyLCAwLAorICAgICJIaWdoIHdhdGVyIG1hcmsgZm9y
IFRDUCBjYWNoZSBlbnRyaWVzIik7CitzdGF0aWMgaW50CW5mc3JjX3VkcGhpZ2h3YXRlciA9IE5G
U1JWQ0FDSEVfVURQSElHSFdBVEVSOworU1lTQ1RMX0lOVChfdmZzX25mc2QsIE9JRF9BVVRPLCB1
ZHBoaWdod2F0ZXIsIENUTEZMQUdfUlcsCisgICAgJm5mc3JjX3VkcGhpZ2h3YXRlciwgMCwKKyAg
ICAiSGlnaCB3YXRlciBtYXJrIGZvciBVRFAgY2FjaGUgZW50cmllcyIpOworCiBzdGF0aWMgaW50
IG5mc3JjX3RjcG5vbmlkZW1wb3RlbnQgPSAxOwotc3RhdGljIGludCBuZnNyY191ZHBoaWdod2F0
ZXIgPSBORlNSVkNBQ0hFX1VEUEhJR0hXQVRFUiwgbmZzcmNfdWRwY2FjaGVzaXplID0gMDsKK3N0
YXRpYyBpbnQgbmZzcmNfdWRwY2FjaGVzaXplID0gMDsKIHN0YXRpYyBUQUlMUV9IRUFEKCwgbmZz
cnZjYWNoZSkgbmZzcnZ1ZHBscnU7CiBzdGF0aWMgc3RydWN0IG5mc3J2aGFzaGhlYWQgbmZzcnZo
YXNodGJsW05GU1JWQ0FDSEVfSEFTSFNJWkVdLAogICAgIG5mc3J2dWRwaGFzaHRibFtORlNSVkNB
Q0hFX0hBU0hTSVpFXTsKQEAgLTc4MSw4ICs3OTIsMTUgQEAgbmZzcmNfdHJpbWNhY2hlKHVfaW50
NjRfdCBzb2NrcmVmLCBzdHJ1YwogewogCXN0cnVjdCBuZnNydmNhY2hlICpycCwgKm5leHRycDsK
IAlpbnQgaTsKKwlzdGF0aWMgdGltZV90IGxhc3R0cmltID0gMDsKIAorCWlmIChORlNEX01PTk9T
RUMgPT0gbGFzdHRyaW0gJiYKKwkgICAgbmZzcmNfdGNwc2F2ZWRyZXBsaWVzIDwgbmZzcmNfdGNw
aGlnaHdhdGVyICYmCisJICAgIG5mc3JjX3VkcGNhY2hlc2l6ZSA8IChuZnNyY191ZHBoaWdod2F0
ZXIgKworCSAgICBuZnNyY191ZHBoaWdod2F0ZXIgLyAyKSkKKwkJcmV0dXJuOwogCU5GU0xPQ0tD
QUNIRSgpOworCWxhc3R0cmltID0gTkZTRF9NT05PU0VDOwogCVRBSUxRX0ZPUkVBQ0hfU0FGRShy
cCwgJm5mc3J2dWRwbHJ1LCByY19scnUsIG5leHRycCkgewogCQlpZiAoIShycC0+cmNfZmxhZyAm
IChSQ19JTlBST0d8UkNfTE9DS0VEfFJDX1dBTlRFRCkpCiAJCSAgICAgJiYgcnAtPnJjX3JlZmNu
dCA9PSAwCg==
------=_Part_2185821_351431992.1350093954057
Content-Type: text/x-patch; name=drc3.patch
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=drc3.patch

LS0tIGZzL25mc3NlcnZlci9uZnNfbmZzZGNhY2hlLmMuc2F2CTIwMTItMTAtMTAgMTg6NTY6MDEu
MDAwMDAwMDAwIC0wNDAwCisrKyBmcy9uZnNzZXJ2ZXIvbmZzX25mc2RjYWNoZS5jCTIwMTItMTAt
MTIgMjE6MDQ6MjEuMDAwMDAwMDAwIC0wNDAwCkBAIC0xNjAsNyArMTYwLDggQEAgX19GQlNESUQo
IiRGcmVlQlNEOiBoZWFkL3N5cy9mcy9uZnNzZXJ2ZQogI2luY2x1ZGUgPGZzL25mcy9uZnNwb3J0
Lmg+CiAKIGV4dGVybiBzdHJ1Y3QgbmZzc3RhdHMgbmV3bmZzc3RhdHM7Ci1ORlNDQUNIRU1VVEVY
OworZXh0ZXJuIHN0cnVjdCBtdHggbmZzcmNfdGNwbXR4W05GU1JWQ0FDSEVfSEFTSFNJWkVdOwor
ZXh0ZXJuIHN0cnVjdCBtdHggbmZzcmNfdWRwbXR4OwogaW50IG5mc3JjX2Zsb29kbGV2ZWwgPSBO
RlNSVkNBQ0hFX0ZMT09ETEVWRUwsIG5mc3JjX3RjcHNhdmVkcmVwbGllcyA9IDA7CiAjZW5kaWYJ
LyogIUFQUExFS0VYVCAqLwogCkBAIC0yMDgsMTAgKzIwOSwxMSBAQCBzdGF0aWMgaW50IG5ld25m
c3YyX3Byb2NpZFtORlNfVjNOUFJPQ1NdCiAJTkZTVjJQUk9DX05PT1AsCiB9OwogCisjZGVmaW5l
CW5mc3JjX2hhc2goeGlkKQkoKCh4aWQpICsgKCh4aWQpID4+IDI0KSkgJSBORlNSVkNBQ0hFX0hB
U0hTSVpFKQogI2RlZmluZQlORlNSQ1VEUEhBU0goeGlkKSBcCi0JKCZuZnNydnVkcGhhc2h0Ymxb
KCh4aWQpICsgKCh4aWQpID4+IDI0KSkgJSBORlNSVkNBQ0hFX0hBU0hTSVpFXSkKKwkoJm5mc3J2
dWRwaGFzaHRibFtuZnNyY19oYXNoKHhpZCldKQogI2RlZmluZQlORlNSQ0hBU0goeGlkKSBcCi0J
KCZuZnNydmhhc2h0YmxbKCh4aWQpICsgKCh4aWQpID4+IDI0KSkgJSBORlNSVkNBQ0hFX0hBU0hT
SVpFXSkKKwkoJm5mc3J2aGFzaHRibFtuZnNyY19oYXNoKHhpZCldKQogI2RlZmluZQlUUlVFCTEK
ICNkZWZpbmUJRkFMU0UJMAogI2RlZmluZQlORlNSVkNBQ0hFX0NIRUNLTEVOCTEwMApAQCAtMjYy
LDYgKzI2NCwxOCBAQCBzdGF0aWMgaW50IG5mc3JjX2dldGxlbmFuZGNrc3VtKG1idWZfdCBtCiBz
dGF0aWMgdm9pZCBuZnNyY19tYXJrc2FtZXRjcGNvbm4odV9pbnQ2NF90KTsKIAogLyoKKyAqIFJl
dHVybiB0aGUgY29ycmVjdCBtdXRleCBmb3IgdGhpcyBjYWNoZSBlbnRyeS4KKyAqLworc3RhdGlj
IF9faW5saW5lIHN0cnVjdCBtdHggKgorbmZzcmNfY2FjaGVtdXRleChzdHJ1Y3QgbmZzcnZjYWNo
ZSAqcnApCit7CisKKwlpZiAoKHJwLT5yY19mbGFnICYgUkNfVURQKSAhPSAwKQorCQlyZXR1cm4g
KCZuZnNyY191ZHBtdHgpOworCXJldHVybiAoJm5mc3JjX3RjcG10eFtuZnNyY19oYXNoKHJwLT5y
Y194aWQpXSk7Cit9CisKKy8qCiAgKiBJbml0aWFsaXplIHRoZSBzZXJ2ZXIgcmVxdWVzdCBjYWNo
ZSBsaXN0CiAgKi8KIEFQUExFU1RBVElDIHZvaWQKQEAgLTMzNiwxMCArMzUwLDEyIEBAIG5mc3Jj
X2dldHVkcChzdHJ1Y3QgbmZzcnZfZGVzY3JpcHQgKm5kLCAKIAlzdHJ1Y3Qgc29ja2FkZHJfaW42
ICpzYWRkcjY7CiAJc3RydWN0IG5mc3J2aGFzaGhlYWQgKmhwOwogCWludCByZXQgPSAwOworCXN0
cnVjdCBtdHggKm11dGV4OwogCisJbXV0ZXggPSBuZnNyY19jYWNoZW11dGV4KG5ld3JwKTsKIAlo
cCA9IE5GU1JDVURQSEFTSChuZXdycC0+cmNfeGlkKTsKIGxvb3A6Ci0JTkZTTE9DS0NBQ0hFKCk7
CisJbXR4X2xvY2sobXV0ZXgpOwogCUxJU1RfRk9SRUFDSChycCwgaHAsIHJjX2hhc2gpIHsKIAkg
ICAgaWYgKG5ld3JwLT5yY194aWQgPT0gcnAtPnJjX3hpZCAmJgogCQluZXdycC0+cmNfcHJvYyA9
PSBycC0+cmNfcHJvYyAmJgpAQCAtMzQ3LDggKzM2Myw4IEBAIGxvb3A6CiAJCW5mc2FkZHJfbWF0
Y2goTkVURkFNSUxZKHJwKSwgJnJwLT5yY19oYWRkciwgbmQtPm5kX25hbSkpIHsKIAkJCWlmICgo
cnAtPnJjX2ZsYWcgJiBSQ19MT0NLRUQpICE9IDApIHsKIAkJCQlycC0+cmNfZmxhZyB8PSBSQ19X
QU5URUQ7Ci0JCQkJKHZvaWQpbXR4X3NsZWVwKHJwLCBORlNDQUNIRU1VVEVYUFRSLAotCQkJCSAg
ICAoUFpFUk8gLSAxKSB8IFBEUk9QLCAibmZzcmMiLCAxMCAqIGh6KTsKKwkJCQkodm9pZCltdHhf
c2xlZXAocnAsIG11dGV4LCAoUFpFUk8gLSAxKSB8IFBEUk9QLAorCQkJCSAgICAibmZzcmMiLCAx
MCAqIGh6KTsKIAkJCQlnb3RvIGxvb3A7CiAJCQl9CiAJCQlpZiAocnAtPnJjX2ZsYWcgPT0gMCkK
QEAgLTM1OCwxNCArMzc0LDE0IEBAIGxvb3A6CiAJCQlUQUlMUV9JTlNFUlRfVEFJTCgmbmZzcnZ1
ZHBscnUsIHJwLCByY19scnUpOwogCQkJaWYgKHJwLT5yY19mbGFnICYgUkNfSU5QUk9HKSB7CiAJ
CQkJbmV3bmZzc3RhdHMuc3J2Y2FjaGVfaW5wcm9naGl0cysrOwotCQkJCU5GU1VOTE9DS0NBQ0hF
KCk7CisJCQkJbXR4X3VubG9jayhtdXRleCk7CiAJCQkJcmV0ID0gUkNfRFJPUElUOwogCQkJfSBl
bHNlIGlmIChycC0+cmNfZmxhZyAmIFJDX1JFUFNUQVRVUykgewogCQkJCS8qCiAJCQkJICogVjIg
b25seS4KIAkJCQkgKi8KIAkJCQluZXduZnNzdGF0cy5zcnZjYWNoZV9ub25pZGVtZG9uZWhpdHMr
KzsKLQkJCQlORlNVTkxPQ0tDQUNIRSgpOworCQkJCW10eF91bmxvY2sobXV0ZXgpOwogCQkJCW5m
c3J2ZF9yZXBoZWFkKG5kKTsKIAkJCQkqKG5kLT5uZF9lcnJwKSA9IHJwLT5yY19zdGF0dXM7CiAJ
CQkJcmV0ID0gUkNfUkVQTFk7CkBAIC0zNzMsNyArMzg5LDcgQEAgbG9vcDoKIAkJCQkJTkZTUlZD
QUNIRV9VRFBUSU1FT1VUOwogCQkJfSBlbHNlIGlmIChycC0+cmNfZmxhZyAmIFJDX1JFUE1CVUYp
IHsKIAkJCQluZXduZnNzdGF0cy5zcnZjYWNoZV9ub25pZGVtZG9uZWhpdHMrKzsKLQkJCQlORlNV
TkxPQ0tDQUNIRSgpOworCQkJCW10eF91bmxvY2sobXV0ZXgpOwogCQkJCW5kLT5uZF9tcmVxID0g
bV9jb3B5bShycC0+cmNfcmVwbHksIDAsCiAJCQkJCU1fQ09QWUFMTCwgTV9XQUlUKTsKIAkJCQly
ZXQgPSBSQ19SRVBMWTsKQEAgLTQwMyw3ICs0MTksNyBAQCBsb29wOgogCX0KIAlMSVNUX0lOU0VS
VF9IRUFEKGhwLCBuZXdycCwgcmNfaGFzaCk7CiAJVEFJTFFfSU5TRVJUX1RBSUwoJm5mc3J2dWRw
bHJ1LCBuZXdycCwgcmNfbHJ1KTsKLQlORlNVTkxPQ0tDQUNIRSgpOworCW10eF91bmxvY2sobXV0
ZXgpOwogCW5kLT5uZF9ycCA9IG5ld3JwOwogCXJldCA9IFJDX0RPSVQ7CiAKQEAgLTQyMSwxMiAr
NDM3LDE0IEBAIG5mc3J2ZF91cGRhdGVjYWNoZShzdHJ1Y3QgbmZzcnZfZGVzY3JpcHQKIAlzdHJ1
Y3QgbmZzcnZjYWNoZSAqcnA7CiAJc3RydWN0IG5mc3J2Y2FjaGUgKnJldHJwID0gTlVMTDsKIAlt
YnVmX3QgbTsKKwlzdHJ1Y3QgbXR4ICptdXRleDsKIAogCXJwID0gbmQtPm5kX3JwOwogCWlmICgh
cnApCiAJCXBhbmljKCJuZnNydmRfdXBkYXRlY2FjaGUgbnVsbCBycCIpOwogCW5kLT5uZF9ycCA9
IE5VTEw7Ci0JTkZTTE9DS0NBQ0hFKCk7CisJbXV0ZXggPSBuZnNyY19jYWNoZW11dGV4KHJwKTsK
KwltdHhfbG9jayhtdXRleCk7CiAJbmZzcmNfbG9jayhycCk7CiAJaWYgKCEocnAtPnJjX2ZsYWcg
JiBSQ19JTlBST0cpKQogCQlwYW5pYygibmZzcnZkX3VwZGF0ZWNhY2hlIG5vdCBpbnByb2ciKTsK
QEAgLTQ0MSw3ICs0NTksNyBAQCBuZnNydmRfdXBkYXRlY2FjaGUoc3RydWN0IG5mc3J2X2Rlc2Ny
aXB0CiAJICovCiAJaWYgKG5kLT5uZF9yZXBzdGF0ID09IE5GU0VSUl9SRVBMWUZST01DQUNIRSkg
ewogCQluZXduZnNzdGF0cy5zcnZjYWNoZV9ub25pZGVtZG9uZWhpdHMrKzsKLQkJTkZTVU5MT0NL
Q0FDSEUoKTsKKwkJbXR4X3VubG9jayhtdXRleCk7CiAJCW5kLT5uZF9yZXBzdGF0ID0gMDsKIAkJ
aWYgKG5kLT5uZF9tcmVxKQogCQkJbWJ1Zl9mcmVlbShuZC0+bmRfbXJlcSk7CkBAIC00NzQsMjEg
KzQ5MiwyMSBAQCBuZnNydmRfdXBkYXRlY2FjaGUoc3RydWN0IG5mc3J2X2Rlc2NyaXB0CiAJCSAg
ICBuZnN2Ml9yZXBzdGF0W25ld25mc3YyX3Byb2NpZFtuZC0+bmRfcHJvY251bV1dKSB7CiAJCQly
cC0+cmNfc3RhdHVzID0gbmQtPm5kX3JlcHN0YXQ7CiAJCQlycC0+cmNfZmxhZyB8PSBSQ19SRVBT
VEFUVVM7Ci0JCQlORlNVTkxPQ0tDQUNIRSgpOworCQkJbXR4X3VubG9jayhtdXRleCk7CiAJCX0g
ZWxzZSB7CiAJCQlpZiAoIShycC0+cmNfZmxhZyAmIFJDX1VEUCkpIHsKLQkJCSAgICBuZnNyY190
Y3BzYXZlZHJlcGxpZXMrKzsKKwkJCSAgICBhdG9taWNfYWRkX2ludCgmbmZzcmNfdGNwc2F2ZWRy
ZXBsaWVzLCAxKTsKIAkJCSAgICBpZiAobmZzcmNfdGNwc2F2ZWRyZXBsaWVzID4KIAkJCQluZXdu
ZnNzdGF0cy5zcnZjYWNoZV90Y3BwZWFrKQogCQkJCW5ld25mc3N0YXRzLnNydmNhY2hlX3RjcHBl
YWsgPQogCQkJCSAgICBuZnNyY190Y3BzYXZlZHJlcGxpZXM7CiAJCQl9Ci0JCQlORlNVTkxPQ0tD
QUNIRSgpOworCQkJbXR4X3VubG9jayhtdXRleCk7CiAJCQltID0gbV9jb3B5bShuZC0+bmRfbXJl
cSwgMCwgTV9DT1BZQUxMLCBNX1dBSVQpOwotCQkJTkZTTE9DS0NBQ0hFKCk7CisJCQltdHhfbG9j
ayhtdXRleCk7CiAJCQlycC0+cmNfcmVwbHkgPSBtOwogCQkJcnAtPnJjX2ZsYWcgfD0gUkNfUkVQ
TUJVRjsKLQkJCU5GU1VOTE9DS0NBQ0hFKCk7CisJCQltdHhfdW5sb2NrKG11dGV4KTsKIAkJfQog
CQlpZiAocnAtPnJjX2ZsYWcgJiBSQ19VRFApIHsKIAkJCXJwLT5yY190aW1lc3RhbXAgPSBORlNE
X01PTk9TRUMgKwpAQCAtNTA0LDcgKzUyMiw3IEBAIG5mc3J2ZF91cGRhdGVjYWNoZShzdHJ1Y3Qg
bmZzcnZfZGVzY3JpcHQKIAkJfQogCX0gZWxzZSB7CiAJCW5mc3JjX2ZyZWVjYWNoZShycCk7Ci0J
CU5GU1VOTE9DS0NBQ0hFKCk7CisJCW10eF91bmxvY2sobXV0ZXgpOwogCX0KIAogb3V0OgpAQCAt
NTIwLDE0ICs1MzgsMTYgQEAgb3V0OgogQVBQTEVTVEFUSUMgdm9pZAogbmZzcnZkX2RlbGNhY2hl
KHN0cnVjdCBuZnNydmNhY2hlICpycCkKIHsKKwlzdHJ1Y3QgbXR4ICptdXRleDsKIAorCW11dGV4
ID0gbmZzcmNfY2FjaGVtdXRleChycCk7CiAJaWYgKCEocnAtPnJjX2ZsYWcgJiBSQ19JTlBST0cp
KQogCQlwYW5pYygibmZzcnZkX2RlbGNhY2hlIG5vdCBpbiBwcm9nIik7Ci0JTkZTTE9DS0NBQ0hF
KCk7CisJbXR4X2xvY2sobXV0ZXgpOwogCXJwLT5yY19mbGFnICY9IH5SQ19JTlBST0c7CiAJaWYg
KHJwLT5yY19yZWZjbnQgPT0gMCAmJiAhKHJwLT5yY19mbGFnICYgUkNfTE9DS0VEKSkKIAkJbmZz
cmNfZnJlZWNhY2hlKHJwKTsKLQlORlNVTkxPQ0tDQUNIRSgpOworCW10eF91bmxvY2sobXV0ZXgp
OwogfQogCiAvKgpAQCAtNTM5LDcgKzU1OSw5IEBAIEFQUExFU1RBVElDIHZvaWQKIG5mc3J2ZF9z
ZW50Y2FjaGUoc3RydWN0IG5mc3J2Y2FjaGUgKnJwLCBzdHJ1Y3Qgc29ja2V0ICpzbywgaW50IGVy
cikKIHsKIAl0Y3Bfc2VxIHRtcF9zZXE7CisJc3RydWN0IG10eCAqbXV0ZXg7CiAKKwltdXRleCA9
IG5mc3JjX2NhY2hlbXV0ZXgocnApOwogCWlmICghKHJwLT5yY19mbGFnICYgUkNfTE9DS0VEKSkK
IAkJcGFuaWMoIm5mc3J2ZF9zZW50Y2FjaGUgbm90IGxvY2tlZCIpOwogCWlmICghZXJyKSB7CkBA
IC01NDgsMTAgKzU3MCwxMCBAQCBuZnNydmRfc2VudGNhY2hlKHN0cnVjdCBuZnNydmNhY2hlICpy
cCwgCiAJCSAgICAgc28tPnNvX3Byb3RvLT5wcl9wcm90b2NvbCAhPSBJUFBST1RPX1RDUCkKIAkJ
CXBhbmljKCJuZnMgc2VudCBjYWNoZSIpOwogCQlpZiAobmZzcnZfZ2V0c29ja3NlcW51bShzbywg
JnRtcF9zZXEpKSB7Ci0JCQlORlNMT0NLQ0FDSEUoKTsKKwkJCW10eF9sb2NrKG11dGV4KTsKIAkJ
CXJwLT5yY190Y3BzZXEgPSB0bXBfc2VxOwogCQkJcnAtPnJjX2ZsYWcgfD0gUkNfVENQU0VROwot
CQkJTkZTVU5MT0NLQ0FDSEUoKTsKKwkJCW10eF91bmxvY2sobXV0ZXgpOwogCQl9CiAJfQogCW5m
c3JjX3VubG9jayhycCk7CkBAIC01NzAsMTEgKzU5MiwxMyBAQCBuZnNyY19nZXR0Y3Aoc3RydWN0
IG5mc3J2X2Rlc2NyaXB0ICpuZCwgCiAJc3RydWN0IG5mc3J2Y2FjaGUgKmhpdHJwOwogCXN0cnVj
dCBuZnNydmhhc2hoZWFkICpocCwgbmZzcmNfdGVtcGxpc3Q7CiAJaW50IGhpdCwgcmV0ID0gMDsK
KwlzdHJ1Y3QgbXR4ICptdXRleDsKIAorCW11dGV4ID0gbmZzcmNfY2FjaGVtdXRleChuZXdycCk7
CiAJaHAgPSBORlNSQ0hBU0gobmV3cnAtPnJjX3hpZCk7CiAJbmV3cnAtPnJjX3JlcWxlbiA9IG5m
c3JjX2dldGxlbmFuZGNrc3VtKG5kLT5uZF9tcmVwLCAmbmV3cnAtPnJjX2Nrc3VtKTsKIHRyeWFn
YWluOgotCU5GU0xPQ0tDQUNIRSgpOworCW10eF9sb2NrKG11dGV4KTsKIAloaXQgPSAxOwogCUxJ
U1RfSU5JVCgmbmZzcmNfdGVtcGxpc3QpOwogCS8qCkBAIC02MzIsOCArNjU2LDggQEAgdHJ5YWdh
aW46CiAJCXJwID0gaGl0cnA7CiAJCWlmICgocnAtPnJjX2ZsYWcgJiBSQ19MT0NLRUQpICE9IDAp
IHsKIAkJCXJwLT5yY19mbGFnIHw9IFJDX1dBTlRFRDsKLQkJCSh2b2lkKW10eF9zbGVlcChycCwg
TkZTQ0FDSEVNVVRFWFBUUiwKLQkJCSAgICAoUFpFUk8gLSAxKSB8IFBEUk9QLCAibmZzcmMiLCAx
MCAqIGh6KTsKKwkJCSh2b2lkKW10eF9zbGVlcChycCwgbXV0ZXgsIChQWkVSTyAtIDEpIHwgUERS
T1AsCisJCQkgICAgIm5mc3JjIiwgMTAgKiBoeik7CiAJCQlnb3RvIHRyeWFnYWluOwogCQl9CiAJ
CWlmIChycC0+cmNfZmxhZyA9PSAwKQpAQCAtNjQxLDcgKzY2NSw3IEBAIHRyeWFnYWluOgogCQly
cC0+cmNfZmxhZyB8PSBSQ19MT0NLRUQ7CiAJCWlmIChycC0+cmNfZmxhZyAmIFJDX0lOUFJPRykg
ewogCQkJbmV3bmZzc3RhdHMuc3J2Y2FjaGVfaW5wcm9naGl0cysrOwotCQkJTkZTVU5MT0NLQ0FD
SEUoKTsKKwkJCW10eF91bmxvY2sobXV0ZXgpOwogCQkJaWYgKG5ld3JwLT5yY19zb2NrcmVmID09
IHJwLT5yY19zb2NrcmVmKQogCQkJCW5mc3JjX21hcmtzYW1ldGNwY29ubihycC0+cmNfc29ja3Jl
Zik7CiAJCQlyZXQgPSBSQ19EUk9QSVQ7CkBAIC02NTAsNyArNjc0LDcgQEAgdHJ5YWdhaW46CiAJ
CQkgKiBWMiBvbmx5LgogCQkJICovCiAJCQluZXduZnNzdGF0cy5zcnZjYWNoZV9ub25pZGVtZG9u
ZWhpdHMrKzsKLQkJCU5GU1VOTE9DS0NBQ0hFKCk7CisJCQltdHhfdW5sb2NrKG11dGV4KTsKIAkJ
CWlmIChuZXdycC0+cmNfc29ja3JlZiA9PSBycC0+cmNfc29ja3JlZikKIAkJCQluZnNyY19tYXJr
c2FtZXRjcGNvbm4ocnAtPnJjX3NvY2tyZWYpOwogCQkJcmV0ID0gUkNfUkVQTFk7CkBAIC02NjAs
NyArNjg0LDcgQEAgdHJ5YWdhaW46CiAJCQkJTkZTUlZDQUNIRV9UQ1BUSU1FT1VUOwogCQl9IGVs
c2UgaWYgKHJwLT5yY19mbGFnICYgUkNfUkVQTUJVRikgewogCQkJbmV3bmZzc3RhdHMuc3J2Y2Fj
aGVfbm9uaWRlbWRvbmVoaXRzKys7Ci0JCQlORlNVTkxPQ0tDQUNIRSgpOworCQkJbXR4X3VubG9j
ayhtdXRleCk7CiAJCQlpZiAobmV3cnAtPnJjX3NvY2tyZWYgPT0gcnAtPnJjX3NvY2tyZWYpCiAJ
CQkJbmZzcmNfbWFya3NhbWV0Y3Bjb25uKHJwLT5yY19zb2NrcmVmKTsKIAkJCXJldCA9IFJDX1JF
UExZOwpAQCAtNjg1LDcgKzcwOSw3IEBAIHRyeWFnYWluOgogCW5ld3JwLT5yY19jYWNoZXRpbWUg
PSBORlNEX01PTk9TRUM7CiAJbmV3cnAtPnJjX2ZsYWcgfD0gUkNfSU5QUk9HOwogCUxJU1RfSU5T
RVJUX0hFQUQoaHAsIG5ld3JwLCByY19oYXNoKTsKLQlORlNVTkxPQ0tDQUNIRSgpOworCW10eF91
bmxvY2sobXV0ZXgpOwogCW5kLT5uZF9ycCA9IG5ld3JwOwogCXJldCA9IFJDX0RPSVQ7CiAKQEAg
LTY5NiwxNiArNzIwLDE3IEBAIG91dDoKIAogLyoKICAqIExvY2sgYSBjYWNoZSBlbnRyeS4KLSAq
IEFsc28gcHV0cyBhIG11dGV4IGxvY2sgb24gdGhlIGNhY2hlIGxpc3QuCiAgKi8KIHN0YXRpYyB2
b2lkCiBuZnNyY19sb2NrKHN0cnVjdCBuZnNydmNhY2hlICpycCkKIHsKLQlORlNDQUNIRUxPQ0tS
RVFVSVJFRCgpOworCXN0cnVjdCBtdHggKm11dGV4OworCisJbXV0ZXggPSBuZnNyY19jYWNoZW11
dGV4KHJwKTsKKwltdHhfYXNzZXJ0KG11dGV4LCBNQV9PV05FRCk7CiAJd2hpbGUgKChycC0+cmNf
ZmxhZyAmIFJDX0xPQ0tFRCkgIT0gMCkgewogCQlycC0+cmNfZmxhZyB8PSBSQ19XQU5URUQ7Ci0J
CSh2b2lkKW10eF9zbGVlcChycCwgTkZTQ0FDSEVNVVRFWFBUUiwgUFpFUk8gLSAxLAotCQkgICAg
Im5mc3JjIiwgMCk7CisJCSh2b2lkKW10eF9zbGVlcChycCwgbXV0ZXgsIFBaRVJPIC0gMSwgIm5m
c3JjIiwgMCk7CiAJfQogCXJwLT5yY19mbGFnIHw9IFJDX0xPQ0tFRDsKIH0KQEAgLTcxNiwxMSAr
NzQxLDEzIEBAIG5mc3JjX2xvY2soc3RydWN0IG5mc3J2Y2FjaGUgKnJwKQogc3RhdGljIHZvaWQK
IG5mc3JjX3VubG9jayhzdHJ1Y3QgbmZzcnZjYWNoZSAqcnApCiB7CisJc3RydWN0IG10eCAqbXV0
ZXg7CiAKLQlORlNMT0NLQ0FDSEUoKTsKKwltdXRleCA9IG5mc3JjX2NhY2hlbXV0ZXgocnApOwor
CW10eF9sb2NrKG11dGV4KTsKIAlycC0+cmNfZmxhZyAmPSB+UkNfTE9DS0VEOwogCW5mc3JjX3dh
bnRlZChycCk7Ci0JTkZTVU5MT0NLQ0FDSEUoKTsKKwltdHhfdW5sb2NrKG11dGV4KTsKIH0KIAog
LyoKQEAgLTc0Myw3ICs3NzAsNiBAQCBzdGF0aWMgdm9pZAogbmZzcmNfZnJlZWNhY2hlKHN0cnVj
dCBuZnNydmNhY2hlICpycCkKIHsKIAotCU5GU0NBQ0hFTE9DS1JFUVVJUkVEKCk7CiAJTElTVF9S
RU1PVkUocnAsIHJjX2hhc2gpOwogCWlmIChycC0+cmNfZmxhZyAmIFJDX1VEUCkgewogCQlUQUlM
UV9SRU1PVkUoJm5mc3J2dWRwbHJ1LCBycCwgcmNfbHJ1KTsKQEAgLTc1Myw3ICs3NzksNyBAQCBu
ZnNyY19mcmVlY2FjaGUoc3RydWN0IG5mc3J2Y2FjaGUgKnJwKQogCWlmIChycC0+cmNfZmxhZyAm
IFJDX1JFUE1CVUYpIHsKIAkJbWJ1Zl9mcmVlbShycC0+cmNfcmVwbHkpOwogCQlpZiAoIShycC0+
cmNfZmxhZyAmIFJDX1VEUCkpCi0JCQluZnNyY190Y3BzYXZlZHJlcGxpZXMtLTsKKwkJCWF0b21p
Y19hZGRfaW50KCZuZnNyY190Y3BzYXZlZHJlcGxpZXMsIC0xKTsKIAl9CiAJRlJFRSgoY2FkZHJf
dClycCwgTV9ORlNSVkNBQ0hFKTsKIAluZXduZnNzdGF0cy5zcnZjYWNoZV9zaXplLS07CkBAIC03
NjgsMjAgKzc5NCwyMiBAQCBuZnNydmRfY2xlYW5jYWNoZSh2b2lkKQogCXN0cnVjdCBuZnNydmNh
Y2hlICpycCwgKm5leHRycDsKIAlpbnQgaTsKIAotCU5GU0xPQ0tDQUNIRSgpOwogCWZvciAoaSA9
IDA7IGkgPCBORlNSVkNBQ0hFX0hBU0hTSVpFOyBpKyspIHsKKwkJbXR4X2xvY2soJm5mc3JjX3Rj
cG10eFtpXSk7CiAJCUxJU1RfRk9SRUFDSF9TQUZFKHJwLCAmbmZzcnZoYXNodGJsW2ldLCByY19o
YXNoLCBuZXh0cnApIHsKIAkJCW5mc3JjX2ZyZWVjYWNoZShycCk7CiAJCX0KKwkJbXR4X3VubG9j
aygmbmZzcmNfdGNwbXR4W2ldKTsKIAl9CisJbXR4X2xvY2soJm5mc3JjX3VkcG10eCk7CiAJZm9y
IChpID0gMDsgaSA8IE5GU1JWQ0FDSEVfSEFTSFNJWkU7IGkrKykgewogCQlMSVNUX0ZPUkVBQ0hf
U0FGRShycCwgJm5mc3J2dWRwaGFzaHRibFtpXSwgcmNfaGFzaCwgbmV4dHJwKSB7CiAJCQluZnNy
Y19mcmVlY2FjaGUocnApOwogCQl9CiAJfQogCW5ld25mc3N0YXRzLnNydmNhY2hlX3NpemUgPSAw
OworCW10eF91bmxvY2soJm5mc3JjX3VkcG10eCk7CiAJbmZzcmNfdGNwc2F2ZWRyZXBsaWVzID0g
MDsKLQlORlNVTkxPQ0tDQUNIRSgpOwogfQogCiAvKgpAQCAtNzkyLDM0ICs4MjAsNDIgQEAgbmZz
cmNfdHJpbWNhY2hlKHVfaW50NjRfdCBzb2NrcmVmLCBzdHJ1YwogewogCXN0cnVjdCBuZnNydmNh
Y2hlICpycCwgKm5leHRycDsKIAlpbnQgaTsKLQlzdGF0aWMgdGltZV90IGxhc3R0cmltID0gMDsK
KwlzdGF0aWMgdGltZV90IHVkcF9sYXN0dHJpbSA9IDAsIHRjcF9sYXN0dHJpbSA9IDA7CiAKLQlp
ZiAoTkZTRF9NT05PU0VDID09IGxhc3R0cmltICYmCi0JICAgIG5mc3JjX3RjcHNhdmVkcmVwbGll
cyA8IG5mc3JjX3RjcGhpZ2h3YXRlciAmJgotCSAgICBuZnNyY191ZHBjYWNoZXNpemUgPCAobmZz
cmNfdWRwaGlnaHdhdGVyICsKLQkgICAgbmZzcmNfdWRwaGlnaHdhdGVyIC8gMikpCi0JCXJldHVy
bjsKLQlORlNMT0NLQ0FDSEUoKTsKLQlsYXN0dHJpbSA9IE5GU0RfTU9OT1NFQzsKLQlUQUlMUV9G
T1JFQUNIX1NBRkUocnAsICZuZnNydnVkcGxydSwgcmNfbHJ1LCBuZXh0cnApIHsKLQkJaWYgKCEo
cnAtPnJjX2ZsYWcgJiAoUkNfSU5QUk9HfFJDX0xPQ0tFRHxSQ19XQU5URUQpKQotCQkgICAgICYm
IHJwLT5yY19yZWZjbnQgPT0gMAotCQkgICAgICYmICgocnAtPnJjX2ZsYWcgJiBSQ19SRUZDTlQp
IHx8Ci0JCQkgTkZTRF9NT05PU0VDID4gcnAtPnJjX3RpbWVzdGFtcCB8fAotCQkJIG5mc3JjX3Vk
cGNhY2hlc2l6ZSA+IG5mc3JjX3VkcGhpZ2h3YXRlcikpCi0JCQluZnNyY19mcmVlY2FjaGUocnAp
OwotCX0KLQlmb3IgKGkgPSAwOyBpIDwgTkZTUlZDQUNIRV9IQVNIU0laRTsgaSsrKSB7Ci0JCUxJ
U1RfRk9SRUFDSF9TQUZFKHJwLCAmbmZzcnZoYXNodGJsW2ldLCByY19oYXNoLCBuZXh0cnApIHsK
KwlpZiAoTkZTRF9NT05PU0VDICE9IHVkcF9sYXN0dHJpbSB8fAorCSAgICBuZnNyY191ZHBjYWNo
ZXNpemUgPj0gKG5mc3JjX3VkcGhpZ2h3YXRlciArCisJICAgIG5mc3JjX3VkcGhpZ2h3YXRlciAv
IDIpKSB7CisJCW10eF9sb2NrKCZuZnNyY191ZHBtdHgpOworCQl1ZHBfbGFzdHRyaW0gPSBORlNE
X01PTk9TRUM7CisJCVRBSUxRX0ZPUkVBQ0hfU0FGRShycCwgJm5mc3J2dWRwbHJ1LCByY19scnUs
IG5leHRycCkgewogCQkJaWYgKCEocnAtPnJjX2ZsYWcgJiAoUkNfSU5QUk9HfFJDX0xPQ0tFRHxS
Q19XQU5URUQpKQogCQkJICAgICAmJiBycC0+cmNfcmVmY250ID09IDAKIAkJCSAgICAgJiYgKChy
cC0+cmNfZmxhZyAmIFJDX1JFRkNOVCkgfHwKIAkJCQkgTkZTRF9NT05PU0VDID4gcnAtPnJjX3Rp
bWVzdGFtcCB8fAotCQkJCSBuZnNyY19hY3RpdmVzb2NrZXQocnAsIHNvY2tyZWYsIHNvKSkpCisJ
CQkJIG5mc3JjX3VkcGNhY2hlc2l6ZSA+IG5mc3JjX3VkcGhpZ2h3YXRlcikpCiAJCQkJbmZzcmNf
ZnJlZWNhY2hlKHJwKTsKIAkJfQorCQltdHhfdW5sb2NrKCZuZnNyY191ZHBtdHgpOworCX0KKwlp
ZiAoTkZTRF9NT05PU0VDICE9IHRjcF9sYXN0dHJpbSB8fAorCSAgICBuZnNyY190Y3BzYXZlZHJl
cGxpZXMgPj0gbmZzcmNfdGNwaGlnaHdhdGVyKSB7CisJCWZvciAoaSA9IDA7IGkgPCBORlNSVkNB
Q0hFX0hBU0hTSVpFOyBpKyspIHsKKwkJCW10eF9sb2NrKCZuZnNyY190Y3BtdHhbaV0pOworCQkJ
aWYgKGkgPT0gMCkKKwkJCQl0Y3BfbGFzdHRyaW0gPSBORlNEX01PTk9TRUM7CisJCQlMSVNUX0ZP
UkVBQ0hfU0FGRShycCwgJm5mc3J2aGFzaHRibFtpXSwgcmNfaGFzaCwKKwkJCSAgICBuZXh0cnAp
IHsKKwkJCQlpZiAoIShycC0+cmNfZmxhZyAmCisJCQkJICAgICAoUkNfSU5QUk9HfFJDX0xPQ0tF
RHxSQ19XQU5URUQpKQorCQkJCSAgICAgJiYgcnAtPnJjX3JlZmNudCA9PSAwCisJCQkJICAgICAm
JiAoKHJwLT5yY19mbGFnICYgUkNfUkVGQ05UKSB8fAorCQkJCQkgTkZTRF9NT05PU0VDID4gcnAt
PnJjX3RpbWVzdGFtcCB8fAorCQkJCQkgbmZzcmNfYWN0aXZlc29ja2V0KHJwLCBzb2NrcmVmLCBz
bykpKQorCQkJCQluZnNyY19mcmVlY2FjaGUocnApOworCQkJfQorCQkJbXR4X3VubG9jaygmbmZz
cmNfdGNwbXR4W2ldKTsKKwkJfQogCX0KLQlORlNVTkxPQ0tDQUNIRSgpOwogfQogCiAvKgpAQCAt
ODI4LDEyICs4NjQsMTQgQEAgbmZzcmNfdHJpbWNhY2hlKHVfaW50NjRfdCBzb2NrcmVmLCBzdHJ1
YwogQVBQTEVTVEFUSUMgdm9pZAogbmZzcnZkX3JlZmNhY2hlKHN0cnVjdCBuZnNydmNhY2hlICpy
cCkKIHsKKwlzdHJ1Y3QgbXR4ICptdXRleDsKIAotCU5GU0xPQ0tDQUNIRSgpOworCW11dGV4ID0g
bmZzcmNfY2FjaGVtdXRleChycCk7CisJbXR4X2xvY2sobXV0ZXgpOwogCWlmIChycC0+cmNfcmVm
Y250IDwgMCkKIAkJcGFuaWMoIm5mcyBjYWNoZSByZWZjbnQiKTsKIAlycC0+cmNfcmVmY250Kys7
Ci0JTkZTVU5MT0NLQ0FDSEUoKTsKKwltdHhfdW5sb2NrKG11dGV4KTsKIH0KIAogLyoKQEAgLTg0
MiwxNCArODgwLDE2IEBAIG5mc3J2ZF9yZWZjYWNoZShzdHJ1Y3QgbmZzcnZjYWNoZSAqcnApCiBB
UFBMRVNUQVRJQyB2b2lkCiBuZnNydmRfZGVyZWZjYWNoZShzdHJ1Y3QgbmZzcnZjYWNoZSAqcnAp
CiB7CisJc3RydWN0IG10eCAqbXV0ZXg7CiAKLQlORlNMT0NLQ0FDSEUoKTsKKwltdXRleCA9IG5m
c3JjX2NhY2hlbXV0ZXgocnApOworCW10eF9sb2NrKG11dGV4KTsKIAlpZiAocnAtPnJjX3JlZmNu
dCA8PSAwKQogCQlwYW5pYygibmZzIGNhY2hlIGRlcmVmY250Iik7CiAJcnAtPnJjX3JlZmNudC0t
OwogCWlmIChycC0+cmNfcmVmY250ID09IDAgJiYgIShycC0+cmNfZmxhZyAmIChSQ19MT0NLRUQg
fCBSQ19JTlBST0cpKSkKIAkJbmZzcmNfZnJlZWNhY2hlKHJwKTsKLQlORlNVTkxPQ0tDQUNIRSgp
OworCW10eF91bmxvY2sobXV0ZXgpOwogfQogCiAvKgotLS0gZnMvbmZzc2VydmVyL25mc19uZnNk
cG9ydC5jLnNhdgkyMDEyLTEwLTExIDE3OjM4OjI2LjAwMDAwMDAwMCAtMDQwMAorKysgZnMvbmZz
c2VydmVyL25mc19uZnNkcG9ydC5jCTIwMTItMTAtMTEgMTc6NDM6MTYuMDAwMDAwMDAwIC0wNDAw
CkBAIC02MCw3ICs2MCw4IEBAIGV4dGVybiBTVkNQT09MCSpuZnNydmRfcG9vbDsKIGV4dGVybiBz
dHJ1Y3QgbmZzdjRsb2NrIG5mc2Rfc3VzcGVuZF9sb2NrOwogc3RydWN0IHZmc29wdGxpc3QgbmZz
djRyb290X29wdCwgbmZzdjRyb290X25ld29wdDsKIE5GU0RMT0NLTVVURVg7Ci1zdHJ1Y3QgbXR4
IG5mc19jYWNoZV9tdXRleDsKK3N0cnVjdCBtdHggbmZzcmNfdGNwbXR4W05GU1JWQ0FDSEVfSEFT
SFNJWkVdOworc3RydWN0IG10eCBuZnNyY191ZHBtdHg7CiBzdHJ1Y3QgbXR4IG5mc192NHJvb3Rf
bXV0ZXg7CiBzdHJ1Y3QgbmZzcnZmaCBuZnNfcm9vdGZoLCBuZnNfcHViZmg7CiBpbnQgbmZzX3B1
YmZoc2V0ID0gMCwgbmZzX3Jvb3RmaHNldCA9IDA7CkBAIC0zMjg4LDcgKzMyODksNyBAQCBleHRl
cm4gaW50ICgqbmZzZF9jYWxsX25mc2QpKHN0cnVjdCB0aHJlCiBzdGF0aWMgaW50CiBuZnNkX21v
ZGV2ZW50KG1vZHVsZV90IG1vZCwgaW50IHR5cGUsIHZvaWQgKmRhdGEpCiB7Ci0JaW50IGVycm9y
ID0gMDsKKwlpbnQgZXJyb3IgPSAwLCBpOwogCXN0YXRpYyBpbnQgbG9hZGVkID0gMDsKIAogCXN3
aXRjaCAodHlwZSkgewpAQCAtMzI5Niw3ICszMjk3LDEwIEBAIG5mc2RfbW9kZXZlbnQobW9kdWxl
X3QgbW9kLCBpbnQgdHlwZSwgdm8KIAkJaWYgKGxvYWRlZCkKIAkJCWdvdG8gb3V0OwogCQluZXdu
ZnNfcG9ydGluaXQoKTsKLQkJbXR4X2luaXQoJm5mc19jYWNoZV9tdXRleCwgIm5mc19jYWNoZV9t
dXRleCIsIE5VTEwsIE1UWF9ERUYpOworCQlmb3IgKGkgPSAwOyBpIDwgTkZTUlZDQUNIRV9IQVNI
U0laRTsgaSsrKQorCQkJbXR4X2luaXQoJm5mc3JjX3RjcG10eFtpXSwgIm5mc190Y3BjYWNoZV9t
dXRleCIsIE5VTEwsCisJCQkgICAgTVRYX0RFRik7CisJCW10eF9pbml0KCZuZnNyY191ZHBtdHgs
ICJuZnNfdWRwY2FjaGVfbXV0ZXgiLCBOVUxMLCBNVFhfREVGKTsKIAkJbXR4X2luaXQoJm5mc192
NHJvb3RfbXV0ZXgsICJuZnNfdjRyb290X211dGV4IiwgTlVMTCwgTVRYX0RFRik7CiAJCW10eF9p
bml0KCZuZnN2NHJvb3RfbW50Lm1udF9tdHgsICJzdHJ1Y3QgbW91bnQgbXR4IiwgTlVMTCwKIAkJ
ICAgIE1UWF9ERUYpOwpAQCAtMzM0MCw3ICszMzQ0LDkgQEAgbmZzZF9tb2RldmVudChtb2R1bGVf
dCBtb2QsIGludCB0eXBlLCB2bwogCQkJc3ZjcG9vbF9kZXN0cm95KG5mc3J2ZF9wb29sKTsKIAog
CQkvKiBhbmQgZ2V0IHJpZCBvZiB0aGUgbG9ja3MgKi8KLQkJbXR4X2Rlc3Ryb3koJm5mc19jYWNo
ZV9tdXRleCk7CisJCWZvciAoaSA9IDA7IGkgPCBORlNSVkNBQ0hFX0hBU0hTSVpFOyBpKyspCisJ
CQltdHhfZGVzdHJveSgmbmZzcmNfdGNwbXR4W2ldKTsKKwkJbXR4X2Rlc3Ryb3koJm5mc3JjX3Vk
cG10eCk7CiAJCW10eF9kZXN0cm95KCZuZnNfdjRyb290X211dGV4KTsKIAkJbXR4X2Rlc3Ryb3ko
Jm5mc3Y0cm9vdF9tbnQubW50X210eCk7CiAJCWxvY2tkZXN0cm95KCZuZnN2NHJvb3RfbW50Lm1u
dF9leHBsb2NrKTsKLS0tIGZzL25mcy9uZnNwb3J0Lmguc2F2CTIwMTItMTAtMTAgMjA6NTY6MjYu
MDAwMDAwMDAwIC0wNDAwCisrKyBmcy9uZnMvbmZzcG9ydC5oCTIwMTItMTAtMTAgMjA6NTY6NDIu
MDAwMDAwMDAwIC0wNDAwCkBAIC01NDYsMTEgKzU0Niw2IEBAIHZvaWQgbmZzcnZkX3JjdihzdHJ1
Y3Qgc29ja2V0ICosIHZvaWQgKiwKICNkZWZpbmUJTkZTUkVRU1BJTkxPQ0sJCWV4dGVybiBzdHJ1
Y3QgbXR4IG5mc19yZXFfbXV0ZXgKICNkZWZpbmUJTkZTTE9DS1JFUSgpCQltdHhfbG9jaygmbmZz
X3JlcV9tdXRleCkKICNkZWZpbmUJTkZTVU5MT0NLUkVRKCkJCW10eF91bmxvY2soJm5mc19yZXFf
bXV0ZXgpCi0jZGVmaW5lCU5GU0NBQ0hFTVVURVgJCWV4dGVybiBzdHJ1Y3QgbXR4IG5mc19jYWNo
ZV9tdXRleAotI2RlZmluZQlORlNDQUNIRU1VVEVYUFRSCSgmbmZzX2NhY2hlX211dGV4KQotI2Rl
ZmluZQlORlNMT0NLQ0FDSEUoKQkJbXR4X2xvY2soJm5mc19jYWNoZV9tdXRleCkKLSNkZWZpbmUJ
TkZTVU5MT0NLQ0FDSEUoKQltdHhfdW5sb2NrKCZuZnNfY2FjaGVfbXV0ZXgpCi0jZGVmaW5lCU5G
U0NBQ0hFTE9DS1JFUVVJUkVEKCkJbXR4X2Fzc2VydCgmbmZzX2NhY2hlX211dGV4LCBNQV9PV05F
RCkKICNkZWZpbmUJTkZTU09DS01VVEVYCQlleHRlcm4gc3RydWN0IG10eCBuZnNfc2xvY2tfbXV0
ZXgKICNkZWZpbmUJTkZTU09DS01VVEVYUFRSCQkoJm5mc19zbG9ja19tdXRleCkKICNkZWZpbmUJ
TkZTTE9DS1NPQ0soKQkJbXR4X2xvY2soJm5mc19zbG9ja19tdXRleCkKLS0tIGZzL25mcy9uZnNy
dmNhY2hlLmguc2F2CTIwMTItMTAtMTIgMjA6MDM6NDIuMDAwMDAwMDAwIC0wNDAwCisrKyBmcy9u
ZnMvbmZzcnZjYWNoZS5oCTIwMTItMTAtMTIgMjA6MDM6NTUuMDAwMDAwMDAwIC0wNDAwCkBAIC00
MSw3ICs0MSw3IEBACiAjZGVmaW5lCU5GU1JWQ0FDSEVfTUFYX1NJWkUJMjA0OAogI2RlZmluZQlO
RlNSVkNBQ0hFX01JTl9TSVpFCSAgNjQKIAotI2RlZmluZQlORlNSVkNBQ0hFX0hBU0hTSVpFCTIw
CisjZGVmaW5lCU5GU1JWQ0FDSEVfSEFTSFNJWkUJMjAwCiAKIHN0cnVjdCBuZnNydmNhY2hlIHsK
IAlMSVNUX0VOVFJZKG5mc3J2Y2FjaGUpIHJjX2hhc2g7CQkvKiBIYXNoIGNoYWluICovCg==
------=_Part_2185821_351431992.1350093954057--

From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 13 04:55:43 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 153A634A
 for <freebsd-hackers@freebsd.org>; Sat, 13 Oct 2012 04:55:43 +0000 (UTC)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: from hergotha.csail.mit.edu
 (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2])
 by mx1.freebsd.org (Postfix) with ESMTP id B74128FC0A
 for <freebsd-hackers@freebsd.org>; Sat, 13 Oct 2012 04:55:42 +0000 (UTC)
Received: from hergotha.csail.mit.edu (localhost [127.0.0.1])
 by hergotha.csail.mit.edu (8.14.5/8.14.5) with ESMTP id q9D4tf2s037126;
 Sat, 13 Oct 2012 00:55:41 -0400 (EDT)
 (envelope-from wollman@hergotha.csail.mit.edu)
Received: (from wollman@localhost)
 by hergotha.csail.mit.edu (8.14.5/8.14.4/Submit) id q9D4tfcG037123;
 Sat, 13 Oct 2012 00:55:41 -0400 (EDT) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <20600.62541.243673.307571@hergotha.csail.mit.edu>
Date: Sat, 13 Oct 2012 00:55:41 -0400
From: Garrett Wollman <wollman@freebsd.org>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: NFS server bottlenecks
In-Reply-To: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
References: <608951636.2115684.1349992972756.JavaMail.root@erie.cs.uoguelph.ca>
 <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7
 (hergotha.csail.mit.edu [127.0.0.1]); Sat, 13 Oct 2012 00:55:41 -0400 (EDT)
X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED
 autolearn=disabled version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 hergotha.csail.mit.edu
Cc: Nikolay Denev <ndenev@gmail.com>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Oct 2012 04:55:43 -0000

<<On Fri, 12 Oct 2012 22:05:54 -0400 (EDT), Rick Macklem <rmacklem@uoguelph.ca> said:

> I've attached the patch drc3.patch (it assumes drc2.patch has already been
> applied) that replaces the single mutex with one for each hash list
> for tcp. It also increases the size of NFSRVCACHE_HASHSIZE to 200.

I haven't tested this at all, but I think putting all of the mutexes
in an array like that is likely to cause cache-line ping-ponging.  It
may be better to use a pool mutex, or to put the mutexes adjacent in
memory to the list heads that they protect.  (But I probably won't be
able to do the performance testing on any of these for a while.  I
have a server running the "drc2" code but haven't gotten my users to
put a load on it yet.)

-GAWollman

From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 13 10:26:44 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 60591A7F;
 Sat, 13 Oct 2012 10:26:44 +0000 (UTC)
 (envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 118148FC12;
 Sat, 13 Oct 2012 10:26:44 +0000 (UTC)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
 by cyrus.watson.org (Postfix) with ESMTPS id 10A2546B09;
 Sat, 13 Oct 2012 06:26:43 -0400 (EDT)
Date: Sat, 13 Oct 2012 11:26:42 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Carl Delsey <carl.r.delsey@intel.com>
Subject: Re: No bus_space_read_8 on x86 ?
In-Reply-To: <5078575B.2020808@intel.com>
Message-ID: <alpine.BSF.2.00.1210131122130.46503@fledge.watson.org>
References: <506DC574.9010300@intel.com> <201210091154.15873.jhb@freebsd.org>
 <5075EC29.1010907@intel.com> <201210121131.46373.jhb@freebsd.org>
 <alpine.BSF.2.00.1210121700220.62497@fledge.watson.org>
 <5078575B.2020808@intel.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Oct 2012 10:26:44 -0000


On Fri, 12 Oct 2012, Carl Delsey wrote:

>> Indeed -- and on non-x86, where there are uncached direct map segments, and 
>> TLB entries that disable caching, reading 2x 32-bit vs 1x 64-bit have quite 
>> different effects in terms of atomicity. Where uncached I/Os are being 
>> used, those differences may affect semantics significantly -- e.g., if your 
>> device has a 64-bit memory-mapped FIFO or registers, 2x 32-bit gives you 
>> two halves of two different 64-bit values, rather than two halves of the 
>> same value.  As device drivers depend on those atomicity semantics, we 
>> should (at the busspace level) offer only the exactly expected semantics, 
>> rather than trying to patch things up.  If a device driver accessing 64-bit 
>> fields wants to support doing it using two 32-bit reads, it can figure out 
>> how to splice it together following bus_space_read_region_4().
> I wouldn't make any default behaviour for bus_space_read_8 on i386, just 
> amd64. My assumption (which may be unjustified) is that by far the most 
> common implementations to read a 64-bit register on i386 would be to read the 
> lower 4 bytes first, followed by the upper 4 bytes (or vice versa) and then 
> stitch them together.  I think we should provide helper functions for these 
> two cases, otherwise I fear our code base will be littered with multiple 
> independent implementations of this.
>
> Some driver writer who wants to take advantage of these helper functions 
> would do something like
> #ifdef i386
> #define    bus_space_read_8    bus_space_read_8_lower_first
> #endif
> otherwise, using bus_space_read_8 won't compile for i386 builds.
> If these implementations won't work for their case, they are free to write 
> their own implementation or take whatever action is necessary.
>
> I guess my question is, are these cases common enough that it is worth 
> helping developers by providing functions that do the double read and shifts 
> for them, or do we leave them to deal with it on their own at the risk of 
> possibly some duplicated code.

I was thinking we might suggest to developers that they use a KPI that 
specifically captures the underlying semantics, so it's clear they understand 
them.  Untested example:

 	uint64_t v;

 	/*
 	 * On 32-bit systems, read the 64-bit statistic using two 32-bit
 	 * reads.
 	 *
 	 * XXX: This will sometimes lead to a race.
 	 *
 	 * XXX: Gosh, I wonder if some word-swapping is needed in the merge?
 	 */
#ifdef 32-bit
 	bus_space_read_region_4(space, handle, offset, (uint32_t *)&v, 2;
#else
 	bus_space_read_8(space, handle, offset, &v);
#endif

The potential need to word swap, however, suggests that you may be right about 
the error-prone nature of manual merging.

Robert


From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 13 13:03:37 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 7021E91B;
 Sat, 13 Oct 2012 13:03:37 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id DC2F28FC12;
 Sat, 13 Oct 2012 13:03:36 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EAN6MclCDaFvO/2dsb2JhbABFhhG6GYIgAQEBBAEBASArIAsbGAICDRkCKQEJJgYIBwQBHASHZAumTJF3gSGKLhqEZIESA5M+gi2BFY8ZgwmBRzQ
X-IronPort-AV: E=Sophos;i="4.80,581,1344225600"; d="scan'208";a="183488893"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 13 Oct 2012 09:03:23 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id F1267B41C2;
 Sat, 13 Oct 2012 09:03:22 -0400 (EDT)
Date: Sat, 13 Oct 2012 09:03:22 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Garrett Wollman <wollman@freebsd.org>
Message-ID: <611092759.2189637.1350133402953.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20600.62541.243673.307571@hergotha.csail.mit.edu>
Subject: Re: NFS server bottlenecks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: Nikolay Denev <ndenev@gmail.com>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Oct 2012 13:03:37 -0000

Garrett Wollman wrote:
> <<On Fri, 12 Oct 2012 22:05:54 -0400 (EDT), Rick Macklem
> <rmacklem@uoguelph.ca> said:
> 
> > I've attached the patch drc3.patch (it assumes drc2.patch has
> > already been
> > applied) that replaces the single mutex with one for each hash list
> > for tcp. It also increases the size of NFSRVCACHE_HASHSIZE to 200.
> 
> I haven't tested this at all, but I think putting all of the mutexes
> in an array like that is likely to cause cache-line ping-ponging. It
> may be better to use a pool mutex, or to put the mutexes adjacent in
> memory to the list heads that they protect. 
Well, I'll admit I don't know how to do this.

What the code does need is a "set of mutexes", where any of the mutexes
can be referred to by an "index". I could easily define a structure that
has:
struct nfsrc_hashhead {
     struct nfsrvcachehead head;
     struct mtx mutex;
} nfsrc_hashhead[NFSRVCACHE_HASHSIZE];
- but all that does is leave a small structure between each "struct mtx" and I
  wouldn't have thought that would make much difference. (How big is a typical
  hardware cache line these days? I have no idea.)
  - I suppose I could "waste space" and define a glob of unused space
    between them, like:
struct nfsrc_hashhead {
     struct nfsrvcachehead head;
     char garbage[N];
     struct mtx mutex;
} nfsrc_hashhead[NFSRVCACHE_HASHSIZE];
- If this makes sense, how big should N be? (Somewhat less that the length
  of a cache line, I'd guess. It seems that the structure should be at least
  a cache line length in size.)

All this seems "kinda hokey" to me and beyond what code at this level should
be worrying about, but I'm game to make changes, if others think it's appropriate.

I've never use mtx_pool(9) mutexes, but it doesn't sound like they would
be the right fit, from reading the man page. (Assuming the mtx_pool_find()
is guaranteed to return the same mutex for the same address passed in as
an argument, it would seem that they would work, since I can pass
&nfsrvcachehead[i] in as the pointer arg to "index" a mutex.)
Hopefully jhb@ can say if using mtx_pool(9) for this would be better than
an array:
   struct mtx nfsrc_tcpmtx[NFSRVCACHE_HASHSIZE];

Does anyone conversant with mutexes know what the best coding approach is?

>(But I probably won't be
> able to do the performance testing on any of these for a while. I
> have a server running the "drc2" code but haven't gotten my users to
> put a load on it yet.)
> 
No rush. At this point, the earliest I could commit something like this to
head would be December.

rick
ps: I hope John doesn't mind being added to the cc list yet again. It's
    just that I suspect he knows a fair bit about mutex implementation
    and possible hardware cache line effects.

> -GAWollman
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 13 15:22:55 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 482E6D3F;
 Sat, 13 Oct 2012 15:22:55 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50])
 by mx1.freebsd.org (Postfix) with ESMTP id A42558FC17;
 Sat, 13 Oct 2012 15:22:54 +0000 (UTC)
Received: by mail-wg0-f50.google.com with SMTP id 16so2959150wgi.31
 for <multiple recipients>; Sat, 13 Oct 2012 08:22:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=/olCYPBEWcL7a7m43HBPAkiGp1N2wqa0K+xYBKri0LI=;
 b=eKLvkyZ+8ZV26Oi+8RVYt75xYScP2qlO3T3X1APvryZ2xfuTckMVvC77WUBqz+AmLW
 OGXi/JU/PgLlRGjExhfQe41B2BGp/4xE+wtLI4a2HhcaMSzwTNMhxFDijCWTf+LZ624M
 OKR54RzfgkWPrpbeOj8Pg4/JZOSlGXpVaP0rczNVS+3UVKFvSojNe8CHCsJKsqx8OBo1
 54Q7WY1L/YNDZfAjfjA3MuNWYCRPYPgmYCe+C3MLoUjWxZvwSaFOImFwUHrHnpeb8zUx
 iq2IwtJyVG4mkD+ea7Chi70Qck9Px11NPVtwZTN3IEnEp1OALxaXi53LiYz83TjOY9IQ
 03rw==
Received: by 10.180.19.71 with SMTP id c7mr12865089wie.2.1350141767964;
 Sat, 13 Oct 2012 08:22:47 -0700 (PDT)
Received: from [10.181.156.211] ([213.226.63.148])
 by mx.google.com with ESMTPS id dm3sm4093716wib.3.2012.10.13.08.22.45
 (version=TLSv1/SSLv3 cipher=OTHER);
 Sat, 13 Oct 2012 08:22:47 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
Date: Sat, 13 Oct 2012 18:22:50 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com>
References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
To: Rick Macklem <rmacklem@uoguelph.ca>
X-Mailer: Apple Mail (2.1498)
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>,
 Garrett Wollman <wollman@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Oct 2012 15:22:55 -0000


On Oct 13, 2012, at 5:05 AM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> I wrote:
>> Oops, I didn't get the "readahead" option description
>> quite right in the last post. The default read ahead
>> is 1, which does result in "rsize * 2", since there is
>> the read + 1 readahead.
>>=20
>> "rsize * 16" would actually be for the option "readahead=3D15"
>> and for "readahead=3D16" the calculation would be "rsize * 17".
>>=20
>> However, the example was otherwise ok, I think? rick
>=20
> I've attached the patch drc3.patch (it assumes drc2.patch has already =
been
> applied) that replaces the single mutex with one for each hash list
> for tcp. It also increases the size of NFSRVCACHE_HASHSIZE to 200.
>=20
> These patches are also at:
>  http://people.freebsd.org/~rmacklem/drc2.patch
>  http://people.freebsd.org/~rmacklem/drc3.patch
> in case the attachments don't get through.
>=20
> rick
> ps: I haven't tested drc3.patch a lot, but I think it's ok?

drc3.patch applied and build cleanly and shows nice improvement!

I've done a quick benchmark using iozone over the NFS mount from the =
Linux host.

drc2.pach (but with NFSRVCACHE_HASHSIZE=3D500)

	TEST WITH 8K
	=
--------------------------------------------------------------------------=
-----------------------
        Auto Mode
        Using Minimum Record Size 8 KB
        Using Maximum Record Size 8 KB
        Using minimum file size of 2097152 kilobytes.
        Using maximum file size of 2097152 kilobytes.
        O_DIRECT feature enabled
        SYNC Mode.=20
        OPS Mode. Output is in operations per second.
        Command line used: iozone -a -y 8k -q 8k -n 2g -g 2g -C -I -o -O =
-i 0 -i 1 -i 2
        Time Resolution =3D 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                            random  =
random    bkwd   record   stride                                  =20
              KB  reclen   write rewrite    read    reread    read   =
write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152       8    1919    1914     2356     2321    2335    =
1706                                                         =20

	TEST WITH 1M
	=
--------------------------------------------------------------------------=
-----------------------
        Auto Mode
        Using Minimum Record Size 1024 KB
        Using Maximum Record Size 1024 KB
        Using minimum file size of 2097152 kilobytes.
        Using maximum file size of 2097152 kilobytes.
        O_DIRECT feature enabled
        SYNC Mode.=20
        OPS Mode. Output is in operations per second.
        Command line used: iozone -a -y 1m -q 1m -n 2g -g 2g -C -I -o -O =
-i 0 -i 1 -i 2
        Time Resolution =3D 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                            random  =
random    bkwd   record   stride                                  =20
              KB  reclen   write rewrite    read    reread    read   =
write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    1024      73      64      477      486     496      =
61                                                         =20


drc3.patch

	TEST WITH 8K
	=
--------------------------------------------------------------------------=
-----------------------
        Auto Mode
        Using Minimum Record Size 8 KB
        Using Maximum Record Size 8 KB
        Using minimum file size of 2097152 kilobytes.
        Using maximum file size of 2097152 kilobytes.
        O_DIRECT feature enabled
        SYNC Mode.=20
        OPS Mode. Output is in operations per second.
        Command line used: iozone -a -y 8k -q 8k -n 2g -g 2g -C -I -o -O =
-i 0 -i 1 -i 2
        Time Resolution =3D 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                            random  =
random    bkwd   record   stride                                  =20
              KB  reclen   write rewrite    read    reread    read   =
write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152       8    2108    2397     3001     3013    3010    =
2389                                                         =20


	TEST WITH 1M
	=
--------------------------------------------------------------------------=
-----------------------
        Auto Mode
        Using Minimum Record Size 1024 KB
        Using Maximum Record Size 1024 KB
        Using minimum file size of 2097152 kilobytes.
        Using maximum file size of 2097152 kilobytes.
        O_DIRECT feature enabled
        SYNC Mode.=20
        OPS Mode. Output is in operations per second.
        Command line used: iozone -a -y 1m -q 1m -n 2g -g 2g -C -I -o -O =
-i 0 -i 1 -i 2
        Time Resolution =3D 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                            random  =
random    bkwd   record   stride                                  =20
              KB  reclen   write rewrite    read    reread    read   =
write    read  rewrite     read   fwrite frewrite   fread  freread
         2097152    1024      80      79      521      536     528      =
75                                                         =20


Also with drc3 the CPU usage on the server is noticeably lower. Most of =
the time I could see only the geom{g_up}/{g_down} threads,
and a few nfsd threads, before that nfsd's were much more prominent.

I guess under bigger load the performance improvement can be bigger.

I'll run some more tests with heavier loads this week.

Thanks,
Nikolay


From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 13 19:16:01 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id F40D9788;
 Sat, 13 Oct 2012 19:16:00 +0000 (UTC)
 (envelope-from gnn@neville-neil.com)
Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176])
 by mx1.freebsd.org (Postfix) with ESMTP id A8B888FC08;
 Sat, 13 Oct 2012 19:16:00 +0000 (UTC)
Received: from pool-96-250-5-62.nycmny.fios.verizon.net ([96.250.5.62]:60073
 helo=minion.home)
 by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80)
 (envelope-from <gnn@neville-neil.com>)
 id 1TN7Bo-0005JZ-2M; Sat, 13 Oct 2012 15:16:00 -0400
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: [CFT/RFC]: refactor bsd.prog.mk to understand multiple programs
 instead of a singular program
From: George Neville-Neil <gnn@neville-neil.com>
In-Reply-To: <127FA63D-8EEE-4616-AE1E-C39469DDCC6A@xcllnt.net>
Date: Sat, 13 Oct 2012 15:15:59 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <1340AB5D-F824-4E7D-9D6C-F7E5489AE870@neville-neil.com>
References: <CAGH67wRkOmy7rWLkxXnT2155PuSQpwOMyu7dTAKeO1WW2dju7g@mail.gmail.com>
 <201210020750.23358.jhb@freebsd.org>
 <CAGH67wTM1VDrpu7rS=VE1G_kVEOHhS4-OCy5FX_6eDGmiNTA8A@mail.gmail.com>
 <201210021037.27762.jhb@freebsd.org>
 <CAGH67wQffjVHqFw_eN=mfeg-Ac2Z6XBT5Hv72ev0kjjx7YH7SA@mail.gmail.com>
 <127FA63D-8EEE-4616-AE1E-C39469DDCC6A@xcllnt.net>
To: Marcel Moolenaar <marcel@xcllnt.net>
X-Mailer: Apple Mail (2.1499)
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - vps.hungerhost.com
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - neville-neil.com
X-Mailman-Approved-At: Sat, 13 Oct 2012 19:26:47 +0000
Cc: Garrett Cooper <yanegomi@gmail.com>, freebsd-hackers@freebsd.org,
 "Simon J. Gerraty" <sjg@juniper.net>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Oct 2012 19:16:01 -0000


On Oct 8, 2012, at 12:11 , Marcel Moolenaar <marcel@xcllnt.net> wrote:

>=20
> On Oct 4, 2012, at 9:42 AM, Garrett Cooper <yanegomi@gmail.com> wrote:
>>>> Both parties (Isilon/Juniper) are converging on the ATF porting =
work
>>>> that Giorgos/myself have done after talking at the FreeBSD =
Foundation
>>>> meet-n-greet. I have contributed all of the patches that I have =
other
>>>> to marcel for feedback.
>>>=20
>>> This is very non-obvious to the public at large (e.g. there was no =
public
>>> response to one group's inquiry about the second ATF import for =
example).
>>> Also, given that you had no idea that sgf@ and obrien@ were working =
on
>>> importing NetBSD's bmake as a prerequisite for ATF, it seems that =
whatever
>>> discussions were held were not very detailed at best.  I think it =
would be
>>> good to have the various folks working on ATF to at least summarize =
the
>>> current state of things and sketch out some sort of plan or roadmap =
for future
>>> work in a public forum (such as atf@, though a summary mail would be =
quite
>>> appropriate for arch@).
>>=20
>> I'm in part to blame for this. There was some discussion -- but not =
at
>> length; unfortunately no one from Juniper was present at the meet and
>> greet; the information I got was second hand; I didn't follow up to
>> figure out the exact details / clarify what I had in mind with the
>> appropriate parties.
>=20
> Hang on. I want in on the blame part! :-)
>=20
> Seriously: no-one is really to blame as far as I can see. We just had
> two independent efforts (ATF & bmake) and there was no indication that
> one would be greatly benefitted from the other. At least not to the
> point of creating a dependency.
>=20
> I just committed the bmake bits. It not only adds bmake to the build,
> but also includes the changes necessary to use bmake.
>=20
> With that in place it's easier to decide whether we want the =
dependency
> or not.
>=20
> Before we can switch permanently to bmake, we need to do the following
> first:
> 1.  Request an EXP ports build with bmake as make(1). This should tell
>    us the "damage" of switching to bmake for ports.
> 2.  In parallel with 1: build www & docs with bmake and assess the
>    damage
> 3.  Fix all the damage
>=20
> Then:
>=20
> 4.  Switch.
>=20
> It could be a while (many weeks) before we get to 4, so the question
> really is whether the people working on ATF are willing and able to
> build and install FreeBSD using WITH_BMAKE?
>=20

I think that's a small price to pay for getting going with the ATF
stuff now rather than in 4 weeks.  What's the right way to do this
now with HEAD?

Best,
George


From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 13 20:52:42 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id F3AA87BA;
 Sat, 13 Oct 2012 20:52:41 +0000 (UTC)
 (envelope-from marcel@xcllnt.net)
Received: from mail.xcllnt.net (mail.xcllnt.net [70.36.220.4])
 by mx1.freebsd.org (Postfix) with ESMTP id C63AC8FC14;
 Sat, 13 Oct 2012 20:52:41 +0000 (UTC)
Received: from marcelm-sslvpn-nc.jnpr.net (natint3.juniper.net [66.129.224.36])
 (authenticated bits=0)
 by mail.xcllnt.net (8.14.5/8.14.5) with ESMTP id q9DKqdBr086942
 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
 Sat, 13 Oct 2012 13:52:40 -0700 (PDT)
 (envelope-from marcel@xcllnt.net)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: [CFT/RFC]: refactor bsd.prog.mk to understand multiple programs
 instead of a singular program
From: Marcel Moolenaar <marcel@xcllnt.net>
In-Reply-To: <1340AB5D-F824-4E7D-9D6C-F7E5489AE870@neville-neil.com>
Date: Sat, 13 Oct 2012 13:52:34 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <B1F6FD62-0E1B-48CA-B324-C9CDD65FD53D@xcllnt.net>
References: <CAGH67wRkOmy7rWLkxXnT2155PuSQpwOMyu7dTAKeO1WW2dju7g@mail.gmail.com>
 <201210020750.23358.jhb@freebsd.org>
 <CAGH67wTM1VDrpu7rS=VE1G_kVEOHhS4-OCy5FX_6eDGmiNTA8A@mail.gmail.com>
 <201210021037.27762.jhb@freebsd.org>
 <CAGH67wQffjVHqFw_eN=mfeg-Ac2Z6XBT5Hv72ev0kjjx7YH7SA@mail.gmail.com>
 <127FA63D-8EEE-4616-AE1E-C39469DDCC6A@xcllnt.net>
 <1340AB5D-F824-4E7D-9D6C-F7E5489AE870@neville-neil.com>
To: George Neville-Neil <gnn@neville-neil.com>
X-Mailer: Apple Mail (2.1499)
Cc: Garrett Cooper <yanegomi@gmail.com>, freebsd-hackers@freebsd.org,
 "Simon J. Gerraty" <sjg@juniper.net>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Oct 2012 20:52:42 -0000


On Oct 13, 2012, at 12:15 PM, George Neville-Neil <gnn@neville-neil.com> =
wrote:

>=20
> I think that's a small price to pay for getting going with the ATF
> stuff now rather than in 4 weeks.  What's the right way to do this
> now with HEAD?

Set WITH_BMAKE=3Dyes in /etc/src.conf or /etc/make.conf and
you're good to go.

One caveat: manually rebuild and re-install usr.bin/bmake after
the buildworld & installworld with WITH_BMAKE=3Dyes set. The one
created as part of the buildworld has a bug due to being built
by FreeBSD's make. A fix is known and will be committed soon,
but until then, the manual step is needed.

That's it...

--=20
Marcel Moolenaar
marcel@xcllnt.net


From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 13 20:15:37 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 6E715BCA;
 Sat, 13 Oct 2012 20:15:37 +0000 (UTC) (envelope-from sjg@juniper.net)
Received: from exprod7og113.obsmtp.com (exprod7og113.obsmtp.com [64.18.2.179])
 by mx1.freebsd.org (Postfix) with ESMTP id D6BBF8FC0C;
 Sat, 13 Oct 2012 20:15:36 +0000 (UTC)
Received: from P-EMHUB03-HQ.jnpr.net ([66.129.224.36]) (using TLSv1) by
 exprod7ob113.postini.com ([64.18.6.12]) with SMTP
 ID DSNKUHnL4vuHMYYtf2/iwemqs81nA5iY7LWn@postini.com;
 Sat, 13 Oct 2012 13:15:37 PDT
Received: from magenta.juniper.net (172.17.27.123) by P-EMHUB03-HQ.jnpr.net
 (172.24.192.33) with Microsoft SMTP Server (TLS) id 8.3.213.0; Sat, 13 Oct
 2012 13:13:30 -0700
Received: from chaos.jnpr.net (chaos.jnpr.net [172.24.29.229])	by
 magenta.juniper.net (8.11.3/8.11.3) with ESMTP id q9DKDUh83856;	Sat, 13 Oct
 2012 13:13:30 -0700 (PDT)	(envelope-from sjg@juniper.net)
Received: from chaos.jnpr.net (localhost [127.0.0.1])	by chaos.jnpr.net
 (Postfix) with ESMTP id 68EBB58094;	Sat, 13 Oct 2012 13:13:30 -0700 (PDT)
To: George Neville-Neil <gnn@neville-neil.com>
Subject: Re: [CFT/RFC]: refactor bsd.prog.mk to understand multiple programs
 instead of a singular program
In-Reply-To: <1340AB5D-F824-4E7D-9D6C-F7E5489AE870@neville-neil.com>
References: <CAGH67wRkOmy7rWLkxXnT2155PuSQpwOMyu7dTAKeO1WW2dju7g@mail.gmail.com>
 <201210020750.23358.jhb@freebsd.org>
 <CAGH67wTM1VDrpu7rS=VE1G_kVEOHhS4-OCy5FX_6eDGmiNTA8A@mail.gmail.com>
 <201210021037.27762.jhb@freebsd.org>
 <CAGH67wQffjVHqFw_eN=mfeg-Ac2Z6XBT5Hv72ev0kjjx7YH7SA@mail.gmail.com>
 <127FA63D-8EEE-4616-AE1E-C39469DDCC6A@xcllnt.net>
 <1340AB5D-F824-4E7D-9D6C-F7E5489AE870@neville-neil.com>
Comments: In-reply-to: George Neville-Neil <gnn@neville-neil.com>
 message dated "Sat, 13 Oct 2012 15:15:59 -0400."
From: "Simon J. Gerraty" <sjg@juniper.net>
X-Mailer: MH-E 7.82+cvs; nmh 1.3; GNU Emacs 22.3.1
Date: Sat, 13 Oct 2012 13:13:30 -0700
Message-ID: <20121013201330.68EBB58094@chaos.jnpr.net>
MIME-Version: 1.0
Content-Type: text/plain
X-Mailman-Approved-At: Sat, 13 Oct 2012 21:33:28 +0000
Cc: Garrett Cooper <yanegomi@gmail.com>, freebsd-hackers@freebsd.org,
 freebsd-arch@freebsd.org, Marcel Moolenaar <marcel@xcllnt.net>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Oct 2012 20:15:37 -0000


On Sat, 13 Oct 2012 15:15:59 -0400, George Neville-Neil writes:
>> It could be a while (many weeks) before we get to 4, so the question
>> really is whether the people working on ATF are willing and able to
>> build and install FreeBSD using WITH_BMAKE?
>>=20
>
>I think that's a small price to pay for getting going with the ATF
>stuff now rather than in 4 weeks.  What's the right way to do this
>now with HEAD?

We can add bsd.progs.mk (if you have devel/bmake port installed you
have it as /usr/local/share/mk/progs.mk)
and atf.test.mk and people can just "go for it" ?

From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 13 22:37:11 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 002C62BA;
 Sat, 13 Oct 2012 22:37:10 +0000 (UTC)
 (envelope-from yanegomi@gmail.com)
Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com
 [209.85.214.182])
 by mx1.freebsd.org (Postfix) with ESMTP id 97AFD8FC0C;
 Sat, 13 Oct 2012 22:37:10 +0000 (UTC)
Received: by mail-ob0-f182.google.com with SMTP id wc20so5176500obb.13
 for <multiple recipients>; Sat, 13 Oct 2012 15:37:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=fmxPqBHeXKhnIVGRjFlZpQ3NcNpdYVC8PT/Dd0LdV+c=;
 b=UkxFa9IkbISyPnGpUhh7h1WHJ7BcmQAFw2F7ZWprPCqhMci33zIsuuZohlubX0U2Yu
 YJevPtd6OBBNDQehq3dG0DXquIYW71up+/krGcAF29xSSM+enLTt14iJ23sttI5GwIro
 e2pNHQ28v4jrKTkFfAL/nqh2USR74Pbyafl38CXlwKUuGbMLvGrarTXLQK8XEPNfnlEr
 5/Ny+8SyzOO9qNWVkaSEEFcojEOSlVLFWem9uuJ7xhI/Z0X+p+SFVad4oJPVElPWRS/e
 ZPQ/8f0q4jmKtql9knO+YWKSBD08X73eGM7k2flhEd9nUlijA7k+XqRjF+3wIiZey+E0
 14uA==
MIME-Version: 1.0
Received: by 10.182.31.50 with SMTP id x18mr6317895obh.56.1350167830150; Sat,
 13 Oct 2012 15:37:10 -0700 (PDT)
Received: by 10.76.167.202 with HTTP; Sat, 13 Oct 2012 15:37:09 -0700 (PDT)
In-Reply-To: <20121013201330.68EBB58094@chaos.jnpr.net>
References: <CAGH67wRkOmy7rWLkxXnT2155PuSQpwOMyu7dTAKeO1WW2dju7g@mail.gmail.com>
 <201210020750.23358.jhb@freebsd.org>
 <CAGH67wTM1VDrpu7rS=VE1G_kVEOHhS4-OCy5FX_6eDGmiNTA8A@mail.gmail.com>
 <201210021037.27762.jhb@freebsd.org>
 <CAGH67wQffjVHqFw_eN=mfeg-Ac2Z6XBT5Hv72ev0kjjx7YH7SA@mail.gmail.com>
 <127FA63D-8EEE-4616-AE1E-C39469DDCC6A@xcllnt.net>
 <1340AB5D-F824-4E7D-9D6C-F7E5489AE870@neville-neil.com>
 <20121013201330.68EBB58094@chaos.jnpr.net>
Date: Sat, 13 Oct 2012 15:37:09 -0700
Message-ID: <CAGH67wRUTJWV6syQ5Q5GStht1NqEofsGP2Z3gqZGUMXhoqZoVw@mail.gmail.com>
Subject: Re: [CFT/RFC]: refactor bsd.prog.mk to understand multiple programs
 instead of a singular program
From: Garrett Cooper <yanegomi@gmail.com>
To: "Simon J. Gerraty" <sjg@juniper.net>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-hackers@freebsd.org, George Neville-Neil <gnn@neville-neil.com>,
 freebsd-arch@freebsd.org, Marcel Moolenaar <marcel@xcllnt.net>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Oct 2012 22:37:11 -0000

On Sat, Oct 13, 2012 at 1:13 PM, Simon J. Gerraty <sjg@juniper.net> wrote:
>
> On Sat, 13 Oct 2012 15:15:59 -0400, George Neville-Neil writes:
>>> It could be a while (many weeks) before we get to 4, so the question
>>> really is whether the people working on ATF are willing and able to
>>> build and install FreeBSD using WITH_BMAKE?
>>
>>I think that's a small price to pay for getting going with the ATF
>>stuff now rather than in 4 weeks.  What's the right way to do this
>>now with HEAD?
>
> We can add bsd.progs.mk (if you have devel/bmake port installed you
> have it as /usr/local/share/mk/progs.mk)
> and atf.test.mk and people can just "go for it" ?

    As long as it can function sanely in a NetBSD-like manner and I
can start writing tests, I don't mind...
Thanks!
-Garrett