From owner-freebsd-arch@FreeBSD.ORG  Sun Jul  5 08:00:46 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EC903106564A;
	Sun,  5 Jul 2009 08:00:46 +0000 (UTC)
	(envelope-from gary.jennejohn@freenet.de)
Received: from mout3.freenet.de (mout3.freenet.de [IPv6:2001:748:100:40::2:5])
	by mx1.freebsd.org (Postfix) with ESMTP id 4A9B68FC0A;
	Sun,  5 Jul 2009 08:00:46 +0000 (UTC)
	(envelope-from gary.jennejohn@freenet.de)
Received: from [195.4.92.27] (helo=17.mx.freenet.de)
	by mout3.freenet.de with esmtpa (ID gary.jennejohn@freenet.de) (port
	25) (Exim 4.69 #92)
	id 1MNMeS-0006Tc-Ux; Sun, 05 Jul 2009 10:00:45 +0200
Received: from tb438.t.pppool.de ([89.55.180.56]:33676
	helo=ernst.jennejohn.org)
	by 17.mx.freenet.de with esmtpa (ID gary.jennejohn@freenet.de) (port
	25) (Exim 4.69 #79)
	id 1MNMeS-0006uC-KX; Sun, 05 Jul 2009 10:00:44 +0200
Date: Sun, 5 Jul 2009 10:00:44 +0200
From: Gary Jennejohn <gary.jennejohn@freenet.de>
To: Alexander Motin <mav@FreeBSD.org>
Message-ID: <20090705100044.4053e2f9@ernst.jennejohn.org>
In-Reply-To: <4A4FAA2D.3020409@FreeBSD.org>
References: <4A4FAA2D.3020409@FreeBSD.org>
X-Mailer: Claws Mail 3.7.1 (GTK+ 2.16.2; amd64-portbld-freebsd8.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-purgate-ID: 149285::1246780844-00000BB6-227D931B/0-0/0-0
Cc: freebsd-arch@freebsd.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: gary.jennejohn@freenet.de
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Jul 2009 08:00:47 -0000

On Sat, 04 Jul 2009 22:14:53 +0300
Alexander Motin <mav@FreeBSD.org> wrote:

> Can somebody explain me a difference between DFLTPHYS and MAXPHYS 
> constants? As I understand, the last one is a maximal amount of memory, 
> that can be mapped to the kernel, or passed to the hardware drivers. But 
> why then DFLTPHYS is used in so many places and what does it mean?
> 

There's a pretty good comment on these in /sys/conf/NOTES.

> Isn't it a time to review their values for increasing? 64KB looks funny, 
> comparing to modern memory sizes and data rates. It just increases 
> interrupt rates, but I don't think it really need to be so small to 
> improve interactivity now.
> 

Probably historical from the days when memory was scarce.

There's nothing preventing the user from upping these values in his
kernel config file.  But note the warning in NOTES about possibly
making the kernel unbootable.  It's not clear whether this warning is
still valid given todays larger memory footprints and the inmproved
VM system.

I wonder whether all drivers can correctly handle larger values for
DFLTPHYS.

---
Gary Jennejohn

From owner-freebsd-arch@FreeBSD.ORG  Sun Jul  5 08:38:38 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A9DE1106564A
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 08:38:38 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121])
	by mx1.freebsd.org (Postfix) with ESMTP id 33B9B8FC12
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 08:38:37 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from [212.86.226.226] (account mav@alkar.net HELO
	mavbook.mavhome.dp.ua)
	by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9)
	with ESMTPSA id 247673270; Sun, 05 Jul 2009 11:38:35 +0300
Message-ID: <4A50667F.7080608@FreeBSD.org>
Date: Sun, 05 Jul 2009 11:38:23 +0300
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.21 (X11/20090405)
MIME-Version: 1.0
To: gary.jennejohn@freenet.de
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
In-Reply-To: <20090705100044.4053e2f9@ernst.jennejohn.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-arch@freebsd.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Jul 2009 08:38:38 -0000

Gary Jennejohn wrote:
> On Sat, 04 Jul 2009 22:14:53 +0300
> Alexander Motin <mav@FreeBSD.org> wrote:
> 
>> Can somebody explain me a difference between DFLTPHYS and MAXPHYS 
>> constants? As I understand, the last one is a maximal amount of memory, 
>> that can be mapped to the kernel, or passed to the hardware drivers. But 
>> why then DFLTPHYS is used in so many places and what does it mean?
> 
> There's a pretty good comment on these in /sys/conf/NOTES.

But it does not explains why.

>> Isn't it a time to review their values for increasing? 64KB looks funny, 
>> comparing to modern memory sizes and data rates. It just increases 
>> interrupt rates, but I don't think it really need to be so small to 
>> improve interactivity now.
> 
> Probably historical from the days when memory was scarce.
> 
> There's nothing preventing the user from upping these values in his
> kernel config file.  But note the warning in NOTES about possibly
> making the kernel unbootable.  It's not clear whether this warning is
> still valid given todays larger memory footprints and the inmproved
> VM system.
> 
> I wonder whether all drivers can correctly handle larger values for
> DFLTPHYS.

There are always will be drivers/devices with limitations. They should 
just be able to report that limitations to system. This is possible with 
GEOM, but it doesn't looks tuned well for all providers. There are many 
places, when DFLTPHYS used just with hope that it will work. IMHO if 
driver unable to adapt to any defined DFLTPHYS value, it should not use 
it, but instead should announce some specific value that it really supports.

-- 
Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Sun Jul  5 14:11:55 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 48E6F1065670;
	Sun,  5 Jul 2009 14:11:55 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au
	[211.29.132.189])
	by mx1.freebsd.org (Postfix) with ESMTP id D7B688FC16;
	Sun,  5 Jul 2009 14:11:54 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c122-106-161-96.carlnfd1.nsw.optusnet.com.au
	(c122-106-161-96.carlnfd1.nsw.optusnet.com.au [122.106.161.96])
	by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	n65EBp4B013112
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 6 Jul 2009 00:11:52 +1000
Date: Mon, 6 Jul 2009 00:11:51 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Alexander Motin <mav@FreeBSD.org>
In-Reply-To: <4A50667F.7080608@FreeBSD.org>
Message-ID: <20090705223126.I42918@delplex.bde.org>
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
	<4A50667F.7080608@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: gary.jennejohn@freenet.de, freebsd-arch@FreeBSD.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Jul 2009 14:11:55 -0000

On Sun, 5 Jul 2009, Alexander Motin wrote:

> Gary Jennejohn wrote:
>> On Sat, 04 Jul 2009 22:14:53 +0300
>> Alexander Motin <mav@FreeBSD.org> wrote:
>> 
>>> Can somebody explain me a difference between DFLTPHYS and MAXPHYS 
>>> constants? As I understand, the last one is a maximal amount of memory, 
>>> that can be mapped to the kernel, or passed to the hardware drivers. But 
>>> why then DFLTPHYS is used in so many places and what does it mean?
>> 
>> There's a pretty good comment on these in /sys/conf/NOTES.
>
> But it does not explains why.

DFLTPHYS is the default -- the size to be used when the correct size is
not known.  However, this is mostly broken:

- the correct size should always be known at a low level.  You have to
   know the maximum size for a device to know that this size is larger
   than the default, else using the default size won't work.  Also, you
   have to know that the default size is a multiple of the minimum size.
   Both of these are usually true accidentally, so things sort of work.

- the default size is defaulted inconsistently.  Geom hides the device
   maximum i/o size (d_maxsize, which is normally either 64K or DFLTPHYS
   which happen to be the same) from the top level of devices (it reblocks
   if necessary so that sizes up to (s_iosize_max, which is always
   MAXPHYS) work, so it is difficult to see the the low-level size or to
   use an i/o size that is a multiple of the device maximum i/o size if
   the latter is not a divisor or MAXPHYS.  This means that hard-coding
   MAXPHYS would work best in most places above the driver level, but most
   places have a mess of buggy layering (mnt_iosize_max is supposed to
   default to DFLTPHYS and then be changed to si_iosize_max when the latter
   is known, but some file systems forget to do this).

>>> Isn't it a time to review their values for increasing? 64KB looks funny, 
>>> comparing to modern memory sizes and data rates. It just increases 
>>> interrupt rates, but I don't think it really need to be so small to 
>>> improve interactivity now.

64K is large enough to bust modern L1 caches and old L2 caches.  Make the
size bigger to bust modern L2 caches too.  Interrupt rates don't matter
when you are transfering 64K items per interrupt.

>> I wonder whether all drivers can correctly handle larger values for
>> DFLTPHYS.

Most can't, since their hardware can't.  They can fake it (ata used to)
but there is negative point in this for most drivers, since geom already
reblocks for disk devices and reblocking would be wrong for devices like
tapes.

> There are always will be drivers/devices with limitations. They should just 
> be able to report that limitations to system. This is possible with GEOM, but 
> it doesn't looks tuned well for all providers. There are many places, when 
> DFLTPHYS used just with hope that it will work. IMHO if driver unable to 
> adapt to any defined DFLTPHYS value, it should not use it, but instead should 
> announce some specific value that it really supports.

cam scsi devices seem to be the only important ones that still hard-code
d_maxsize to DFLTPHYS.  Strangely, pre-cam scsi had the beginnings (or
remnants) of more sophisticated i/o size limiting.  In FreeBSD-1, it
has an xxminphys() function for every scsi device.  I think it was supposed
to be possible to ask any device for any i/o size, and minphys was used
for reblocking at a low level.  minphys was only implemented for scsi
drivers and wasn't part of the physio() as in Net/2 (?). For the aha1542
driver, minphys was:

% void 
% ahaminphys(bp)
% 	struct buf *bp;
% {
% /*      aha seems to explode with 17 segs (64k may require 17 segs) */
% /*      on old boards so use a max of 16 segs if you have problems here */
% 	if (bp->b_bcount > ((AHA_NSEG - 1) * PAGESIZ)) {
% 		bp->b_bcount = ((AHA_NSEG - 1) * PAGESIZ);
% 	}
% }

FreeBSD-1 doesn't have DFLTPHYS, and barely uses MAXPHYS.  MAXPHYS was 64K.
I think MAXBSIZE = 64K limited most transfers.  However, physio() uses a
buffer of size 256K, larger than it does today!, so apparently, device
drivers were responsible for lots of reblocking.  In the wd driver, the
reblocking consisted of doing 1 512-block at a time (I think it didn't
even do multiple sectors per interrupt then).

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Sun Jul  5 14:16:13 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5ED031065675;
	Sun,  5 Jul 2009 14:16:13 +0000 (UTC)
	(envelope-from gary.jennejohn@freenet.de)
Received: from mout1.freenet.de (mout1.freenet.de [IPv6:2001:748:100:40::2:3])
	by mx1.freebsd.org (Postfix) with ESMTP id DFF208FC08;
	Sun,  5 Jul 2009 14:16:12 +0000 (UTC)
	(envelope-from gary.jennejohn@freenet.de)
Received: from [195.4.92.16] (helo=6.mx.freenet.de)
	by mout1.freenet.de with esmtpa (ID gary.jennejohn@freenet.de) (port
	25) (Exim 4.69 #92)
	id 1MNSVn-0005vt-B6; Sun, 05 Jul 2009 16:16:11 +0200
Received: from tb832.t.pppool.de ([89.55.184.50]:37641
	helo=ernst.jennejohn.org)
	by 6.mx.freenet.de with esmtpa (ID gary.jennejohn@freenet.de) (port 25)
	(Exim 4.69 #79) id 1MNSVn-0005yg-1L; Sun, 05 Jul 2009 16:16:11 +0200
Date: Sun, 5 Jul 2009 16:16:10 +0200
From: Gary Jennejohn <gary.jennejohn@freenet.de>
To: Alexander Motin <mav@FreeBSD.org>
Message-ID: <20090705161610.52e01954@ernst.jennejohn.org>
In-Reply-To: <4A50667F.7080608@FreeBSD.org>
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
	<4A50667F.7080608@FreeBSD.org>
X-Mailer: Claws Mail 3.7.2 (GTK+ 2.16.2; amd64-portbld-freebsd8.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-purgate-ID: 149285::1246803371-00000AA3-92B87048/0-0/0-0
Cc: freebsd-arch@freebsd.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: gary.jennejohn@freenet.de
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Jul 2009 14:16:13 -0000

On Sun, 05 Jul 2009 11:38:23 +0300
Alexander Motin <mav@FreeBSD.org> wrote:

> Gary Jennejohn wrote:
> > I wonder whether all drivers can correctly handle larger values for
> > DFLTPHYS.
> 
> There are always will be drivers/devices with limitations. They should 
> just be able to report that limitations to system. This is possible with 
> GEOM, but it doesn't looks tuned well for all providers. There are many 
> places, when DFLTPHYS used just with hope that it will work. IMHO if 
> driver unable to adapt to any defined DFLTPHYS value, it should not use 
> it, but instead should announce some specific value that it really supports.
> 

This would be the correct way to do things.

I remember back in the good-old-days, circa 1985, disk drivers _always_
did their own PHYS handling so that utilities could pass in whatever
value they wanted to use for the size.  Of course, that meant that each
driver reinvented the wheel.

---
Gary Jennejohn

From owner-freebsd-arch@FreeBSD.ORG  Sun Jul  5 14:37:21 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B12011065670
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 14:37:21 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121])
	by mx1.freebsd.org (Postfix) with ESMTP id 335458FC17
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 14:37:20 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from [212.86.226.226] (account mav@alkar.net HELO
	mavbook.mavhome.dp.ua)
	by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9)
	with ESMTPSA id 247686623; Sun, 05 Jul 2009 17:37:17 +0300
Message-ID: <4A50BA9A.9080005@FreeBSD.org>
Date: Sun, 05 Jul 2009 17:37:14 +0300
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.21 (X11/20090405)
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
	<4A50667F.7080608@FreeBSD.org>
	<20090705223126.I42918@delplex.bde.org>
In-Reply-To: <20090705223126.I42918@delplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: gary.jennejohn@freenet.de, freebsd-arch@FreeBSD.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Jul 2009 14:37:21 -0000

Bruce Evans wrote:
> On Sun, 5 Jul 2009, Alexander Motin wrote:
>>>> Isn't it a time to review their values for increasing? 64KB looks 
>>>> funny, comparing to modern memory sizes and data rates. It just 
>>>> increases interrupt rates, but I don't think it really need to be so 
>>>> small to improve interactivity now.
> 
> 64K is large enough to bust modern L1 caches and old L2 caches.  Make the
> size bigger to bust modern L2 caches too.  Interrupt rates don't matter
> when you are transfering 64K items per interrupt.

How cache size related to it, if DMA transfers data directly to RAM? 
Sure, CPU will invalidate related cache lines, but why it should 
invalidate everything?

Small transfers give more work to all levels from GEOM down to CAM/ATA, 
controllers and drives. It is not just a context switching.

>>> I wonder whether all drivers can correctly handle larger values for
>>> DFLTPHYS.
> 
> Most can't, since their hardware can't.  They can fake it (ata used to)
> but there is negative point in this for most drivers, since geom already
> reblocks for disk devices and reblocking would be wrong for devices like
> tapes.

I am not speaking about reblocking. I am speaking about best possible 
hardware usage. I can't say about the most, but at least AHCI and modern 
SiI SATA chips, I have worked closely, practically have no limits for 
transaction size, except the amount of memory their drivers allocate for 
S/G table. My new drivers are able to self-tune for any MAXPHYS value.

-- 
Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Sun Jul  5 16:46:40 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EE31E106566C;
	Sun,  5 Jul 2009 16:46:40 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 88ADC8FC0A;
	Sun,  5 Jul 2009 16:46:40 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-161-96.carlnfd1.nsw.optusnet.com.au
	[122.106.161.96])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	n65GkbhA030233
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 6 Jul 2009 02:46:38 +1000
Date: Mon, 6 Jul 2009 02:46:37 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Alexander Motin <mav@FreeBSD.org>
In-Reply-To: <4A50BA9A.9080005@FreeBSD.org>
Message-ID: <20090706005851.L1439@besplex.bde.org>
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
	<4A50667F.7080608@FreeBSD.org> <20090705223126.I42918@delplex.bde.org>
	<4A50BA9A.9080005@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: gary.jennejohn@freenet.de, freebsd-arch@FreeBSD.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Jul 2009 16:46:41 -0000

On Sun, 5 Jul 2009, Alexander Motin wrote:

> Bruce Evans wrote:
>> On Sun, 5 Jul 2009, Alexander Motin wrote:
>>>>> Isn't it a time to review their values for increasing? 64KB looks funny, 
>>>>> comparing to modern memory sizes and data rates. It just increases 
>>>>> interrupt rates, but I don't think it really need to be so small to 
>>>>> improve interactivity now.
>> 
>> 64K is large enough to bust modern L1 caches and old L2 caches.  Make the
>> size bigger to bust modern L2 caches too.  Interrupt rates don't matter
>> when you are transfering 64K items per interrupt.
>
> How cache size related to it, if DMA transfers data directly to RAM? Sure, 
> CPU will invalidate related cache lines, but why it should invalidate 
> everything?

I was thinking more of transfers to userland.  Increasing user buffer
sizes above about half the L2 cache size guarantees busting the L2
cache, if the application actually looks at all of its data.  If the
data is read using read(), then the L2 cache will be busted twice (or
a bit less with nontemporal copying), first by copying out the data
and then by looking at it.  If the data is read using mmap(), then the
L2 cache will only be busted once.  This effect has always been very
noticeable using dd.  Larger buffer sizes are also bad for latency.

> Small transfers give more work to all levels from GEOM down to CAM/ATA, 
> controllers and drives. It is not just a context switching.

Yes, I can't see any cache busting below the level of copyout().  Also,
after you convert all applications to use mmap() instead of read(),
the cache busting should become per-CPU.

>>>> I wonder whether all drivers can correctly handle larger values for
>>>> DFLTPHYS.
>> 
>> Most can't, since their hardware can't.  They can fake it (ata used to)
>> but there is negative point in this for most drivers, since geom already
>> reblocks for disk devices and reblocking would be wrong for devices like
>> tapes.
>
> I am not speaking about reblocking. I am speaking about best possible 
> hardware usage. I can't say about the most, but at least AHCI and modern SiI 
> SATA chips, I have worked closely, practically have no limits for transaction 
> size, except the amount of memory their drivers allocate for S/G table. My 
> new drivers are able to self-tune for any MAXPHYS value.

The main limit above ata seems to be only MAXPHYS and its use in pbufs.
DFLTPHYS seems to only be used in buggy unimportant cases.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Sun Jul  5 17:12:16 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 75AD01065670
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 17:12:16 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121])
	by mx1.freebsd.org (Postfix) with ESMTP id B4AE58FC17
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 17:12:15 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from [212.86.226.226] (account mav@alkar.net HELO
	mavbook.mavhome.dp.ua)
	by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9)
	with ESMTPSA id 247691562; Sun, 05 Jul 2009 20:12:12 +0300
Message-ID: <4A50DEE8.6080406@FreeBSD.org>
Date: Sun, 05 Jul 2009 20:12:08 +0300
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.21 (X11/20090405)
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
	<4A50667F.7080608@FreeBSD.org>
	<20090705223126.I42918@delplex.bde.org>
	<4A50BA9A.9080005@FreeBSD.org>
	<20090706005851.L1439@besplex.bde.org>
In-Reply-To: <20090706005851.L1439@besplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-arch@FreeBSD.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Jul 2009 17:12:16 -0000

Bruce Evans wrote:
> On Sun, 5 Jul 2009, Alexander Motin wrote:
>> Bruce Evans wrote:
>>> On Sun, 5 Jul 2009, Alexander Motin wrote:
>>> 64K is large enough to bust modern L1 caches and old L2 caches.  Make 
>>> the
>>> size bigger to bust modern L2 caches too.  Interrupt rates don't matter
>>> when you are transfering 64K items per interrupt.
>>
>> How cache size related to it, if DMA transfers data directly to RAM? 
>> Sure, CPU will invalidate related cache lines, but why it should 
>> invalidate everything?
> 
> I was thinking more of transfers to userland.  Increasing user buffer
> sizes above about half the L2 cache size guarantees busting the L2
> cache, if the application actually looks at all of its data.  If the
> data is read using read(), then the L2 cache will be busted twice (or
> a bit less with nontemporal copying), first by copying out the data
> and then by looking at it.  If the data is read using mmap(), then the
> L2 cache will only be busted once.  This effect has always been very
> noticeable using dd.  Larger buffer sizes are also bad for latency.
> 
>> Small transfers give more work to all levels from GEOM down to 
>> CAM/ATA, controllers and drives. It is not just a context switching.
> 
> Yes, I can't see any cache busting below the level of copyout().  Also,
> after you convert all applications to use mmap() instead of read(),
> the cache busting should become per-CPU.

As soon as file data usually passing via buffer cache, they will anyway 
be read to the different memory areas and copied-out from them. So I 
don't see much difference there between doing single big and several 
small transactions. Cache trashing by user-level also will depends only 
on user-level application buffer size, but not on kernel.

How to reproduce that dd experiment? I have my system running with 
MAXPHYS of 512K and here is what I have:

# dd if=/dev/ada0 of=/dev/null bs=512k count=1000
1000+0 records in
1000+0 records out
524288000 bytes transferred in 2.471564 secs (212128024 bytes/sec)
# dd if=/dev/ada0 of=/dev/null bs=256k count=2000
2000+0 records in
2000+0 records out
524288000 bytes transferred in 2.666643 secs (196609752 bytes/sec)
# dd if=/dev/ada0 of=/dev/null bs=128k count=4000
4000+0 records in
4000+0 records out
524288000 bytes transferred in 2.759498 secs (189993969 bytes/sec)
# dd if=/dev/ada0 of=/dev/null bs=64k count=8000
8000+0 records in
8000+0 records out
524288000 bytes transferred in 2.718900 secs (192830927 bytes/sec)

CPU load instead grows from 10% at 512K to 15% at 64K. May be trashing 
effect will only be noticeable at block comparable to cache size, but 
modern CPUs have megabytes of cache.

-- 
Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Sun Jul  5 18:32:15 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 424C7106566C;
	Sun,  5 Jul 2009 18:32:15 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au
	[211.29.132.189])
	by mx1.freebsd.org (Postfix) with ESMTP id 99B0C8FC14;
	Sun,  5 Jul 2009 18:32:14 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-161-96.carlnfd1.nsw.optusnet.com.au
	[122.106.161.96])
	by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	n65IWBpC005294
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 6 Jul 2009 04:32:13 +1000
Date: Mon, 6 Jul 2009 04:32:11 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Alexander Motin <mav@freebsd.org>
In-Reply-To: <4A50DEE8.6080406@FreeBSD.org>
Message-ID: <20090706034250.C2240@besplex.bde.org>
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
	<4A50667F.7080608@FreeBSD.org> <20090705223126.I42918@delplex.bde.org>
	<4A50BA9A.9080005@FreeBSD.org> <20090706005851.L1439@besplex.bde.org>
	<4A50DEE8.6080406@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@freebsd.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Jul 2009 18:32:15 -0000

On Sun, 5 Jul 2009, Alexander Motin wrote:

> Bruce Evans wrote:
>> I was thinking more of transfers to userland.  Increasing user buffer
>> sizes above about half the L2 cache size guarantees busting the L2
>> cache, if the application actually looks at all of its data.  If the
>> data is read using read(), then the L2 cache will be busted twice (or
>> a bit less with nontemporal copying), first by copying out the data
>> and then by looking at it.  If the data is read using mmap(), then the
>> L2 cache will only be busted once.  This effect has always been very
>> noticeable using dd.  Larger buffer sizes are also bad for latency.
> ...
> How to reproduce that dd experiment? I have my system running with MAXPHYS of 
> 512K and here is what I have:

I used a regular file with the same size as main memory (1G), and for
today's test, not quite dd, but a program that throws away the data
(so as to avoid overcall for write syscalls) and prints status info
in a more suitable form than even dd's ^T.

Your results show that physio() behaves quite differently than copying
reading a regular file.  I see similar behaviour input from a disk file.

> # dd if=/dev/ada0 of=/dev/null bs=512k count=1000
> 1000+0 records in
> 1000+0 records out
> 524288000 bytes transferred in 2.471564 secs (212128024 bytes/sec)

512MB would be too small with buffering for a regular file, but should
be OK with a disk file.

> # dd if=/dev/ada0 of=/dev/null bs=256k count=2000
> 2000+0 records in
> 2000+0 records out
> 524288000 bytes transferred in 2.666643 secs (196609752 bytes/sec)
> # dd if=/dev/ada0 of=/dev/null bs=128k count=4000
> 4000+0 records in
> 4000+0 records out
> 524288000 bytes transferred in 2.759498 secs (189993969 bytes/sec)
> # dd if=/dev/ada0 of=/dev/null bs=64k count=8000
> 8000+0 records in
> 8000+0 records out
> 524288000 bytes transferred in 2.718900 secs (192830927 bytes/sec)
>
> CPU load instead grows from 10% at 512K to 15% at 64K. May be trashing effect 
> will only be noticeable at block comparable to cache size, but modern CPUs 
> have megabytes of cache.

I used systat -v to estimate the load.  Its average jumps around more than I
like, but I don't have anything better.  Sys time from dd and others is even
more useless than it used to be since lots of the i/o runs in threads and
the system doesn't know how to charge the application for thread time.

My results (MAXPHYS is 64K, transfer rate 50MB/S, under FreeBSD-~5.2
de-geomed):

regular file:

block size    %idle
----------    -----
1M            87
16K           91
4K            88 (?)
512           72 (?)

disk file:

block size    %idle
----------    -----
1M            96
64K           96
32K           93
16K           87
8K            82 (firmware can't keep up and rate drops to 37MB/S)

In the case of the regular file, almost all i/o is clustered so the driver
sees mainly the cluster size (driver max size of 64K before geom).  Upper
layers then do a good job of only adding a few percent CPU when declustering
to 16K fs-blocks.

In the case of the disk file, I can't explain why the overhead is so low
(~0.5% intr 3.5% sys) for large block sizes.  Uncached copies on the
test machine go at 850MB/S so 50MB/S should take 1/19 of the CPU or 5.3%.

Another difference with the disk file test is that physio() uses a single
pbuf so the test doesn't thrash the buffer cache's memory.  dd of a large
regular file will thrash the L2 cache even if the user buffer size is small,
but still goes faster with a smaller user buffer since the user buffer
stays cached.

Faster disks will of course want larger block sizes.  I'm still suprised
that this makes more difference to CPU than throughput.  Maybe it doesn't
really, but the measurement becomes differently accurate when the CPU
becomes more loaded.  At 100% load there would be nowhere to hide things
like speculative cache fetches.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Sun Jul  5 18:51:13 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9CB741065672
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 18:51:13 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121])
	by mx1.freebsd.org (Postfix) with ESMTP id DC3CE8FC15
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 18:51:12 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from [212.86.226.226] (account mav@alkar.net HELO
	mavbook.mavhome.dp.ua)
	by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9)
	with ESMTPSA id 247694239; Sun, 05 Jul 2009 21:51:09 +0300
Message-ID: <4A50F619.4020101@FreeBSD.org>
Date: Sun, 05 Jul 2009 21:51:05 +0300
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.21 (X11/20090405)
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
	<4A50667F.7080608@FreeBSD.org>
	<20090705223126.I42918@delplex.bde.org>
	<4A50BA9A.9080005@FreeBSD.org>
	<20090706005851.L1439@besplex.bde.org>
	<4A50DEE8.6080406@FreeBSD.org>
	<20090706034250.C2240@besplex.bde.org>
In-Reply-To: <20090706034250.C2240@besplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-arch@freebsd.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Jul 2009 18:51:13 -0000

Bruce Evans wrote:
> On Sun, 5 Jul 2009, Alexander Motin wrote:
> 
>> Bruce Evans wrote:
>>> I was thinking more of transfers to userland.  Increasing user buffer
>>> sizes above about half the L2 cache size guarantees busting the L2
>>> cache, if the application actually looks at all of its data.  If the
>>> data is read using read(), then the L2 cache will be busted twice (or
>>> a bit less with nontemporal copying), first by copying out the data
>>> and then by looking at it.  If the data is read using mmap(), then the
>>> L2 cache will only be busted once.  This effect has always been very
>>> noticeable using dd.  Larger buffer sizes are also bad for latency.
>> ...
>> How to reproduce that dd experiment? I have my system running with 
>> MAXPHYS of 512K and here is what I have:
> 
> I used a regular file with the same size as main memory (1G), and for
> today's test, not quite dd, but a program that throws away the data
> (so as to avoid overcall for write syscalls) and prints status info
> in a more suitable form than even dd's ^T.
> 
> Your results show that physio() behaves quite differently than copying
> reading a regular file.  I see similar behaviour input from a disk file.
> 
>> # dd if=/dev/ada0 of=/dev/null bs=512k count=1000
>> 1000+0 records in
>> 1000+0 records out
>> 524288000 bytes transferred in 2.471564 secs (212128024 bytes/sec)
> 
> 512MB would be too small with buffering for a regular file, but should
> be OK with a disk file.
> 
>> # dd if=/dev/ada0 of=/dev/null bs=256k count=2000
>> 2000+0 records in
>> 2000+0 records out
>> 524288000 bytes transferred in 2.666643 secs (196609752 bytes/sec)
>> # dd if=/dev/ada0 of=/dev/null bs=128k count=4000
>> 4000+0 records in
>> 4000+0 records out
>> 524288000 bytes transferred in 2.759498 secs (189993969 bytes/sec)
>> # dd if=/dev/ada0 of=/dev/null bs=64k count=8000
>> 8000+0 records in
>> 8000+0 records out
>> 524288000 bytes transferred in 2.718900 secs (192830927 bytes/sec)
>>
>> CPU load instead grows from 10% at 512K to 15% at 64K. May be trashing 
>> effect will only be noticeable at block comparable to cache size, but 
>> modern CPUs have megabytes of cache.
> 
> I used systat -v to estimate the load.  Its average jumps around more 
> than I
> like, but I don't have anything better.  Sys time from dd and others is 
> even
> more useless than it used to be since lots of the i/o runs in threads and
> the system doesn't know how to charge the application for thread time.
> 
> My results (MAXPHYS is 64K, transfer rate 50MB/S, under FreeBSD-~5.2
> de-geomed):
> 
> regular file:
> 
> block size    %idle
> ----------    -----
> 1M            87
> 16K           91
> 4K            88 (?)
> 512           72 (?)
> 
> disk file:
> 
> block size    %idle
> ----------    -----
> 1M            96
> 64K           96
> 32K           93
> 16K           87
> 8K            82 (firmware can't keep up and rate drops to 37MB/S)
> 
> In the case of the regular file, almost all i/o is clustered so the driver
> sees mainly the cluster size (driver max size of 64K before geom).  Upper
> layers then do a good job of only adding a few percent CPU when 
> declustering
> to 16K fs-blocks.

In this tests you've got almost only negative side of effect, as you 
have said, due to cache misses. Do you really have CPU with so small L2 
cache? Some kind of P3 or old Celeron? But with 64K MAXPHYS you just 
didn't get any benefit from using bigger block size.

-- 
Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Sun Jul  5 19:16:35 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 79D4D1065674
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 19:16:35 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121])
	by mx1.freebsd.org (Postfix) with ESMTP id EE6AA8FC1D
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 19:16:34 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from [212.86.226.226] (account mav@alkar.net HELO
	mavbook.mavhome.dp.ua)
	by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9)
	with ESMTPSA id 247695245; Sun, 05 Jul 2009 22:16:31 +0300
Message-ID: <4A50FC0B.9090601@FreeBSD.org>
Date: Sun, 05 Jul 2009 22:16:27 +0300
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.21 (X11/20090405)
MIME-Version: 1.0
To: Adrian Chadd <adrian@freebsd.org>
References: <4A4FAA2D.3020409@FreeBSD.org>	
	<20090705100044.4053e2f9@ernst.jennejohn.org>	
	<4A50667F.7080608@FreeBSD.org>
	<20090705223126.I42918@delplex.bde.org>	
	<4A50BA9A.9080005@FreeBSD.org>
	<20090706005851.L1439@besplex.bde.org>	
	<4A50DEE8.6080406@FreeBSD.org>
	<20090706034250.C2240@besplex.bde.org>	
	<4A50F619.4020101@FreeBSD.org>
	<d763ac660907051158i256c0f93n4a895a992c2a8c34@mail.gmail.com>
In-Reply-To: <d763ac660907051158i256c0f93n4a895a992c2a8c34@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-arch@freebsd.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Jul 2009 19:16:35 -0000

Adrian Chadd wrote:
> 2009/7/6 Alexander Motin <mav@freebsd.org>:
> 
>> In this tests you've got almost only negative side of effect, as you have
>> said, due to cache misses. Do you really have CPU with so small L2 cache?
>> Some kind of P3 or old Celeron? But with 64K MAXPHYS you just didn't get any
>> benefit from using bigger block size.
> 
> All the world isn't your current desktop box with only SATA devices :)

This is laptop and what do you mean by "only SATA"? You know any storage 
which performance degrade from big transactions?

> There have been and will be plenty of little embedded CPUs with tiny
> amounts of cache for quite some time to come.

Fine, lets set it to 8K on ARM. What do want to say by that?

> You're also doing simple stream IO tests. Please re-think the thought
> experiment with a whole lot of parallel IO going on rather than just
> straight single stream IO.

Please don't. Parallel access with big blocks becomes just more linear 
with growing block length. For modern drives with >100MB/s speeds and 
10ms access time it is just a madness to transfer less then 1MB in one 
transaction with random access.

> Also, please realise that part of having your cache thrashed is what
> it does to the performance of -other- code. dd may be fast, but if
> you're constantly purging your caches by copying around all of that
> data, subsequent code has to go and freshen the cache again. On older
> and anaemic embedded/low power boxes the cost of a cache miss vs cache
> hit can still be quite expensive.

I think that anaemic embedded/low power boxes will prefer to handle 
operation by chipset hardware as much as possible without interrupting CPU.

Also please read one of my previous posts. I don't see why, with, for 
example, 1M user-level buffer, buffer-cache backed access spited into 
many small disk transactions could less trash CPU cache. It just 
transmit same amount of data into the same buffer cachememory addresses. 
It is not a disk transaction DMA size trashes the cache. If you want to 
fight it - OK, but not there.

-- 
Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Sun Jul  5 19:25:39 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7EAA61065688
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 19:25:39 +0000 (UTC)
	(envelope-from adrian.chadd@gmail.com)
Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.226])
	by mx1.freebsd.org (Postfix) with ESMTP id 4DF998FC0A
	for <freebsd-arch@freebsd.org>; Sun,  5 Jul 2009 19:25:39 +0000 (UTC)
	(envelope-from adrian.chadd@gmail.com)
Received: by rv-out-0506.google.com with SMTP id f9so1149759rvb.43
	for <freebsd-arch@freebsd.org>; Sun, 05 Jul 2009 12:25:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:sender:received:in-reply-to
	:references:date:x-google-sender-auth:message-id:subject:from:to:cc
	:content-type:content-transfer-encoding;
	bh=MwWBUiUmA3rfFwSy+RSR4TPhZqHzxWKp5S5AkmbxZDg=;
	b=BfQATZi29aHj22LDAfJ/VxiQKZgNGja3B9AjtwngB5JE/XoyJbj4YaJnhSCkcuu++P
	w9vD69K3RIDF6XGOwPAvVpH+LB8/w8PIaU4xRIyloqXVceJdxFxYmUSeIv3p6MMftaRM
	lv3991+TAP4WBgRtAfQlLG26j7Hv5mRSS3TbI=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	b=m7mpvxBUIXffRnknVe/YAvkcd/qt7Wig8azV67UlGfYI+z5Kri/EO0uJbxjdVJN8ye
	08c+n6zRuCfM2w5u9nYarHehMlqu1d9VCqk2RnoCc7cZEw7f8ypcp4AZhsKWO8Fanooo
	yAgbkze7xtOElGRPfhEnEbLxI+eOwIA3bB+sY=
MIME-Version: 1.0
Sender: adrian.chadd@gmail.com
Received: by 10.140.248.15 with SMTP id v15mr317122rvh.246.1246820316605; Sun, 
	05 Jul 2009 11:58:36 -0700 (PDT)
In-Reply-To: <4A50F619.4020101@FreeBSD.org>
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
	<4A50667F.7080608@FreeBSD.org> <20090705223126.I42918@delplex.bde.org>
	<4A50BA9A.9080005@FreeBSD.org> <20090706005851.L1439@besplex.bde.org>
	<4A50DEE8.6080406@FreeBSD.org> <20090706034250.C2240@besplex.bde.org>
	<4A50F619.4020101@FreeBSD.org>
Date: Mon, 6 Jul 2009 02:58:36 +0800
X-Google-Sender-Auth: bf2a0323a2968e38
Message-ID: <d763ac660907051158i256c0f93n4a895a992c2a8c34@mail.gmail.com>
From: Adrian Chadd <adrian@freebsd.org>
To: Alexander Motin <mav@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-arch@freebsd.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Jul 2009 19:25:39 -0000

2009/7/6 Alexander Motin <mav@freebsd.org>:

> In this tests you've got almost only negative side of effect, as you have
> said, due to cache misses. Do you really have CPU with so small L2 cache?
> Some kind of P3 or old Celeron? But with 64K MAXPHYS you just didn't get any
> benefit from using bigger block size.

All the world isn't your current desktop box with only SATA devices :)

There have been and will be plenty of little embedded CPUs with tiny
amounts of cache for quite some time to come.

You're also doing simple stream IO tests. Please re-think the thought
experiment with a whole lot of parallel IO going on rather than just
straight single stream IO.

Also, please realise that part of having your cache thrashed is what
it does to the performance of -other- code. dd may be fast, but if
you're constantly purging your caches by copying around all of that
data, subsequent code has to go and freshen the cache again. On older
and anaemic embedded/low power boxes the cost of a cache miss vs cache
hit can still be quite expensive.

2c,


Adrian

From owner-freebsd-arch@FreeBSD.ORG  Mon Jul  6 01:14:21 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1025C106564A
	for <freebsd-arch@freebsd.org>; Mon,  6 Jul 2009 01:14:21 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.freebsd.org (Postfix) with ESMTP id D5BAB8FC23
	for <freebsd-arch@freebsd.org>; Mon,  6 Jul 2009 01:14:20 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.14.2/8.14.1) with ESMTP id n661EKpf065707
	for <freebsd-arch@freebsd.org>; Sun, 5 Jul 2009 18:14:20 -0700 (PDT)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.14.2/8.13.4/Submit) id n661EK68065706;
	Sun, 5 Jul 2009 18:14:20 -0700 (PDT)
Date: Sun, 5 Jul 2009 18:14:20 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200907060114.n661EK68065706@apollo.backplane.com>
To: freebsd-arch@freebsd.org
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
	<4A50667F.7080608@FreeBSD.org> <20090705223126.I42918@delplex.bde.org>
	<4A50BA9A.9080005@FreeBSD.org> <20090706005851.L1439@besplex.bde.org>
	<4A50DEE8.6080406@FreeBSD.org> <20090706034250.C2240@besplex.bde.org>
	<4A50F619.4020101@FreeBSD.org>
	<d763ac660907051158i256c0f93n4a895a992c2a8c34@mail.gmail.com>
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Jul 2009 01:14:21 -0000

    I think MAXPHYS, or the equivalent, is still used somewhat in the
    clustering code.  The number of buffers the clustering code decides to
    chain together dictates the impact on the actual device.  The relevancy
    here has very little to do with cache smashing and more to do with
    optimizing disk seeks (or network latency).  There is no best value for
    this.  It is only marginally more interesting for a network interface
    due to the fact that most links still run with absurdly small MTUs
    (even 9000+ is absurdly small).  It is entirely uninteresting for
    a SATA or other modern disk link.

    For linear transfers you only need a value sufficiently large to reduce
    the impact of command overhead on the cpu and achieve the device's
    maximum linear transfer rate For example, doing a dd with bs=512
    verses bs=32k.  It runs on a curve and there will generally be very
    little additional bang for the buck beyond 64K for a linear transfer
    (assuming read ahead and NCQ to reduce inter-command latency).

    For random and semi-random transfers a larger buffer sizes have two
    impacts.  First is a negative impact on seek times.  A random seek-read
    of 16K is faster then a random seek-read of 64K is faster then a random
    seek-read of 512K.  I did a ton of testing with HAMMER and it just
    didn't make much sense to go beyond 128K, frankly, but neither does it
    make sense to use something really tiny like 8K.  32K-128K seems to be
    the sweet spot.  The second is a positive impact on reducing the total
    number of seeks *IF* you have reasonable cache locality of reference.

    There is no correct value, it depends heavily on the access pattern.
    A random access pattern with very little locality of reference will
    benefit from a smaller block size while a random access pattern with
    high locality of reference will benefit from a larger block size.  That's
    all there is to it.

    I have a fairly negative opinion of trying to tune block size to cpu
    caches.  I don't think it matters nearly as much as tuning it to the
    seek/locality-of-reference performace curve, and I don't feel that
    contrived linear tests are all that interesting since they don't really
    reflect real-life work-loads.

    on-drive caching has an impact too, but that's another conversation.
    Vendors have been known to intentionally degrade drive cache performance
    on consumer drives verses commercial drives.  I've often hit limitations
    in testing HAMMER which seem to be contrived by vendors that would have
    allowed me to use a smaller block size and still get the locality of
    reference, but I wind up having to use a larger one because the drive
    cache doesn't behave sanely.

    --

    The DMA ability of modern devices and device drivers is pretty much moot
    as no self respecting disk controller chipset will be limited to a 
    measily 64K max transfer any more.  AHCI certainly has no issue doing
    in excess of a megabyte.  The limit is something like 65535 chained
    entries for AHCI.  I forget what the spec says exactly but it's
    basically more then we'd ever really need.  Nobody should really care
    about the performance of a chipset that is limited to a 64K max
    transfer.

    As long as the cluster code knows what the device can do and the
    filesystem doesn't try to use a larger block size the device is
    capable of in a single BIO, the cluster code will make up the
    difference for any device-based limitations.

						-Matt


From owner-freebsd-arch@FreeBSD.ORG  Mon Jul  6 11:06:53 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E5E5D106564A
	for <freebsd-arch@FreeBSD.org>; Mon,  6 Jul 2009 11:06:53 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id B7CB08FC20
	for <freebsd-arch@FreeBSD.org>; Mon,  6 Jul 2009 11:06:53 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n66B6rvf010684
	for <freebsd-arch@FreeBSD.org>; Mon, 6 Jul 2009 11:06:53 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n66B6rFs010680
	for freebsd-arch@FreeBSD.org; Mon, 6 Jul 2009 11:06:53 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 6 Jul 2009 11:06:53 GMT
Message-Id: <200907061106.n66B6rFs010680@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-arch@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-arch@FreeBSD.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Jul 2009 11:06:54 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/120749  arch       [request] Suggest upping the default kern.ps_arg_cache

1 problem total.


From owner-freebsd-arch@FreeBSD.ORG  Mon Jul  6 15:54:18 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3BD6B106566C;
	Mon,  6 Jul 2009 15:54:18 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by mx1.freebsd.org (Postfix) with ESMTP id B35F78FC16;
	Mon,  6 Jul 2009 15:54:17 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c122-107-120-90.carlnfd1.nsw.optusnet.com.au
	(c122-107-120-90.carlnfd1.nsw.optusnet.com.au [122.107.120.90])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	n66FsEpm024231
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 7 Jul 2009 01:54:15 +1000
Date: Tue, 7 Jul 2009 01:54:14 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Alexander Motin <mav@freebsd.org>
In-Reply-To: <4A50F619.4020101@FreeBSD.org>
Message-ID: <20090707011217.O43961@delplex.bde.org>
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
	<4A50667F.7080608@FreeBSD.org> <20090705223126.I42918@delplex.bde.org>
	<4A50BA9A.9080005@FreeBSD.org> <20090706005851.L1439@besplex.bde.org>
	<4A50DEE8.6080406@FreeBSD.org> <20090706034250.C2240@besplex.bde.org>
	<4A50F619.4020101@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-arch@freebsd.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Jul 2009 15:54:18 -0000

On Sun, 5 Jul 2009, Alexander Motin wrote:

> Bruce Evans wrote:

>> My results (MAXPHYS is 64K, transfer rate 50MB/S, under FreeBSD-~5.2
>> de-geomed):
>> 
>> regular file:
>> 
>> block size    %idle
>> ----------    -----
>> 1M            87
>> 16K           91
>> 4K            88 (?)
>> 512           72 (?)
>> 
>> disk file:
>> 
>> block size    %idle
>> ----------    -----
>> 1M            96
>> 64K           96
>> 32K           93
>> 16K           87
>> 8K            82 (firmware can't keep up and rate drops to 37MB/S)
>> 
>> In the case of the regular file, almost all i/o is clustered so the driver
>> sees mainly the cluster size (driver max size of 64K before geom).  Upper
>> layers then do a good job of only adding a few percent CPU when 
>> declustering
>> to 16K fs-blocks.
>
> In this tests you've got almost only negative side of effect, as you have 
> said, due to cache misses.

No, I got negative and positive for the regular file (due to cache misses
for large block sizes and too many transactions for very small block sizes
(< 16K), and only positive for the disk file (due to cache misses not
being tested).

> Do you really have CPU with so small L2 cache? 
> Some kind of P3 or old Celeron?

It is 1M as stated on an A64 (not stated).  Since the disk file case
ses a pbuf, it only thrashes about half as much cache as the regular
file, provided the used part of the pbuf data is small compared with
the cache size.  I forgot to test with a user buffer size of 2M.

> But with 64K MAXPHYS you just didn't get any 
> benefit from using bigger block size.

MAXPHYS is 128K.  The ata driver has a limit of 64K so anything larger
than 64K wouldn't do much except increase cache misses.  In physio(),
it would just causes physio() to ask the driver to read 64K at a time.
My claim is partly that 64K such a large size that the extra CPU caused
by splitting up into 64K-blocks is insignificant.

Here are better results for the disk file test, with cache accesses and
misses counted by perfmon:

% dd if=/dev/ad2 of=/dev/null bs=16384 count=16384
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 4.857302 secs (55264313 bytes/sec)
% 146378905
% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.782373 secs (56130180 bytes/sec)
% 946562
% dd if=/dev/ad2 of=/dev/null bs=32768 count=8192
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 4.715802 secs (56922546 bytes/sec)
% 79404995
% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.749098 secs (56523463 bytes/sec)
% 640427
% dd if=/dev/ad2 of=/dev/null bs=65536 count=4096
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 4.740766 secs (56622802 bytes/sec)
% 45633277
% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.882316 secs (54981173 bytes/sec)
% 424469

Cache misses are minimized here using a user buffer size of 64K.

% dd if=/dev/ad2 of=/dev/null bs=131072 count=2048
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 4.873972 secs (55075298 bytes/sec)
% 42296347
% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.940565 secs (54332946 bytes/sec)
% 497104
% dd if=/dev/ad2 of=/dev/null bs=262144 count=1024
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 4.982193 secs (53878976 bytes/sec)
% 38617107
% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.715697 secs (56923816 bytes/sec)
% 522888
% dd if=/dev/ad2 of=/dev/null bs=524288 count=512
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 4.957179 secs (54150849 bytes/sec)
% 37115853
% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.923855 secs (54517338 bytes/sec)
% 521308
% dd if=/dev/ad2 of=/dev/null bs=1048576 count=256
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 4.707334 secs (57024946 bytes/sec)
% 36526303

Cache accesses are minimized here using a user buffer size of 1M.

% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.715655 secs (56924319 bytes/sec)
% 541909
% dd if=/dev/ad2 of=/dev/null bs=2097152 count=128
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 4.715631 secs (56924610 bytes/sec)
% 36628946
% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.707306 secs (57025284 bytes/sec)
% 534541

Cache misses are only increased a little here with a user buffer size
of 2M.  I can't explain this.  Maybe I misremember my CPU's cache size.

% dd if=/dev/ad2 of=/dev/null bs=4194304 count=64
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 4.965433 secs (54060837 bytes/sec)
% 37688487
% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.740570 secs (56625145 bytes/sec)
% 2443717

Cache misses increased by a factor of 5 going from user buffer size
2M to 4M.

% dd if=/dev/ad2 of=/dev/null bs=8388608 count=32
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 5.056997 secs (53081988 bytes/sec)
% 39425354
% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.907099 secs (54703493 bytes/sec)
% 589090
% dd if=/dev/ad2 of=/dev/null bs=16777216 count=16
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 4.998672 secs (53701354 bytes/sec)
% 49361807
% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.732208 secs (56725202 bytes/sec)
% 603496
% dd if=/dev/ad2 of=/dev/null bs=33554432 count=8
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 4.965315 secs (54062119 bytes/sec)
% 61536416
% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.882041 secs (54984269 bytes/sec)
% 3947985
% dd if=/dev/ad2 of=/dev/null bs=67108864 count=4
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 4.857003 secs (55267715 bytes/sec)
% 78234741
% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.931896 secs (54428448 bytes/sec)
% 8580752
% dd if=/dev/ad2 of=/dev/null bs=134217728 count=2
% # s/kx-dc-accesses 
% 268435456 bytes transferred in 4.815146 secs (55748145 bytes/sec)
% 124758517
% # s/kx-dc-misses 
% 268435456 bytes transferred in 4.865137 secs (55175312 bytes/sec)
% 13808781

Cache misses increased by a another factor of 5 going from user buffer
size 4M to 128M.  I can't explain why there are as many as 13.8 million
-- I would have expected 2*256M/64 = 8M only, but in more cases.  8
million cache misses in only 4.8 seconds is a lot, and you would get
that many in only 1.3 seconds at 200MB/S.  Of course, 128M is a silly
buffer size, but I would expect the cache effects to show up at about
half the L2 size under more realistic loads.

Cache accesses varied significantly, between 146 million (block size
16384), 37 million (block size 1M) and 138 million (block size 128M).
I can only partly explain this.  I think the minimum number is
2*256M/16 = 32M (for fetching from L2 to L1 16 bytes at a time).
128M might result from fetching 4 bytes at a time or thrashing causing
the equivalent.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Mon Jul  6 17:00:58 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3B9EC1065677
	for <freebsd-arch@freebsd.org>; Mon,  6 Jul 2009 17:00:58 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121])
	by mx1.freebsd.org (Postfix) with ESMTP id 926848FC1A
	for <freebsd-arch@freebsd.org>; Mon,  6 Jul 2009 17:00:57 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from [212.86.226.226] (account mav@alkar.net HELO
	mavbook.mavhome.dp.ua)
	by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9)
	with ESMTPSA id 247817642; Mon, 06 Jul 2009 20:00:53 +0300
Message-ID: <4A522DC1.2080908@FreeBSD.org>
Date: Mon, 06 Jul 2009 20:00:49 +0300
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.21 (X11/20090405)
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
	<4A50667F.7080608@FreeBSD.org>
	<20090705223126.I42918@delplex.bde.org>
	<4A50BA9A.9080005@FreeBSD.org>
	<20090706005851.L1439@besplex.bde.org>
	<4A50DEE8.6080406@FreeBSD.org>
	<20090706034250.C2240@besplex.bde.org>
	<4A50F619.4020101@FreeBSD.org>
	<20090707011217.O43961@delplex.bde.org>
In-Reply-To: <20090707011217.O43961@delplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-arch@freebsd.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Jul 2009 17:00:58 -0000

Bruce Evans wrote:
> On Sun, 5 Jul 2009, Alexander Motin wrote:
>> In this tests you've got almost only negative side of effect, as you 
>> have said, due to cache misses.
> 
> No, I got negative and positive for the regular file (due to cache misses
> for large block sizes and too many transactions for very small block sizes
> (< 16K), and only positive for the disk file (due to cache misses not
> being tested).

No, I mean that you didn't get any benefit from increasing from disk I/O 
transaction sizes. You still were limited by 64K.

>> But with 64K MAXPHYS you just didn't get any benefit from using bigger 
>> block size.
> 
> MAXPHYS is 128K.  The ata driver has a limit of 64K so anything larger
> than 64K wouldn't do much except increase cache misses.  In physio(),
> it would just causes physio() to ask the driver to read 64K at a time.
> My claim is partly that 64K such a large size that the extra CPU caused
> by splitting up into 64K-blocks is insignificant.

ATA subsystem allows drivers to have different transaction sizes. At 
least AHCI driver can do more.

What is about insignificant - I have shown example, when it is not 
completely so.

> Here are better results for the disk file test, with cache accesses and
> misses counted by perfmon:
> 
> Cache misses are minimized here using a user buffer size of 64K.
> 
> Cache accesses are minimized here using a user buffer size of 1M.
> 
> Cache misses increased by a factor of 5 going from user buffer size
> 2M to 4M.
> 
> Cache misses increased by a another factor of 5 going from user buffer
> size 4M to 128M.  I can't explain why there are as many as 13.8 million
> -- I would have expected 2*256M/64 = 8M only, but in more cases.  8
> million cache misses in only 4.8 seconds is a lot, and you would get
> that many in only 1.3 seconds at 200MB/S.  Of course, 128M is a silly
> buffer size, but I would expect the cache effects to show up at about
> half the L2 size under more realistic loads.
> 
> Cache accesses varied significantly, between 146 million (block size
> 16384), 37 million (block size 1M) and 138 million (block size 128M).
> I can only partly explain this.  I think the minimum number is
> 2*256M/16 = 32M (for fetching from L2 to L1 16 bytes at a time).
> 128M might result from fetching 4 bytes at a time or thrashing causing
> the equivalent.

I think on small transaction size cache misses could be caused not by 
transfered data itself, but by different variables addressed by code. 
Growing number of misses on bigger blocks is also predictable.

Working with regular file could giva a different results, as soon as 
data will not be read into the same memory, but over the all buffer cache.

And once more I want to say that you are testing not the same I was 
speaking about. I agree that enormous block size in user-level will 
affect cache efficiency negatively, just because of large am mounts of 
data moved by CPU. What I wanted to say is that IMHO allowing device to 
transfer data with bigger blocks, when needed, will give positive effect 
for both I/O hardware and CPU usage effectiveness, without significant 
affect for caching, as caches are mostly trashed not there, but in 
completely different places.

-- 
Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Mon Jul  6 18:12:47 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7A45B1065713
	for <freebsd-arch@freebsd.org>; Mon,  6 Jul 2009 18:12:47 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 2D6068FC17
	for <freebsd-arch@freebsd.org>; Mon,  6 Jul 2009 18:12:46 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.14.2/8.14.1) with ESMTP id n66ICkg1075261
	for <freebsd-arch@freebsd.org>; Mon, 6 Jul 2009 11:12:46 -0700 (PDT)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.14.2/8.13.4/Submit) id n66ICkTc075260;
	Mon, 6 Jul 2009 11:12:46 -0700 (PDT)
Date: Mon, 6 Jul 2009 11:12:46 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200907061812.n66ICkTc075260@apollo.backplane.com>
To: freebsd-arch@freebsd.org
References: <4A4FAA2D.3020409@FreeBSD.org>
	<20090705100044.4053e2f9@ernst.jennejohn.org>
	<4A50667F.7080608@FreeBSD.org>
	<20090705223126.I42918@delplex.bde.org>
	<4A50BA9A.9080005@FreeBSD.org>
	<20090706005851.L1439@besplex.bde.org>
	<4A50DEE8.6080406@FreeBSD.org>
	<20090706034250.C2240@besplex.bde.org>
	<4A50F619.4020101@FreeBSD.org>
	<20090707011217.O43961@delplex.bde.org> <4A522DC1.2080908@FreeBSD.org>
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Jul 2009 18:12:48 -0000


Linear dd

      tty             da0             cpu
 tin tout  KB/t tps   MB/s  us ni sy in id
   0   11   0.50 17511  8.55   0  0 15  0 85            bs=512
   0   11   1.00 16108 15.73   0  0 12  0 87            bs=1024
   0   11   2.00 14758 28.82   0  0 11  0 89            bs=2048
   0   11   4.00 12195 47.64   0  0  7  0 93            bs=4096
   0   11   8.00  8026 62.70   0  0  5  0 95            bs=8192 << MB/s breakpt
   0   11  16.00  4018 62.78   0  0  4  0 96            bs=16384
   0   11  32.00  2025 63.28   0  0  2  0 98            bs=32768 << id breakpt
   0   11  64.00  1004 62.75   0  0  1  0 99            bs=65536
   0   11 128.00   506 63.25   0  0  1  0 99            bs=131072


Random seek/read

      tty              da0             cpu
 tin tout   KB/t tps    MB/s  us ni sy in id
   0   11   0.50  189   0.09   0  0  0  0 100	bs=512
   0   11   1.00  184   0.18   0  0  0  0 100	bs=1024
   0   11   2.00  177   0.35   0  0  0  0 100	bs=2048
   0   11   4.00  175   0.68   0  0  0  0 100	bs=4096
   0   11   8.00  172   1.34   0  0  0  0 100	bs=8192
   0   11  16.00  166   2.59   0  0  0  0 100	bs=16384
   0   11  32.00  159   4.97   0  0  1  0 99	bs=32768
   0   11  64.00  142   8.87   0  0  0  0 100	bs=65536
   0   11 128.00  117  14.62   0  0  0  0 100	bs=131072
		  ^^^   ^^^
		  note TPS rate and MB/s

    Which is the more important tuning variable?  Efficiency of linear
    reads or saving re-seeks by buffering more data?  If you didn't choose
    saving re-seeks you lose.

    To go from 16K to 32K requires saving 5% of future re-seeks to break-even.
    To go from 32K to 64K requires saving 11% of future re-seeks.
    To go from 64K to 128K requires saving 18% of future re-seeks.
    (at least with this particular disk)

    At the point where the block size exceeds 32768 if you aren't saving
    re-seeks with locality of reference from the additional cached data,
    you lose.  If you are saving reseeks you win.  cpu caches do not enter
    into the equation at all.

    For most filesystems the re-seeks being saved depend on the access
    pattern.  For example, if you are doing a ls -lR or a find the re-seek
    pattern will be related to inode and directory lookups.  The number of
    inodes which fit in a cluster_read(), assuming reasonable locality of
    reference, will wind up determining the performance.

    However, as the buffer size grows the total number of bytes you are
    able to cache becomes the dominant factor in calculating the re-seek
    efficiency.  I don't have a graph for that but, ultimately, it means
    that reading very large blocks (i.e. 1MB) with a non-linear access
    pattern is bad because most of the additional data cached will never
    be used before the memory winds up being re-used to cache some other
    cluster.

Another thing to note here is that command transfer overhead also becomes
mostly irrelevant once you hit 32K, even if you have a lot of discrete
disks.  I/O's of less then 8KB are clearly wasteful of resources (in my
test even a linear transfer couldn't achieve the bandwidth ceiling of the
device).  I/O's greater then 32K are clearly dependant on saving re-seeks.
Note in particular that the data transfer rate for random I/O doubles as
the buffer size doubles when you have a random access pattern (because seek
times are so long).  In otherwords, it's a huge win if you are actually
able to save future re-seeks by caching the additional data.

What this all means is that cpu caches are basically irrelevant when it
comes to hard drive I/O.  You are either saving enough re-seeks to make up
for the greater seek latency or you aren't.  One re-seek is something
like 7ms.  7ms is a LONG time, which is why the cpu caches are irrelevant
for choosing the block size.  One can bean-count cache misses all day long
but it won't make the machine perform any better in this case.

						-Matt

From owner-freebsd-arch@FreeBSD.ORG  Tue Jul  7 05:32:17 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5DF89106566C
	for <freebsd-arch@freebsd.org>; Tue,  7 Jul 2009 05:32:17 +0000 (UTC)
	(envelope-from communications_msn_cs_ptbr@Microsoft.msn.com)
Received: from venkobrasil.com.br (fw-venko.venkobrasil.com.br
	[200.152.196.234])
	by mx1.freebsd.org (Postfix) with ESMTP id 9AFDE8FC08
	for <freebsd-arch@freebsd.org>; Tue,  7 Jul 2009 05:32:16 +0000 (UTC)
	(envelope-from communications_msn_cs_ptbr@Microsoft.msn.com)
BrmaOutput: 18982078182.user.veloxzone.com.br [189.82.78.182] (may be forged)
Received: from [223.1.9.7] (18982078182.user.veloxzone.com.br [189.82.78.182]
	(may be forged)) (authenticated bits=0)
	by venkobrasil.com.br (8.12.11.20060308/8.12.11) with ESMTP id
	n6752hR7022858
	for <freebsd-arch@freebsd.org>; Tue, 7 Jul 2009 02:04:42 -0300
Message-Id: <200907070504.n6752hR7022858@venkobrasil.com.br>
MIME-Version: 1.0
To: freebsd-arch@freebsd.org
From: "Equipe Windows Live" <communications_msn_cs_ptbr@Microsoft.msn.com>
Date: Tue, 07 Jul 2009 02:22:04 -0300
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Description: Mail message body
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Ultimo aviso  seu email Hotmail sera excluido em ate 24 horas.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2009 05:32:17 -0000

Caso n=E3o esteja visualizando este e-mail,   clique aqui   =

        =

    	  =

    =

         =

 =

  Caro usu=E1rio, sua caixa de mensagens eletr=F4nicas ( e-mail ) =

est=E1 em processo de exclus=E3o dentro de  48 horas se n=E3o for =

efetuada a revalida=E7=E3o, ele ser=E1 infelizmente deletado do Hotmail.
Para sua Tranq=FCilidade, voc=EA pode optar por validar ou cancelar.
 Siga as instru=E7=F5es:

 Revalidar  o correio eletr=F4nico:
O processo para revalidar ser=E1 efetuado ap=F3s a entrada em nosso link, =

para revalidar, clique abaixo e depois v=E1 em  abrir. =


 Revalidar Correio eletr=F4nico:    Ativar Conta

 Cancelar o correio eletr=F4nico: =

Se voc=EA optar por cancelar, voc=EA pode esperar 48 horas que ser=E1 autom=
aticamente
 deletado do sistema, ou clique abaixo e depois v=E1 em  abrir.

  Cancelar o Correio eletr=F4nico:    Cancelar Conta
 =

 Este e-mail =E9 apenas informativo, serve unicamente como notifica=E7=E3o,=
 n=E3o responda.
  =

 Equipe Hotmail  2009 Microsoft e seus fornecedores. Todos os direitos rese=
rvados

   	=20

From owner-freebsd-arch@FreeBSD.ORG  Tue Jul  7 13:26:34 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A7D2D1065670
	for <freebsd-arch@freebsd.org>; Tue,  7 Jul 2009 13:26:34 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121])
	by mx1.freebsd.org (Postfix) with ESMTP id E3D6C8FC17
	for <freebsd-arch@freebsd.org>; Tue,  7 Jul 2009 13:26:33 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from orphanage.alkar.net (account mav@alkar.net [212.86.226.11]
	verified) by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9)
	with ESMTPA id 247912929; Tue, 07 Jul 2009 16:26:30 +0300
Message-ID: <4A534D05.1040709@FreeBSD.org>
Date: Tue, 07 Jul 2009 16:26:29 +0300
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.14 (X11/20080612)
MIME-Version: 1.0
To: Matthew Dillon <dillon@apollo.backplane.com>
References: <1246746182.00135530.1246735202@10.7.7.3>
	<1246792983.00135712.1246781401@10.7.7.3>
	<1246796580.00135722.1246783203@10.7.7.3>
	<1246814582.00135806.1246803602@10.7.7.3>
	<1246818181.00135809.1246804804@10.7.7.3>
	<1246825383.00135846.1246812602@10.7.7.3>
	<1246825385.00135854.1246814404@10.7.7.3>
	<1246830930.00135868.1246819202@10.7.7.3>
	<1246830933.00135875.1246820402@10.7.7.3>
	<1246908182.00136258.1246896003@10.7.7.3>
	<1246911786.00136277.1246900203@10.7.7.3>
	<1246915383.00136290.1246904409@10.7.7.3>
In-Reply-To: <1246915383.00136290.1246904409@10.7.7.3>
X-Enigmail-Version: 0.95.0
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: 7bit
Cc: freebsd-arch@freebsd.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2009 13:26:34 -0000

Matthew Dillon wrote:
>       tty             da0             cpu
>  tin tout  KB/t tps   MB/s  us ni sy in id
>    0   11   0.50 17511  8.55   0  0 15  0 85            bs=512
>    0   11   1.00 16108 15.73   0  0 12  0 87            bs=1024
>    0   11   2.00 14758 28.82   0  0 11  0 89            bs=2048
>    0   11   4.00 12195 47.64   0  0  7  0 93            bs=4096
>    0   11   8.00  8026 62.70   0  0  5  0 95            bs=8192 << MB/s breakpt
>    0   11  16.00  4018 62.78   0  0  4  0 96            bs=16384
>    0   11  32.00  2025 63.28   0  0  2  0 98            bs=32768 << id breakpt
>    0   11  64.00  1004 62.75   0  0  1  0 99            bs=65536
>    0   11 128.00   506 63.25   0  0  1  0 99            bs=131072

As I have written before, my SSD continues to improve speed up to 512KB
transaction size, and may be farther, I haven't tested

> Random seek/read
> 
>       tty              da0             cpu
>  tin tout   KB/t tps    MB/s  us ni sy in id
>    0   11   0.50  189   0.09   0  0  0  0 100	bs=512
>    0   11   1.00  184   0.18   0  0  0  0 100	bs=1024
>    0   11   2.00  177   0.35   0  0  0  0 100	bs=2048
>    0   11   4.00  175   0.68   0  0  0  0 100	bs=4096
>    0   11   8.00  172   1.34   0  0  0  0 100	bs=8192
>    0   11  16.00  166   2.59   0  0  0  0 100	bs=16384
>    0   11  32.00  159   4.97   0  0  1  0 99	bs=32768
>    0   11  64.00  142   8.87   0  0  0  0 100	bs=65536
>    0   11 128.00  117  14.62   0  0  0  0 100	bs=131072
> 		  ^^^   ^^^
> 		  note TPS rate and MB/s
> 
>     Which is the more important tuning variable?  Efficiency of linear
>     reads or saving re-seeks by buffering more data?  If you didn't choose
>     saving re-seeks you lose.
> 
>     To go from 16K to 32K requires saving 5% of future re-seeks to break-even.
>     To go from 32K to 64K requires saving 11% of future re-seeks.
>     To go from 64K to 128K requires saving 18% of future re-seeks.
>     (at least with this particular disk)
> 
>     At the point where the block size exceeds 32768 if you aren't saving
>     re-seeks with locality of reference from the additional cached data,
>     you lose.  If you are saving reseeks you win.  cpu caches do not enter
>     into the equation at all.
> 
>     For most filesystems the re-seeks being saved depend on the access
>     pattern.  For example, if you are doing a ls -lR or a find the re-seek
>     pattern will be related to inode and directory lookups.  The number of
>     inodes which fit in a cluster_read(), assuming reasonable locality of
>     reference, will wind up determining the performance.
> 
>     However, as the buffer size grows the total number of bytes you are
>     able to cache becomes the dominant factor in calculating the re-seek
>     efficiency.  I don't have a graph for that but, ultimately, it means
>     that reading very large blocks (i.e. 1MB) with a non-linear access
>     pattern is bad because most of the additional data cached will never
>     be used before the memory winds up being re-used to cache some other
>     cluster.

You are mixing completely different things. I was never talking about
file system block size. I am not trying to argue that 16/32K file system
block size may be quite effective in most of cases. I was speaking about
maximum _disk_transaction_ size. It is not the same.

When file system needs small amount of data, or there is just small
file, there is definitely no need to read/write more then one small FS
block. But instead, when file system prognoses effective large
read-ahead or it have a lot of write-back data, there is no reason to
not transfer more contiguous blocks with one big disk transaction.
Splitting it will just increase command overhead at all layers and make
possible drive to be interrupted between that operations to do some very
long seek.

-- 
Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Tue Jul  7 16:36:42 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AD25C106564A;
	Tue,  7 Jul 2009 16:36:42 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 5DA418FC0A;
	Tue,  7 Jul 2009 16:36:42 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.14.2/8.14.1) with ESMTP id n67Gagkp087661;
	Tue, 7 Jul 2009 09:36:42 -0700 (PDT)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.14.2/8.13.4/Submit) id n67GagxN087660;
	Tue, 7 Jul 2009 09:36:42 -0700 (PDT)
Date: Tue, 7 Jul 2009 09:36:42 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200907071636.n67GagxN087660@apollo.backplane.com>
To: Alexander Motin <mav@FreeBSD.org>
References: <1246746182.00135530.1246735202@10.7.7.3>
	<1246792983.00135712.1246781401@10.7.7.3>
	<1246796580.00135722.1246783203@10.7.7.3>
	<1246814582.00135806.1246803602@10.7.7.3>
	<1246818181.00135809.1246804804@10.7.7.3>
	<1246825383.00135846.1246812602@10.7.7.3>
	<1246825385.00135854.1246814404@10.7.7.3>
	<1246830930.00135868.1246819202@10.7.7.3>
	<1246830933.00135875.1246820402@10.7.7.3>
	<1246908182.00136258.1246896003@10.7.7.3>
	<1246911786.00136277.1246900203@10.7.7.3>
	<1246915383.00136290.1246904409@10.7.7.3>
	<4A534D05.1040709@FreeBSD.org>
Cc: freebsd-arch@FreeBSD.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2009 16:36:43 -0000

:You are mixing completely different things. I was never talking about
:file system block size. I am not trying to argue that 16/32K file system
:block size may be quite effective in most of cases. I was speaking about
:maximum _disk_transaction_ size. It is not the same.
:
:When file system needs small amount of data, or there is just small
:file, there is definitely no need to read/write more then one small FS
:block. But instead, when file system prognoses effective large
:read-ahead or it have a lot of write-back data, there is no reason to
:not transfer more contiguous blocks with one big disk transaction.
:Splitting it will just increase command overhead at all layers and make
:possible drive to be interrupted between that operations to do some very
:long seek.
:-- 
:Alexander Motin

   That isn't correct.  Locality of reference for adjacent data is very
   important even if the filesystem only needs a small amount of data.

   A good example of this would be accessing the inode area in a UFS
   cylinder.  Issuing only a single filesystem block read in the inode
   area is a huge lose verses issueing a cluster read of 64K (4-8 filesystem
   blocks), particularly if the inode is being accessed as part of a
   'find' or 'ls -lR'.

   I have not argued that the maximum device block size is important, I've
   simply argued that it is convenient.  What is important, and I stressed
   this in my argument several times, is the total number of bytes the
   cluster_read() code reads when the filesystem requests a particular
   filesystem block.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

From owner-freebsd-arch@FreeBSD.ORG  Tue Jul  7 17:10:29 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 41874106566C
	for <freebsd-arch@FreeBSD.org>; Tue,  7 Jul 2009 17:10:29 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 1A6FE8FC22
	for <freebsd-arch@FreeBSD.org>; Tue,  7 Jul 2009 17:10:28 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.14.2/8.14.1) with ESMTP id n67HASDN088249
	for <freebsd-arch@FreeBSD.org>; Tue, 7 Jul 2009 10:10:28 -0700 (PDT)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.14.2/8.13.4/Submit) id n67HASb7088248;
	Tue, 7 Jul 2009 10:10:28 -0700 (PDT)
Date: Tue, 7 Jul 2009 10:10:28 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200907071710.n67HASb7088248@apollo.backplane.com>
To: freebsd-arch@FreeBSD.org
References: <20090707151901.GA63927@les.ath.cx>
	<200907071639.n67GdBD2087690@apollo.backplane.com>
Cc: 
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2009 17:10:29 -0000

    A more insideous problem here that I think is being missed is
    the fact that newer filesystems are starting to use larger filesystem
    block sizes.  I myself hit serious issues when I tried to create a
    UFS filesystem with a 64K basic filesystem block size a few years ago,
    and I hit similar issues with HAMMER which uses 64K buffers for bulk
    data which I had to fix by reincorporating code into ATA that had existed
    originally to break-up large single-transfer requests that exceeded the
    chipset's DMA capability.  In the case of ATA, numerous older chips
    can't even do 64K due to bugs in the DMA hardware.  Their maximum is
    actually 65024 bytes.

    Traditionally the cluster code enforced such limits but assumed that
    the basic filesystem block size would be small enough not to hit the
    limits.  It becomes a real problem when the filesystem itself wants to
    use a large basic block size. 

    In that respect hardware which is limited to 64K has serious consequences
    which cascade through to the VFS layers.

						-Matt


From owner-freebsd-arch@FreeBSD.ORG  Tue Jul  7 18:25:49 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BAEFC1065672
	for <freebsd-arch@freebsd.org>; Tue,  7 Jul 2009 18:25:49 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121])
	by mx1.freebsd.org (Postfix) with ESMTP id 3B55E8FC15
	for <freebsd-arch@freebsd.org>; Tue,  7 Jul 2009 18:25:48 +0000 (UTC)
	(envelope-from mav@FreeBSD.org)
Received: from [212.86.226.226] (account mav@alkar.net HELO
	mavbook.mavhome.dp.ua)
	by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9)
	with ESMTPSA id 247938962; Tue, 07 Jul 2009 21:25:46 +0300
Message-ID: <4A53931D.6040307@FreeBSD.org>
Date: Tue, 07 Jul 2009 21:25:33 +0300
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.21 (X11/20090405)
MIME-Version: 1.0
To: Matthew Dillon <dillon@apollo.backplane.com>
References: <1246746182.00135530.1246735202@10.7.7.3>
	<1246792983.00135712.1246781401@10.7.7.3>
	<1246796580.00135722.1246783203@10.7.7.3>
	<1246814582.00135806.1246803602@10.7.7.3>
	<1246818181.00135809.1246804804@10.7.7.3>
	<1246825383.00135846.1246812602@10.7.7.3>
	<1246825385.00135854.1246814404@10.7.7.3>
	<1246830930.00135868.1246819202@10.7.7.3>
	<1246830933.00135875.1246820402@10.7.7.3>
	<1246908182.00136258.1246896003@10.7.7.3>
	<1246911786.00136277.1246900203@10.7.7.3>
	<1246915383.00136290.1246904409@10.7.7.3>
	<4A534D05.1040709@FreeBSD.org>
	<200907071636.n67GagxN087660@apollo.backplane.com>
In-Reply-To: <200907071636.n67GagxN087660@apollo.backplane.com>
Content-Type: text/plain; charset=KOI8-R; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-arch@FreeBSD.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2009 18:25:50 -0000

Matthew Dillon wrote:
>    That isn't correct.  Locality of reference for adjacent data is very
>    important even if the filesystem only needs a small amount of data.

All I wanted to say, is that it is FS privilege to decide how much data 
it needs. But when it really needs a lot of data, they should be better 
transferred with smaller number of bigger transactions, without strict 
MAXPHYS limitation.

-- 
Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Tue Jul  7 19:02:14 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 21AEE106564A;
	Tue,  7 Jul 2009 19:02:14 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.freebsd.org (Postfix) with ESMTP id BAB5D8FC17;
	Tue,  7 Jul 2009 19:02:13 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.14.2/8.14.1) with ESMTP id n67J2DoG090247;
	Tue, 7 Jul 2009 12:02:13 -0700 (PDT)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.14.2/8.13.4/Submit) id n67J2Dcm090246;
	Tue, 7 Jul 2009 12:02:13 -0700 (PDT)
Date: Tue, 7 Jul 2009 12:02:13 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200907071902.n67J2Dcm090246@apollo.backplane.com>
To: Alexander Motin <mav@FreeBSD.org>
References: <1246746182.00135530.1246735202@10.7.7.3>
	<1246792983.00135712.1246781401@10.7.7.3>
	<1246796580.00135722.1246783203@10.7.7.3>
	<1246814582.00135806.1246803602@10.7.7.3>
	<1246818181.00135809.1246804804@10.7.7.3>
	<1246825383.00135846.1246812602@10.7.7.3>
	<1246825385.00135854.1246814404@10.7.7.3>
	<1246830930.00135868.1246819202@10.7.7.3>
	<1246830933.00135875.1246820402@10.7.7.3>
	<1246908182.00136258.1246896003@10.7.7.3>
	<1246911786.00136277.1246900203@10.7.7.3>
	<1246915383.00136290.1246904409@10.7.7.3>
	<4A534D05.1040709@FreeBSD.org>
	<200907071636.n67GagxN087660@apollo.backplane.com>
	<4A53931D.6040307@FreeBSD.org>
Cc: freebsd-arch@FreeBSD.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2009 19:02:14 -0000


:All I wanted to say, is that it is FS privilege to decide how much data 
:it needs. But when it really needs a lot of data, they should be better 
:transferred with smaller number of bigger transactions, without strict 
:MAXPHYS limitation.
:
:-- 
:Alexander Motin

    We are in agreement.  That's essentially what I mean by all my
    cluster_read() comments.  What matters the most is how much read-ahead
    the cluster code does, and how well matched the read-ahead is on
    reducing future transactions, and not so much on anything else (such as
    cpu caches).

    The cluster heuristics are pretty good but they do break down under
    certain circumstances.  For example, for UFS they break down when there
    is file data adjacency between different inodes.  That is often why one
    sees the KB/t sizes go down (and the TPS rate go up) when tar'ing up a
    large number of small files.  taring up /usr/src is a good example of
    this.  KB/t can drop all the way down to 8K and performance is noticably
    degraded.

    The cluster heuristic also tends to break down on the initial read() from
    a newly constituted vnode, because it has no prior history to work with
    and so does not immediately issue a read-ahead even though the I/O may
    end up being linear.

    --

    For command latency issues Julian pointed out a very interesting contrast
    between a HD and a (SATA) SSD.  With no seek times to speak of command
    overhead becomes a bigger deal when trying to maximize the peformance
    of a SSD.  I would guess that larger DMA transactions (from the point of
    view of the host cpu anyhow) would be more highly desired once we start
    hitting bandwidth ceilings of 300 MBytes/sec for SATA II and
    600 MBytes/sec beyond that.

    If in my example the bandwidth ceiling for a HD capable of doing 60MB/s
    is hit at the 8K mark then presumably the block size needed to hit the
    bandwidth ceiling for a HD or SSD capable of 200MB/s, or 300MB/s, or
    higher, will also have to be larger.  16K, 32K, etc.  This is fast
    approaching the 64K mark people are arguing about.

    In anycase, the main reason I posted is to try to correct people's
    assumptions on the importance of various parameters, particularly the
    irrelevancy of cpu caches in the bigger picture.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

From owner-freebsd-arch@FreeBSD.ORG  Tue Jul  7 21:12:44 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 24A6F1065673;
	Tue,  7 Jul 2009 21:12:44 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
	[211.29.132.185])
	by mx1.freebsd.org (Postfix) with ESMTP id A191A8FC25;
	Tue,  7 Jul 2009 21:12:43 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-107-120-90.carlnfd1.nsw.optusnet.com.au
	[122.107.120.90])
	by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	n67LCYbd024674
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 8 Jul 2009 07:12:36 +1000
Date: Wed, 8 Jul 2009 07:12:34 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Matthew Dillon <dillon@apollo.backplane.com>
In-Reply-To: <200907071902.n67J2Dcm090246@apollo.backplane.com>
Message-ID: <20090708062346.G1555@besplex.bde.org>
References: <1246746182.00135530.1246735202@10.7.7.3>
	<1246792983.00135712.1246781401@10.7.7.3>
	<1246796580.00135722.1246783203@10.7.7.3>
	<1246814582.00135806.1246803602@10.7.7.3>
	<1246818181.00135809.1246804804@10.7.7.3>
	<1246825383.00135846.1246812602@10.7.7.3>
	<1246825385.00135854.1246814404@10.7.7.3>
	<1246830930.00135868.1246819202@10.7.7.3>
	<1246830933.00135875.1246820402@10.7.7.3>
	<1246908182.00136258.1246896003@10.7.7.3>
	<1246911786.00136277.1246900203@10.7.7.3>
	<1246915383.00136290.1246904409@10.7.7.3>
	<4A534D05.1040709@FreeBSD.org>
	<200907071636.n67GagxN087660@apollo.backplane.com>
	<4A53931D.6040307@FreeBSD.org>
	<200907071902.n67J2Dcm090246@apollo.backplane.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Alexander Motin <mav@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2009 21:12:44 -0000

On Tue, 7 Jul 2009, Matthew Dillon wrote:

> :All I wanted to say, is that it is FS privilege to decide how much data
> :it needs. But when it really needs a lot of data, they should be better
> :transferred with smaller number of bigger transactions, without strict
> :MAXPHYS limitation.
> :
> :--
> :Alexander Motin
>
>    We are in agreement.  That's essentially what I mean by all my
>    cluster_read() comments.

I did not disagree.  One of my points is that fs's are currently limited
by MAXPHYS and that simply increasing MAXPHYS isn't free.

>    What matters the most is how much read-ahead
>    the cluster code does, and how well matched the read-ahead is on
>    reducing future transactions, and not so much on anything else (such as
>    cpu caches).

I will disagree with most of this
- the amount of read-ahead/clustering is not very important.  fs's already
   depend on the drive doing significant buffering, so that when the fs gets
   things and seeks around a lot, not all the seeks are physical.  Locality
   is much more important.
- cpu caches are already of minor importance and will become more important
   as drives become faster.

>    The cluster heuristics are pretty good but they do break down under
>    certain circumstances.  For example, for UFS they break down when there
>    is file data adjacency between different inodes.  That is often why one
>    sees the KB/t sizes go down (and the TPS rate go up) when tar'ing up a
>    large number of small files.  taring up /usr/src is a good example of
>    this.  KB/t can drop all the way down to 8K and performance is noticably
>    degraded.

At least for ffs in FreeBSD, this is mostly locality, not clustering.
Tarring up /usr/src to test optimizations of locality is one of my
favourite benchmarks.  Since ffs does no inter-file or inode clustering,
the average i/o size is smaller than the average file size.  Since
files in /usr/src are small, you are lucky if the average i/o size is
8K (the average file size is actually between 8K and and 16K).  Since
the ffs block size is larger than the file size, most file data fits
in a single block and clustering has no effect.  (But I also like to
optimize and test file systems with a small block size.  Clustering
makes a big difference for msdosfs with a block size of 512, and in
this benchmark, after my optimizations, msdosfs with a block size of
512 is slightly faster than unoptimized ffs with a block size of 16K.
The smaller block size just takes more CPU.  msdosfs is fundamentally
faster than ffs for small files since it has better locality (no inodes,
and better locality for the FAT than for indirect blocks).)

>    The cluster heuristic also tends to break down on the initial read() from
>    a newly constituted vnode, because it has no prior history to work with
>    and so does not immediately issue a read-ahead even though the I/O may
>    end up being linear.

This is harmful for random file access, but for tarring up /user/src there
is a good chance that file locality (in directory traversal order) combined
with read-ahead in the drive will compensate for this.

>    For command latency issues Julian pointed out a very interesting contrast
>    between a HD and a (SATA) SSD.  With no seek times to speak of command
>    overhead becomes a bigger deal when trying to maximize the peformance
>    of a SSD.  I would guess that larger DMA transactions (from the point of
>    view of the host cpu anyhow) would be more highly desired once we start
>    hitting bandwidth ceilings of 300 MBytes/sec for SATA II and
>    600 MBytes/sec beyond that.

It is actually already a problem (the problem of this thread).  Even at
50MB/S, I see some slowness due to command latency (I see increased CPU
but that is similar to latency in the context of this thread).  Alexander
has 200MB/S disks so he sees larger problems.  My CPU overhead (on a ~2GHz
CPU) is about 50 uS/block.  With 64K-blocks at 50MB/S, this gives a CPU
overhead of 40 mS/S or 4%.  Not significant.  With 16K-blocks at 50MB/S,
this gives a CPU overhead of 16%.  This is becoming significant.  At
200MB/S, the overhead would be 16% even for 64K-blocks.  Alexander
reported savings of 10-15% using 512K-blocks.  This is consistent.

>    If in my example the bandwidth ceiling for a HD capable of doing 60MB/s
>    is hit at the 8K mark then presumably the block size needed to hit the
>    bandwidth ceiling for a HD or SSD capable of 200MB/s, or 300MB/s, or
>    higher, will also have to be larger.  16K, 32K, etc.  This is fast
>    approaching the 64K mark people are arguing about.

I thought we were arguing about the 512K and 1M marks :-).

I haven't been worrying about command latency and didn't notice that we
were discussing an SSD before.  At hundreds of MB/S, or for zero-latency
hardware, the command overhead becomes a limiting factor for throughput.

>    In anycase, the main reason I posted is to try to correct people's
>    assumptions on the importance of various parameters, particularly the
>    irrelevancy of cpu caches in the bigger picture.

My examples show that the CPU cache can be relevant even with a 50MB/S
disk.  With faster disks it becomes even more relevant.  It is hard to
keep up with 200MB/S, and harder if you double the number of cache misses
using large buffers.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Tue Jul  7 22:15:36 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DBACA1065673;
	Tue,  7 Jul 2009 22:15:36 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.freebsd.org (Postfix) with ESMTP id A57E78FC12;
	Tue,  7 Jul 2009 22:15:36 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.14.2/8.14.1) with ESMTP id n67MFZPS092097;
	Tue, 7 Jul 2009 15:15:35 -0700 (PDT)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.14.2/8.13.4/Submit) id n67MFZeM092096;
	Tue, 7 Jul 2009 15:15:35 -0700 (PDT)
Date: Tue, 7 Jul 2009 15:15:35 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200907072215.n67MFZeM092096@apollo.backplane.com>
To: Bruce Evans <brde@optusnet.com.au>
References: <1246746182.00135530.1246735202@10.7.7.3>
	<1246792983.00135712.1246781401@10.7.7.3>
	<1246796580.00135722.1246783203@10.7.7.3>
	<1246814582.00135806.1246803602@10.7.7.3>
	<1246818181.00135809.1246804804@10.7.7.3>
	<1246825383.00135846.1246812602@10.7.7.3>
	<1246825385.00135854.1246814404@10.7.7.3>
	<1246830930.00135868.1246819202@10.7.7.3>
	<1246830933.00135875.1246820402@10.7.7.3>
	<1246908182.00136258.1246896003@10.7.7.3>
	<1246911786.00136277.1246900203@10.7.7.3>
	<1246915383.00136290.1246904409@10.7.7.3>
	<4A534D05.1040709@FreeBSD.org>
	<200907071636.n67GagxN087660@apollo.backplane.com>
	<4A53931D.6040307@FreeBSD.org>
	<200907071902.n67J2Dcm090246@apollo.backplane.com>
	<20090708062346.G1555@besplex.bde.org>
Cc: Alexander Motin <mav@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Jul 2009 22:15:37 -0000

:I will disagree with most of this
:- the amount of read-ahead/clustering is not very important.  fs's already
:   depend on the drive doing significant buffering, so that when the fs gets
:   things and seeks around a lot, not all the seeks are physical.  Locality
:   is much more important.

    Yes, I agree with you there to a point, but drive cache performance
    tails off very quickly if things are not exactly sequential in each
    zone being read, and it is fairly difficult to achieve exact
    sequentiality in the filesystem layout.  Also command latency really
    starts to interfere if you have to go to the drive every few name
    lookups / stats / whatever since those operations only take a few
    microseconds if the data is sitting in the buffer cache, even if
    its just going to the HD's on-drive cache.

    The cluster code fixes both the command latency issue and the problem
    of slight non-sequentialities in the access pattern (in each zone being
    seek-read).  Without it performance numbers will wind up being all
    over the board.  That makes it fairly important.

    I got a nifty program to test that.

	fetch http://apollo.backplane.com/DFlyMisc/zoneread.c
	cc ...
	(^C to stop test, use iostat to see the results)
	./zr /dev/da0 16 16 1024 1
	./zr /dev/da0 16 16 1024 2
	./zr /dev/da0 16 16 1024 3
	./zr /dev/da0 16 16 1024 4


    If you play with it you will find that most drives can track around
    16 zones and 100% sequential forward reads in each zone.  Any other
    access pattern severely degrades performance.  For example if you
    read the data in reverse you can kiss goodbyte to performance.  If
    you introduce slight non-linearities in the access pattern, even though
    the seeks are within 16-32K of each other, performance degrades very
    rapidly.

    This is what I mean by drives not doing sane caching.  It was ok with
    smaller drives where the non-linearities were hitting up against the
    need to do an actual head seek, but the drive caches in today's huge
    drives are just not tuned very well.

    UFS does have a bit of advantage here but HAMMER does a fairly good
    job too.  The problem HAMMER has is with its initial layout due to
    B-Tree node splits (which messes up linearity in the B-Tree).  Once
    the reblocker cleans up the B-Tree performance is recovered.  The
    B-Tree is the biggest problem, but I can't fix the initial layout
    without making incompatible media changes so I'm holding off on
    doing it for now.

						-Matt


From owner-freebsd-arch@FreeBSD.ORG  Thu Jul  9 16:36:48 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7D3911065675
	for <freebsd-arch@freebsd.org>; Thu,  9 Jul 2009 16:36:48 +0000 (UTC)
	(envelope-from bounces+305227.47129446.578549@icpbounce.com)
Received: from smtp2.icpbounce.com (smtp2.icpbounce.com [216.27.93.124])
	by mx1.freebsd.org (Postfix) with ESMTP id 521928FC0A
	for <freebsd-arch@freebsd.org>; Thu,  9 Jul 2009 16:36:46 +0000 (UTC)
	(envelope-from bounces+305227.47129446.578549@icpbounce.com)
Received: from localhost.localdomain (localhost.localdomain [127.0.0.1])
	by smtp2.icpbounce.com (Postfix) with ESMTP id C0F794AFEC
	for <freebsd-arch@freebsd.org>; Thu,  9 Jul 2009 12:15:43 -0400 (EDT)
Date: Thu, 9 Jul 2009 12:15:43 -0400
To: freebsd-arch@freebsd.org
From: Global Access Travel <mailing@gaturkey.com>
Message-ID: <f7bfc032c97ba7ae052ca18fc0816629@localhost.localdomain>
X-Priority: 3
X-Mailer: PHPMailer [version 1.72]
Errors-To: bounces+305227.47129446.578549@icpbounce.com
X-List-Unsubscribe: <http://app.icontact.com/icp/listunsubscribe.php?r=47129446&l=82243&s=83FM&m=578549&c=305227>
X-Unsubscribe-Web: <http://app.icontact.com/icp/listunsubscribe.php?r=47129446&l=82243&s=83FM&m=578549&c=305227>
X-ICPINFO: 
X-Return-Path-Hint: bounces+305227.47129446.578549@icpbounce.com
MIME-Version: 1.0
Content-Type: text/plain; charset = "utf-8"
Content-Transfer-Encoding: 8bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Fam Trip to TURKEY for $999 (Refundable)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Jul 2009 16:36:49 -0000

[http://www.turkeycallingus.com/]

Exclusive Boutique Enterprise

Turkey FAM
ISTANBUL - CAPPADOCIA - KONYA - ANTALYA - PAMUKKALE - KUSADASI
9 Nights / 11 Days  $999

• 5 Continents • 150 Countries Worldwide • 100.000 Hotels •
Instant Confirmation

[http://www.turkeycallingus.com]

[http://www.turkeycallingus.com/turkey-fam/TurkeyFam.htm]
[http://www.turkeycallingus.com/turkey-fam/TurkeyFamItinerary.htm]
[http://www.turkeycallingus.com/turkey-fam/TurkeyFamRates.htm]
[http://www.turkeycallingus.com/turkey-fam/TurkeyFamServices.htm]
[http://www.turkeycallingus.com/turkey-fam/TurkeyFamHotels.htm]

Global Access proudly presents the biggest FAM Trip of the year, teaming
with Turkish Airlines and Turkish Ministry of Tourism and Culture.

As the host of ASTA IDE 2010 and European Capital of Culture 2010, Turkey
is likely to be the one of the most popular destinations in 2010. Those who
act early and get to know this beautiful country better will be able to
give a better insight to their clients and secure more bookings.

Our specially selected travel agents will stay in best hotels in each
town, be escorted by professional, top tour guides, taste exceptionally
good examples of Turkish Cuisine, and get to know Turkey in elegant way.

Join us for a luxury FAM adventure and be our special guest in our
beautiful country!

COMBINE WITH World Travel Market!

One of the biggest travel shows of Europe and the world, WTM, will be
held in London between 9-12 November 2009. Combine your London trip with
Turkey and benefit from great agent rates to see one of the most popular
tourist destinations from USA and Canada.
WE WILL REFUND YOUR MONEY BACK !

Upon booking your 20th passenger on a Global Access Travel Service, we
will refund you the whole tour price that you’ve paid for the FAM Trip.

If you book 20 or more people on a Global Access Travel Service before
the FAM Trip starts, then you will travel for free!

About Us Global Access Travel (GA) was founded in Turkey by a group of
tourism professionals and marketing experts who recognized the needs to
offer online services for accommodations, car rentals, and other travel
related services to travel agencies.
Through its sophisticated online reservation services, GA offers more
than 100,000 hotels, motels, resorts, clubs and apartments all around the
world.
Other services of GA include car rentals, transfers, special tours,
luxury services, city breaks, flight tickets and other services such as
tailor made tour packages, exhibition organizations, incentives and other
travel related services around the globe at competitive rates.

[http://www.TurkeyCallingus.com] www.TurkeyCalling.us
[http://www.turkeycallingus.com/turkey-calling-contact-us.htm]

Global Access Travel
Tel: +90 212 258 58 29 Fax: +90 212 258 34 47
E-mail : [mailto:incoming@gaturkey.com] incoming@gaturkey.com   Website:
[http://www.turkeycallingus.com/] www.TurkeyCalling.Us
This message was sent by: FamTrit turkey, Nüzhetiye Cad., istanbul, besiktas 34357, Turkey

To be removed click here:
http://app.icontact.com/icp/mmail-mprofile.pl?r=47129446&l=82243&s=83FM&m=578549&c=305227

Forward to a friend: 
http://app.icontact.com/icp/sub/forward?m=578549&s=47129446&c=83FM&cid=305227


From owner-freebsd-arch@FreeBSD.ORG  Thu Jul  9 21:54:13 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A7C81106566C
	for <freebsd-arch@freebsd.org>; Thu,  9 Jul 2009 21:54:13 +0000 (UTC)
	(envelope-from freebsd-arch@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 2E92A8FC1B
	for <freebsd-arch@freebsd.org>; Thu,  9 Jul 2009 21:54:12 +0000 (UTC)
	(envelope-from freebsd-arch@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1MP1ZD-0005IA-OG
	for freebsd-arch@freebsd.org; Thu, 09 Jul 2009 21:54:11 +0000
Received: from 93-138-117-98.adsl.net.t-com.hr ([93.138.117.98])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-arch@freebsd.org>; Thu, 09 Jul 2009 21:54:11 +0000
Received: from ivoras by 93-138-117-98.adsl.net.t-com.hr with local (Gmexim
	0.1 (Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-arch@freebsd.org>; Thu, 09 Jul 2009 21:54:11 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-arch@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Thu, 09 Jul 2009 23:53:55 +0200
Lines: 59
Message-ID: <h35oto$vkd$1@ger.gmane.org>
References: <4A4FAA2D.3020409@FreeBSD.org>	<20090705100044.4053e2f9@ernst.jennejohn.org>	<4A50667F.7080608@FreeBSD.org>
	<20090705223126.I42918@delplex.bde.org>	<4A50BA9A.9080005@FreeBSD.org>
	<20090706005851.L1439@besplex.bde.org>	<4A50DEE8.6080406@FreeBSD.org>
	<20090706034250.C2240@besplex.bde.org>	<4A50F619.4020101@FreeBSD.org>
	<d763ac660907051158i256c0f93n4a895a992c2a8c34@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig93BC92D1C01CFF2631BD80B1"
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: 93-138-117-98.adsl.net.t-com.hr
User-Agent: Thunderbird 2.0.0.22 (Windows/20090605)
In-Reply-To: <d763ac660907051158i256c0f93n4a895a992c2a8c34@mail.gmail.com>
X-Enigmail-Version: 0.95.7
Sender: news <news@ger.gmane.org>
Subject: Re: DFLTPHYS vs MAXPHYS
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Jul 2009 21:54:14 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig93BC92D1C01CFF2631BD80B1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Adrian Chadd wrote:
> 2009/7/6 Alexander Motin <mav@freebsd.org>:
>=20
>> In this tests you've got almost only negative side of effect, as you h=
ave
>> said, due to cache misses. Do you really have CPU with so small L2 cac=
he?
>> Some kind of P3 or old Celeron? But with 64K MAXPHYS you just didn't g=
et any
>> benefit from using bigger block size.
>=20
> All the world isn't your current desktop box with only SATA devices :)
>
> There have been and will be plenty of little embedded CPUs with tiny
> amounts of cache for quite some time to come.

Yes, and no embedded developer will use the GENERIC kernel on his device
so we can, for this purpose, ignore them :)

> You're also doing simple stream IO tests. Please re-think the thought
> experiment with a whole lot of parallel IO going on rather than just
> straight single stream IO.

Also, one thing to remember is RAID, both hardware and software. For
example, with gstripe of two drives it's very visible how sharply the
performance falls if you go from 32 kB stripes to 64 kB stripes, since
the upper layer passes 64 kB requests to GEOM. GEOM will pass the
request to gstripe, which will in the first case request 32 kB from each
drive (faster) and in the second case only 64 kB from one of the drives
(no performance gain from striping).

(please adjust for 32/64 -> 64/128 if appropriate, I don't have the raw
numbers now)

Of course it's not a reason as-is but both Windows and Linux have 1 MB
BIO buffers so it's reasonable to assume that vendors will optimize for
that size if they can.


--------------enig93BC92D1C01CFF2631BD80B1
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpWZvMACgkQldnAQVacBcjSowCcC6dSaIRxKirDMfjnEWywOnNB
h6AAn0sirNEORJhrbcS7I9pto9UMDwA/
=tvMf
-----END PGP SIGNATURE-----

--------------enig93BC92D1C01CFF2631BD80B1--


From owner-freebsd-arch@FreeBSD.ORG  Sat Jul 11 08:52:56 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 87CC41065677
	for <freebsd-arch@freebsd.org>; Sat, 11 Jul 2009 08:52:56 +0000 (UTC)
	(envelope-from toby@iacmusic.com)
Received: from 4-George.m6.net (4-George.m6.net [70.84.97.170])
	by mx1.freebsd.org (Postfix) with ESMTP id 43AB08FC22
	for <freebsd-arch@freebsd.org>; Sat, 11 Jul 2009 08:52:55 +0000 (UTC)
	(envelope-from toby@iacmusic.com)
Received: from pool-70-106-84-225.hag.east.verizon.net [70.106.84.225] by
	4-George.m6.net with SMTP; Sat, 11 Jul 2009 09:08:16 +0100
X-Unsent: 1
Date: Sat, 11 Jul 2009 04:22:16 -0400
Content-Transfer-Encoding: quoted-printable
To: freebsd-arch@freebsd.org
From: Toby@IACmusic.com
Message-ID: <CHILKAT-MID-fa64deb9-0b52-c6b0-f359-0344cfa9739c@FT-PC>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Finally, a song contest that is free to enter.  $27,
	000+ in prizes too!
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 11 Jul 2009 08:52:59 -0000


   =0A =0A

   Hi,

   =0A


   =0A

   IACmusic.com has started a major song contest and this one you can
   enter for free! &n= bsp; Thanks to a major new =0Asponsorship, we're
   throwing a party that = will make major waves in the Indie =0AWorld.
    Got a =0Asong= you know is good ?  You could win, there are going to
   be a lot of = =0Awinners..

   =0A


   =0A

   It's our [1]YEAR =0AOF THE I= NDIE Celebration and it started
   "Indiependents" Day July =0A4th!&nb= sp;   =0AThere are 16 gen= re
   categories to choose from to enter your song in, =0Aincluding Songwri=
   ting.  The Grand Prize is a huge package that =0Aincludes $1000 wor=
   th of musical equipment (whatever you need), 2 =0Aweeks stay in a c=
   ondo suite at your choice of a number of US =0Avacation spots, an i=
   Pod Shuffle, and a IAC Prime Perpetual Lifetime =0Amembership.  But=
   there are also 3 nice prizes in each of 16 =0Acategories and you can
   en= ter any original song.  You will get a =0Alot of additional
   exposur= e even by advancing to the later rounds of the
   =0Acompetition.
   He= re's a [2]direct =0Alink to enter your song, if not logged in you
   will hit = login page to do so.

   =0A


   =0A

   Go [3]here =0A= for the details.

   =0A


   =0A

   Good luc= k,

   =0A


   =0A

   The Staff at IACmusic.com (the Ind= ie Capitol of the World)

   =0A =0A =0A

References

   1. file://localhost/tmp/3D"htt=
   2. 3D"http://iacmusic.com/quickSignup.aspx"
   3. file://localhost/tmp/3D=