From owner-freebsd-performance@FreeBSD.ORG  Mon May  5 19:57:42 2003
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5F02137B401
	for <freebsd-performance@freebsd.org>;
	Mon,  5 May 2003 19:57:42 -0700 (PDT)
Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net
	[207.217.120.189])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C613B43F3F
	for <freebsd-performance@freebsd.org>;
	Mon,  5 May 2003 19:57:41 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from pool0018.cvx21-bradley.dialup.earthlink.net ([209.179.192.18]
	helo=mindspring.com)
	by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 19CseA-0006mZ-00; Mon, 05 May 2003 19:57:39 -0700
Message-ID: <3EB72454.519DF3E@mindspring.com>
Date: Mon, 05 May 2003 19:56:20 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: Artem Tepponen <temik@egartech.com>
References: <5235EF9BAE6B7F4CB3735789EEF73B29B06A69@turtle.egar.egartech.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a49718f3ba2992e79f9fc08d12e75daf2f666fa475841a1c7a350badd9bab72f9c350badd9bab72f9c
cc: freebsd-performance@freebsd.org
Subject: Re: freebsd-performance Digest, Vol 3, Issue 1
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 May 2003 02:57:42 -0000

Artem Tepponen wrote:
> > Too bad it's not supported, and too bad that, if it was, the
> > overhead would be too high because there's not VOP to get the
> > FS block offsets, so you would have to go trouh the FS code to
> > swap, and it would be much, much slower.
> 
> Btw, do you have any fresh numbers on hand that can support this statement?
> Naive approach whould be comparing CPU time taken and disk latencies
> that differ by an order of magnitude and conclude that few microseconds
> eaten by CPU would go unnoticed compared with milliseconds taken by disk.

The FS orders operations; raw disk I/O does not.  The FS lays
out blocks in files essentially at random; the layout of the
blocks in the swap partition is linear.  The FS must obey POSIX
semantics about access and modification times; raw disk I/O
does not.  The FS enforces read-before-write on non-page aligned
whole page access.

We aren't talking about CPU time here, we are talking about
operational delay overhead, seek overhead, and a doubling of
the addition of a write operation per read or write access to
the file, etc..

You can't tell me that twice the I/O... potentially twice the
I/O... is a CPU issue.

Even with the optimization I suggested, of getting a physical
block list, and using that against the raw device (essentially
the same pig-trick that the FreeBSD NTFS uses to rewrite NTFS
files contents, so long as the size never changes), there's
still an additional indirection through a blocklist to convert
a physically discontiguous block array into a logically contiguous
one, and there's still the fact that it has to seek all over the
disk to access those blocks, and it can't use bulk transfer in
the driver or predictive read-ahead in the VM system.

Add to this that you can't dump to a swap device created this way:
at crash time, you cannot risk extending the file, so it would
have to be pre-allocated large enough, and you could not trust
the block conversion list was not corrupted by whatever caused
the panic, and where you write is not limited by a simple set of
block offsets for a region of the disk which is guaranteed to not
contain boot-critical or recovery-critical data...

...and you have an overwhelming set of performance limitations
not related to CPU utilization.

-- Terry