From owner-freebsd-performance@FreeBSD.ORG Mon May 5 19:57:42 2003 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5F02137B401 for ; Mon, 5 May 2003 19:57:42 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id C613B43F3F for ; Mon, 5 May 2003 19:57:41 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0018.cvx21-bradley.dialup.earthlink.net ([209.179.192.18] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19CseA-0006mZ-00; Mon, 05 May 2003 19:57:39 -0700 Message-ID: <3EB72454.519DF3E@mindspring.com> Date: Mon, 05 May 2003 19:56:20 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Artem Tepponen References: <5235EF9BAE6B7F4CB3735789EEF73B29B06A69@turtle.egar.egartech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a49718f3ba2992e79f9fc08d12e75daf2f666fa475841a1c7a350badd9bab72f9c350badd9bab72f9c cc: freebsd-performance@freebsd.org Subject: Re: freebsd-performance Digest, Vol 3, Issue 1 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 May 2003 02:57:42 -0000 Artem Tepponen wrote: > > Too bad it's not supported, and too bad that, if it was, the > > overhead would be too high because there's not VOP to get the > > FS block offsets, so you would have to go trouh the FS code to > > swap, and it would be much, much slower. > > Btw, do you have any fresh numbers on hand that can support this statement? > Naive approach whould be comparing CPU time taken and disk latencies > that differ by an order of magnitude and conclude that few microseconds > eaten by CPU would go unnoticed compared with milliseconds taken by disk. The FS orders operations; raw disk I/O does not. The FS lays out blocks in files essentially at random; the layout of the blocks in the swap partition is linear. The FS must obey POSIX semantics about access and modification times; raw disk I/O does not. The FS enforces read-before-write on non-page aligned whole page access. We aren't talking about CPU time here, we are talking about operational delay overhead, seek overhead, and a doubling of the addition of a write operation per read or write access to the file, etc.. You can't tell me that twice the I/O... potentially twice the I/O... is a CPU issue. Even with the optimization I suggested, of getting a physical block list, and using that against the raw device (essentially the same pig-trick that the FreeBSD NTFS uses to rewrite NTFS files contents, so long as the size never changes), there's still an additional indirection through a blocklist to convert a physically discontiguous block array into a logically contiguous one, and there's still the fact that it has to seek all over the disk to access those blocks, and it can't use bulk transfer in the driver or predictive read-ahead in the VM system. Add to this that you can't dump to a swap device created this way: at crash time, you cannot risk extending the file, so it would have to be pre-allocated large enough, and you could not trust the block conversion list was not corrupted by whatever caused the panic, and where you write is not limited by a simple set of block offsets for a region of the disk which is guaranteed to not contain boot-critical or recovery-critical data... ...and you have an overwhelming set of performance limitations not related to CPU utilization. -- Terry