From owner-freebsd-fs@FreeBSD.ORG  Fri May  4 19:42:46 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
X-Original-To: freebsd-fs@FreeBSD.org
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 8931116A401;
	Fri,  4 May 2007 19:42:46 +0000 (UTC)
	(envelope-from bakul@bitblocks.com)
Received: from mail.bitblocks.com (bitblocks.com [64.142.15.60])
	by mx1.freebsd.org (Postfix) with ESMTP id 6B5AB13C458;
	Fri,  4 May 2007 19:42:46 +0000 (UTC)
	(envelope-from bakul@bitblocks.com)
Received: from bitblocks.com (localhost.bitblocks.com [127.0.0.1])
	by mail.bitblocks.com (Postfix) with ESMTP id 228C15B51;
	Fri,  4 May 2007 12:42:46 -0700 (PDT)
To: Bruce Evans <bde@zeta.org.au>
In-reply-to: Your message of "Fri, 04 May 2007 15:45:15 +1000."
	<20070504153155.H37499@besplex.bde.org> 
Date: Fri, 04 May 2007 12:42:46 -0700
From: Bakul Shah <bakul@bitblocks.com>
Message-Id: <20070504194246.228C15B51@mail.bitblocks.com>
Cc: freebsd-fs@FreeBSD.org, Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: ZFS vs UFS2 overhead and may be a bug? 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 04 May 2007 19:42:46 -0000

> >> Interesting. There are two problems. First is that cat(1) uses
> >> st_blksize to find out best size of I/O request and we force it to
> >> PAGE_SIZE, which is very, very wrong for ZFS - it should be equal to
> >> recordsize. I need to find discussion about this:
> 
> >> ...
> >> 	sb->st_blksize = PAGE_SIZE;
> >
> > This does seem suboptimal.  Almost always one reads an entire
> 
> It's just broken.

What should it be?

> > file and the overhead of going to the disk is high enough
> > that one may as well read small files in one syscall.  Apps
> > that want to keep lots and lots of files open can always
> > adjust the buffer size.
> >
> > Since disk seek access time is the largest cost component,
> > ideally contiguously allocated data should be read in one
> > access in order to avoid any extra seeks.  At the very least
> 
> Buffering makes the userland i/o size have almost no effect on
> physical disk accesses.  Perhaps even for zfs, since IIRC your
> benchmark showed anamolies for the case of sparse files where
> no disk accesses are involved.

This is perhaps a separate problem from that of sparse file
access.  In my tests on regular files (not sparse) ZFS took
8% more time to read a 10G file when 4K buffer was used and
90% more time for 1K buffer.  May be it is simply the ZFS
overhead but as size of read() buffer has an non-negligible
effect, something needs to be done.