From owner-freebsd-fs@FreeBSD.ORG Thu May 3 21:15:03 2007 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AD7B116A401; Thu, 3 May 2007 21:15:03 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from mail.bitblocks.com (mail.bitblocks.com [64.142.15.60]) by mx1.freebsd.org (Postfix) with ESMTP id 923C813C447; Thu, 3 May 2007 21:15:03 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from bitblocks.com (localhost.bitblocks.com [127.0.0.1]) by mail.bitblocks.com (Postfix) with ESMTP id 56DAF5B2E; Thu, 3 May 2007 14:15:03 -0700 (PDT) To: Pawel Jakub Dawidek In-reply-to: Your message of "Thu, 03 May 2007 21:06:26 +0200." <20070503190626.GB7177@garage.freebsd.pl> Date: Thu, 03 May 2007 14:15:03 -0700 From: Bakul Shah Message-Id: <20070503211503.56DAF5B2E@mail.bitblocks.com> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS vs UFS2 overhead and may be a bug? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 May 2007 21:15:03 -0000 > Interesting. There are two problems. First is that cat(1) uses > st_blksize to find out best size of I/O request and we force it to > PAGE_SIZE, which is very, very wrong for ZFS - it should be equal to > recordsize. I need to find discussion about this: > > /* > * According to www.opengroup.org, the meaning of st_blksize is > * "a filesystem-specific preferred I/O block size for this > * object. In some filesystem types, this may vary from file > * to file" > * Default to PAGE_SIZE after much discussion. > * XXX: min(PAGE_SIZE, vp->v_bufobj.bo_bsize) may be more > * correct. > */ > > sb->st_blksize = PAGE_SIZE; This does seem suboptimal. Almost always one reads an entire file and the overhead of going to the disk is high enough that one may as well read small files in one syscall. Apps that want to keep lots and lots of files open can always adjust the buffer size. Since disk seek access time is the largest cost component, ideally contiguously allocated data should be read in one access in order to avoid any extra seeks. At the very least st_blksize should be as large as the minimum unit of contiguous allocation (== filesystem block size). Even V7 unix had this! > I tested it on Solaris and this is not FreeBSD-specific problem, the > same is on Solaris. Is there a chance you could send your observations > to zfs-discuss@opensolaris.org, but just comparsion between dd(1) with > bs=128k and bs=4k (the other tests might be confusing). I just did so.