From owner-freebsd-fs@FreeBSD.ORG Fri May 4 19:42:46 2007 Return-Path: X-Original-To: freebsd-fs@FreeBSD.org Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8931116A401; Fri, 4 May 2007 19:42:46 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from mail.bitblocks.com (bitblocks.com [64.142.15.60]) by mx1.freebsd.org (Postfix) with ESMTP id 6B5AB13C458; Fri, 4 May 2007 19:42:46 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from bitblocks.com (localhost.bitblocks.com [127.0.0.1]) by mail.bitblocks.com (Postfix) with ESMTP id 228C15B51; Fri, 4 May 2007 12:42:46 -0700 (PDT) To: Bruce Evans In-reply-to: Your message of "Fri, 04 May 2007 15:45:15 +1000." <20070504153155.H37499@besplex.bde.org> Date: Fri, 04 May 2007 12:42:46 -0700 From: Bakul Shah Message-Id: <20070504194246.228C15B51@mail.bitblocks.com> Cc: freebsd-fs@FreeBSD.org, Pawel Jakub Dawidek Subject: Re: ZFS vs UFS2 overhead and may be a bug? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 May 2007 19:42:46 -0000 > >> Interesting. There are two problems. First is that cat(1) uses > >> st_blksize to find out best size of I/O request and we force it to > >> PAGE_SIZE, which is very, very wrong for ZFS - it should be equal to > >> recordsize. I need to find discussion about this: > > >> ... > >> sb->st_blksize = PAGE_SIZE; > > > > This does seem suboptimal. Almost always one reads an entire > > It's just broken. What should it be? > > file and the overhead of going to the disk is high enough > > that one may as well read small files in one syscall. Apps > > that want to keep lots and lots of files open can always > > adjust the buffer size. > > > > Since disk seek access time is the largest cost component, > > ideally contiguously allocated data should be read in one > > access in order to avoid any extra seeks. At the very least > > Buffering makes the userland i/o size have almost no effect on > physical disk accesses. Perhaps even for zfs, since IIRC your > benchmark showed anamolies for the case of sparse files where > no disk accesses are involved. This is perhaps a separate problem from that of sparse file access. In my tests on regular files (not sparse) ZFS took 8% more time to read a 10G file when 4K buffer was used and 90% more time for 1K buffer. May be it is simply the ZFS overhead but as size of read() buffer has an non-negligible effect, something needs to be done.