From owner-freebsd-fs@FreeBSD.ORG Sat Apr 13 19:24:33 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 71504325; Sat, 13 Apr 2013 19:24:33 +0000 (UTC) (envelope-from nowakpl@platinum.linux.pl) Received: from platinum.linux.pl (platinum.edu.pl [81.161.192.4]) by mx1.freebsd.org (Postfix) with ESMTP id 1913910D4; Sat, 13 Apr 2013 19:24:32 +0000 (UTC) Received: by platinum.linux.pl (Postfix, from userid 87) id 0CD8947E27; Sat, 13 Apr 2013 21:24:30 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on platinum.linux.pl X-Spam-Level: X-Spam-Status: No, score=-1.3 required=3.0 tests=ALL_TRUSTED,AWL autolearn=disabled version=3.3.2 Received: from [10.255.1.2] (unknown [83.151.38.73]) by platinum.linux.pl (Postfix) with ESMTPA id 5160347E21; Sat, 13 Apr 2013 21:24:30 +0200 (CEST) Message-ID: <5169B0D7.9090607@platinum.linux.pl> Date: Sat, 13 Apr 2013 21:24:07 +0200 From: Adam Nowacki User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: Will Andrews Subject: Re: ZFS slow reads for unallocated blocks References: <5166EA43.7050700@platinum.linux.pl> <5167B1C5.8020402@FreeBSD.org> <51689A2C.4080402@platinum.linux.pl> <5169324A.3080309@FreeBSD.org> <516949C7.4030305@platinum.linux.pl> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-fs@freebsd.org" , zfs@lists.illumos.org, Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Apr 2013 19:24:33 -0000 Including zfs@illumos on this. To recap: Reads from sparse files are slow with speed proportional to ratio of read size to filesystem recordsize ratio. There is no physical disk I/O. # zfs create -o atime=off -o recordsize=128k -o compression=off -o sync=disabled -o mountpoint=/home/testfs home/testfs # dd if=/dev/random of=/home/testfs/random10m bs=1024k count=10 # truncate -s 10m /home/testfs/trunc10m # dd if=/home/testfs/random10m of=/dev/null bs=512 10485760 bytes transferred in 0.078637 secs (133344041 bytes/sec) # dd if=/home/testfs/trunc10m of=/dev/null bs=512 10485760 bytes transferred in 1.011500 secs (10366544 bytes/sec) # zfs create -o atime=off -o recordsize=8M -o compression=off -o sync=disabled -o mountpoint=/home/testfs home/testfs # dd if=/home/testfs/random10m of=/dev/null bs=512 10485760 bytes transferred in 0.080430 secs (130371205 bytes/sec) # dd if=/home/testfs/trunc10m of=/dev/null bs=512 10485760 bytes transferred in 72.465486 secs (144700 bytes/sec) This is from FreeBSD 9.1 and possible solution at http://tepeserwery.pl/nowak/freebsd/zfs_sparse_optimization_v2.patch.txt - untested yet, system will be busy building packages for a few more days. On 2013-04-13 19:11, Will Andrews wrote: > Hi, > > I think the idea of using a pre-zeroed region as the 'source' is a good > one, but probably it would be better to set a special flag on a hole > dbuf than to require caller flags. That way, ZFS can lazily evaluate > the hole dbuf (i.e. avoid zeroing db_data until it has to). However, > that could be complicated by the fact that there are many potential > users of hole dbufs that would want to write to the dbuf. > > This sort of optimization should be brought to the illumos zfs list. As > it stands, your patch is also FreeBSD-specific, since 'zero_region' only > exists in vm/vm_kern.c. Given the frequency of zero-copying, however, > it's quite possible there are other versions of this region elsewhere. > > --Will. > > > On Sat, Apr 13, 2013 at 6:04 AM, Adam Nowacki > wrote: > > Temporary dbufs are created for each missing (unallocated on disk) > record, including indirects if the hole is large enough. Those dbufs > never find way to ARC and are freed at the end of dmu_read_uio. > > A small read (from a hole) would in the best case bzero 128KiB > (recordsize, more if missing indirects) ... and I'm running modified > ZFS with record sizes up to 8MiB. > > # zfs create -o atime=off -o recordsize=8M -o compression=off -o > mountpoint=/home/testfs home/testfs > # truncate -s 8m /home/testfs/trunc8m > # dd if=/dev/zero of=/home/testfs/zero8m bs=8m count=1 > 1+0 records in > 1+0 records out > 8388608 bytes transferred in 0.010193 secs (822987745 bytes/sec) > > # time cat /home/testfs/trunc8m > /dev/null > 0.000u 6.111s 0:06.11 100.0% 15+2753k 0+0io 0pf+0w > > # time cat /home/testfs/zero8m > /dev/null > 0.000u 0.010s 0:00.01 100.0% 12+2168k 0+0io 0pf+0w > > 600x increase in system time and close to 1MB/s - insanity. > > The fix - a lot of the code to efficiently handle this was already > there. > > dbuf_hold_impl has int fail_sparse argument to return ENOENT for > holes. Just had to get there and somehow back to dmu_read_uio where > zeroing can happen at byte granularity. > > ... didn't have time to actually test it yet. > > > On 2013-04-13 12:24, Andriy Gapon wrote: > > on 13/04/2013 02:35 Adam Nowacki said the following: > > http://tepeserwery.pl/nowak/__freebsd/zfs_sparse___optimization.patch.txt > > > Does it look sane? > > > It's hard to tell from a quick look since they change is not small. > What is your idea of the problem and the fix? > > On 2013-04-12 09:03, Andriy Gapon wrote: > > > ENOTIME to really investigate, but here is a basic > profile result for those > interested: > kernel`bzero+0xa > kernel`dmu_buf_hold_array_by___dnode+0x1cf > kernel`dmu_read_uio+0x66 > kernel`zfs_freebsd_read+0x3c0 > kernel`VOP_READ_APV+0x92 > kernel`vn_read+0x1a3 > kernel`vn_io_fault+0x23a > kernel`dofileread+0x7b > kernel`sys_read+0x9e > kernel`amd64_syscall+0x238 > kernel`0xffffffff80747e4b > > That's where > 99% of time is spent. > > > > > > _________________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/__mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to > "freebsd-fs-unsubscribe@__freebsd.org > " > >