Date: Sat, 13 Apr 2013 21:24:07 +0200 From: Adam Nowacki <nowakpl@platinum.linux.pl> To: Will Andrews <will@firepipe.net> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>, zfs@lists.illumos.org, Andriy Gapon <avg@freebsd.org> Subject: Re: ZFS slow reads for unallocated blocks Message-ID: <5169B0D7.9090607@platinum.linux.pl> In-Reply-To: <CADBaqmgjKNXERk6OMgDJHH4S_CvkUNS%2BfEepNrTWBVezFHeksg@mail.gmail.com> References: <5166EA43.7050700@platinum.linux.pl> <5167B1C5.8020402@FreeBSD.org> <51689A2C.4080402@platinum.linux.pl> <5169324A.3080309@FreeBSD.org> <516949C7.4030305@platinum.linux.pl> <CADBaqmgjKNXERk6OMgDJHH4S_CvkUNS%2BfEepNrTWBVezFHeksg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Including zfs@illumos on this. To recap: Reads from sparse files are slow with speed proportional to ratio of read size to filesystem recordsize ratio. There is no physical disk I/O. # zfs create -o atime=off -o recordsize=128k -o compression=off -o sync=disabled -o mountpoint=/home/testfs home/testfs # dd if=/dev/random of=/home/testfs/random10m bs=1024k count=10 # truncate -s 10m /home/testfs/trunc10m # dd if=/home/testfs/random10m of=/dev/null bs=512 10485760 bytes transferred in 0.078637 secs (133344041 bytes/sec) # dd if=/home/testfs/trunc10m of=/dev/null bs=512 10485760 bytes transferred in 1.011500 secs (10366544 bytes/sec) # zfs create -o atime=off -o recordsize=8M -o compression=off -o sync=disabled -o mountpoint=/home/testfs home/testfs # dd if=/home/testfs/random10m of=/dev/null bs=512 10485760 bytes transferred in 0.080430 secs (130371205 bytes/sec) # dd if=/home/testfs/trunc10m of=/dev/null bs=512 10485760 bytes transferred in 72.465486 secs (144700 bytes/sec) This is from FreeBSD 9.1 and possible solution at http://tepeserwery.pl/nowak/freebsd/zfs_sparse_optimization_v2.patch.txt - untested yet, system will be busy building packages for a few more days. On 2013-04-13 19:11, Will Andrews wrote: > Hi, > > I think the idea of using a pre-zeroed region as the 'source' is a good > one, but probably it would be better to set a special flag on a hole > dbuf than to require caller flags. That way, ZFS can lazily evaluate > the hole dbuf (i.e. avoid zeroing db_data until it has to). However, > that could be complicated by the fact that there are many potential > users of hole dbufs that would want to write to the dbuf. > > This sort of optimization should be brought to the illumos zfs list. As > it stands, your patch is also FreeBSD-specific, since 'zero_region' only > exists in vm/vm_kern.c. Given the frequency of zero-copying, however, > it's quite possible there are other versions of this region elsewhere. > > --Will. > > > On Sat, Apr 13, 2013 at 6:04 AM, Adam Nowacki <nowakpl@platinum.linux.pl > <mailto:nowakpl@platinum.linux.pl>> wrote: > > Temporary dbufs are created for each missing (unallocated on disk) > record, including indirects if the hole is large enough. Those dbufs > never find way to ARC and are freed at the end of dmu_read_uio. > > A small read (from a hole) would in the best case bzero 128KiB > (recordsize, more if missing indirects) ... and I'm running modified > ZFS with record sizes up to 8MiB. > > # zfs create -o atime=off -o recordsize=8M -o compression=off -o > mountpoint=/home/testfs home/testfs > # truncate -s 8m /home/testfs/trunc8m > # dd if=/dev/zero of=/home/testfs/zero8m bs=8m count=1 > 1+0 records in > 1+0 records out > 8388608 bytes transferred in 0.010193 secs (822987745 bytes/sec) > > # time cat /home/testfs/trunc8m > /dev/null > 0.000u 6.111s 0:06.11 100.0% 15+2753k 0+0io 0pf+0w > > # time cat /home/testfs/zero8m > /dev/null > 0.000u 0.010s 0:00.01 100.0% 12+2168k 0+0io 0pf+0w > > 600x increase in system time and close to 1MB/s - insanity. > > The fix - a lot of the code to efficiently handle this was already > there. > > dbuf_hold_impl has int fail_sparse argument to return ENOENT for > holes. Just had to get there and somehow back to dmu_read_uio where > zeroing can happen at byte granularity. > > ... didn't have time to actually test it yet. > > > On 2013-04-13 12:24, Andriy Gapon wrote: > > on 13/04/2013 02:35 Adam Nowacki said the following: > > http://tepeserwery.pl/nowak/__freebsd/zfs_sparse___optimization.patch.txt > <http://tepeserwery.pl/nowak/freebsd/zfs_sparse_optimization.patch.txt> > > Does it look sane? > > > It's hard to tell from a quick look since they change is not small. > What is your idea of the problem and the fix? > > On 2013-04-12 09:03, Andriy Gapon wrote: > > > ENOTIME to really investigate, but here is a basic > profile result for those > interested: > kernel`bzero+0xa > kernel`dmu_buf_hold_array_by___dnode+0x1cf > kernel`dmu_read_uio+0x66 > kernel`zfs_freebsd_read+0x3c0 > kernel`VOP_READ_APV+0x92 > kernel`vn_read+0x1a3 > kernel`vn_io_fault+0x23a > kernel`dofileread+0x7b > kernel`sys_read+0x9e > kernel`amd64_syscall+0x238 > kernel`0xffffffff80747e4b > > That's where > 99% of time is spent. > > > > > > _________________________________________________ > freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list > http://lists.freebsd.org/__mailman/listinfo/freebsd-fs > <http://lists.freebsd.org/mailman/listinfo/freebsd-fs> > To unsubscribe, send any mail to > "freebsd-fs-unsubscribe@__freebsd.org > <mailto:freebsd-fs-unsubscribe@freebsd.org>" > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5169B0D7.9090607>