Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Apr 2013 11:11:30 -0600
From:      Will Andrews <will@firepipe.net>
To:        Adam Nowacki <nowakpl@platinum.linux.pl>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>, Andriy Gapon <avg@freebsd.org>
Subject:   Re: ZFS slow reads for unallocated blocks
Message-ID:  <CADBaqmgjKNXERk6OMgDJHH4S_CvkUNS%2BfEepNrTWBVezFHeksg@mail.gmail.com>
In-Reply-To: <516949C7.4030305@platinum.linux.pl>
References:  <5166EA43.7050700@platinum.linux.pl> <5167B1C5.8020402@FreeBSD.org> <51689A2C.4080402@platinum.linux.pl> <5169324A.3080309@FreeBSD.org> <516949C7.4030305@platinum.linux.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

I think the idea of using a pre-zeroed region as the 'source' is a good
one, but probably it would be better to set a special flag on a hole dbuf
than to require caller flags.  That way, ZFS can lazily evaluate the hole
dbuf (i.e. avoid zeroing db_data until it has to).  However, that could be
complicated by the fact that there are many potential users of hole dbufs
that would want to write to the dbuf.

This sort of optimization should be brought to the illumos zfs list.  As it
stands, your patch is also FreeBSD-specific, since 'zero_region' only
exists in vm/vm_kern.c.  Given the frequency of zero-copying, however, it's
quite possible there are other versions of this region elsewhere.

--Will.


On Sat, Apr 13, 2013 at 6:04 AM, Adam Nowacki <nowakpl@platinum.linux.pl>wrote:

> Temporary dbufs are created for each missing (unallocated on disk) record,
> including indirects if the hole is large enough. Those dbufs never find way
> to ARC and are freed at the end of dmu_read_uio.
>
> A small read (from a hole) would in the best case bzero 128KiB
> (recordsize, more if missing indirects) ... and I'm running modified ZFS
> with record sizes up to 8MiB.
>
> # zfs create -o atime=off -o recordsize=8M -o compression=off -o
> mountpoint=/home/testfs home/testfs
> # truncate -s 8m /home/testfs/trunc8m
> # dd if=/dev/zero of=/home/testfs/zero8m bs=8m count=1
> 1+0 records in
> 1+0 records out
> 8388608 bytes transferred in 0.010193 secs (822987745 bytes/sec)
>
> # time cat /home/testfs/trunc8m > /dev/null
> 0.000u 6.111s 0:06.11 100.0%    15+2753k 0+0io 0pf+0w
>
> # time cat /home/testfs/zero8m > /dev/null
> 0.000u 0.010s 0:00.01 100.0%    12+2168k 0+0io 0pf+0w
>
> 600x increase in system time and close to 1MB/s - insanity.
>
> The fix - a lot of the code to efficiently handle this was already there.
>
> dbuf_hold_impl has int fail_sparse argument to return ENOENT for holes.
> Just had to get there and somehow back to dmu_read_uio where zeroing can
> happen at byte granularity.
>
> ... didn't have time to actually test it yet.
>
>
> On 2013-04-13 12:24, Andriy Gapon wrote:
>
>> on 13/04/2013 02:35 Adam Nowacki said the following:
>>
>>> http://tepeserwery.pl/nowak/**freebsd/zfs_sparse_**
>>> optimization.patch.txt<http://tepeserwery.pl/nowak/freebsd/zfs_sparse_optimization.patch.txt>;
>>>
>>> Does it look sane?
>>>
>>
>> It's hard to tell from a quick look since they change is not small.
>> What is your idea of the problem and the fix?
>>
>>  On 2013-04-12 09:03, Andriy Gapon wrote:
>>>
>>>>
>>>> ENOTIME to really investigate, but here is a basic profile result for
>>>> those
>>>> interested:
>>>>                 kernel`bzero+0xa
>>>>                 kernel`dmu_buf_hold_array_by_**dnode+0x1cf
>>>>                 kernel`dmu_read_uio+0x66
>>>>                 kernel`zfs_freebsd_read+0x3c0
>>>>                 kernel`VOP_READ_APV+0x92
>>>>                 kernel`vn_read+0x1a3
>>>>                 kernel`vn_io_fault+0x23a
>>>>                 kernel`dofileread+0x7b
>>>>                 kernel`sys_read+0x9e
>>>>                 kernel`amd64_syscall+0x238
>>>>                 kernel`0xffffffff80747e4b
>>>>
>>>> That's where > 99% of time is spent.
>>>>
>>>>
>>>
>>
>>
> ______________________________**_________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/**mailman/listinfo/freebsd-fs<http://lists.freebsd.org/mailman/listinfo/freebsd-fs>;
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@**freebsd.org<freebsd-fs-unsubscribe@freebsd.org>
> "
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADBaqmgjKNXERk6OMgDJHH4S_CvkUNS%2BfEepNrTWBVezFHeksg>