From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 13 19:24:33 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 71504325;
 Sat, 13 Apr 2013 19:24:33 +0000 (UTC)
 (envelope-from nowakpl@platinum.linux.pl)
Received: from platinum.linux.pl (platinum.edu.pl [81.161.192.4])
 by mx1.freebsd.org (Postfix) with ESMTP id 1913910D4;
 Sat, 13 Apr 2013 19:24:32 +0000 (UTC)
Received: by platinum.linux.pl (Postfix, from userid 87)
 id 0CD8947E27; Sat, 13 Apr 2013 21:24:30 +0200 (CEST)
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on platinum.linux.pl
X-Spam-Level: 
X-Spam-Status: No, score=-1.3 required=3.0 tests=ALL_TRUSTED,AWL
 autolearn=disabled version=3.3.2
Received: from [10.255.1.2] (unknown [83.151.38.73])
 by platinum.linux.pl (Postfix) with ESMTPA id 5160347E21;
 Sat, 13 Apr 2013 21:24:30 +0200 (CEST)
Message-ID: <5169B0D7.9090607@platinum.linux.pl>
Date: Sat, 13 Apr 2013 21:24:07 +0200
From: Adam Nowacki <nowakpl@platinum.linux.pl>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130328 Thunderbird/17.0.5
MIME-Version: 1.0
To: Will Andrews <will@firepipe.net>
Subject: Re: ZFS slow reads for unallocated blocks
References: <5166EA43.7050700@platinum.linux.pl>
 <5167B1C5.8020402@FreeBSD.org> <51689A2C.4080402@platinum.linux.pl>
 <5169324A.3080309@FreeBSD.org> <516949C7.4030305@platinum.linux.pl>
 <CADBaqmgjKNXERk6OMgDJHH4S_CvkUNS+fEepNrTWBVezFHeksg@mail.gmail.com>
In-Reply-To: <CADBaqmgjKNXERk6OMgDJHH4S_CvkUNS+fEepNrTWBVezFHeksg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>, zfs@lists.illumos.org,
 Andriy Gapon <avg@freebsd.org>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Apr 2013 19:24:33 -0000

Including zfs@illumos on this. To recap:

Reads from sparse files are slow with speed proportional to ratio of 
read size to filesystem recordsize ratio. There is no physical disk I/O.

# zfs create -o atime=off -o recordsize=128k -o compression=off -o 
sync=disabled -o mountpoint=/home/testfs home/testfs
# dd if=/dev/random of=/home/testfs/random10m bs=1024k count=10
# truncate -s 10m /home/testfs/trunc10m
# dd if=/home/testfs/random10m of=/dev/null bs=512
10485760 bytes transferred in 0.078637 secs (133344041 bytes/sec)
# dd if=/home/testfs/trunc10m of=/dev/null bs=512
10485760 bytes transferred in 1.011500 secs (10366544 bytes/sec)

# zfs create -o atime=off -o recordsize=8M -o compression=off -o 
sync=disabled -o mountpoint=/home/testfs home/testfs
# dd if=/home/testfs/random10m of=/dev/null bs=512
10485760 bytes transferred in 0.080430 secs (130371205 bytes/sec)
# dd if=/home/testfs/trunc10m of=/dev/null bs=512
10485760 bytes transferred in 72.465486 secs (144700 bytes/sec)

This is from FreeBSD 9.1 and possible solution at 
http://tepeserwery.pl/nowak/freebsd/zfs_sparse_optimization_v2.patch.txt 
- untested yet, system will be busy building packages for a few more days.

On 2013-04-13 19:11, Will Andrews wrote:
> Hi,
>
> I think the idea of using a pre-zeroed region as the 'source' is a good
> one, but probably it would be better to set a special flag on a hole
> dbuf than to require caller flags.  That way, ZFS can lazily evaluate
> the hole dbuf (i.e. avoid zeroing db_data until it has to).  However,
> that could be complicated by the fact that there are many potential
> users of hole dbufs that would want to write to the dbuf.
>
> This sort of optimization should be brought to the illumos zfs list.  As
> it stands, your patch is also FreeBSD-specific, since 'zero_region' only
> exists in vm/vm_kern.c.  Given the frequency of zero-copying, however,
> it's quite possible there are other versions of this region elsewhere.
>
> --Will.
>
>
> On Sat, Apr 13, 2013 at 6:04 AM, Adam Nowacki <nowakpl@platinum.linux.pl
> <mailto:nowakpl@platinum.linux.pl>> wrote:
>
>     Temporary dbufs are created for each missing (unallocated on disk)
>     record, including indirects if the hole is large enough. Those dbufs
>     never find way to ARC and are freed at the end of dmu_read_uio.
>
>     A small read (from a hole) would in the best case bzero 128KiB
>     (recordsize, more if missing indirects) ... and I'm running modified
>     ZFS with record sizes up to 8MiB.
>
>     # zfs create -o atime=off -o recordsize=8M -o compression=off -o
>     mountpoint=/home/testfs home/testfs
>     # truncate -s 8m /home/testfs/trunc8m
>     # dd if=/dev/zero of=/home/testfs/zero8m bs=8m count=1
>     1+0 records in
>     1+0 records out
>     8388608 bytes transferred in 0.010193 secs (822987745 bytes/sec)
>
>     # time cat /home/testfs/trunc8m > /dev/null
>     0.000u 6.111s 0:06.11 100.0%    15+2753k 0+0io 0pf+0w
>
>     # time cat /home/testfs/zero8m > /dev/null
>     0.000u 0.010s 0:00.01 100.0%    12+2168k 0+0io 0pf+0w
>
>     600x increase in system time and close to 1MB/s - insanity.
>
>     The fix - a lot of the code to efficiently handle this was already
>     there.
>
>     dbuf_hold_impl has int fail_sparse argument to return ENOENT for
>     holes. Just had to get there and somehow back to dmu_read_uio where
>     zeroing can happen at byte granularity.
>
>     ... didn't have time to actually test it yet.
>
>
>     On 2013-04-13 12:24, Andriy Gapon wrote:
>
>         on 13/04/2013 02:35 Adam Nowacki said the following:
>
>             http://tepeserwery.pl/nowak/__freebsd/zfs_sparse___optimization.patch.txt
>             <http://tepeserwery.pl/nowak/freebsd/zfs_sparse_optimization.patch.txt>
>
>             Does it look sane?
>
>
>         It's hard to tell from a quick look since they change is not small.
>         What is your idea of the problem and the fix?
>
>             On 2013-04-12 09:03, Andriy Gapon wrote:
>
>
>                 ENOTIME to really investigate, but here is a basic
>                 profile result for those
>                 interested:
>                                  kernel`bzero+0xa
>                                  kernel`dmu_buf_hold_array_by___dnode+0x1cf
>                                  kernel`dmu_read_uio+0x66
>                                  kernel`zfs_freebsd_read+0x3c0
>                                  kernel`VOP_READ_APV+0x92
>                                  kernel`vn_read+0x1a3
>                                  kernel`vn_io_fault+0x23a
>                                  kernel`dofileread+0x7b
>                                  kernel`sys_read+0x9e
>                                  kernel`amd64_syscall+0x238
>                                  kernel`0xffffffff80747e4b
>
>                 That's where > 99% of time is spent.
>
>
>
>
>
>     _________________________________________________
>     freebsd-fs@freebsd.org <mailto:freebsd-fs@freebsd.org> mailing list
>     http://lists.freebsd.org/__mailman/listinfo/freebsd-fs
>     <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>
>     To unsubscribe, send any mail to
>     "freebsd-fs-unsubscribe@__freebsd.org
>     <mailto:freebsd-fs-unsubscribe@freebsd.org>"
>
>