From owner-freebsd-fs@FreeBSD.ORG  Sat Apr 13 12:04:47 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 6B3C1C31;
 Sat, 13 Apr 2013 12:04:47 +0000 (UTC)
 (envelope-from nowakpl@platinum.linux.pl)
Received: from platinum.linux.pl (platinum.edu.pl [81.161.192.4])
 by mx1.freebsd.org (Postfix) with ESMTP id 312B3D3;
 Sat, 13 Apr 2013 12:04:47 +0000 (UTC)
Received: by platinum.linux.pl (Postfix, from userid 87)
 id 2721A47E24; Sat, 13 Apr 2013 14:04:44 +0200 (CEST)
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on platinum.linux.pl
X-Spam-Level: 
X-Spam-Status: No, score=-1.3 required=3.0 tests=ALL_TRUSTED,AWL
 autolearn=disabled version=3.3.2
Received: from [10.255.1.2] (unknown [83.151.38.73])
 by platinum.linux.pl (Postfix) with ESMTPA id CBC6347E21;
 Sat, 13 Apr 2013 14:04:44 +0200 (CEST)
Message-ID: <516949C7.4030305@platinum.linux.pl>
Date: Sat, 13 Apr 2013 14:04:23 +0200
From: Adam Nowacki <nowakpl@platinum.linux.pl>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130328 Thunderbird/17.0.5
MIME-Version: 1.0
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: ZFS slow reads for unallocated blocks
References: <5166EA43.7050700@platinum.linux.pl>
 <5167B1C5.8020402@FreeBSD.org> <51689A2C.4080402@platinum.linux.pl>
 <5169324A.3080309@FreeBSD.org>
In-Reply-To: <5169324A.3080309@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 Apr 2013 12:04:47 -0000

Temporary dbufs are created for each missing (unallocated on disk) 
record, including indirects if the hole is large enough. Those dbufs 
never find way to ARC and are freed at the end of dmu_read_uio.

A small read (from a hole) would in the best case bzero 128KiB 
(recordsize, more if missing indirects) ... and I'm running modified ZFS 
with record sizes up to 8MiB.

# zfs create -o atime=off -o recordsize=8M -o compression=off -o 
mountpoint=/home/testfs home/testfs
# truncate -s 8m /home/testfs/trunc8m
# dd if=/dev/zero of=/home/testfs/zero8m bs=8m count=1
1+0 records in
1+0 records out
8388608 bytes transferred in 0.010193 secs (822987745 bytes/sec)

# time cat /home/testfs/trunc8m > /dev/null
0.000u 6.111s 0:06.11 100.0%    15+2753k 0+0io 0pf+0w

# time cat /home/testfs/zero8m > /dev/null
0.000u 0.010s 0:00.01 100.0%    12+2168k 0+0io 0pf+0w

600x increase in system time and close to 1MB/s - insanity.

The fix - a lot of the code to efficiently handle this was already there.

dbuf_hold_impl has int fail_sparse argument to return ENOENT for holes. 
Just had to get there and somehow back to dmu_read_uio where zeroing can 
happen at byte granularity.

... didn't have time to actually test it yet.

On 2013-04-13 12:24, Andriy Gapon wrote:
> on 13/04/2013 02:35 Adam Nowacki said the following:
>> http://tepeserwery.pl/nowak/freebsd/zfs_sparse_optimization.patch.txt
>>
>> Does it look sane?
>
> It's hard to tell from a quick look since they change is not small.
> What is your idea of the problem and the fix?
>
>> On 2013-04-12 09:03, Andriy Gapon wrote:
>>>
>>> ENOTIME to really investigate, but here is a basic profile result for those
>>> interested:
>>>                 kernel`bzero+0xa
>>>                 kernel`dmu_buf_hold_array_by_dnode+0x1cf
>>>                 kernel`dmu_read_uio+0x66
>>>                 kernel`zfs_freebsd_read+0x3c0
>>>                 kernel`VOP_READ_APV+0x92
>>>                 kernel`vn_read+0x1a3
>>>                 kernel`vn_io_fault+0x23a
>>>                 kernel`dofileread+0x7b
>>>                 kernel`sys_read+0x9e
>>>                 kernel`amd64_syscall+0x238
>>>                 kernel`0xffffffff80747e4b
>>>
>>> That's where > 99% of time is spent.
>>>
>>
>
>