From owner-freebsd-fs@FreeBSD.ORG Sat Apr 13 12:04:47 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6B3C1C31; Sat, 13 Apr 2013 12:04:47 +0000 (UTC) (envelope-from nowakpl@platinum.linux.pl) Received: from platinum.linux.pl (platinum.edu.pl [81.161.192.4]) by mx1.freebsd.org (Postfix) with ESMTP id 312B3D3; Sat, 13 Apr 2013 12:04:47 +0000 (UTC) Received: by platinum.linux.pl (Postfix, from userid 87) id 2721A47E24; Sat, 13 Apr 2013 14:04:44 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on platinum.linux.pl X-Spam-Level: X-Spam-Status: No, score=-1.3 required=3.0 tests=ALL_TRUSTED,AWL autolearn=disabled version=3.3.2 Received: from [10.255.1.2] (unknown [83.151.38.73]) by platinum.linux.pl (Postfix) with ESMTPA id CBC6347E21; Sat, 13 Apr 2013 14:04:44 +0200 (CEST) Message-ID: <516949C7.4030305@platinum.linux.pl> Date: Sat, 13 Apr 2013 14:04:23 +0200 From: Adam Nowacki User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: Andriy Gapon Subject: Re: ZFS slow reads for unallocated blocks References: <5166EA43.7050700@platinum.linux.pl> <5167B1C5.8020402@FreeBSD.org> <51689A2C.4080402@platinum.linux.pl> <5169324A.3080309@FreeBSD.org> In-Reply-To: <5169324A.3080309@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Apr 2013 12:04:47 -0000 Temporary dbufs are created for each missing (unallocated on disk) record, including indirects if the hole is large enough. Those dbufs never find way to ARC and are freed at the end of dmu_read_uio. A small read (from a hole) would in the best case bzero 128KiB (recordsize, more if missing indirects) ... and I'm running modified ZFS with record sizes up to 8MiB. # zfs create -o atime=off -o recordsize=8M -o compression=off -o mountpoint=/home/testfs home/testfs # truncate -s 8m /home/testfs/trunc8m # dd if=/dev/zero of=/home/testfs/zero8m bs=8m count=1 1+0 records in 1+0 records out 8388608 bytes transferred in 0.010193 secs (822987745 bytes/sec) # time cat /home/testfs/trunc8m > /dev/null 0.000u 6.111s 0:06.11 100.0% 15+2753k 0+0io 0pf+0w # time cat /home/testfs/zero8m > /dev/null 0.000u 0.010s 0:00.01 100.0% 12+2168k 0+0io 0pf+0w 600x increase in system time and close to 1MB/s - insanity. The fix - a lot of the code to efficiently handle this was already there. dbuf_hold_impl has int fail_sparse argument to return ENOENT for holes. Just had to get there and somehow back to dmu_read_uio where zeroing can happen at byte granularity. ... didn't have time to actually test it yet. On 2013-04-13 12:24, Andriy Gapon wrote: > on 13/04/2013 02:35 Adam Nowacki said the following: >> http://tepeserwery.pl/nowak/freebsd/zfs_sparse_optimization.patch.txt >> >> Does it look sane? > > It's hard to tell from a quick look since they change is not small. > What is your idea of the problem and the fix? > >> On 2013-04-12 09:03, Andriy Gapon wrote: >>> >>> ENOTIME to really investigate, but here is a basic profile result for those >>> interested: >>> kernel`bzero+0xa >>> kernel`dmu_buf_hold_array_by_dnode+0x1cf >>> kernel`dmu_read_uio+0x66 >>> kernel`zfs_freebsd_read+0x3c0 >>> kernel`VOP_READ_APV+0x92 >>> kernel`vn_read+0x1a3 >>> kernel`vn_io_fault+0x23a >>> kernel`dofileread+0x7b >>> kernel`sys_read+0x9e >>> kernel`amd64_syscall+0x238 >>> kernel`0xffffffff80747e4b >>> >>> That's where > 99% of time is spent. >>> >> > >