From owner-freebsd-fs@FreeBSD.ORG Sat Apr 13 17:11:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8380445C for ; Sat, 13 Apr 2013 17:11:31 +0000 (UTC) (envelope-from will@firepipe.net) Received: from mail-ia0-x236.google.com (mail-ia0-x236.google.com [IPv6:2607:f8b0:4001:c02::236]) by mx1.freebsd.org (Postfix) with ESMTP id 563A6C23 for ; Sat, 13 Apr 2013 17:11:31 +0000 (UTC) Received: by mail-ia0-f182.google.com with SMTP id u20so3232919iag.41 for ; Sat, 13 Apr 2013 10:11:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:x-gm-message-state; bh=zOYGt5sb3dEgXyfbgQb3tD7tMu1qxZ2GnLBNo/MzzZk=; b=Od9gee+KISAEEKkupt50uJhm6xGIiKr/wi3L4Pg50MfSqXlFHNWoiuNja8BfU31FYy 874tSxa/dgnQsqyQ+/uoBaasiLgoNpzwYwciVLUUXaj7BmVQEiKvv5kD8kJ+C+SkozJP NxDH7JzNeCVNlGXo5LsZWSBKu18+w/fPZ5QgAZ6/rD+ZUSPKm5HYq0iSI48ifRx0ktM6 0+HT/ubcMpQqkQw9avOH9FhKjHepF1Es5ySHCUeWwFhcD9AXCqrMMNoH/SBM8eg7b+qr 8k+ikXBaw2NpUxq9OTF6T0OS+mM0iWOIHRAk9ZeyWLI+AxgOrj4nFttoSjfSgrcx0DLW o6SQ== MIME-Version: 1.0 X-Received: by 10.42.155.66 with SMTP id t2mr6972683icw.10.1365873091004; Sat, 13 Apr 2013 10:11:31 -0700 (PDT) Received: by 10.231.211.133 with HTTP; Sat, 13 Apr 2013 10:11:30 -0700 (PDT) In-Reply-To: <516949C7.4030305@platinum.linux.pl> References: <5166EA43.7050700@platinum.linux.pl> <5167B1C5.8020402@FreeBSD.org> <51689A2C.4080402@platinum.linux.pl> <5169324A.3080309@FreeBSD.org> <516949C7.4030305@platinum.linux.pl> Date: Sat, 13 Apr 2013 11:11:30 -0600 Message-ID: Subject: Re: ZFS slow reads for unallocated blocks From: Will Andrews To: Adam Nowacki X-Gm-Message-State: ALoCoQkDBoCVcVKKzRRbj+txE6t7BH+V+4Hp8lTfFr0WG9Qp9rsX9cBl+DliPBXuDB/WUeKrBcbw Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "freebsd-fs@freebsd.org" , Andriy Gapon X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Apr 2013 17:11:31 -0000 Hi, I think the idea of using a pre-zeroed region as the 'source' is a good one, but probably it would be better to set a special flag on a hole dbuf than to require caller flags. That way, ZFS can lazily evaluate the hole dbuf (i.e. avoid zeroing db_data until it has to). However, that could be complicated by the fact that there are many potential users of hole dbufs that would want to write to the dbuf. This sort of optimization should be brought to the illumos zfs list. As it stands, your patch is also FreeBSD-specific, since 'zero_region' only exists in vm/vm_kern.c. Given the frequency of zero-copying, however, it's quite possible there are other versions of this region elsewhere. --Will. On Sat, Apr 13, 2013 at 6:04 AM, Adam Nowacki wrote: > Temporary dbufs are created for each missing (unallocated on disk) record, > including indirects if the hole is large enough. Those dbufs never find way > to ARC and are freed at the end of dmu_read_uio. > > A small read (from a hole) would in the best case bzero 128KiB > (recordsize, more if missing indirects) ... and I'm running modified ZFS > with record sizes up to 8MiB. > > # zfs create -o atime=off -o recordsize=8M -o compression=off -o > mountpoint=/home/testfs home/testfs > # truncate -s 8m /home/testfs/trunc8m > # dd if=/dev/zero of=/home/testfs/zero8m bs=8m count=1 > 1+0 records in > 1+0 records out > 8388608 bytes transferred in 0.010193 secs (822987745 bytes/sec) > > # time cat /home/testfs/trunc8m > /dev/null > 0.000u 6.111s 0:06.11 100.0% 15+2753k 0+0io 0pf+0w > > # time cat /home/testfs/zero8m > /dev/null > 0.000u 0.010s 0:00.01 100.0% 12+2168k 0+0io 0pf+0w > > 600x increase in system time and close to 1MB/s - insanity. > > The fix - a lot of the code to efficiently handle this was already there. > > dbuf_hold_impl has int fail_sparse argument to return ENOENT for holes. > Just had to get there and somehow back to dmu_read_uio where zeroing can > happen at byte granularity. > > ... didn't have time to actually test it yet. > > > On 2013-04-13 12:24, Andriy Gapon wrote: > >> on 13/04/2013 02:35 Adam Nowacki said the following: >> >>> http://tepeserwery.pl/nowak/**freebsd/zfs_sparse_** >>> optimization.patch.txt >>> >>> Does it look sane? >>> >> >> It's hard to tell from a quick look since they change is not small. >> What is your idea of the problem and the fix? >> >> On 2013-04-12 09:03, Andriy Gapon wrote: >>> >>>> >>>> ENOTIME to really investigate, but here is a basic profile result for >>>> those >>>> interested: >>>> kernel`bzero+0xa >>>> kernel`dmu_buf_hold_array_by_**dnode+0x1cf >>>> kernel`dmu_read_uio+0x66 >>>> kernel`zfs_freebsd_read+0x3c0 >>>> kernel`VOP_READ_APV+0x92 >>>> kernel`vn_read+0x1a3 >>>> kernel`vn_io_fault+0x23a >>>> kernel`dofileread+0x7b >>>> kernel`sys_read+0x9e >>>> kernel`amd64_syscall+0x238 >>>> kernel`0xffffffff80747e4b >>>> >>>> That's where > 99% of time is spent. >>>> >>>> >>> >> >> > ______________________________**_________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/**mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@**freebsd.org > " >