From owner-freebsd-stable@FreeBSD.ORG Mon Nov 18 09:39:00 2013 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0D2BEE78 for ; Mon, 18 Nov 2013 09:39:00 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 3CD672B09 for ; Mon, 18 Nov 2013 09:38:58 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA10085; Mon, 18 Nov 2013 11:38:41 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ViLI1-000Fuv-C1; Mon, 18 Nov 2013 11:38:41 +0200 Message-ID: <5289DFFD.7080607@FreeBSD.org> Date: Mon, 18 Nov 2013 11:38:05 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Jan Mikkelsen Subject: Re: Help with filing a [maybe] ZFS/mmap bug. References: <20967.760.95825.310085@gargle.gargle.HOWL><51E80B30.1090004@FreeBSD.org><20968.10645.880772.30501@gargle.gargle.HOWL><520202E5.30300@FreeBSD.org><20994.55913.93606.436124@gargle.gargle.HOWL> <21111.12085.958991.356982@gargle.gargle.HOWL> <4EB902F80CE84DD2BF36C85EF4CE8EF8@multiplay.co.uk> <5284B8A5.8040604@FreeBSD.org> <52889105.7040404@FreeBSD.org> <4B5798F5-269A-4E71-9799-E1B4E0C1545F@transactionware.com> In-Reply-To: <4B5798F5-269A-4E71-9799-E1B4E0C1545F@transactionware.com> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Nov 2013 09:39:00 -0000 on 18/11/2013 11:24 Jan Mikkelsen said the following: > > On 17 Nov 2013, at 8:48 pm, Andriy Gapon wrote: > >> on 14/11/2013 13:48 Andriy Gapon said the following: >>> HOWEVER. I think that there is a bug that I introduced in r246293. >>> Specifically I changed >>> vm_page_undirty(pp); >>> to >>> pmap_remove_write(pp); >>> vm_page_clear_dirty(pp, off, nbytes); >>> >>> vm_page_undirty() would be a very serious (and probably obvious) bug, if it were >>> not a NOP in effect. The details are explained in the commit message. >>> But when I used vm_page_clear_dirty I completely missed the fact that *extends* >>> the range to DEV_BSIZE aligned boundaries[*]. So, given the described behavior >>> and that pmap_remove_write clears the page modified bit, it is possible that the >>> data dirty data in the extended areas will be marked as clean. >> >> Here is a patch (for head) that should fix the described above issue: >> >> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c >> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c >> index 2e2cbd6..4fcd571 100644 >> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c >> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c >> @@ -328,6 +328,20 @@ page_busy(vnode_t *vp, int64_t start, int64_t off, int64_t >> nbytes) >> { >> vm_object_t obj; >> vm_page_t pp; >> + int64_t end; >> + >> + /* >> + * At present vm_page_clear_dirty extends the cleared range to DEV_BSIZE >> + * aligned boundaries, if the range is not aligned. As a result a >> + * DEV_BSIZE subrange with partially dirty data may get marked as clean. >> + * It may happen that all DEV_BSIZE subranges are marked clean and thus >> + * the whole page would be considred clean despite have some dirty data. >> + * For this reason we should shrink the range to DEV_BSIZE aligned >> + * boundaries before calling vm_page_clear_dirty. >> + */ >> + end = rounddown2(off + nbytes, DEV_BSIZE); >> + off = roundup2(off, DEV_BSIZE); >> + nbytes = end - off; >> >> obj = vp->v_object; >> zfs_vmobject_assert_wlocked(obj); >> @@ -362,7 +376,8 @@ page_busy(vnode_t *vp, int64_t start, int64_t off, int64_t >> nbytes) >> ASSERT3U(pp->valid, ==, VM_PAGE_BITS_ALL); >> vm_object_pip_add(obj, 1); >> pmap_remove_write(pp); >> - vm_page_clear_dirty(pp, off, nbytes); >> + if (nbytes != 0) >> + vm_page_clear_dirty(pp, off, nbytes); >> } >> break; >> } > > > 9.2 does not seem to have a rounddown2() macro. Thanks for the heads-up! You could use a plain rounddown() or just 'x & ~(DEV_BSIZE - 1)'. -- Andriy Gapon