From owner-freebsd-current@freebsd.org Wed May 13 14:43:03 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id E80AB2F81D8 for ; Wed, 13 May 2020 14:43:03 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qv1-xf2f.google.com (mail-qv1-xf2f.google.com [IPv6:2607:f8b0:4864:20::f2f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49Mcpv5ssnz480G; Wed, 13 May 2020 14:43:03 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qv1-xf2f.google.com with SMTP id r3so7704qvm.1; Wed, 13 May 2020 07:43:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=8jlDi2hFH758/jt7B8L2DAFs3gBPnU+r0mUjy3pOOjY=; b=H75wRirjZbEmWnL9tfzcudg5MWDTAJg7iXzqA1yBSXLJ3YJmYuRmLL5nzSGhHmDNp3 WU+KFgVdhzN1eJPoj+kV2+t06zKrXPvZAf+Uv28rEfmT486+6D34CYTUJDYyCukHOGYf N0vH4vGqKKFhKGj9dyJkaCbiQXPIGO7/Ni1bWL4CbNA3833YEb+p+4XCV+INjpcMQFsL 7Tzip/nbKoQ9UOUw47TwD04OD/SbaaYBMnnxVBCN9NqbeucyI0n56mbq2eonUJBJNVRV tfHYpxRGlv3Hs7IKffaneRN2AsDJpyH3n9G8nmjfv9eNPGZFqJ9A3Ozkm+WSCOgm78dk Ucdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=8jlDi2hFH758/jt7B8L2DAFs3gBPnU+r0mUjy3pOOjY=; b=V+aYT/zwmMxZo+UJYCI0tOdmlA5mCtia/0FqihxGRhr2SsYVxBPk5Iew41tX+85luH VvDFYPE8UTX31PiRFYdMM+4fTvtMVZSHj3LTMG5XUmsiQ0y6TKkpL/hfieCgqAAERtbl d26FYOMlzhgEFzKZPARf+zvQps0x0CKVP3yX0+4abna/CjlMi0igomSRHAf23BoyJKhy RUn6C6A675Rety33/VpeoQJQWKRwrD1M94jIVBUlcR7sP7wrfufwy8ahmtPCcd2YFwD1 pOvfSpPn5yiWZ0n63LDgM5thVdjnAan+iT3RFvjas5CDxiVopAspAxIh+oPUMOuceub7 xPCQ== X-Gm-Message-State: AOAM530Ke9hyR21xd1YvX1f5++CqVJe5cjpkz8TQFDy0J+GuFbFKUOg3 0FJfOaRtztONQFtqIADEOMjj5jNP X-Google-Smtp-Source: ABdhPJwIaOIg+yGtGca9v7wjgbHVubpNR13f6qLSwxJf2C6ohxMdA+RQIDSaVc4a1iMrPmz2A1q2Ig== X-Received: by 2002:a05:6214:1427:: with SMTP id o7mr7180qvx.104.1589380982530; Wed, 13 May 2020 07:43:02 -0700 (PDT) Received: from raichu (toroon0560w-lp130-15-184-144-87-103.dsl.bell.ca. [184.144.87.103]) by smtp.gmail.com with ESMTPSA id n20sm13913469qkk.53.2020.05.13.07.43.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 May 2020 07:43:01 -0700 (PDT) Sender: Mark Johnston Date: Wed, 13 May 2020 10:42:57 -0400 From: Mark Johnston To: Andriy Gapon Cc: Bryan Drewery , freebsd-current@freebsd.org Subject: Re: zfs deadlock on r360452 relating to busy vm page Message-ID: <20200513144257.GA24239@raichu> References: <2bdc8563-283b-32cc-8a1a-85ff52aca99e@FreeBSD.org> <0e9cceba-84d0-ec4f-8784-36703452201d@FreeBSD.org> <889cb93b-85c7-3ec4-4ccf-5fb56ec38fa5@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <889cb93b-85c7-3ec4-4ccf-5fb56ec38fa5@FreeBSD.org> X-Rspamd-Queue-Id: 49Mcpv5ssnz480G X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-6.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 May 2020 14:43:04 -0000 On Wed, May 13, 2020 at 10:45:24AM +0300, Andriy Gapon wrote: > On 13/05/2020 10:35, Andriy Gapon wrote: > > On 13/05/2020 01:47, Bryan Drewery wrote: > >> Trivial repro: > >> > >> dd if=/dev/zero of=blah & tail -F blah > >> ^C > >> load: 0.21 cmd: tail 72381 [prev->lr_read_cv] 2.17r 0.00u 0.01s 0% 2600k > >> #0 0xffffffff80bce615 at mi_switch+0x155 > >> #1 0xffffffff80c1cfea at sleepq_switch+0x11a > >> #2 0xffffffff80b57f0a at _cv_wait+0x15a > >> #3 0xffffffff829ddab6 at rangelock_enter+0x306 > >> #4 0xffffffff829ecd3f at zfs_freebsd_getpages+0x14f > >> #5 0xffffffff810e3ab9 at VOP_GETPAGES_APV+0x59 > >> #6 0xffffffff80f349e7 at vnode_pager_getpages+0x37 > >> #7 0xffffffff80f2a93f at vm_pager_get_pages+0x4f > >> #8 0xffffffff80f054b0 at vm_fault+0x780 > >> #9 0xffffffff80f04bde at vm_fault_trap+0x6e > >> #10 0xffffffff8106544e at trap_pfault+0x1ee > >> #11 0xffffffff81064a9c at trap+0x44c > >> #12 0xffffffff8103a978 at calltrap+0x8 > > > > In r329363 I re-worked zfs_getpages and introduced range locking to it. > > At the time I believed that it was safe and maybe it was, please see the commit > > message. > > There, indeed, have been many performance / concurrency improvements to the VM > > system and r358443 is one of them. > > Thinking more about it, it could be r352176. > I think that vm_page_grab_valid (and later vm_page_grab_valid_unlocked) are not > equivalent to the code that they replaced. > The original code would check valid field before any locking and it would > attempt any locking / busing if a page is invalid. The object was required to > be locked though. > The new code tries to busy the page in any case. > > > I am not sure how to resolve the problem best. Maybe someone who knows the > > latest VM code better than me can comment on my assumptions stated in the commit > > message. The general trend has been to use the page busy lock as the single point of synchronization for per-page state. As you noted, updates to the valid bits were previously interlocked by the object lock, but this is coarse-grained and hurts concurrency. I think you are right that the range locking in getpages was ok before the recent change, but it seems preferable to try and address this in ZFS. > > In illumos (and, I think, in OpenZFS/ZoL) they don't have the range locking in > > this corner of the code because of a similar deadlock a long time ago. Do they just not implement readahead? Can you explain exactly what the range lock accomplishes here - is it entirely to ensure that znode block size remains stable?