From owner-freebsd-current@freebsd.org Sat Feb 17 22:54:33 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D8C27F185FC for ; Sat, 17 Feb 2018 22:54:32 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 72156718DF for ; Sat, 17 Feb 2018 22:54:32 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 31EC6F185FB; Sat, 17 Feb 2018 22:54:32 +0000 (UTC) Delivered-To: current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0D234F185FA for ; Sat, 17 Feb 2018 22:54:32 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-lf0-f49.google.com (mail-lf0-f49.google.com [209.85.215.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7125D718DB; Sat, 17 Feb 2018 22:54:31 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-lf0-f49.google.com with SMTP id f137so8518412lfe.4; Sat, 17 Feb 2018 14:54:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=CPTDu4Tj1fwEjD9oFgG3ZD+nLLXSLBZT26qAdyeK+NI=; b=WuGKSuqIeRii9n8OMHUzXR/1J5pp6ranIO6WV33cqETwggWkWzl1mBwQSPMJx4s3uD pz3GmILp7G7ucSRhFoqRvpCzGNdtfmzUZDZrS9C4A/s5E6QhpeBd6kpTFlQEiGU7jSKc NUuBt9q/FtvP2DyQ+NkpyhH57VCLq2z67QR2JLioj2oA44US9dp3bxL6UsSnS2Gz7wTJ YPTLY1PK7AVVYeguNoJZWp8Fxge6vxetpCJrORwITMQYXjwgAk9xwmVJcezrrS91YjGZ cJFlcGHaU52tbg+WjsqZclqxT+GgXCPRfSysf8sr+FXWYBbjs6lQERTRqRD/OTD2EdDm RtTA== X-Gm-Message-State: APf1xPCmikcyJIBw+7kgSATxLiYg2pBu1YdKc34y8fP6wN3nLKjXnpTp Dv+WVL0C77TAExbPD9G+vHIUSbkw X-Google-Smtp-Source: AH8x227JeRWLRoalUjnFR79CLI9r0jMccdVulL8u99ext9RJ0UDVjCv79TjJYs8r5J0gj4JQJOYnPg== X-Received: by 10.25.147.219 with SMTP id w88mr7059140lfk.58.1518908063766; Sat, 17 Feb 2018 14:54:23 -0800 (PST) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id b13sm4327549lfb.27.2018.02.17.14.54.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 17 Feb 2018 14:54:23 -0800 (PST) Subject: Re: Since last week (today) current on my Ryzen box is unstable To: Andrew Reilly , kib@freebsd.org, Gleb Smirnoff Cc: current@freebsd.org References: <0CEA9D55-D488-42EC-BBDE-D0B7CE58BAEA@bigpond.net.au> From: Andriy Gapon Message-ID: Date: Sun, 18 Feb 2018 00:54:21 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <0CEA9D55-D488-42EC-BBDE-D0B7CE58BAEA@bigpond.net.au> Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Feb 2018 22:54:33 -0000 On 17/02/2018 14:16, Andrew Reilly wrote: > Today's rebuild has given me uptimes of below an hour, usually. The box will stay up in single user mode long enough to rebuild world/kernel, but multi-user it is panicking at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592 > > The backtrace shows that it gets to this panic from a sendfile() syscall. The line above is in the middle of a big edit that's part of svn revision 329363. The tripping assertion seems to suggest that m->valid != 0, for whatever that's worth. I am doing a bit of an offline investigation with Andrew and it seems that the actual panic message is this: panic: vm_page_assert_xbusied: page 0xfffff807ebbd8f98 not exclusive busy @ /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592 The stack is this: vpanic() at vpanic/frame 0xfffffe00b3c36390 dmu_read_pages() at dmu_read_pages+0x535/frame 0xfffffe00b3c36460 zfs_freebsd_getpages() at zfs_freebsd_getpages+0x24c/frame 0xfffffe00b3c36510 VOP_GETPAGES_APV() at VOP_GETPAGES_APV+0xd9/frame 0xfffffe00b3c36540 vop_stdgetpages_async() at vop_stdgetpages_async+0x49/frame 0xfffffe00b3c36590 VOP_GETPAGES_ASYNC_APV() at VOP_GETPAGES_ASYNC_APV+0xd9/frame 0xfffffe00b3c365c0 vnode_pager_getpages_async() at vnode_pager_getpages_async+0x81/frame 0xfffffe00b3c36650 vn_sendfile() at vn_sendfile+0xe70/frame 0xfffffe00b3c368e0 sendfile() at sendfile+0x149/frame 0xfffffe00b3c36980 amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe00b3c36ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffffffdb00 I looked at sendfile_swapin() code and it seems that it uses the pager API in an undocumented way. Specifically, it inserts bogus_page into the array of requested pages. For starters, bogus_page is not busied and VOP_GETPAGES is documented to have all requested pages exclusively busied. Second, I always had an impression that bogus_page is an implementation detail of the unified buffer / page cache and that other code need not be aware of it. So, my opinion is that the sendfile code uses a "clever hack" that happens to work with the buffer cache based filesystems, but that that hack is a bug. So, I'd prefer that the problem is fixed in that code. But I am open to being convinced that all VOP_GETPAGES implementations, including that in ZFS, must be made aware of bogus_page. Or, at least, that they should not verify that the requested pages are busied. -- Andriy Gapon