From owner-freebsd-current@freebsd.org Sun Feb 18 02:54:00 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DDDAAF077A4 for ; Sun, 18 Feb 2018 02:53:59 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 7ABA87DA04 for ; Sun, 18 Feb 2018 02:53:59 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: by mailman.ysv.freebsd.org (Postfix) id 3ED09F0779F; Sun, 18 Feb 2018 02:53:59 +0000 (UTC) Delivered-To: current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 02B15F0779C for ; Sun, 18 Feb 2018 02:53:59 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebi.us (glebi.us [96.95.210.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "cell.glebi.us", Issuer "cell.glebi.us" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 8B48A7D9FE; Sun, 18 Feb 2018 02:53:58 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebi.us (localhost [127.0.0.1]) by cell.glebi.us (8.15.2/8.15.2) with ESMTPS id w1I2Zkmd001485 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 17 Feb 2018 18:35:46 -0800 (PST) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebi.us (8.15.2/8.15.2/Submit) id w1I2ZjCo001484; Sat, 17 Feb 2018 18:35:45 -0800 (PST) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebi.us: glebius set sender to glebius@FreeBSD.org using -f Date: Sat, 17 Feb 2018 18:35:45 -0800 From: Gleb Smirnoff To: Andriy Gapon Cc: Andrew Reilly , kib@freebsd.org, current@freebsd.org Subject: Re: Since last week (today) current on my Ryzen box is unstable Message-ID: <20180218023545.GE93303@FreeBSD.org> References: <0CEA9D55-D488-42EC-BBDE-D0B7CE58BAEA@bigpond.net.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.3 (2018-01-21) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Feb 2018 02:54:00 -0000 Andriy, On Sun, Feb 18, 2018 at 12:54:21AM +0200, Andriy Gapon wrote: A> > Today's rebuild has given me uptimes of below an hour, usually. The box will stay up in single user mode long enough to rebuild world/kernel, but multi-user it is panicking at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592 A> > A> > The backtrace shows that it gets to this panic from a sendfile() syscall. The line above is in the middle of a big edit that's part of svn revision 329363. The tripping assertion seems to suggest that m->valid != 0, for whatever that's worth. A> A> I am doing a bit of an offline investigation with Andrew and it seems that the A> actual panic message is this: A> A> panic: vm_page_assert_xbusied: page 0xfffff807ebbd8f98 not exclusive busy @ A> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592 A> A> The stack is this: A> vpanic() at vpanic/frame 0xfffffe00b3c36390 A> dmu_read_pages() at dmu_read_pages+0x535/frame 0xfffffe00b3c36460 A> zfs_freebsd_getpages() at zfs_freebsd_getpages+0x24c/frame 0xfffffe00b3c36510 A> VOP_GETPAGES_APV() at VOP_GETPAGES_APV+0xd9/frame 0xfffffe00b3c36540 A> vop_stdgetpages_async() at vop_stdgetpages_async+0x49/frame 0xfffffe00b3c36590 A> VOP_GETPAGES_ASYNC_APV() at VOP_GETPAGES_ASYNC_APV+0xd9/frame 0xfffffe00b3c365c0 A> vnode_pager_getpages_async() at vnode_pager_getpages_async+0x81/frame A> 0xfffffe00b3c36650 A> vn_sendfile() at vn_sendfile+0xe70/frame 0xfffffe00b3c368e0 A> sendfile() at sendfile+0x149/frame 0xfffffe00b3c36980 A> amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe00b3c36ab0 A> fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffffffdb00 A> A> I looked at sendfile_swapin() code and it seems that it uses the pager API in an A> undocumented way. Specifically, it inserts bogus_page into the array of A> requested pages. For starters, bogus_page is not busied and VOP_GETPAGES is A> documented to have all requested pages exclusively busied. Second, I always had A> an impression that bogus_page is an implementation detail of the unified buffer A> / page cache and that other code need not be aware of it. A> A> So, my opinion is that the sendfile code uses a "clever hack" that happens to A> work with the buffer cache based filesystems, but that that hack is a bug. A> So, I'd prefer that the problem is fixed in that code. A> But I am open to being convinced that all VOP_GETPAGES implementations, A> including that in ZFS, must be made aware of bogus_page. Or, at least, that A> they should not verify that the requested pages are busied. This is optimization that improves throughput when file memory cache is fragmented. Why don't you like adding the code to zfs_freebsd_getpages()? -- Gleb Smirnoff