Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 17 Feb 2018 18:35:45 -0800
From:      Gleb Smirnoff <glebius@FreeBSD.org>
To:        Andriy Gapon <avg@FreeBSD.org>
Cc:        Andrew Reilly <areilly@bigpond.net.au>, kib@freebsd.org, current@freebsd.org
Subject:   Re: Since last week (today) current on my Ryzen box is unstable
Message-ID:  <20180218023545.GE93303@FreeBSD.org>
In-Reply-To: <cc3ae685-5f0e-d968-7b08-60a4836093e1@FreeBSD.org>
References:  <0CEA9D55-D488-42EC-BBDE-D0B7CE58BAEA@bigpond.net.au> <cc3ae685-5f0e-d968-7b08-60a4836093e1@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
  Andriy,

On Sun, Feb 18, 2018 at 12:54:21AM +0200, Andriy Gapon wrote:
A> > Today's rebuild has given me uptimes of below an hour, usually.  The box will stay up in single user mode long enough to rebuild world/kernel, but multi-user it is panicking at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
A> > 
A> > The backtrace shows that it gets to this panic from a sendfile() syscall.  The line above is in the middle of a big edit that's part of svn revision 329363.  The tripping assertion seems to suggest that m->valid != 0, for whatever that's worth.
A> 
A> I am doing a bit of an offline investigation with Andrew and it seems that the
A> actual panic message is this:
A> 
A> panic: vm_page_assert_xbusied: page 0xfffff807ebbd8f98 not exclusive busy @
A> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
A> 
A> The stack is this:
A> vpanic() at vpanic/frame 0xfffffe00b3c36390
A> dmu_read_pages() at dmu_read_pages+0x535/frame 0xfffffe00b3c36460
A> zfs_freebsd_getpages() at zfs_freebsd_getpages+0x24c/frame 0xfffffe00b3c36510
A> VOP_GETPAGES_APV() at VOP_GETPAGES_APV+0xd9/frame 0xfffffe00b3c36540
A> vop_stdgetpages_async() at vop_stdgetpages_async+0x49/frame 0xfffffe00b3c36590
A> VOP_GETPAGES_ASYNC_APV() at VOP_GETPAGES_ASYNC_APV+0xd9/frame 0xfffffe00b3c365c0
A> vnode_pager_getpages_async() at vnode_pager_getpages_async+0x81/frame
A> 0xfffffe00b3c36650
A> vn_sendfile() at vn_sendfile+0xe70/frame 0xfffffe00b3c368e0
A> sendfile() at sendfile+0x149/frame 0xfffffe00b3c36980
A> amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe00b3c36ab0
A> fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffffffdb00
A> 
A> I looked at sendfile_swapin() code and it seems that it uses the pager API in an
A> undocumented way.  Specifically, it inserts bogus_page into the array of
A> requested pages.  For starters, bogus_page is not busied and VOP_GETPAGES is
A> documented to have all requested pages exclusively busied.  Second, I always had
A> an impression that bogus_page is an implementation detail of the unified buffer
A> / page cache and that other code need not be aware of it.
A> 
A> So, my opinion is that the sendfile code uses a "clever hack" that happens to
A> work with the buffer cache based filesystems, but that that hack is a bug.
A> So, I'd prefer that the problem is fixed in that code.
A> But I am open to being convinced that all VOP_GETPAGES implementations,
A> including that in ZFS, must be made aware of bogus_page.  Or, at least, that
A> they should not verify that the requested pages are busied.

This is optimization that improves throughput when file memory cache is
fragmented. Why don't you like adding the code to zfs_freebsd_getpages()?

-- 
Gleb Smirnoff



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180218023545.GE93303>