Date: Wed, 28 Nov 2018 19:19:04 -0500 From: Mark Johnston <markj@freebsd.org> To: Garrett Wollman <wollman@bimajority.org> Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@freebsd.org Subject: Re: Trap 12 in vm_page_alloc_after() Message-ID: <20181129001904.GA63393@raichu> In-Reply-To: <23547.30738.149260.454185@hergotha.csail.mit.edu> References: <23538.4310.710700.401331@hergotha.csail.mit.edu> <20181119050944.GW2378@kib.kiev.ua> <23547.30738.149260.454185@hergotha.csail.mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Nov 25, 2018 at 11:35:30PM -0500, Garrett Wollman wrote: > <<On Mon, 19 Nov 2018 07:09:44 +0200, Konstantin Belousov <kostikbel@gmail.com> said: > > > On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote: > >> Has anyone seen this before? It's on a busy NFS server, but hasn't > >> been observed on any of our other NFS servers. > >> > >> ------------------------------------------------------------------------ > >> Fatal trap 12: page fault while in kernel mode > > >> --- trap 0xc, rip = 0xffffffff809a903d, rsp = 0xfffffe17eb8d0710, rbp = 0xfffffe17eb8d0750 --- > >> vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfffffe17eb8d0750 > > > What is the line number for vm_page_alloc_after+0x15d ? > > Do you have NUMA enabled on 11 ? > > If gdb is to be believed, the trap is at line 1687: > > /* > * At this point we had better have found a good page. > */ > KASSERT(m != NULL, ("missing page")); > free_count = vm_phys_freecnt_adj(m, -1); > >>>>>> if ((m->flags & PG_ZERO) != 0) > vm_page_zero_count--; > mtx_unlock(&vm_page_queue_free_mtx); > vm_page_alloc_check(m); > > The faulting instruction is: > > 0xffffffff809a903d <vm_page_alloc_after+349>: testb $0x8,0x5a(%r14) > > There are no options matching /numa/i in the configuration. (This is > a non-debugging configuration so the KASSERT is inoperative, I > assume.) I have about a dozen other servers with the same kernel and > they're not crashing, but obviously they all have different loads and > sets of active clients. If you're using a Skylake, I suspect that you can set the hw.skz63_enable tunable to 0 as a workaround, assuming you're not using any code that relies on Intel TSX. (I don't think there's anything in the base system that does.) There are some details in https://reviews.freebsd.org/D18374
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20181129001904.GA63393>