Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Nov 2018 19:19:04 -0500
From:      Mark Johnston <markj@freebsd.org>
To:        Garrett Wollman <wollman@bimajority.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@freebsd.org
Subject:   Re: Trap 12 in vm_page_alloc_after()
Message-ID:  <20181129001904.GA63393@raichu>
In-Reply-To: <23547.30738.149260.454185@hergotha.csail.mit.edu>
References:  <23538.4310.710700.401331@hergotha.csail.mit.edu> <20181119050944.GW2378@kib.kiev.ua> <23547.30738.149260.454185@hergotha.csail.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Nov 25, 2018 at 11:35:30PM -0500, Garrett Wollman wrote:
> <<On Mon, 19 Nov 2018 07:09:44 +0200, Konstantin Belousov <kostikbel@gmail.com> said:
> 
> > On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote:
> >> Has anyone seen this before?  It's on a busy NFS server, but hasn't
> >> been observed on any of our other NFS servers.
> >> 
> >> ------------------------------------------------------------------------
> >> Fatal trap 12: page fault while in kernel mode
> 
> >> --- trap 0xc, rip = 0xffffffff809a903d, rsp = 0xfffffe17eb8d0710, rbp = 0xfffffe17eb8d0750 ---
> >> vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfffffe17eb8d0750
> 
> > What is the line number for vm_page_alloc_after+0x15d ?
> > Do you have NUMA enabled on 11 ?
> 
> If gdb is to be believed, the trap is at line 1687:
> 
>         /*
>          *  At this point we had better have found a good page.
>          */
>         KASSERT(m != NULL, ("missing page"));
>         free_count = vm_phys_freecnt_adj(m, -1);
> >>>>>>  if ((m->flags & PG_ZERO) != 0)
>                 vm_page_zero_count--;
>         mtx_unlock(&vm_page_queue_free_mtx);
>         vm_page_alloc_check(m);
> 
> The faulting instruction is:
> 
> 0xffffffff809a903d <vm_page_alloc_after+349>:   testb  $0x8,0x5a(%r14)
> 
> There are no options matching /numa/i in the configuration.  (This is
> a non-debugging configuration so the KASSERT is inoperative, I
> assume.)  I have about a dozen other servers with the same kernel and
> they're not crashing, but obviously they all have different loads and
> sets of active clients.

If you're using a Skylake, I suspect that you can set the
hw.skz63_enable tunable to 0 as a workaround, assuming you're not using
any code that relies on Intel TSX.  (I don't think there's anything in
the base system that does.)  There are some details in
https://reviews.freebsd.org/D18374



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20181129001904.GA63393>