Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Sep 2018 17:20:34 -0400
From:      Mark Johnston <markj@freebsd.org>
To:        Steve Kargl <sgk@troutmask.apl.washington.edu>
Cc:        freebsd-current@freebsd.org
Subject:   Re: ALPHA4 panic in VM
Message-ID:  <20180919212034.GD99168@raichu>
In-Reply-To: <20180919211156.GA1677@troutmask.apl.washington.edu>
References:  <20180919200152.GA1164@troutmask.apl.washington.edu> <20180919210211.GC99168@raichu> <20180919211156.GA1677@troutmask.apl.washington.edu>

index | next in thread | previous in thread | raw e-mail

On Wed, Sep 19, 2018 at 02:11:56PM -0700, Steve Kargl wrote:
> On Wed, Sep 19, 2018 at 05:02:11PM -0400, Mark Johnston wrote:
> > On Wed, Sep 19, 2018 at 01:01:52PM -0700, Steve Kargl wrote:
> > > I have the kernel and core file if more information is needed.
> > > 
> > > % cat info.2
> > > Dump header from device: /dev/ada0p3
> >    Architecture: amd64
> > >   Architecture Version: 2
> > >   Dump Length: 2348281856
> > >   Blocksize: 512
> > >   Compression: none
> > >   Dumptime: Wed Sep 19 12:29:59 2018
> > >   Hostname: troutmask.apl.washington.edu
> > >   Magic: FreeBSD Kernel Dump
> > >   Version String: FreeBSD 12.0-ALPHA4 #0 r338505: Thu Sep  6 13:45:34 PDT 2018
> > >     kargl@troutmask.apl.washington.edu:/usr/obj/usr/src/amd64.amd64/sys/SPEW
> > >   Panic String: page fault
> > >   Dump Parity: 2676008548
> > >   Bounds: 2
> > >   Dump Status: good
> > > 
> > > % more core.txt.2
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 1; apic id = 11
> > > fault virtual address   = 0xffffb8000719a428
> > 
> > This seems to be the result of a bit-flip.  cred is 0xffffb8000719a400,
> > which is almost but not quite in the direct map.  In particular we have:
> > 
> > (kgdb) frame 10                                                                                                                 
> > #10 0xffffffff8083e07d in vm_object_destroy (object=<optimized out>) at /usr/src/sys/vm/vm_object.c:703            
> > 703                     swap_release_by_cred(object->charge, object->cred);                     
> > (kgdb) p object            
> > $8 = <optimized out>                                                                                                    
> > (kgdb) p *(vm_object_t)$r13                                                                            
> > $9 = {
> > ...
> >   cred = 0xffffb8000719a400,
> >   charge = 28672,
> >   umtx_data = 0x0
> > }
> > (kgdb) p *(struct ucred *)0xfffff8000719a400
> > $10 = {
> >   cr_ref = 5737, 
> >   cr_uid = 1001, 
> >   cr_ruid = 1001, 
> >   cr_svuid = 1001, 
> >   cr_ngroups = 7, 
> >   cr_rgid = 1001, 
> >   cr_svgid = 1001, 
> >   cr_uidinfo = 0xfffff80007285500, 
> >   cr_ruidinfo = 0xfffff80007285500, 
> >   cr_prison = 0xffffffff80a9de10 <prison0>, 
> > ... <more sane-looking ucred fields>
> > 
> > That is, flipping one of the bits in the fault address leads me to a
> > valid ucred.  This could in principle be the result of a software bug,
> > but I'd be more inclined to suspect the hardware.
> 
> Mark,
> 
> Thanks for looking into the problem.  This system has
> been running for probably 2 years or so without issues.
> I guess it's time to pull out memtest86+ (or similar)
> to see if hardware is starting to fail.

I'm not sure whether you're using ECC RAM, but if not, the system is
susceptible to silent random bit flips.


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180919212034.GD99168>