Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 10 Mar 2014 07:10:20 +0900
From:      Sean Bruno <sbruno@ignoranthack.me>
To:        Glen Barber <gjb@FreeBSD.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, freebsd-current@FreeBSD.org
Subject:   Re: panic: vm_fault: fault on nofault entry
Message-ID:  <1394403020.2231.1.camel@powernoodle.corp.yahoo.com>
In-Reply-To: <20140309181657.GI1776@glenbarber.us>
References:  <20140309165648.GF1776@glenbarber.us> <20140309180132.GO24664@kib.kiev.ua>  <20140309181657.GI1776@glenbarber.us>

next in thread | previous in thread | raw e-mail | index | archive | help

--=-X7j2orV71+MWYdQeKomn
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

On Sun, 2014-03-09 at 14:16 -0400, Glen Barber wrote:
> On Sun, Mar 09, 2014 at 08:01:32PM +0200, Konstantin Belousov wrote:
> > On Sun, Mar 09, 2014 at 12:56:48PM -0400, Glen Barber wrote:
> > > We are having regular panics on several machines in the cluster.
> > >=20
> > > Below follows the script from the kgdb(1) session, hopefully providin=
g
> > > enough information.  This machine runs 11.0-CURRENT #2 r262892, from
> > > 2 days ago.
> > >=20
> > > It uses tmpfs(5) for the port build workspace.  I have an unconfirmed
> > > suspicion that use of sysutils/lsof is involved somehow, but cannot b=
e
> > > sure.  (In my experience with panics with port building, removing lso=
f
> > > from the system did have an effect, but I may be going down the wrong
> > > rabbit hole.)
> > >=20
> >=20
> > This is very similar to issue reported several time ago.
> > Try this patch.  I never get a feedback.
> >=20
> > diff --git a/sys/amd64/amd64/mem.c b/sys/amd64/amd64/mem.c
> > index abbbb21..fd9c5df 100644
> > --- a/sys/amd64/amd64/mem.c
> > +++ b/sys/amd64/amd64/mem.c
> > @@ -98,7 +98,13 @@ memrw(struct cdev *dev, struct uio *uio, int flags)
> >  kmemphys:
> >  			o =3D v & PAGE_MASK;
> >  			c =3D min(uio->uio_resid, (u_int)(PAGE_SIZE - o));
> > -			error =3D uiomove((void *)PHYS_TO_DMAP(v), (int)c, uio);
> > +			v =3D PHYS_TO_DMAP(v);
> > +			if (v < DMAP_MIN_ADDRESS ||
> > +			    (v > DMAP_MIN_ADDRESS + dmaplimit &&
> > +			    v <=3D DMAP_MAX_ADDRESS) ||
> > +			    pmap_kextract(v) =3D=3D 0)
> > +				return (EFAULT);
> > +			error =3D uiomove((void *)v, (int)c, uio);
> >  			continue;
> >  		}
> >  		else if (dev2unit(dev) =3D=3D CDEV_MINOR_KMEM) {
>=20
> There is a very similar patch on one of these machines.
>=20
>   Index: sys/amd64/amd64/mem.c
>   =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>   --- sys/amd64/amd64/mem.c	(revision 262298)
>   +++ sys/amd64/amd64/mem.c	(working copy)
>   @@ -98,6 +98,12 @@
>    kmemphys:
>    			o =3D v & PAGE_MASK;
>    			c =3D min(uio->uio_resid, (u_int)(PAGE_SIZE - o));
>   +			v =3D PHYS_TO_DMAP(v);
>   +			if (v < DMAP_MIN_ADDRESS ||
>   +			    (v > DMAP_MIN_ADDRESS + dmaplimit &&
>   +			    v <=3D DMAP_MAX_ADDRESS) ||
>   +			    pmap_kextract(v) =3D=3D 0)
>   +				return (EFAULT);
>    			error =3D uiomove((void *)PHYS_TO_DMAP(v), (int)c, uio);
>    			continue;
>    		}
>   Index: sys/amd64/amd64/pmap.c
>   =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>   --- sys/amd64/amd64/pmap.c	(revision 262298)
>   +++ sys/amd64/amd64/pmap.c	(working copy)
>   @@ -321,7 +321,7 @@
>        "Number of kernel page table pages allocated on bootup");
>   =20
>    static int ndmpdp;
>   -static vm_paddr_t dmaplimit;
>   +vm_paddr_t dmaplimit;
>    vm_offset_t kernel_vm_end =3D VM_MIN_KERNEL_ADDRESS;
>    pt_entry_t pg_nx;
>   =20
>   Index: sys/amd64/include/pmap.h
>   =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>   --- sys/amd64/include/pmap.h	(revision 262298)
>   +++ sys/amd64/include/pmap.h	(working copy)
>   @@ -369,6 +369,7 @@
>    extern vm_paddr_t dump_avail[];
>    extern vm_offset_t virtual_avail;
>    extern vm_offset_t virtual_end;
>   +extern vm_paddr_t dmaplimit;
>   =20
>    #define	pmap_page_get_memattr(m)	((vm_memattr_t)(m)->md.pat_mode)
>    #define	pmap_page_is_write_mapped(m)	(((m)->aflags & PGA_WRITEABLE) !=
=3D 0)
>=20
> The machine this change is on paniced today as well.  That machine runs
> r262298M, and I have a vmcore from Feb 24 (there was not enough
> available space to get a crash dump today.)
>=20
> The backtrace from Feb 24 follows.
>=20
> Script started on Sun Mar  9 18:14:41 2014
> root@redbuild04.nyi:/usr/obj/usr/src/sys/REDBUILD # sh
> # kgdb ./kernel.debug /var/crash/vmcore.3
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you =
are
> welcome to change it and/or distribute copies of it under certain conditi=
ons.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for detail=
s.
> This GDB was configured as "amd64-marcel-freebsd"...
>=20
> Unread portion of the kernel message buffer:
> panic: vm_fault: fault on nofault entry, addr: fffffe03becbc000
> cpuid =3D 23
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe1838e=
c1180
> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe1838ec1230
> panic() at panic+0x155/frame 0xfffffe1838ec12b0
> vm_fault_hold() at vm_fault_hold+0x1e7a/frame 0xfffffe1838ec1500
> vm_fault() at vm_fault+0x77/frame 0xfffffe1838ec1540
> trap_pfault() at trap_pfault+0x199/frame 0xfffffe1838ec15e0
> trap() at trap+0x4a0/frame 0xfffffe1838ec17f0
> calltrap() at calltrap+0x8/frame 0xfffffe1838ec17f0
> --- trap 0xc, rip =3D 0xffffffff80d971fb, rsp =3D 0xfffffe1838ec18b0, rbp=
 =3D 0xfffffe1838ec1910 ---
> copyout() at copyout+0x3b/frame 0xfffffe1838ec1910
> memrw() at memrw+0x1ef/frame 0xfffffe1838ec1950
> giant_read() at giant_read+0xa4/frame 0xfffffe1838ec1990
> devfs_read_f() at devfs_read_f+0xeb/frame 0xfffffe1838ec19f0
> dofileread() at dofileread+0x95/frame 0xfffffe1838ec1a40
> kern_readv() at kern_readv+0x68/frame 0xfffffe1838ec1a90
> sys_read() at sys_read+0x63/frame 0xfffffe1838ec1ae0
> amd64_syscall() at amd64_syscall+0x3fb/frame 0xfffffe1838ec1bf0
> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe1838ec1bf0
> --- syscall (3, FreeBSD ELF64, sys_read), rip =3D 0x800b8343a, rsp =3D 0x=
7fffffffcfe8, rbp =3D 0x7fffffffd030 ---
> KDB: enter: panic
>=20
> Reading symbols from /boot/kernel/zfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/zfs.ko.symbols
> Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
> Loaded symbols for /boot/kernel/opensolaris.ko.symbols
> Reading symbols from /boot/kernel/ums.ko.symbols...done.
> Loaded symbols for /boot/kernel/ums.ko.symbols
> Reading symbols from /boot/kernel/tmpfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/tmpfs.ko.symbols
> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/nullfs.ko.symbols
> Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/linprocfs.ko.symbols
> Reading symbols from /boot/kernel/linux.ko.symbols...done.
> Loaded symbols for /boot/kernel/linux.ko.symbols
> #0  doadump (textdump=3D-954994000) at pcpu.h:219
> 219		__asm("movq %%gs:%1,%0" : "=3Dr" (td)
> (kgdb) bt
> #0  doadump (textdump=3D-954994000) at pcpu.h:219
> #1  0xffffffff8034a175 in db_fncall (dummy1=3D<value optimized out>,=20
>     dummy2=3D<value optimized out>, dummy3=3D<value optimized out>, dummy=
4=3D<value optimized out>)
>     at /usr/src/sys/ddb/db_command.c:578
> #2  0xffffffff80349e5d in db_command (cmd_table=3D0x0) at /usr/src/sys/dd=
b/db_command.c:449
> #3  0xffffffff80349bd4 in db_command_loop () at /usr/src/sys/ddb/db_comma=
nd.c:502
> #4  0xffffffff8034c630 in db_trap (type=3D<value optimized out>, code=3D0=
)
>     at /usr/src/sys/ddb/db_main.c:231
> #5  0xffffffff80987329 in kdb_trap (type=3D3, code=3D0, tf=3D<value optim=
ized out>)
>     at /usr/src/sys/kern/subr_kdb.c:656
> #6  0xffffffff80d99009 in trap (frame=3D0xfffffe1838ec1160)
>     at /usr/src/sys/amd64/amd64/trap.c:571
> #7  0xffffffff80d7dd12 in calltrap () at /usr/src/sys/amd64/amd64/excepti=
on.S:231
> #8  0xffffffff80986a8e in kdb_enter (why=3D0xffffffff8100ed4f "panic", ms=
g=3D<value optimized out>)
>     at cpufunc.h:63
> #9  0xffffffff809462b5 in panic (fmt=3D<value optimized out>)
>     at /usr/src/sys/kern/kern_shutdown.c:752
> #10 0xffffffff80c0981a in vm_fault_hold (map=3D<value optimized out>,=20
>     vaddr=3D<value optimized out>, fault_type=3D<value optimized out>,=
=20
>     fault_flags=3D<value optimized out>, m_hold=3D<value optimized out>)
>     at /usr/src/sys/vm/vm_fault.c:272
> #11 0xffffffff80c07957 in vm_fault (map=3D0xfffff80002000000, vaddr=3D<va=
lue optimized out>,=20
>     fault_type=3D1 '\001', fault_flags=3D128) at /usr/src/sys/vm/vm_fault=
.c:217
> #12 0xffffffff80d997f9 in trap_pfault (frame=3D0xfffffe1838ec1800, usermo=
de=3D0)
>     at /usr/src/sys/amd64/amd64/trap.c:767
> #13 0xffffffff80d99020 in trap (frame=3D0xfffffe1838ec1800)
>     at /usr/src/sys/amd64/amd64/trap.c:455
> #14 0xffffffff80d7dd12 in calltrap () at /usr/src/sys/amd64/amd64/excepti=
on.S:231
> #15 0xffffffff80d971fb in copyout () at /usr/src/sys/amd64/amd64/support.=
S:246
> #16 0xffffffff8099bb35 in uiomove_faultflag (cp=3D<value optimized out>,=
=20
>     n=3D<value optimized out>, uio=3D0xfffffe1838ec1ab0, nofault=3D<value=
 optimized out>)
>     at /usr/src/sys/kern/subr_uio.c:192
> #17 0xffffffff80d8576f in memrw (dev=3D<value optimized out>, uio=3D<valu=
e optimized out>,=20
>     flags=3D<value optimized out>) at /usr/src/sys/amd64/amd64/mem.c:107
> ---Type <return> to continue, or q <return> to quit---
> #18 0xffffffff808ec764 in giant_read (dev=3D0xfffff80011347c00, uio=3D0xf=
ffffe1838ec1ab0, ioflag=3D0)
>     at /usr/src/sys/kern/kern_conf.c:442
> #19 0xffffffff80817e2b in devfs_read_f (fp=3D0xfffff80854be3140, uio=3D0x=
fffffe1838ec1ab0,=20
>     cred=3D<value optimized out>, flags=3D0, td=3D0xfffff801f52c5490)
>     at /usr/src/sys/fs/devfs/devfs_vnops.c:1193
> #20 0xffffffff809a0e25 in dofileread (td=3D0xfffff801f52c5490, fd=3D4, fp=
=3D0xfffff80854be3140,=20
>     auio=3D0xfffffe1838ec1ab0, offset=3D<value optimized out>, flags=3D11=
72307968) at file.h:299
> #21 0xffffffff809a0b48 in kern_readv (td=3D0xfffff801f52c5490, fd=3D4, au=
io=3D0xfffffe1838ec1ab0)
>     at /usr/src/sys/kern/sys_generic.c:256
> #22 0xffffffff809a0ad3 in sys_read (td=3D<value optimized out>, uap=3D<va=
lue optimized out>)
>     at /usr/src/sys/kern/sys_generic.c:171
> #23 0xffffffff80d9a04b in amd64_syscall (td=3D0xfffff801f52c5490, traced=
=3D0) at subr_syscall.c:133
> #24 0xffffffff80d7dffb in Xfast_syscall () at /usr/src/sys/amd64/amd64/ex=
ception.S:390
> #25 0x0000000800b8343a in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> Current language:  auto; currently minimal
> (kgdb) quit
>=20
> Script done on Sun Mar  9 18:14:59 2014
>=20
> Glen
>=20

Not sure I can add much here other than to say that redbuild machines
are now running -current as opposed to stable/10.

We are running redbuild01/02 unpatched and 03/04 with patch to compare
stability.  We haven't seen much difference, so either I've screwed up
the patch or the bug report.

sean

--=-X7j2orV71+MWYdQeKomn
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAABAgAGBQJTHObIAAoJEBkJRdwI6BaHJTUH/35XxcAt4SuczAWPiq3cZAeu
prwFGy1Vz1iwbW/aP2YQT+j2Dws5IkYmq4aWjJjcI6HhjsedSIcQ8Px3zgqMeWWf
iPECuzIJwnXROIgO+UDbSlfli+QF5h49TFWtdWGBvMjmxrjW2wxhFsvi9cSAmCKu
0lexyBH56ks7LcIAehW0qoUG8Ohnpu7Lermitjdce8ChBLi+BNJpvcoOo2cNUYVL
Wp+5NcSDWLN2G/xtIlpvRqKMbT3LHYNAcZI9fQr/okaEKhp+2bqQMxaRKcjglrir
iH5L3X+dkgelM8aubXHEN807EGUgiiske6+Qax3JEDyH1w1qiVtnW8Qsb+m2t4o=
=vfhA
-----END PGP SIGNATURE-----

--=-X7j2orV71+MWYdQeKomn--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1394403020.2231.1.camel>