Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 07 Feb 2019 18:24:31 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 235582] rpc_svc_gss / nfsd kernel panic
Message-ID:  <bug-235582-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D235582

            Bug ID: 235582
           Summary: rpc_svc_gss / nfsd kernel panic
           Product: Base System
           Version: 11.2-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: peter.x.eriksson@liu.se

We have recently gone "live" with more NFS users "banging" on our FreeBSD-b=
ased
fileservers. And now something seems to have started triggering kernel pani=
cs.
Since the they major difference from before is the number of NFS users so t=
his
is the major suspect...

We just caught a panic and got a screendump from the console and the stack
traceback shows:

> Fatal trap 12: page fault while in kernel mode
> cpuid =3D 8; apic id =3D 08
> fault virtual addresa  =3D 0x0
> fault code             =3D supervisor read data, page not present
> instruction pointer    =3D 0x20:0xffffffff82b578e9
> stack pointer          =3D 0x20:0xfffffe3fdc627760
> code segment           =3D base 0x0, limit 0xfffff, type 0x1b
                         =3D DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags       =3D interrupt enabled, resume, IOPL =3D 0
> current process        =3D 2519 (nfsd: service)
> trap number            =3D 12
> panic: page fault
> cpuid =3D 8
> KDB: stack backtrace
> #0 0xffffffff80b3d577 at kdb_backtrace+0x67
> #1 0xffffffff80af6b17 at vpanic+0x177
> #2 0xffffffff80af6993 at panic+0x43
> #3 0xffffffff80f77fdf at trap_fatal+0x35f
> #4 0xffffffff80f78039 at trap_pfault+0x49
> #5 0xffffffff80f77807 at trap+0x2c7
> #6 0xffffffff80f56fbc at calltrap+0x8
> #7 0xffffffff82b5d4d2 at svc_rpc_gss+0x8f2
> # 8 0xffffffff80d6c1b6 at svc_run_internal+0x726
> #9 0xffffffff80d6cd4b at svc_thread_start+0xb
> #10 0xffffffff80aba093 at fork_exit+0x8
> #11 0xffffffff80f48ede at fork_trampoline+0xe

(Unfortunately not kernel crash dump from this machine).

Systems are: Dell PowerEdge R730xd with 256GB RAM, HBA330 (LSI 3008) SAS
controllers, ZFS-storage, Intel X710 10GE-ethernet machines running FreeBSD
11.2. No swap enabled. ZFS ARC capped to 128GB.

NFS v4.0 or v4.1 client with sec=3Dkrb5:krb5i:krb5p security. Most clients =
(if
not all) are running Linux CentOS or Ubuntu). Around 200 active clients per
server.

(Most clients are Windows users using SMB via Samba though)
We have enabled a crash dump device one a couple of the machines and are go=
ing
to enable it on more in order to try to get a crash-dump when the next serv=
er
panics...

Any ideas where this bug might be or how we could workaround it? (Disabling=
 NFS
is unfortunately not an option).

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-235582-227>