Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Jul 2004 20:00:11 +0900
From:      Pyun YongHyeon <yongari@kt-is.co.kr>
To:        sparc64@freebsd.org
Subject:   NFS panic and malloc(9) warning
Message-ID:  <20040727110011.GA5553@kt-is.co.kr>

next in thread | raw e-mail | index | archive | help
Hello All,

As soon as I did 'cp' via NFS, my Ultra2(NFS server) paniced.
The panic is reproduable. Even though I got core files successfully,
neither gdb6 nor kgdb understand the core.(no trace info.)

1st panic: trap: memory address not aligned
nfs_getreq() + 0x1d0
nfssrv_dorec() + 0xcc
nfssvc_nfsd + 0x1c0

objdump said the following.
....
0000000000000480 <nfs_getreq>:
nfs_getreq():
/usr/src/sys/nfsserver/nfs_srvsock.c:286


/*
 * Parse an RPC request
 * - verify it
 * - fill in the cred struct.
 */
....
/usr/src/sys/nfsserver/nfs_srvsock.c:373
     650:       c4 06 80 00     ld  [ %i2 ], %g2
     654:       b4 06 a0 04     add  %i2, 4, %i2
     658:       c4 26 20 bc     st  %g2, [ %i0 + 0xbc ]
....

So the paniced code is located in line 373.
    364                 /*
    365                  * XXX: This credential should be managed using crget(9)
    366                  * and related calls.  Right now, this tramples on any
    367                  * extensible data in the ucred, fails to initialize the
    368                  * mutex, and worse.  This must be fixed before FreeBSD
    369                  * 5.3-RELEASE.
    370                  */
    371                 bzero((caddr_t)&nd->nd_cr, sizeof (struct ucred));
    372                 nd->nd_cr.cr_ref = 1;
    373                 nd->nd_cr.cr_uid = fxdr_unsigned(uid_t, *tl++);
    374                 nd->nd_cr.cr_gid = fxdr_unsigned(gid_t, *tl++);
    375                 len = fxdr_unsigned(int, *tl);


2nd panic: trap: memory address not aligned
nfsrv_write() + 0x178
nfssvc_nfsd() + 0x850

objdump said the following.
....
0000000000002600 <nfsrv_write>:
nfsrv_write():
/usr/src/sys/nfsserver/nfs_serv.c:1059

...
/usr/src/sys/nfsserver/nfs_serv.c:1106
                off = fxdr_hyper(tl);
    2778:       c2 02 00 00     ld  [ %o0 ], %g1
    277c:       c4 02 20 04     ld  [ %o0 + 4 ], %g2
So the paniced code is located in line 1106.

   1103         NFSD_LOCK();
   1104         if (v3) {
   1105                 tl = nfsm_dissect(u_int32_t *, 5 * NFSX_UNSIGNED);
   1106                 off = fxdr_hyper(tl);
   1107                 tl += 3;
   1108                 stable = fxdr_unsigned(int, *tl++);
   1109         } else {
   1110                 tl = nfsm_dissect(u_int32_t *, 4 * NFSX_UNSIGNED);
   1111                 off = (off_t)fxdr_unsigned(u_int32_t, *++tl);
   1112                 tl += 2;


And there is always a malloc(9) warning during the NFS operation.
Jul 27 16:51:36 daemon kernel: malloc(M_WAITOK) of "Mbuf", forcing M_NOWAIT with the following non-sleepable locks held:
Jul 27 16:51:36 daemon kernel: exclusive sleep mutex nfsd_mtx r = 0 (0xc03cdc38) locked @ /usr/src/sys/nfsserver/nfs_srvsock.c:712
Jul 27 16:51:36 daemon kernel: KDB: stack backtrace:
Jul 27 16:51:36 daemon kernel: uma_zalloc_arg() at uma_zalloc_arg+0x40
Jul 27 16:51:36 daemon kernel: nfsm_disct() at nfsm_disct+0x90
Jul 27 16:51:36 daemon kernel: nfs_getreq() at nfs_getreq+0x44
Jul 27 16:51:36 daemon kernel: nfsrv_dorec() at nfsrv_dorec+0xcc
Jul 27 16:51:36 daemon kernel: nfssvc_nfsd() at nfssvc_nfsd+0x2e4
Jul 27 16:51:36 daemon kernel: nfssvc() at nfssvc+0x144
Jul 27 16:51:36 daemon kernel: syscall() at syscall+0x21c
Jul 27 16:51:36 daemon kernel: -- syscall (155, FreeBSD ELF64, nfssvc) %o7=0x102bac --
Jul 27 16:51:36 daemon kernel: userland() at 0x4039d848
Jul 27 16:51:36 daemon kernel: user trace: trap %o7=0x102bac
Jul 27 16:51:36 daemon kernel: pc 0x4039d848, sp 0x7fdffffe031
Jul 27 16:51:36 daemon kernel: pc 0x1019a4, sp 0x7fdffffe0f1
Jul 27 16:51:36 daemon kernel: pc 0x100f80, sp 0x7fdffffe4d1
Jul 27 16:51:36 daemon kernel: pc 0x4020a8d4, sp 0x7fdffffe591
Jul 27 16:51:36 daemon kernel: done

However, this warning is really strange. The source code at line 712 is
    707         m = rec->nr_packet;
    708         free(rec, M_NFSRVDESC);
    709         NFSD_UNLOCK();
    710         MALLOC(nd, struct nfsrv_descript *, sizeof (struct nfsrv_descript),
    711                 M_NFSRVDESC, M_WAITOK);
    712         NFSD_LOCK();
    713         nd->nd_md = nd->nd_mrep = m;
So when malloc(9) is called there is no nfsd_mtx lock held.
The system is -CURRENT(July 26 2004).
Any clues?

Regards,
Pyun YongHyeon
-- 
Pyun YongHyeon <http://www.kr.freebsd.org/~yongari>;



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040727110011.GA5553>