From owner-freebsd-net Wed Nov 6 3:14: 4 2002 Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 544E237B401 for ; Wed, 6 Nov 2002 03:14:01 -0800 (PST) Received: from mail.otel.net (gw3.OTEL.net [212.36.8.151]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4955443E3B for ; Wed, 6 Nov 2002 03:13:54 -0800 (PST) (envelope-from ikostov@otel.net) Received: from judicator.otel.net ([212.36.9.113]) by mail.otel.net with esmtp (Exim 3.36 #1) id 189O7p-000NQR-00; Wed, 06 Nov 2002 13:13:33 +0200 Date: Wed, 6 Nov 2002 13:13:33 +0200 (EET) From: Iasen Kostov To: Archie Cobbs Cc: freebsd-net@FreeBSD.ORG Subject: Re: NFS functions does *NOT* check if they really have allocated any memory In-Reply-To: <200211052106.gA5L6igd039808@arch20m.dellroad.org> Message-ID: <20021106120422.G80368-100000@shadowhand.OTEL.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tue, 5 Nov 2002, Archie Cobbs wrote: > Iasen Kostov writes: > > As I experience system crushes at time of mbufs exhaustion I've compiled > > a debug kernel and traced the problem. I seems the NFS functions > > (nfsm_rpchead, nfsm_reqh ...) does *NOT* chek if they really have > > allocated memory by MGET macro. > > No check is necessary if M_WAIT is specified; the M_GET() function > is always successful in that case. Same for malloc(). If that was true, I should not see any traps 12 , should I ? :) In case of nfsm_reqh MGET() called as MGET(mb, M_WAIT, MT_DATA) returns NULL in casese of mbuf exhaustion. this is fix/test a add to nfsm_reqh() function: nfs/nfs_subs.c:591 MGET(mb, M_WAIT, MT_DATA); /* * This becomes true when there is no more mbufs available. * If you don't belive me - test it :) */ if(mb == 0) { printf("nfsm_reqh: no memory for header\n"); return NULL; } If there was not this check - kernel crushes at this point: nfs/nfs_subs.c:592 // Of the original file if (hsiz >= MINCLSIZE) { MCLGET(mb, M_WAIT); } Here is the panic message: IdlePTD at phsyical address 0x00326000 initial pcb at physical address 0x00299ba0 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0xc fault code = supervisor write, page not present instruction pointer = 0x8:0xc01d1864 stack pointer = 0x10:0xcd717d68 frame pointer = 0x10:0xcd717d7c code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 167 (ls) interrupt mask = none trap number = 12 panic: page fault And backtrace: #0 dumpsys () at ../../kern/kern_shutdown.c:487 #1 0xc015182b in boot (howto=256) at ../../kern/kern_shutdown.c:316 #2 0xc0151c50 in poweroff_wait (junk=0xc02734ec, howto=-1071173617) at ../../kern/kern_shutdown.c:595 #3 0xc0241382 in trap_fatal (frame=0xcd717d28, eva=12) at ../../i386/i386/trap.c:974 #4 0xc0241055 in trap_pfault (frame=0xcd717d28, usermode=0, eva=12) at ../../i386/i386/trap.c:867 #5 0xc0240c3f in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16, tf_edi = 0, tf_esi = 0, tf_ebp = -848200324, tf_isp = -848200364, tf_ebx = 1, tf_edx = 0, tf_ecx = 6685184, tf_eax = 0, tf_trapno = 12, tf_err = 2, tf_eip = -1071835036, tf_cs = 8, tf_eflags = 66183, tf_esp = 512, tf_ss = -848199992}) at ../../i386/i386/trap.c:466 #6 0xc01d1864 in nfsm_reqh (vp=0xcd70dc00, procid=4, hsiz=72, bposp=0xcd717dc0) at ../../nfs/nfs_subs.c:593 #7 0xc01d83c5 in nfs3_access_otw (vp=0xcd70dc00, wmode=63, p=0xcbff2080, cred=0xc131b100) at ../../nfs/nfs_vnops.c:292 #8 0xc01d8dab in nfs_getattr (ap=0xcd717e20) at ../../nfs/nfs_vnops.c:637 #9 0xc018660f in vn_stat (vp=0xcd70dc00, sb=0xcd717ec8, p=0xcbff2080) at vnode_if.h:276 #10 0xc01865cc in vn_statfile (fp=0xc1320fc0, sb=0xcd717ec8, p=0xcbff2080) at ../../kern/vfs_vnops.c:451 #11 0xc01468cf in fstat (p=0xcbff2080, uap=0xcd717f80) at ../../sys/file.h:206 . . . (kgdb) l nfs_subs.c:593 588 struct nfsmount *nmp; 589 int nqflag; 590 591 MGET(mb, M_WAIT, MT_DATA); << Here MGET returns NULL in mb (I'm sure - I saw it :) 592 if (hsiz >= MINCLSIZE) 593 MCLGET(mb, M_WAIT); << At this point kernel crushes 594 mb->m_len = 0; 595 bpos = mtod(mb, caddr_t); 596 597 /* As you said - MGET used with M_WAIT flag should never return NULL pointer. Is this a problem with MGET macro or it is somewhere in functions that it calls? But wherever is the problem it is a big problem :). It make (at least) NFS servers unstable and could lead to data loss (when kernel crashes). To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message