From owner-freebsd-fs@FreeBSD.ORG Mon Jan 27 18:37:51 2014 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4EF5A2E4; Mon, 27 Jan 2014 18:37:51 +0000 (UTC) Received: from vlakno.cz (mail.vlakno.cz [95.129.96.251]) by mx1.freebsd.org (Postfix) with ESMTP id A259A13C9; Mon, 27 Jan 2014 18:37:50 +0000 (UTC) Received: by vlakno.cz (Postfix, from userid 1002) id 0FC8E1CC5593; Mon, 27 Jan 2014 19:37:43 +0100 (CET) Date: Mon, 27 Jan 2014 19:37:43 +0100 From: Roman Divacky To: Rick Macklem Subject: Re: BUG: possible NULL pointer dereference in nfs server Message-ID: <20140127183743.GA74668@freebsd.org> References: <20140126051258.M1750@besplex.bde.org> <1647701889.16319715.1390701954380.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1647701889.16319715.1390701954380.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.22 (2013-10-16) Cc: Dimitry Andric , fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jan 2014 18:37:51 -0000 On Sat, Jan 25, 2014 at 09:05:54PM -0500, Rick Macklem wrote: > Bruce Evans wrote: > > On Sat, 25 Jan 2014, Dimitry Andric wrote: > > > > > On 25 Jan 2014, at 01:38, Rick Macklem > > > wrote: > > > ... > > > If it inlines this, the result looks approximately like: > > > > > > 1 { > > > 2 fhandle_t *fhp = NULL; > > > 3 struct nfslockfile *new_lfp; > > > 4 int error; > > > 5 > > > 6 if (new_stp->ls_flags & NFSLCK_OPEN) { > > > 7 new_lfp = *NULL; > > > 8 fhp = &new_lfp->lf_fh; > > > 9 } else if (&nfh) { > > > 10 fhp = &nfh; > > > 11 } else { > > > 12 panic("nfsrv_getlockfh"); > > > 13 } > > > 14 error = nfsvno_getfh(vp, fhp, p); > > > 15 NFSEXITCODE(error); > > > 16 getlckret = error; > > > 17 } > > > > > > The code in line 7 is the problematic part. Since this is > > > undefined, > > > the compiler inserts a trap instruction here. I think the problem > > > Roman > > > encountered is that on sparc64, there is no equivalent to x86's ud2 > > > instruction, so it inserts a call to abort() instead, and that > > > function > > > is not available in kernel-land. > > > > Compiler bug. abort() is not available in freestanding > > implementations. > > The behaviour is only undefined if the null pointer is dereferenced > > at > > runtime, so it doesn't include failing to link to abort() at compile > > time. > > > > > ... > > >> Sorry, I'm not a compiler guy, so I don't know why a compiler > > >> would > > >> generate a trap instruction, but since new_lfpp is never NULL when > > >> this is executed, I don't see a problem. > > >> > > >> If others feel that this needs to be re-coded, please let me know > > >> what > > >> you think the code should look like? (A test for non-NULL with a > > >> panic() > > >> before it is used?) > > >> > > >> Is a trap instruction that never gets executed a problem? > > > > > > It's better to avoid undefined behavior in any case. Just add a > > > NULL > > > check, that should be sufficient. > > > > That might only add bloat and unimprove debugging. Since the null > > pointer > > case cannot happen, it cannot be handed properly. It can be > > mishandled in > > the following ways: > > - return an error, so that all callers have to handle the null > > pointer case > > that can't happen. If the compiler is too smart, it will notice > > more > > undefined behaviour (that can't happen) in callers and "force" you > > to > > handle it there too > > - KASSERT() that the pointer cannot be null. Then: > > - on production systems where KASSERT() is null, this won't work > > around > > the compiler bug. Use a panic() instead. To maximize source > > code > > bloat, ifdef all of this. > > - when KASSERT() is not null, it will work around the compiler > > bug. > > If the case that can't happen actually happens, then this > > unimproves > > the debugging by messing up stack traces and turning restartable > > null pointer or SIGILL traps to non-restartable panics. > > Optimizations that replace a large block of code ending with a > > null pointer trap by a single unimplemented instruction would > > probably break restarting anyway. > > > > Bruce > > > So Roman, all I can suggest is to try adding something like: > if (new_lfpp == NULL) > panic("new_lfpp NULL"); > after line#6. If that makes the compiler happy, I can commit it in > April. (Can't do commits before then.) The compiler already inserts "trap" instruction when such a condition happens so this seem superfluous. > I agree with Bruce, but the check might be a good idea, in case a > future code change introduces a bug where the function is called with > new_lfpp NULL and NFSLCK_OPEN set. > > If this doesn't make the compiler happy, all I can suggest is to > play around until you come up with something that works. KASSERT() doesnt communicate that it's an assert, because it can just log into console a carry on. Would you be ok with this patch? Index: fs/nfsserver/nfs_nfsdstate.c =================================================================== --- fs/nfsserver/nfs_nfsdstate.c (revision 261037) +++ fs/nfsserver/nfs_nfsdstate.c (working copy) @@ -1384,7 +1384,8 @@ * If we are doing Lock/LockU and local locking is enabled, sleep * lock the nfslockfile structure. */ - getlckret = nfsrv_getlockfh(vp, new_stp->ls_flags, NULL, &nfh, p); + KASSERT((new_stp->ls_flags & NFSLCK_OPEN) == 0, ("nfsrv_lockctrl: calling nfsrv_getlockfh with NFSLCK_OPEN")); + getlckret = nfsrv_getlockfh(vp, new_stp->ls_flags & ~NFSLCK_OPEN, NULL, &nfh, p); NFSLOCKSTATE(); if (getlckret == 0) { if ((new_stp->ls_flags & (NFSLCK_LOCK | NFSLCK_UNLOCK)) != 0 && > Have fun with it, rick > ps: I haven't seen this reported by tinderbox. Is the problem > specific to your setup? It is present even in your setup :) Just "objdump -d kernel | grep ud2" on kernel compiled by clang. Roman