From owner-freebsd-fs@FreeBSD.ORG Sun Jan 26 02:06:09 2014 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 586F1A3D; Sun, 26 Jan 2014 02:06:09 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id ECA971C9B; Sun, 26 Jan 2014 02:06:08 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: X-IronPort-AV: E=Sophos;i="4.95,721,1384318800"; d="scan'208";a="91061919" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 25 Jan 2014 21:05:54 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 5FF6AB4049; Sat, 25 Jan 2014 21:05:54 -0500 (EST) Date: Sat, 25 Jan 2014 21:05:54 -0500 (EST) From: Rick Macklem To: Roman Divacky Message-ID: <1647701889.16319715.1390701954380.JavaMail.root@uoguelph.ca> In-Reply-To: <20140126051258.M1750@besplex.bde.org> Subject: Re: BUG: possible NULL pointer dereference in nfs server MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: Dimitry Andric , fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Jan 2014 02:06:09 -0000 Bruce Evans wrote: > On Sat, 25 Jan 2014, Dimitry Andric wrote: > > > On 25 Jan 2014, at 01:38, Rick Macklem > > wrote: > > ... > > If it inlines this, the result looks approximately like: > > > > 1 { > > 2 fhandle_t *fhp = NULL; > > 3 struct nfslockfile *new_lfp; > > 4 int error; > > 5 > > 6 if (new_stp->ls_flags & NFSLCK_OPEN) { > > 7 new_lfp = *NULL; > > 8 fhp = &new_lfp->lf_fh; > > 9 } else if (&nfh) { > > 10 fhp = &nfh; > > 11 } else { > > 12 panic("nfsrv_getlockfh"); > > 13 } > > 14 error = nfsvno_getfh(vp, fhp, p); > > 15 NFSEXITCODE(error); > > 16 getlckret = error; > > 17 } > > > > The code in line 7 is the problematic part. Since this is > > undefined, > > the compiler inserts a trap instruction here. I think the problem > > Roman > > encountered is that on sparc64, there is no equivalent to x86's ud2 > > instruction, so it inserts a call to abort() instead, and that > > function > > is not available in kernel-land. > > Compiler bug. abort() is not available in freestanding > implementations. > The behaviour is only undefined if the null pointer is dereferenced > at > runtime, so it doesn't include failing to link to abort() at compile > time. > > > ... > >> Sorry, I'm not a compiler guy, so I don't know why a compiler > >> would > >> generate a trap instruction, but since new_lfpp is never NULL when > >> this is executed, I don't see a problem. > >> > >> If others feel that this needs to be re-coded, please let me know > >> what > >> you think the code should look like? (A test for non-NULL with a > >> panic() > >> before it is used?) > >> > >> Is a trap instruction that never gets executed a problem? > > > > It's better to avoid undefined behavior in any case. Just add a > > NULL > > check, that should be sufficient. > > That might only add bloat and unimprove debugging. Since the null > pointer > case cannot happen, it cannot be handed properly. It can be > mishandled in > the following ways: > - return an error, so that all callers have to handle the null > pointer case > that can't happen. If the compiler is too smart, it will notice > more > undefined behaviour (that can't happen) in callers and "force" you > to > handle it there too > - KASSERT() that the pointer cannot be null. Then: > - on production systems where KASSERT() is null, this won't work > around > the compiler bug. Use a panic() instead. To maximize source > code > bloat, ifdef all of this. > - when KASSERT() is not null, it will work around the compiler > bug. > If the case that can't happen actually happens, then this > unimproves > the debugging by messing up stack traces and turning restartable > null pointer or SIGILL traps to non-restartable panics. > Optimizations that replace a large block of code ending with a > null pointer trap by a single unimplemented instruction would > probably break restarting anyway. > > Bruce > So Roman, all I can suggest is to try adding something like: if (new_lfpp == NULL) panic("new_lfpp NULL"); after line#6. If that makes the compiler happy, I can commit it in April. (Can't do commits before then.) I agree with Bruce, but the check might be a good idea, in case a future code change introduces a bug where the function is called with new_lfpp NULL and NFSLCK_OPEN set. If this doesn't make the compiler happy, all I can suggest is to play around until you come up with something that works. Have fun with it, rick ps: I haven't seen this reported by tinderbox. Is the problem specific to your setup?