From owner-freebsd-fs@FreeBSD.ORG  Sun Jan 26 02:06:09 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 586F1A3D;
 Sun, 26 Jan 2014 02:06:09 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id ECA971C9B;
 Sun, 26 Jan 2014 02:06:08 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: 
X-IronPort-AV: E=Sophos;i="4.95,721,1384318800"; d="scan'208";a="91061919"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 25 Jan 2014 21:05:54 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 5FF6AB4049;
 Sat, 25 Jan 2014 21:05:54 -0500 (EST)
Date: Sat, 25 Jan 2014 21:05:54 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Roman Divacky <rdivacky@freebsd.org>
Message-ID: <1647701889.16319715.1390701954380.JavaMail.root@uoguelph.ca>
In-Reply-To: <20140126051258.M1750@besplex.bde.org>
Subject: Re: BUG: possible NULL pointer dereference in nfs server
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: Dimitry Andric <dim@FreeBSD.org>, fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Jan 2014 02:06:09 -0000

Bruce Evans wrote:
> On Sat, 25 Jan 2014, Dimitry Andric wrote:
> 
> > On 25 Jan 2014, at 01:38, Rick Macklem <rmacklem@uoguelph.ca>
> > wrote:
> > ...
> > If it inlines this, the result looks approximately like:
> >
> >     1	{
> >     2		fhandle_t *fhp = NULL;
> >     3		struct nfslockfile *new_lfp;
> >     4		int error;
> >     5
> >     6		if (new_stp->ls_flags & NFSLCK_OPEN) {
> >     7			new_lfp = *NULL;
> >     8			fhp = &new_lfp->lf_fh;
> >     9		} else if (&nfh) {
> >    10			fhp = &nfh;
> >    11		} else {
> >    12			panic("nfsrv_getlockfh");
> >    13		}
> >    14		error = nfsvno_getfh(vp, fhp, p);
> >    15		NFSEXITCODE(error);
> >    16		getlckret = error;
> >    17	}
> >
> > The code in line 7 is the problematic part.  Since this is
> > undefined,
> > the compiler inserts a trap instruction here.  I think the problem
> > Roman
> > encountered is that on sparc64, there is no equivalent to x86's ud2
> > instruction, so it inserts a call to abort() instead, and that
> > function
> > is not available in kernel-land.
> 
> Compiler bug.  abort() is not available in freestanding
> implementations.
> The behaviour is only undefined if the null pointer is dereferenced
> at
> runtime, so it doesn't include failing to link to abort() at compile
> time.
> 
> > ...
> >> Sorry, I'm not a compiler guy, so I don't know why a compiler
> >> would
> >> generate a trap instruction, but since new_lfpp is never NULL when
> >> this is executed, I don't see a problem.
> >>
> >> If others feel that this needs to be re-coded, please let me know
> >> what
> >> you think the code should look like? (A test for non-NULL with a
> >> panic()
> >> before it is used?)
> >>
> >> Is a trap instruction that never gets executed a problem?
> >
> > It's better to avoid undefined behavior in any case.  Just add a
> > NULL
> > check, that should be sufficient.
> 
> That might only add bloat and unimprove debugging.  Since the null
> pointer
> case cannot happen, it cannot be handed properly.  It can be
> mishandled in
> the following ways:
> - return an error, so that all callers have to handle the null
> pointer case
>    that can't happen.  If the compiler is too smart, it will notice
>    more
>    undefined behaviour (that can't happen) in callers and "force" you
>    to
>    handle it there too
> - KASSERT() that the pointer cannot be null.  Then:
>    - on production systems where KASSERT() is null, this won't work
>    around
>      the compiler bug.  Use a panic() instead.  To maximize source
>      code
>      bloat, ifdef all of this.
>    - when KASSERT() is not null, it will work around the compiler
>    bug.
>      If the case that can't happen actually happens, then this
>      unimproves
>      the debugging by messing up stack traces and turning restartable
>      null pointer or SIGILL traps to non-restartable panics.
>      Optimizations that replace a large block of code ending with a
>      null pointer trap by a single unimplemented instruction would
>      probably break restarting anyway.
> 
> Bruce
> 
So Roman, all I can suggest is to try adding something like:
   if (new_lfpp == NULL)
        panic("new_lfpp NULL");
after line#6. If that makes the compiler happy, I can commit it in
April. (Can't do commits before then.)
I agree with Bruce, but the check might be a good idea, in case a
future code change introduces a bug where the function is called with
new_lfpp NULL and NFSLCK_OPEN set.

If this doesn't make the compiler happy, all I can suggest is to
play around until you come up with something that works.

Have fun with it, rick
ps: I haven't seen this reported by tinderbox. Is the problem
    specific to your setup?