Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Jan 2014 18:04:17 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Roman Divacky <rdivacky@freebsd.org>
Cc:        Dimitry Andric <dim@FreeBSD.org>, fs@FreeBSD.org
Subject:   Re: BUG: possible NULL pointer dereference in nfs server
Message-ID:  <410866336.17211125.1390863857778.JavaMail.root@uoguelph.ca>
In-Reply-To: <20140127183743.GA74668@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Roman Divacky wrote:
> On Sat, Jan 25, 2014 at 09:05:54PM -0500, Rick Macklem wrote:
> > Bruce Evans wrote:
> > > On Sat, 25 Jan 2014, Dimitry Andric wrote:
> > > 
> > > > On 25 Jan 2014, at 01:38, Rick Macklem <rmacklem@uoguelph.ca>
> > > > wrote:
> > > > ...
> > > > If it inlines this, the result looks approximately like:
> > > >
> > > >     1	{
> > > >     2		fhandle_t *fhp = NULL;
> > > >     3		struct nfslockfile *new_lfp;
> > > >     4		int error;
> > > >     5
> > > >     6		if (new_stp->ls_flags & NFSLCK_OPEN) {
> > > >     7			new_lfp = *NULL;
> > > >     8			fhp = &new_lfp->lf_fh;
> > > >     9		} else if (&nfh) {
> > > >    10			fhp = &nfh;
> > > >    11		} else {
> > > >    12			panic("nfsrv_getlockfh");
> > > >    13		}
> > > >    14		error = nfsvno_getfh(vp, fhp, p);
> > > >    15		NFSEXITCODE(error);
> > > >    16		getlckret = error;
> > > >    17	}
> > > >
> > > > The code in line 7 is the problematic part.  Since this is
> > > > undefined,
> > > > the compiler inserts a trap instruction here.  I think the
> > > > problem
> > > > Roman
> > > > encountered is that on sparc64, there is no equivalent to x86's
> > > > ud2
> > > > instruction, so it inserts a call to abort() instead, and that
> > > > function
> > > > is not available in kernel-land.
> > > 
> > > Compiler bug.  abort() is not available in freestanding
> > > implementations.
> > > The behaviour is only undefined if the null pointer is
> > > dereferenced
> > > at
> > > runtime, so it doesn't include failing to link to abort() at
> > > compile
> > > time.
> > > 
> > > > ...
> > > >> Sorry, I'm not a compiler guy, so I don't know why a compiler
> > > >> would
> > > >> generate a trap instruction, but since new_lfpp is never NULL
> > > >> when
> > > >> this is executed, I don't see a problem.
> > > >>
> > > >> If others feel that this needs to be re-coded, please let me
> > > >> know
> > > >> what
> > > >> you think the code should look like? (A test for non-NULL with
> > > >> a
> > > >> panic()
> > > >> before it is used?)
> > > >>
> > > >> Is a trap instruction that never gets executed a problem?
> > > >
> > > > It's better to avoid undefined behavior in any case.  Just add
> > > > a
> > > > NULL
> > > > check, that should be sufficient.
> > > 
> > > That might only add bloat and unimprove debugging.  Since the
> > > null
> > > pointer
> > > case cannot happen, it cannot be handed properly.  It can be
> > > mishandled in
> > > the following ways:
> > > - return an error, so that all callers have to handle the null
> > > pointer case
> > >    that can't happen.  If the compiler is too smart, it will
> > >    notice
> > >    more
> > >    undefined behaviour (that can't happen) in callers and "force"
> > >    you
> > >    to
> > >    handle it there too
> > > - KASSERT() that the pointer cannot be null.  Then:
> > >    - on production systems where KASSERT() is null, this won't
> > >    work
> > >    around
> > >      the compiler bug.  Use a panic() instead.  To maximize
> > >      source
> > >      code
> > >      bloat, ifdef all of this.
> > >    - when KASSERT() is not null, it will work around the compiler
> > >    bug.
> > >      If the case that can't happen actually happens, then this
> > >      unimproves
> > >      the debugging by messing up stack traces and turning
> > >      restartable
> > >      null pointer or SIGILL traps to non-restartable panics.
> > >      Optimizations that replace a large block of code ending with
> > >      a
> > >      null pointer trap by a single unimplemented instruction
> > >      would
> > >      probably break restarting anyway.
> > > 
> > > Bruce
> > > 
> > So Roman, all I can suggest is to try adding something like:
> >    if (new_lfpp == NULL)
> >         panic("new_lfpp NULL");
> > after line#6. If that makes the compiler happy, I can commit it in
> > April. (Can't do commits before then.)
> 
> The compiler already inserts "trap" instruction when such a condition
> happens so this seem superfluous.
> 
Ok, now I'm confused. I thought the problem was an "abort()" call for
sparc64. I certainly run the code with the trap instruction in it and
since it never gets executed, it doesn't bother me on i386.

> > I agree with Bruce, but the check might be a good idea, in case a
> > future code change introduces a bug where the function is called
> > with
> > new_lfpp NULL and NFSLCK_OPEN set.
> > 
> > If this doesn't make the compiler happy, all I can suggest is to
> > play around until you come up with something that works.
>  
> KASSERT() doesnt communicate that it's an assert, because it can
> just log into console a carry on. Would you be ok with this patch?
> 
> Index: fs/nfsserver/nfs_nfsdstate.c
> ===================================================================
> --- fs/nfsserver/nfs_nfsdstate.c        (revision 261037)
> +++ fs/nfsserver/nfs_nfsdstate.c        (working copy)
> @@ -1384,7 +1384,8 @@
>          * If we are doing Lock/LockU and local locking is enabled,
>          sleep
>          * lock the nfslockfile structure.
>          */
> -       getlckret = nfsrv_getlockfh(vp, new_stp->ls_flags, NULL,
> &nfh, p);
> +       KASSERT((new_stp->ls_flags & NFSLCK_OPEN) == 0,
> ("nfsrv_lockctrl: calling nfsrv_getlockfh with NFSLCK_OPEN"));
> +       getlckret = nfsrv_getlockfh(vp, new_stp->ls_flags &
> ~NFSLCK_OPEN, NULL, &nfh, p);
>         NFSLOCKSTATE();
>         if (getlckret == 0) {
>                 if ((new_stp->ls_flags & (NFSLCK_LOCK |
>                 NFSLCK_UNLOCK)) != 0 &&
> 
> 
> > Have fun with it, rick
> > ps: I haven't seen this reported by tinderbox. Is the problem
> >     specific to your setup?
> 
Yea, so long as it includes a comment that states this is done to
work around a stupid compiler bug.

> It is present even in your setup :) Just "objdump -d kernel | grep
> ud2" on kernel compiled
> by clang.
> 
I actually use gcc, but I believe you. I'll admit I still don't understand
why having a trap instruction that never gets executed is a bad thing?

I can commit the above in April. If for some reason the fix is needed sooner,
we'll need to find someone else willing to do the commit.

rick

> Roman
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?410866336.17211125.1390863857778.JavaMail.root>