Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Dec 2016 10:20:30 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Edward Tomasz Napierala <trasz@freebsd.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, src-committers@freebsd.org,  svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r309891 - head/sys/kern
Message-ID:  <20161213091945.V1082@besplex.bde.org>
In-Reply-To: <20161212191257.GA11238@brick>
References:  <201612121522.uBCFMMmm088698@repo.freebsd.org> <20161212163331.GH54029@kib.kiev.ua> <20161212191257.GA11238@brick>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 12 Dec 2016, Edward Tomasz Napierala wrote:

> On 1212T1833, Konstantin Belousov wrote:
>> On Mon, Dec 12, 2016 at 03:22:22PM +0000, Edward Tomasz Napierala wrote:
>>> Author: trasz
>>> Date: Mon Dec 12 15:22:21 2016
>>> New Revision: 309891
>>> URL: https://svnweb.freebsd.org/changeset/base/309891
>>>
>>> Log:
>>>   Avoid dereferencing NULL pointers in devtoname().  I've seen it panic,
>>>   called from ufs_print() in DDB.
>> Calls from DDB should not panic even if accessing NULL pointers.
>
> Yeah, by "panic" I've meant the "reentering ddb" thing.  It's not a panic
> per se, but still breaks eg "show lockedvnodes".

Hrmph.  kib didn't like my patch to kill the spam from reentering ddb.
The spam messes up even cases where the ddb command handles errors.
Howver, in most cases the error handling is incomplete.  There is always
the outer setjmp/longjmp back to the command loop.  This gives null and
bad error handling for unexpected null pointer accesses like the one here.
Other cases set up inner setjump/longjmps which longjmp back to an inner
context which prints an error message (spammed by the reentry messages)
and should clean up.  However, cleanup is rarely done.  One case is
writing to inaccessible memory on at least x86.  Then the page tables
are left in modified state.

Null pointer panics are also interesting.  They seem to have been broken
on i386 by the double mapping of low memory (PTD[0] = PTD[KERNBASE >> 12] =
shared page table for low physical memory).  So the null pointer is mapped
and points to physical memory 0 .  There are other bugs from the shared
page table.  When I fix the other bugs by changing PTD[0] to 0, this also
fixes null pointers.  It breaks ACPI resume but not much else.  See
locore.s for some details about the mapping -- it is documented as being
mainly for ACPI.  But ACPI somehow works on amd64 with PTD[0] = 0.

>> That said, I also do not think that this is the right place to change.
>> UFS um_dev should not be NULL for any active mount.

It could even be a KASSERT(), but that would break use in ddb.  Perhaps
there are already many KASSERT()s in utility functions that break use
in ddb.

Here is my old fix for a similar bug in the tty driver:

X Index: tty.h
X ===================================================================
X --- tty.h	(revision 309660)
X +++ tty.h	(working copy)
X @@ -205,7 +215,8 @@
X  #define	tty_opened(tp)		((tp)->t_flags & TF_OPENED)
X  #define	tty_gone(tp)		((tp)->t_flags & TF_GONE)
X  #define	tty_softc(tp)		((tp)->t_devswsoftc)
X -#define	tty_devname(tp)		devtoname((tp)->t_dev)
X +#define	tty_devname(tp)		((tp)->t_dev == NULL ? "amnesiac" : \
X +				    devtoname((tp)->t_dev))
X 
X  /* Status line printing. */
X  void	tty_info(struct tty *tp);

In the tty driver, the pointer can be null when the device is going away,
but the ddb command "show [all] tty" accesses the tty until it has
completely gone away.  Perhaps similarly for vnodes -- the state of an
object under construction or destruction is especially interesting.

> After looking at this once again I agree.  Looks like some kind of bug
> specific to my sources at that point of time.  Backed off.

Thanks.  This also fixes a style bug (extra blank line).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161213091945.V1082>