From owner-freebsd-hackers@FreeBSD.ORG  Fri May 20 19:50:53 2005
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 76DE716A4CE; Fri, 20 May 2005 19:50:53 +0000 (GMT)
Received: from swip.net (mailfe07.swip.net [212.247.154.193])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 9677C43D39; Fri, 20 May 2005 19:50:50 +0000 (GMT)
	(envelope-from hselasky@c2i.net)
X-T2-Posting-ID: Y1QAsIk9O44SO+J/q9KNyQ==
Received: from mp-217-198-20.daxnet.no ([193.217.198.20] verified)
  by mailfe07.swip.net (CommuniGate Pro SMTP 4.3c5)
  with ESMTP id 174860987; Fri, 20 May 2005 21:50:48 +0200
From: Hans Petter Selasky <hselasky@c2i.net>
To: Brian Fundakowski Feldman <green@freebsd.org>
Date: Fri, 20 May 2005 21:51:34 +0200
User-Agent: KMail/1.7
References: <200505201340.36176.hselasky@c2i.net>
	<428E12B0.9020208@samsco.org> <20050520173236.GG1201@green.homeunix.org>
In-Reply-To: <20050520173236.GG1201@green.homeunix.org>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200505202151.35831.hselasky@c2i.net>
cc: Scott Long <scottl@samsco.org>
cc: hackers@freebsd.org
Subject: Re: problems with new the "contigmalloc" routine
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: hselasky@c2i.net
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2005 19:50:53 -0000

On Friday 20 May 2005 19:32, Brian Fundakowski Feldman wrote:
> On Fri, May 20, 2005 at 10:39:12AM -0600, Scott Long wrote:
> > Brian Fundakowski Feldman wrote:
> > >On Fri, May 20, 2005 at 01:40:35PM +0200, Hans Petter Selasky wrote:
> > >>Hi,
> > >>
> > >>I just hit some problems with the new "contigmalloc()" routine in
> > >>FreeBSD-6-current, which is used by "bus_dmamem_alloc()".
> > >>
> > >>Firstly it locks Giant, which cause locking order reversals when
> > >>allocating memory from certain contexts. Secondly it sleeps even if
> > >> flag M_NOWAIT is passed. Thirdly it does not support flag M_ZERO.
> > >
> > >Read the documentation.  It supports M_ZERO just fine, and it does _not_
> > >support M_NOWAIT.

OK, the M_ZERO support has been there for a while. Maybe it was before M_ZERO 
support was added, but I remember I used "contigmalloc()" to allocate memory 
and got dirty pages. Mostly the pages were zeroed, but suddently not, and I 
had to use bzero. How much has the zeroing mechanism been tested?

> > >
> > >>Can someone explain why these changes have been made?
> > >
> > >Changes?

Maybe a WITNESS_WARN() was added that wasn't there before. At least it didn't 
complain earlier.

> > >
> > >>Why doesn't "contigmalloc()" give a warning when an invalid flag is
> > >>passed?
> > >
> > >The kernel would be significantly larger and almost certainly slower
> > >if it were to double-check that everywhere any bit fields are used
> > >that flags that are not defined to have any behavior are unset.
> > >

It might be a pitfall when porting software. Suddenly a flag does not work 
like expected, undermining the logic of the software. Or what if someone  
passes the wrong enum like "BUS_DMA_ZERO" instead of "M_ZERO". Isn't there 
any KASSERT_WARN(!(mflags & (~(M_XXX)))) ?

> > >>Are these bugs in "contigmalloc()"?
> > >
> > >No.
> > >
> > >>Or does the code using "contigmalloc()" have to be changed?
> > >
> > >Yes.  The i386 bus_dmamem_alloc(), for example, calls it with M_NOWAIT
> > >which is not a valid flag.  The contigmalloc(9) page is not entirely
> > >truthful about the fact that it doesn't sleep at all -- it calls those
> > >routines which can.  They can both be documented to require no locks
> > >to be held when being called, except for M_NOWAIT specifically in the
> > >one-page-or-less allocation case.
> >
> > The amount of colateral damage that has been inflicted on
> > bus_dmamem_alloc by your changes to contigmalloc is truly impressive.
> > Since I've spent the better part of 8 months cleaning it up, please
> > let me know what else needs to be done so that there are no more
> > surprises.  And no, this is not my happy face.
>
> 1.
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/vm/vm_contig.c.diff?r1=1.34&r
>2=1.41&f=h 2. Notice the lack of semantical changes.
> 3. Notice that M_ZERO does the same thing as ever.
> 4. Notice that the same goes for M_WAITOK/M_NOWAIT: ignored for anything
>    that needs more than one page.
> 5. Negative points for Hans for posting corrupt backtraces and making
>    me go through a lot more work than should be necessary.
> 6. Negative points to Hans for posting backtraces to an obviously-
>    molested tree (src/sys/dev/usb2/_*?!?).

Sorry about that, but I've got my own USB driver located in /sys/dev/usb2, but 
the top of the backtrace should be right.

> 7. Negative points to Scott for attempting to imply that for some
>    reason it's okay to call contigmalloc(9) in contexts that it has
>    never been okay to call it in.
> 8.
> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2003-11/1488.html
> 9. Negative points to Scott for failure to spend a couple minutes of
> research and find out these are known bugs.
> 10.
> http://lists.freebsd.org/pipermail/freebsd-current/2005-January/045105.html
> 11. Negative points to Scott for failure to document system semantics when
> it is obvious they should have been clarified according to Scott's own
> notes on the matter.
> 12. Mega positive points to Hans for actually asking the right questions.
> 13. Go fix or document the brokenness in busdma, or implement the features
>     in contigmalloc(9) that were imagined to exist.
>

Well, the problem with blocking behaviour is that processes sleeping needs to 
be waited for:

some_read():
============

lock();

in_read = 1;

unlock();

data = contigmalloc(...);

lock();

if(detaching)
{
 detaching = 0;
 wakeup(&detaching);
 goto done;
}

hw_queue(data);

msleep(data);

if(detaching)
{
 detaching = 0;
 wakeup(&detaching);
 goto done;
}

unlock();

uiomove(data);

lock();

if(detaching)
{
 detaching = 0;
 wakeup(&detaching);
 goto done;
}

done:

in_read = 0;

unlock();

some_detach():
==============

lock();

if(in_read)
{
 wakeup(data);
 detaching = 1;
 msleep(&detaching);
}

unlock();

It would be very nice if all memory allocation functions and "uiomove()" could 
support non-blocking mode, for example by passing a flag. If you look at 
device drivers like (/sys/dev/usb/ugen.c) you will see that the ones that 
wrote it didn't expect "uiomove" to sleep. And "uiomove" does not warn 
because it is called from a Giant locked context. I think you will find the 
same situation with "bus_dmamem_alloc()", where "BUS_DMA_NOWAIT" is passed as 
an argument because the code does not handle it when the call sleeps.


Non-blocking mode has got to be supported. Else you get a nightmare rewriting 
the code to support blocking mode.


Can anyone explain why "uiomove()" has to sleep, and why there is no 
non-blocking "uiomove()"?


Will M_NOWAIT support will be added to "contigmalloc()"?


--HPS