Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Aug 2004 15:45:12 -0600
From:      Scott Long <scottl@freebsd.org>
To:        Poul-Henning Kamp <phk@phk.freebsd.dk>
Cc:        arch@freebsd.org
Subject:   Re: potential re change for 5.3?
Message-ID:  <412D0868.9060203@freebsd.org>
In-Reply-To: <43041.1093469620@critter.freebsd.dk>
References:  <43041.1093469620@critter.freebsd.dk>

next in thread | previous in thread | raw e-mail | index | archive | help
[moving this to a more appropriate list]

Poul-Henning Kamp wrote:

> In message <412D00BE.5030406@freebsd.org>, Scott Long writes:
> 
> 
>>>I'm not sure I understand the question in the first place, sorry...
>>>
>>
>>Let me re-interate.  You should not sleep in the bio path. 
>>tsleep/msleep/malloc(WAITOK)/etc should not happen because they
>>put the entire g_down thread to sleep and block further I/O.
>>Sleep mutexes are kind of a stretch but we seem to be lucky so far.
>>Contrast that to pre-GEOM days where I/O was dispatched directly
> 
>>from the process that initiated it, so sleeping wasn't a horrible
> 
>>thing if done with care.
> 
> 
> There are many things in this casserole, and the one thing I didn't
> see in any driver when I started was "sleeping ... done with care".
> 
> If for instance the process was the kernel trying to free memory
> resources and you slept, they system wedged solid, nobody checked
> for that.
> 

Well, there are certainly many marginal I/O drivers hanging around,
but the ones that are written with care make sure that they allocate
their resources up front and don't allow sleeping in either direction.

> 
>>If there was one thing I could change
>>about GEOM, it would be to allow direct dispatch up and down.
>>Don't get me wrong, I'm understand the usefulness of decoupling
>>the I/O path, especially when it comes to locking, but it does
>>have some down-sides.
> 
> 
>>Incidentally, the 'no blocking in the i/o path' thing is why busdma
>>is the way it is with deferred callbacks.  If we didn't have it,
>>PHK's mutex on GEOM would be triggering all the time under heavy
>>load with bounce buffers.
> 
> 
> And presumably busdma would want to sleep in memory(-space) in some
> shape or form ?  Memory which would only become available as other
> I/O requests were completed freeing up the resources in question ?
> 

No, the case I'm talking about is the NetBSD behaviour of sleeping in
bus_dmamap_load() when the bounce pool is empty.

> The issue isn't solved by allowing sleeping, allowing sleeping would
> open us to the case where the driver goes to sleep on a large read
> which needs to allocate some memory resource which is not available,
> and thereby stalling the subsequent write requests in the queue which
> upon completion would make that memory resource avaiable.
> 
> The current situation not ideal, what happens is that the requests
> faults with ENOMEM in the driver and is retried in GEOM above the
> geom_disk class, provided the driver uses disksort(), other requests
> on the queue would get a chance first. (there should actually be a
> flag which kicked the queue into fifo mode on ENOMEM errors (reset
> on empty queue)).
> 
> So while it is not ideal and not optimal, it actually works in the
> common and the extreeme case (physmem=48m, make -j 12 buildworld
> for instance).
> 

We are still at risk of drivers sleeping on mutexes and blocking g_down
from processing further I/O.  This isn't very apparent on a single
controller/single disk configuration, but toss in multiple controllers
and you have the real potential for priority inversions in the I/O path.

> There are many unknowns still, the entire "make bio scatter/gather
> and map/unmapped" thing for instance throws much of this up in the
> air again, so I'm not very keen to apply workarounds at this time.
> 
> The current rules are very restrictive, but they work, once we get
> further down the road, we may find ways to relax it that does not
> compromise the desirable features of our I/O path.  In the meantime
> I'd prefer to keep the handcuffs on, until we know for sure what
> is a better way.
> 
> 

I'm not suggesting anything different, just making a note of something
that might be desirable in the future.  In a way, I see GEOM as having
the potential to be like Netgraph where it intercepts operations that it
wants to process through it's framework and lets ones that it doesn't 
pass directly through without a decoupling through extra kernel threads.
But that's only one possible strategy.  Introducing the concept of a
I/O scheduler that spawns KSE's to handle individual I/O requests is
another possibility.

Scott



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?412D0868.9060203>