From owner-freebsd-current Sat Dec 19 03:16:04 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id DAA26322 for freebsd-current-outgoing; Sat, 19 Dec 1998 03:16:04 -0800 (PST) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from gatekeeper.tsc.tdk.com (gatekeeper.tsc.tdk.com [207.113.159.21]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id DAA26317 for ; Sat, 19 Dec 1998 03:16:01 -0800 (PST) (envelope-from gdonl@tsc.tdk.com) Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191]) by gatekeeper.tsc.tdk.com (8.8.8/8.8.8) with ESMTP id DAA27751; Sat, 19 Dec 1998 03:15:50 -0800 (PST) (envelope-from gdonl@tsc.tdk.com) Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194]) by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id DAA28140; Sat, 19 Dec 1998 03:15:49 -0800 (PST) Received: (from gdonl@localhost) by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id DAA12134; Sat, 19 Dec 1998 03:15:48 -0800 (PST) From: Don Lewis Message-Id: <199812191115.DAA12134@salsa.gv.tsc.tdk.com> Date: Sat, 19 Dec 1998 03:15:47 -0800 In-Reply-To: Matthew Dillon "Re: asleep()/await(), M_AWAIT, etc..." (Dec 19, 1:53am) X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95) To: Matthew Dillon , Don Lewis Subject: Re: asleep()/await(), M_AWAIT, etc... Cc: freebsd-current@FreeBSD.ORG Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Dec 19, 1:53am, Matthew Dillon wrote: } Subject: Re: asleep()/await(), M_AWAIT, etc... } } :What happens if some other process decides to truncate the file while } :another process is in the middle of paging in a piece of it? If there } :is no reason to care about this sort of thing, then there is no reason } :to hold the lock across the bread(), which would probably be a simple } In the case of a truncate, this higher level operation will not effect } the lower level I/O in progress (or, if it does abort it, will wakeup } anybody waiting for that page anyway). The wakeup occurs and the } original requesting task retries its vm fault. On this attempt it } notices the fact that the file has been truncated and does the right } thing. Effectively we are retrying an operation 'from scratch', so } the fact that the truncate occured is handled properly. It should work ok if you start 'from scratch', but this might require quite a bit of rewinding. } Another indirect use for asleep() would be to unwind locks when an inner } lock cannot be obtained and to then retry the entire sequence later when } the inner lock 'might' become attainable. You do this by asleep()ing on } the event of the inner lock getting unlocked, then popping back through } the call stack and unwinding the locks you were able to get, then } sleeping (calling await()) at the top level (holding no locks) and retrying } when you wake up again. This wouldn't work very well for complex locking } (4 or more levels), but I would guess that it would work quite nicely } for the 2-layer locking that we typically do in the kernel. One case that I thought of could cause you to spin. lock object (succeeds) MALLOC(..., M_AWAIT) (succeeds) MALLOC(..., M_AWAIT) (fails) After the second MALLOC fails, you'll free the first chunk of memory allocated and unlock the object. When you call await(), it'll succeed because memory is available because of the memory that was just freed. The first MALLOC() will succeed, and the second one will fail again. The way this code is written, it looks like only two levels, but it really is 3 by your criteria and thus is getting into questionable territory. } :} allocation fails would be able to unwind the lock(s), await(), and retry. } :} This is something the current code cannot do at all. } : } :Most things that allocate memory want to scribble on it right after they } :allocate it. Using M_AWAIT would take a fair amount of rewriting. You } :can already do something similar without M_AWAIT by using M_NOWAIT. If } :that fails, unwind the lock, use M_WAITOK, and relock the object. However, } :it would probably be cleaner to just do do MALLOC(..., M_WAITOK) before } :grabbing the lock, if possible. } } The point here is that if you cannot afford to block in the procedure } that is doing the memory allocation, you may be able to block in a } higher level procedure. M_NOWAIT and M_WAITOK cannot cover that } situation at all. M_AWAIT (which is like M_NOWAIT but it calls } asleep() as well as returns NULL) *can*. The only implementation } requirement is that the procedure call chain being implemented with } asleep() understand a temporary failure condition and do the right } thing with it (eventually await() and retry from the top level). This really only buys you something if the memory allocation is conditional. If it is unconditional and you can block in the higher level procedure, then you could just use M_WAITOK there and pass in the buffer (or call a procedure that hides the memory allocation to satisfy your object oriented programming sensibilites), but this is more or less equivalent to doing the allocation at the beginning of the lower level procedure. } :NOTE: some of the softupdates panics before 3.0-RELEASE were caused by } } I think you missed the primary point of asleep()/await(). The idea } is that you pop back through subroutine levels, undoing the entire } operation (or a good portion of it), the 'retry later'. What you } describe is precisely the already-existant situation that asleep() and } await() can be used to fix. This might sound expensive, but most } of the places where we would need to use asleep()/await() would not } actually have to pop back more then a few subroutine levels to be } effective. For pathname traversal, if you give up a lock, I think you have to restart from the beginning. This scheme definitely changes the way you need to think about locking. In the traditional locking scheme, one wants to minimize the time that locks are held in order to minimize contention, and the correct locking order must be obeyed in order to avoid deadlocks. In your scheme, the amount of work done between locking nested locks should be minimized, since all this work must be undone and then redone if the inner lock can't be immediately obtained. One still wants to mimimize the amount of time locks are held, and your scheme does reduce the time the outer locks are held, but it is contention for the inner locks that is costly. Deadlocks should not occur due to incorrect locking order, but performance could be bad if locks are done in the wrong order. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message