From owner-freebsd-current Sat Dec 19 01:53:10 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id BAA18859 for freebsd-current-outgoing; Sat, 19 Dec 1998 01:53:10 -0800 (PST) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id BAA18854 for ; Sat, 19 Dec 1998 01:53:09 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.1/8.9.1) id BAA07138; Sat, 19 Dec 1998 01:53:06 -0800 (PST) (envelope-from dillon) Date: Sat, 19 Dec 1998 01:53:06 -0800 (PST) From: Matthew Dillon Message-Id: <199812190953.BAA07138@apollo.backplane.com> To: Don Lewis Cc: freebsd-current@FreeBSD.ORG Subject: Re: asleep()/await(), M_AWAIT, etc... References: <199812190844.AAA11936@salsa.gv.tsc.tdk.com> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :On Dec 17, 12:05am, Matthew Dillon wrote: :} Subject: asleep()/await(), M_AWAIT, etc... : :} We add an await() kernel function. This function initiates any timeout :} and puts the process to sleep, but only if it is still on a sleep queue. :} If someone (i.e. an interrupt) wakes up the sleep address after the :} process calls asleep() but before it calls await(), the slpque is :} cleared and the await() winds up being a NOP. : :How likely is this to happen if the process doesn't go to sleep for some :other reason inbetween the asleep() and the await()? The CPU can execute :a *lot* of code in the time it takes for physical I/O to happen. Well, the idea is for asleep() to not interfere with a normal sleep. If a process does an asleep() and then, for some reason, does a normal sleep or another asleep() without waiting for the prior event to occur, the original asleep() condition is lost and an await() later on that, code-wise, was expecting to wait for the condition earmarked by the original asleep() will not wait for it, instead causing an immediate return and thus an immediate retry. This shouldn't cause a problem, though. The chance of a condition being signalled after an asleep() but before the associated await(), assuming no blocking inbetween, is not very high but I expect it would happen under normal operating conditions maybe 1 out of every 5000 or so uses. The situation becomes more interesting when you get into SMP situations, especially once we start allowing all N processors to enter into supervisor mode and run mainstream supervisor code simultaniously. It should be noted that event interlocks can be done very easily with asleep()/await() without having to mess with the ipl mask. Since the ipl mask doesn't work when SMP supervisor operation is allowed on > 1 cpu at a time, it is just as well that another mechanism exists. :} The purpose of the new routines is to allow blocking conditions to :} propogate up a subroutine chain and get handled at a higher level rather :} then at a lower level in those areas of code that cannot afford to :} leave exclusive locks sitting around. For example, if bread() blocks :} waiting for a low level disk I/O on a block device, the vnode remains :} locked throughout which badly mars potential parallelism when multiple :} programs are accessing the same file. There is no reason to leave the :} high level vnode locked while bringing a page into the VM buffer cache! : :What happens if some other process decides to truncate the file while :another process is in the middle of paging in a piece of it? If there :is no reason to care about this sort of thing, then there is no reason :to hold the lock across the bread(), which would probably be a simple Well, in this particular case we don't care because it isn't the pagein into the process's VM space that we are waiting on, it's the bringing of the page from the underlying block device into the filesystem cache, which is independant of the overlayed filesystem structure and was queued to the disk device on the original attempt. In the case of a truncate, this higher level operation will not effect the lower level I/O in progress (or, if it does abort it, will wakeup anybody waiting for that page anyway). The wakeup occurs and the original requesting task retries its vm fault. On this attempt it notices the fact that the file has been truncated and does the right thing. Effectively we are retrying an operation 'from scratch', so the fact that the truncate occured is handled properly. Another indirect use for asleep() would be to unwind locks when an inner lock cannot be obtained and to then retry the entire sequence later when the inner lock 'might' become attainable. You do this by asleep()ing on the event of the inner lock getting unlocked, then popping back through the call stack and unwinding the locks you were able to get, then sleeping (calling await()) at the top level (holding no locks) and retrying when you wake up again. This wouldn't work very well for complex locking (4 or more levels), but I would guess that it would work quite nicely for the 2-layer locking that we typically do in the kernel. :} allocation fails would be able to unwind the lock(s), await(), and retry. :} This is something the current code cannot do at all. : :Most things that allocate memory want to scribble on it right after they :allocate it. Using M_AWAIT would take a fair amount of rewriting. You :can already do something similar without M_AWAIT by using M_NOWAIT. If :that fails, unwind the lock, use M_WAITOK, and relock the object. However, :it would probably be cleaner to just do do MALLOC(..., M_WAITOK) before :grabbing the lock, if possible. The point here is that if you cannot afford to block in the procedure that is doing the memory allocation, you may be able to block in a higher level procedure. M_NOWAIT and M_WAITOK cannot cover that situation at all. M_AWAIT (which is like M_NOWAIT but it calls asleep() as well as returns NULL) *can*. The only implementation requirement is that the procedure call chain being implemented with asleep() understand a temporary failure condition and do the right thing with it (eventually await() and retry from the top level). :There may be cases where this is not possible. For example, the amount of :the memory you need to allocate depends on the object that you have locked. Oh, certainly, but asleep/await do not have to be implemented everywhere, only in those places where it makes sense to. We aren't removing any of the prior functionality, we are adding new functionality to allow us to solve deadlock situations that occur with the old functionality. :If you have the object unlocked while the memory is being allocated, another :process may touch the object while it is unlocked and you'll end up allocating :the wrong amount of memory. The only scheme that works in this case is :locking the object first and leaving it locked across MALLOC(..., M_WAITOK). : :NOTE: some of the softupdates panics before 3.0-RELEASE were caused by I think you missed the primary point of asleep()/await(). The idea is that you pop back through subroutine levels, undoing the entire operation (or a good portion of it), the 'retry later'. What you describe is precisely the already-existant situation that asleep() and await() can be used to fix. This might sound expensive, but most of the places where we would need to use asleep()/await() would not actually have to pop back more then a few subroutine levels to be effective. -Matt :vnodes inadvertently being unlocked and then relocked in some low level :routines, which allowed files to be fiddled with by one process while :another process thought it had exclusive access. Matthew Dillon Engineering, HiWay Technologies, Inc. & BEST Internet Communications & God knows what else. (Please include original email in any response) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message