From owner-freebsd-current Sat Dec 19 13:11:31 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id NAA18373 for freebsd-current-outgoing; Sat, 19 Dec 1998 13:11:31 -0800 (PST) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id NAA18368 for ; Sat, 19 Dec 1998 13:11:30 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.1/8.9.1) id NAA20772; Sat, 19 Dec 1998 13:11:24 -0800 (PST) (envelope-from dillon) Date: Sat, 19 Dec 1998 13:11:24 -0800 (PST) From: Matthew Dillon Message-Id: <199812192111.NAA20772@apollo.backplane.com> To: Archie Cobbs Cc: freebsd-current@FreeBSD.ORG, "Jordan K. Hubbard" , julian@whistle.com (Julian Elischer), Luoqi Chen Subject: Re: asleep()/await(), M_AWAIT, etc... References: <199812192000.MAA01212@bubba.whistle.com> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :> :> It's something we could work on from the bottom-up. We :> would not have to change everything at once. For example, giving : :Well, since the amount of code to implement it is not that much, :and it allows beneficial changes to be made incrementally and as :appropriate, I don't see any reason why it shouldn't be added. : :-Archie :___________________________________________________________________________ :Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com I'm kinda at odds with myself in regards to my own feelings :-) On the one hand I want to start moving on a new framework to solve deadlock problems and SMP issues. On the otherhand, we have to make 3.0.1 a rock solid release which means no major changes to the core. But, then also, I really believe that Luoqi's fixes to vfs_bio.c need to go in ASAP. Yesterday, if possible! What I would like to do is to start to commit asleep/await as well as an 'inactive' framework of support around those two functions. i.e. to start to add the capability to low level routines such as adding an M_AWAIT flag to malloc, but not actually commit anything that *uses* those capabilities until after the 3.0.1 release. This would help my own testing greatly because I would be able to commit about 70% of the stuff I have dangling in my cvs tree without incuring any operational changes to other people's kernels. Then, after the 3.0.1 release we can start to make real progress on deadlock and SMP issues throughout the system. The same effort required to clean up potential deadlocks will also go a long way towards helping us move the SMP locks deeper into the kernel and to prevent cascade locking failures from being able to occur. For example, if you have a procedure call situation a->b->c where 'c' may currently block, it is not possible to use a simple_lock in b (much less a) that surrounds c. But if c has a non-blocking capability it can call asleep() instead of tsleep() and pop back up to b. b can detect the situation, unlock it's simple_lock and block, then relock and retry. If a is a vnode op and uses a normal (but SMP capable) exclusive lock on the vnode it operates under, and b is able to use a simple_lock for its critical code section (say a is manipulating the vnode and b is manipulating the buffer cache or page tables), then that is all we need to make syscall 'a' operate concurrently in supervisor mode from multiple cpus. Another big area where this has a major effect is the TCP stack: Use normal (but SMP capable) exclusive locks on individual TCBs, and use simple_locks around core route table and mbuf functions that could previously block in a lower-level allocation but now don't because they use M_AWAIT (and thus can release the simple_lock prior to blocking). And, finally, probably the most significant area of effect in a post-3.0.1 commit of code using the new capability would be in read(), and page faults on VFS nodes - especially important when you have a number of processes accessing the same file, shared library, or binary image. Right now if I understand the code correctly, a blocked read or page fault leaves the associated file vnode locked, which effectively serializes any disk I/O on that file and, worse, blocks access to elements of the file that are already in the cache. post-3.0.1 we would be able to issue the underlying disk I/O to the buffer cache and asleep() on the necessary page, then pop back up and unlock the vnode before blocking. This would allow other processes to do I/O on the same vnode, at least for concurrent read()'s on the same file by different processes. Even better, we could use a secondary fcntl lock like subsystem attached to the vnode to guarentee read() and write() atomicy, allowing us to not have to leave an exclusive lock on the vnode itself for the duration of the syscall, giving us 100% concurrency on file I/O with SMP. -Matt Matthew Dillon Engineering, HiWay Technologies, Inc. & BEST Internet Communications & God knows what else. (Please include original email in any response) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message