Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Dec 1998 13:11:24 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Archie Cobbs <archie@whistle.com>
Cc:        freebsd-current@FreeBSD.ORG, "Jordan K. Hubbard" <jkh@zippy.cdrom.com>, julian@whistle.com (Julian Elischer), Luoqi Chen <luoqi@watermarkgroup.com>
Subject:   Re: asleep()/await(), M_AWAIT, etc...
Message-ID:  <199812192111.NAA20772@apollo.backplane.com>
References:   <199812192000.MAA01212@bubba.whistle.com>

next in thread | previous in thread | raw e-mail | index | archive | help
:> 
:>     It's something we could work on from the bottom-up.  We
:>     would not have to change everything at once.  For example, giving
:
:Well, since the amount of code to implement it is not that much,
:and it allows beneficial changes to be made incrementally and as
:appropriate, I don't see any reason why it shouldn't be added.
:
:-Archie
:___________________________________________________________________________
:Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com

    I'm kinda at odds with myself in regards to my own feelings :-)  On the
    one hand I want to start moving on a new framework to solve deadlock
    problems and SMP issues.  On the otherhand, we have to make 3.0.1 a rock
    solid release which means no major changes to the core.  But, then also,
    I really believe that Luoqi's fixes to vfs_bio.c need to go in ASAP. 
    Yesterday, if possible! 

    What I would like to do is to start to commit asleep/await as well as an 
    'inactive' framework of support around those two functions.
    i.e. to start to add the capability to low level routines such as 
    adding an M_AWAIT flag to malloc, but not actually commit anything 
    that *uses* those capabilities until after the 3.0.1 release.  This would
    help my own testing greatly because I would be able to commit about 70% of
    the stuff I have dangling in my cvs tree without incuring any operational
    changes to other people's kernels.

    Then, after the 3.0.1 release we can start to make real progress on
    deadlock and SMP issues throughout the system.  The same effort required 
    to clean up potential deadlocks will also go a long way towards helping
    us move the SMP locks deeper into the kernel and to prevent cascade locking
    failures from being able to occur.

    For example, if you have a procedure call situation a->b->c  where 'c' 
    may currently block, it is not possible to use a simple_lock in b 
    (much less a) that surrounds c.  But if c has a non-blocking capability
    it can call asleep() instead of tsleep() and pop back up to b.  b can 
    detect the situation, unlock it's simple_lock and block, then relock and
    retry.  If a is a vnode op and uses a normal (but SMP capable) exclusive
    lock on the vnode it operates under, and b is able to use a simple_lock
    for its critical code section (say a is manipulating the vnode and b is
    manipulating the buffer cache or page tables), then that is all we need 
    to make syscall 'a' operate concurrently in supervisor mode from multiple
    cpus.

    Another big area where this has a major effect is the TCP stack:  Use 
    normal (but SMP capable) exclusive locks on individual TCBs, and use 
    simple_locks around core route table and mbuf functions that could
    previously block in a lower-level allocation but now don't because they
    use M_AWAIT (and thus can release the simple_lock prior to blocking).

    And, finally, probably the most significant area of effect in a post-3.0.1
    commit of code using the new capability would be in read(), and
    page faults on VFS nodes - especially important when you have a number
    of processes accessing the same file, shared library, or binary image.
    Right now if I understand the code correctly, a blocked read or page fault
    leaves the associated file vnode locked, which effectively serializes any
    disk I/O on that file and, worse, blocks access to elements of the file
    that are already in the cache.   post-3.0.1 we would be able to issue the
    underlying disk I/O to the buffer cache and asleep() on the necessary
    page, then pop back up and unlock the vnode before blocking.  This would
    allow other processes to do I/O on the same vnode, at least for concurrent
    read()'s on the same file by different processes.  Even better, we could
    use a secondary fcntl lock like subsystem attached to the vnode to
    guarentee read() and write() atomicy, allowing us to not have to leave an
    exclusive lock on the vnode itself for the duration of the syscall,
    giving us 100% concurrency on file I/O with SMP.

						-Matt

    Matthew Dillon  Engineering, HiWay Technologies, Inc. & BEST Internet 
                    Communications & God knows what else.
    <dillon@backplane.com> (Please include original email in any response)    


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199812192111.NAA20772>