Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Feb 2003 16:00:37 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Bosko Milekic <bmilekic@unixdaemons.com>
Cc:        Andrew Gallatin <gallatin@cs.duke.edu>, freebsd-arch@FreeBSD.ORG
Subject:   Re: mb_alloc cache balancer / garbage collector
Message-ID:  <200302180000.h1I00bvl000432@apollo.backplane.com>
References:  <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030217154127.A66206@unixdaemons.com>

next in thread | previous in thread | raw e-mail | index | archive | help
:  What the daemon does is replenish the per-CPU caches (if necessary) in
:  one shot without imposing the overhead on the allocation path.  That
:  is, it'll move a bunch of buckets over to the per-CPU caches if they
:  are under-populated; doing that from the main allocation path is
:  theoretically possible but tends to produce high spiking in latency.
:  So what the daemon basically is is a compromise between doing it in
:  the allocation/free path on-the-fly, and doing it from a parallel
:  thread.  Additionally, the daemon will empty part of the global cache
:...

    Hmm.  Well, you can also replentish the per-CPU caches in-bulk on the fly.
    You simply pull in more then one buffer and you will reap the same
    overhead benefits in the allocation path.  If you depend on a thread
    to do this then you can create a situation where a chronic buffer shortage
    in the per-cpu cache can occur if the thread doesn't get cpu quickly
    enough, resulting in non-optimal operation.  In otherwords, while it
    may seem you are saving latency in the critical path (the network trying
    to allocate a buffer), I think you might actually be creating a situation
    where instead of latency you wind up with a critical shortage.

    I don't think VM interaction is that big a deal.  The VM system has a
    notion of a 'shortage' and a 'severe shortage'.  When you are allocating
    mbufs from the global VM system into the per-cpu cache you simply 
    allocate up to <hysteresis> into the cache or until the VM system gets
    low (but not severely low) on memory.  The hysteresis does not have to
    be much to reap the benefits and mitigate the overhead of the global
    mutex(es)... just 5 or 10 mbufs would mitigate global mutex overhead
    to the point where it becomes irrelevant.

    By creating a thread you are introducing more moving parts, and like
    a physical system these moving parts are going to ineract with each
    other.  Remember, the VM system is *already* trying to ensure that 
    enough free pages exist in the system.  If you have a second thread
    eating memory in large globs it is far more likely that you will
    destabilize the pageout daemon and create an oscillation between the
    two threads (pageout daemon and your balancer).  This might not turn up
    in benchmarks (which tend to focus on just one subsystem), but it could
    lead to some pretty nasty degenerate cases under heavy general loads.
    I think it is far better to let the VM system do its job and pull the
    mbufs in on-the-fly in smaller chunks which are less likely to destabilize
    the pageout daemon.

    This can be exasperated... made even worse, if your balancing thread is
    given a high priority.  So you have the potential to starve the mbuf
    system if the balancing thread is too LOW a priority, and the potential
    to destabilize the VM system if the balancing thread is too HIGH a
    priority.

    Also, it seems to me that VM overheads are better addressed in the
    UMA subsystem, not in a leaf allocation subsystem.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200302180000.h1I00bvl000432>