Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Feb 2002 12:32:48 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Jeff Roberson <jroberson@chesapeake.net>, arch@FreeBSD.ORG
Subject:   Re: Slab allocator
Message-ID:  <3C7D4270.F285F888@mindspring.com>
References:  <20020227005915.C17591-100000@mail.chesapeake.net> <200202271926.g1RJQCm29905@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Matthew Dillon wrote:
>     Well, one thing I've noticed right off the bat is that the code
>     is trying to take advantage of per-cpu queues but is still
>     having to obtain a per-cpu mutex to lock the per-cpu queue.

I disliked this as well; I think we can help him work around
this problem, though, based on why he feels that it's needed;
going of half-cocked about it is not going to solve anything,
but knowing the requirements trace for the design decision
will.

Even so, a per-CPU mutex takes it out of the global contention
domain, so that's something.

Use of a mutex, which I think is, by definition, supposed to
be synchronized between CPUs is the only real downside, if we
assume that a per-CPU lock is in fact necessary for these
things... it should be possible to use an intra-CPU primitive
to implement that is much, much cheaper than the inter-CPU
version that's in the mutex implementation.

>     Another thing I noticed is that the code appears to assume
>     that PCPU_GET(cpuid) is stable in certain places, and I don't
>     think that condition necessarily holds unless you explicitly
>     enter a critical section (critical_enter() and critical_exit()).
>     There are some cases where you obtain the per-cpu cache and lock
>     it, which would be safe even if the cpu changed out from under
>     you, and other case such as in uma_zalloc_internal() where you
>     assume that the cpuid is stable when it isn't.

I saw this as well.  This seemed to be much less of a problem,
to me, since I think that the issue of forced kernel preemption
resulting in running on another CPU is currently moot.  In the
long run, I think it will be mostly safe to assume that the CPU
you are running on wil not change, except under extraordinary
conditions (migration), which in turn could be deferred or even
prevented, using a "don't migrate bit".

Without per CPU scheduler queues, this is currently a danger,
but it's a really minor one, in the scheme of things.

In the per CPU scheduler queue case, the migration should have
to be explicit based on a figure of merit.  The main code path
is therefore lockless.  The way a migration occurs is to have
the load on a single CPU exceed some watermark relative to the
overall system load, which can be calculated using atomic figures
of merit that need not be locked to be read per CPU, within a CPU
cluster.  If migration is a "go", then you grab a lock on the
"give away" queue for the CPU you are giving it to, and push it
over there.  That CPU, in turn, checks this queue at the start
of the sheduler cycle to see if it is empty (this check is thus
also lockless in the common case).  If there are processes
pending migration to the CPU, then (and only then) does it
acquire the lock, and migrate them to the local (lockless)
scheduler queue, after which it releases the lock again.  Any
contention which occurs will be between 2 CPUs, not N.

Obviously, this is future work.


>     I also noticed that cache_drain() appears to be the only
>     place where you iterate through the per-cpu mutexes.  All
>     the other places appear to use the current-cpu's mutex.

I am not happy with the "cache drain".  I expect that the way
I would do this is not through stealing, but through notification,
which could, similarly, end up being lockless.

I think the "stealing" case is also an extraordinary condition,
and so I'm not concerned about optimizing it as if it were a
common case.  Consider that the latency introduced will be no
more of a stumbling block for the system than the pool retention
limits for the high water mark being preterbed slightly, so as
to cause the same behaviour.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C7D4270.F285F888>