Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Jun 2004 13:20:08 -0400 (EDT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Ali Niknam <ali@transip.nl>
Cc:        John Baldwin <jhb@FreeBSD.org>
Subject:   Re: FreeBSD 5.2.1: Mutex/Spinlock starvation?
Message-ID:  <Pine.NEB.3.96L.1040608131347.75106A-100000@fledge.watson.org>
In-Reply-To: <00bd01c44cb5$ccf5f840$0400a8c0@redguy>

next in thread | previous in thread | raw e-mail | index | archive | help

On Mon, 7 Jun 2004, Ali Niknam wrote:

> > There isn't a timeout.  Rather, the lock spins so long as the current
> > owning thread is executing on another CPU.
> 
> Interesting. Is there a way to 'lock' CPU's so that they always run on
> 'another' CPU ?
> 
> Unfortunately as we speak the server is down again :( This all makes me
> wonder wether I should simply go back to 4.10.

No one would blame you for backing off -CURRENT to -STABLE.  On the other
hand, having high workloads against -CURRENT is going to be critical to
identifying weaknesses in -CURRENT so we can improve them.  Unfortunately,
it's something of a chicken-and-egg problem...

> I decreased the maximum number of apache children to 1400 and the server
> seems to be barely holding on:
> last pid:  2483;  load averages: 75.77, 28.63, 11.40    up 0+00:04:32
> 19:35:07
> 1438 processes:2 running, 294 sleeping, 1142 lock
> CPU states:  6.2% user,  0.0% nice, 62.6% system,  7.5% interrupt, 23.8%
> idle
> Mem: 698M Active, 27M Inact, 209M Wired, 440K Cache, 96M Buf, 1068M Free
> Swap: 512M Total, 512M Free
> 
> Are there anymore quite stable things to do ? That is except for upping
> to current, which I frankly feel is too dangerous...

There are a number of known weaknesses in 5.2.1 that are resolved in
-CURRENT, but the update would also involve substantial risk as there's
some heavy moving going on in -CURRENT to improve network performance,
etc.  I haven't followed some of your system description in details, but
it seems like the primary thing to do right now, assuming you are still
able to keep 5.2.1 running on the box and are able to futz with the
configuration some, is to identify the specific source of the problem
you're experiencing.  Clearly, too much work is going on in the kernel. 
The question is, what work.  It's likely you're running into an expensive
edge case, it's possible it's resolved in HEAD, and it could be that a low
risk back port would resolve it.  It's also possible you're running into
an unresolved problem in HEAD.

The best case scenario from my perspective would be that you could provide
an equivilent workload against a test box where we could experiment with a
number of debugging settings, as well as simply trying -CURRENT...  It
sounds like we've tried some of the easy plugs, such as switching
schedulers, enabling adaptive mutexes, etc.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Senior Research Scientist, McAfee Research



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1040608131347.75106A-100000>