Date: Tue, 8 Jun 2004 13:20:08 -0400 (EDT) From: Robert Watson <rwatson@FreeBSD.org> To: Ali Niknam <ali@transip.nl> Cc: John Baldwin <jhb@FreeBSD.org> Subject: Re: FreeBSD 5.2.1: Mutex/Spinlock starvation? Message-ID: <Pine.NEB.3.96L.1040608131347.75106A-100000@fledge.watson.org> In-Reply-To: <00bd01c44cb5$ccf5f840$0400a8c0@redguy>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 7 Jun 2004, Ali Niknam wrote: > > There isn't a timeout. Rather, the lock spins so long as the current > > owning thread is executing on another CPU. > > Interesting. Is there a way to 'lock' CPU's so that they always run on > 'another' CPU ? > > Unfortunately as we speak the server is down again :( This all makes me > wonder wether I should simply go back to 4.10. No one would blame you for backing off -CURRENT to -STABLE. On the other hand, having high workloads against -CURRENT is going to be critical to identifying weaknesses in -CURRENT so we can improve them. Unfortunately, it's something of a chicken-and-egg problem... > I decreased the maximum number of apache children to 1400 and the server > seems to be barely holding on: > last pid: 2483; load averages: 75.77, 28.63, 11.40 up 0+00:04:32 > 19:35:07 > 1438 processes:2 running, 294 sleeping, 1142 lock > CPU states: 6.2% user, 0.0% nice, 62.6% system, 7.5% interrupt, 23.8% > idle > Mem: 698M Active, 27M Inact, 209M Wired, 440K Cache, 96M Buf, 1068M Free > Swap: 512M Total, 512M Free > > Are there anymore quite stable things to do ? That is except for upping > to current, which I frankly feel is too dangerous... There are a number of known weaknesses in 5.2.1 that are resolved in -CURRENT, but the update would also involve substantial risk as there's some heavy moving going on in -CURRENT to improve network performance, etc. I haven't followed some of your system description in details, but it seems like the primary thing to do right now, assuming you are still able to keep 5.2.1 running on the box and are able to futz with the configuration some, is to identify the specific source of the problem you're experiencing. Clearly, too much work is going on in the kernel. The question is, what work. It's likely you're running into an expensive edge case, it's possible it's resolved in HEAD, and it could be that a low risk back port would resolve it. It's also possible you're running into an unresolved problem in HEAD. The best case scenario from my perspective would be that you could provide an equivilent workload against a test box where we could experiment with a number of debugging settings, as well as simply trying -CURRENT... It sounds like we've tried some of the easy plugs, such as switching schedulers, enabling adaptive mutexes, etc. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1040608131347.75106A-100000>