Date: Mon, 18 Feb 2002 21:34:07 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Greg Lehey <grog@FreeBSD.ORG> Cc: arch@FreeBSD.ORG, jhb@FreeBSD.ORG Subject: Re: buildworld comparison stable vs current Message-ID: <200202190534.g1J5Y7s58322@apollo.backplane.com> References: <200202170818.g1H8ID067573@apollo.backplane.com> <20020219125840.B2835@sydney.worldwide.lemis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:Have you done any profiling? Also, how many CPUs are there? : :As I said on Friday, I really think we should be doing more :performance measurements, including measuring what's going on at the :lock level. : :Greg The test boxes are DELL2550's, so 2xCPU (1.1GHz pentium III's), ECC memory, SCSI drives. A couple of things have become obvious during my own testing: * Context switching for every interrupt is expensive stable: 2467550 voluntary context switches (buildworld -j 10) current: 23879443 voluntary context switches (buildworld -j 10) buildworld was causing 10761 context switches per second on the current box, which is one context switch every 100 uS which is a serious burden. * Mutexes are expensive calls. Looking at the getuid() stats with and without Giant in userret: unpatched userret, kern.giant.ucred=1 (default) 1 process: 683K patched userret, kern.giant.ucred=1 (default) 1 process: 739K Here we have a single mutex being locked and unlocked adds 8% of overhead to the system call. This is one reason why the new ucred stuff and the timecounter modules are so nice, because they manages to entirely do away with most mutex operations in the critical path. We need to do more of that. * Context switching when a mutex is contested is extremely expensive. With two processes dueling for Giant I observed 400K calls/sec on one occassion and 200K calls/sec with the same exact setup on another occassion, which I tracked down to Giant being constantly contested and sleeping instead of spinning on the second occassion. What was really worrying here was that the results were non deterministic. Sometimes I would get the higher run rate, sometimes I would get the lower run rate, with no rhyme or reason. In anycase, just to clarify, those original numbers, 1800 vs 2219 seconds, were wrong. -current was running with default malloc options of 'J'. The correct numbers are 1800 vs 2097. -Matt Matthew Dillon <dillon@backplane.com> To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200202190534.g1J5Y7s58322>