Date: Mon, 3 Mar 2003 19:55:36 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: Terry Lambert <tlambert2@mindspring.com> Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, "M. Warner Losh" <imp@bsdimp.com>, <current@FreeBSD.ORG> Subject: Re: Any ideas why we can't even boot a i386 ? Message-ID: <20030303191604.V33196-100000@gamplex.bde.org> In-Reply-To: <3E6257B1.32AB9644@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 2 Mar 2003, Terry Lambert wrote: > Poul-Henning Kamp wrote: > > In message <20030303034332.Y30986-100000@gamplex.bde.org>, Bruce Evans writes: > > >On Sun, 2 Mar 2003, Bruce Evans wrote: > > >> On Fri, 28 Feb 2003, Poul-Henning Kamp wrote: > > >> > My main concern would be if the chips have the necessary "umphf" > > >> > to actually do a real-world job once they're done running all the > > >> > overhead of 5.0-R. The lack of cmpxchg8 makes the locking horribly > > >> > expensive. > > >> > > >> Actually, the lack of cmpxchg8 only makes locking more expensive. It's > > > > > >I.e., strictly more expensive, but not much more. > > > > Bruce, it is not a matter of the relative expensiveness of the various > > implementations of locking primitives, its a matter of the cummulative > > weight of all the locks we add to the system. Of course. A 400% pessimization of locking primitive turns into more like a 100% pessimization of locking non-primitives (at the level of mtx_lock()). Since the kernel spends only a few percent of its time (1% say) of its time in non-i386 locking non-primitives, pessimizing these non-primitives by 100% costs 1%. Since the system spends only a few percent of its time in the kernel (10% say), pessimizing the non-primitives costs a whole 0.1%. (0.1% is getting near the resolution of my benchmarkmethod (run makeworld on a fairly idle machine after rebooting).) > Bruce's "make world" benchmark gave coverage of the cumulative > weight, in support of his point. Indeed. Actual testing showed costs of 3.2% (kernel) and 0.3% (real). I made up the percentages of 1% and 0.1% by reducing these a bit (I386_CPU pessimizes more than locking, but the exact percentages aren't very interesting since they are so small). Of course there are loads that use more locks than makeworld (mainly networking with tinygrams I think -- this is why no one except networking people and benchmarkers even notice that -current is much worse than RELENG_4 (slowness from debugging options is irrelevant)). I wrote: > To get the system to run I had to unbreak panicifcpuunsupported() so > that it doesn't gratuitously reject Athlons (CPUs that are upward > compatible should not be rejected), and had to replace pmap.o by the > non-386 version since the 386 version caused strange errors. It's not > clear why the 386 version doesn't work -- the only internal difference > in pmap.c is that the 386 version uses invltlb() and other versions > use invlpg(). Using invlpg() would probably make things more than > 0.3% slower. Selecting the best inv*() was the main optimization that > we dropped when 386 support was made incompatible with support for later > CPUs. Configuring DISABLE_PG_G fixed the problem with pmap. invlpg() apparently doesn't work with global pages, but I386_CPU doesn't stop global pages being configured. > world with my kernel configured for I486_CPU through I686_CPU > %%% > 1532 MHz AthlonXP 1600 256MB 2 ATA drives > async mounted /usr/obj (src on separate drive) > -------------------------------------------------------------- > >>> elf make world completed on Sun Mar 2 16:30:55 EST 2003 > (started Sun Mar 2 15:53:15 EST 2003) > -------------------------------------------------------------- > 2260.31 real 1729.55 user 326.24 sys > 40208 maximum resident set size > 2248 average shared memory size > 1762 average unshared data size > 127 average unshared stack size > 14959205 page reclaims > 25630 page faults > 0 swaps > 43481 block input operations > 3963 block output operations > 0 messages sent > 0 messages received > 5 signals received > 313523 voluntary context switches > 607085 involuntary context switches > %%% > > world with my kernel configured for I386_CPU except for pmap.o > %%% > 1532 MHz AthlonXP 1600 256MB 2 ATA drives > async mounted /usr/obj (src on separate drive) > -------------------------------------------------------------- > >>> elf make world completed on Mon Mar 3 03:00:45 EST 2003 > (started Mon Mar 3 02:22:57 EST 2003) > -------------------------------------------------------------- > 2267.98 real 1730.21 user 336.73 sys > 40208 maximum resident set size > 2245 average shared memory size > 1756 average unshared data size > 127 average unshared stack size > 14958931 page reclaims > 26265 page faults > 0 swaps > 44148 block input operations > 3898 block output operations > 0 messages sent > 0 messages received > 6 signals received > 313986 voluntary context switches > 598687 involuntary context switches > %%% As might be expected, using invlpg() and not using global pages costs more than pessimizing locking primitives. It costs an additional 1.6% (real) and 5.9% (sys): world with my kernel configured for I386_CPU and DISABLE_PG_G %%% 1532 MHz AthlonXP 1600 256MB 2 ATA drives async mounted /usr/obj (src on separate drive) -------------------------------------------------------------- >>> elf make world completed on Mon Mar 3 04:51:15 EST 2003 (started Mon Mar 3 04:12:51 EST 2003) -------------------------------------------------------------- 2303.46 real 1749.21 user 356.69 sys 40208 maximum resident set size 2240 average shared memory size 1746 average unshared data size 127 average unshared stack size 14959130 page reclaims 25845 page faults 0 swaps 43561 block input operations 3932 block output operations 0 messages sent 0 messages received 5 signals received 311492 voluntary context switches 623634 involuntary context switches %%% Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030303191604.V33196-100000>