From owner-freebsd-current Sat Mar 1 6:12: 0 2003 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 830F337B401 for ; Sat, 1 Mar 2003 06:11:58 -0800 (PST) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4174E43FA3 for ; Sat, 1 Mar 2003 06:11:57 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id BAA17092; Sun, 2 Mar 2003 01:11:30 +1100 Date: Sun, 2 Mar 2003 01:13:07 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Poul-Henning Kamp Cc: "M. Warner Losh" , Subject: Re: Any ideas why we can't even boot a i386 ? In-Reply-To: <56631.1046466626@critter.freebsd.dk> Message-ID: <20030302001614.H26391-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, 28 Feb 2003, Poul-Henning Kamp wrote: > My main concern would be if the chips have the necessary "umphf" > to actually do a real-world job once they're done running all the > overhead of 5.0-R. The lack of cmpxchg8 makes the locking horribly > expensive. Actually, the lack of cmpxchg8 only makes locking more expensive. It's hard to say how long cmpxchg8 would take on i386's if i386's had it, but it involves memory accesses which i386's are especially poor at, so I guess it would take about 2/3 as long as the main extra instructions that we use in the CPU_I386 case (pushfl: 4 cycles; cli: 3 cycles; popfl: 5 cycles). Actual testing on an Athlon1600XP in userland for the core of mtx_lock_*() + mtx_unlock_*(), namely atomic_cmpset_acq_ptr() + atomic_cmpset_rel_ptr(), run in a loop (cycle counts include loop overhead): 10 cycles in the !CPU_i386 case 42 cycles in the CPU_I386 case 36 cycles in the CPU_I386 case with cli removed 12 cycles in the CPU_I386 case with cli removed and popfl changed to "addl $4,%%esp" 9 cycles in the CPU_I386 case with pushfl, cli and popfl removed So the i386 code is almost the same speed on AthlonXP's in user mode except for the expensive cli and popfl instructions. However, these instructions aren't so relatively expensive on plain i386's; i386's are just generally slow and their privileged instructions aren't much slower than their integer instructions. The relative slowdown for the full mtx_*lock*() functions would be smaller since these functions do more. mtx_unlock_spin() uses atomic_store_rel_ptr() so it doesn't go near cmpxchg8 or cli and the above times the acquire/release times are almost twice as small as above. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message