From owner-freebsd-alpha Thu Sep 5 1:48:47 2002 Delivered-To: freebsd-alpha@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1378C37B400; Thu, 5 Sep 2002 01:48:41 -0700 (PDT) Received: from srv1.cosmo-project.de (srv1.cosmo-project.de [213.83.6.106]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0E22843E4A; Thu, 5 Sep 2002 01:48:40 -0700 (PDT) (envelope-from ticso@cicely5.cicely.de) Received: from cicely5.cicely.de (cicely5.cicely.de [IPv6:3ffe:400:8d0:301:200:92ff:fe9b:20e7]) by srv1.cosmo-project.de (8.12.5/8.12.5) with ESMTP id g858mR6i068685 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=OK); Thu, 5 Sep 2002 10:48:31 +0200 (CEST) (envelope-from ticso@cicely5.cicely.de) Received: from cicely5.cicely.de (localhost [IPv6:::1]) by cicely5.cicely.de (8.12.1/8.12.1) with ESMTP id g858mQUV092712 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 5 Sep 2002 10:48:26 +0200 (CEST)?g (envelope-from ticso@cicely5.cicely.de) Received: (from ticso@localhost) by cicely5.cicely.de (8.12.1/8.12.1/Submit) id g858mPtQ092711; Thu, 5 Sep 2002 10:48:25 +0200 (CEST)?g (envelope-from ticso) Date: Thu, 5 Sep 2002 10:48:25 +0200 From: Bernd Walter To: Andrew Gallatin Cc: ticso@cicely.de, John Baldwin , freebsd-alpha@FreeBSD.ORG Subject: Re: alpha performance on -current Message-ID: <20020905084824.GJ87724@cicely5.cicely.de> Reply-To: ticso@cicely.de References: <15734.29725.515274.183629@grasshopper.cs.duke.edu> <20020904223255.GI87724@cicely5.cicely.de> <15734.39976.326598.176742@grasshopper.cs.duke.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <15734.39976.326598.176742@grasshopper.cs.duke.edu> X-Operating-System: FreeBSD cicely5.cicely.de 5.0-CURRENT i386 User-Agent: Mutt/1.5.1i Sender: owner-freebsd-alpha@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Wed, Sep 04, 2002 at 07:50:00PM -0400, Andrew Gallatin wrote: > > Bernd Walter writes: > > On Wed, Sep 04, 2002 at 04:59:09PM -0400, Andrew Gallatin wrote: > > > > > > # ./lat_syscall null > > > Simple syscall: 2.0178 microseconds > > > # ./lat_syscall null > > > Simple syscall: 2.0333 microseconds > > > # sysctl -w kern.giant.proc=0 > > > kern.giant.proc: 1 -> 0 > > > # ./lat_syscall null > > > Simple syscall: 1.6360 microseconds > > > # ./lat_syscall null > > > Simple syscall: 1.6333 microseconds > > > > > > > > > Is the locking overhead this bad on x86? It looks downright > > > embarrassing on alpha. Can anything be done about it? Are the > > > memory barriers in atomic_cmpset_acq_* really needed? They have the > > > look of belt & suspenders code.. > > > > They are needed in _acq_ and _rel_ because they are used to build > > mutexes. > > atomic_cmpset_acq_64 calls atomic_cmpset_64() followed by a memory barrier.. > atomic_cmpset_64() ends with a memory barrier. So isn't the > memory barrier in atomic_cmpset_acq_64() extranious? Eg, you have 2 > memory barriers back to back. The one in atomic_cmpset_64 is obsolete. atomic_cmpset_acq_64 is correct. The problem I have with removing the barrier in atomic_cmpset_64 and the like is that it makes an additional difference to i386 and I think we already have enough alpha only bugs to build a new set just for a bit performance. > At any rate, the overhead just plain stinks. At nearly 0.5 us per > mutex, they are just way too expensive. > > > You even need them on UP systems to enshure instruction order. > > Alphas older than ev6 are in-order processors with regard to > instruction order, so presumably they are not needed on ev56 and older > machines? Sounds logical. But if you want to implement specialized tuning in regards of CPU and UP, then there is much more that can be done. Terry has brought up many interessting ideas of what can be done if you know your machine. > > Some barriers can be removed in other atomic functions or replaced > > with write barriers. > > Currently our alpha_wmb() is mappend to mb, which is done because it's > > also done in NetBSD - possibly as a workaround for a unnamed problem. > > > > > FWIW, The appended diff to remove them reducess null system call > > > latency to 1.6us with kern.giant.proc=1, and 1.4us with > > > kern.giant.proc=0. I'm about to start a buildworld with it, but I > > > don't have any SMP boxes. > > > > What kind of CPU is in your XP1000? > > 21264. > > > On >= 21164 the CPU can be initialised into UP operation so that mb > > and wmb fall back to just an contraint on instruction order. > > Also locked instructions can be handled inside the CPU in that case. > > Please elaborate. How does this work with respect to devices? That > sounds quite scary for the atomic code in general. What I wrote was only the short story for the intended use of the atomic functions. Memory barries still have their normal semantics in respect to other memory regions so if you write data into memory, do a barrier and tell the device to dma it, then the barrier still works because the device register is marked uncacheable. But barriers shouldn't be very expensive, what makes them expensive is the required combination with locked instuctions. > FWIW, my machine survived the buildworld. The removal of the memory > barriers chopped nearly 7 minutes off the time: There is not a big chance to loose the insturction ordering race I guess. > with MB: > 8699.25 real 6985.64 user 1379.72 sys > > without MB: > 8298.44 real 7010.03 user 969.02 sys Would be insteresting what happens if you remove only the obsolete barriers. What is required is the following: ... /* acq */ atomic (locked) memory access mb ... /* rel */ mb atomic (locked) memory access ... But what we currently have is ... /* acq */ atomic (locked) memory access mb mb ... /* rel */ mb atomic (locked) memory access mb ... -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message