From owner-freebsd-current@FreeBSD.ORG Wed May 5 18:55:25 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3D2D916A4CF for ; Wed, 5 May 2004 18:55:25 -0700 (PDT) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 91F7843D46 for ; Wed, 5 May 2004 18:55:24 -0700 (PDT) (envelope-from scottl@freebsd.org) Received: from freebsd.org (junior-wifi.samsco.home [192.168.0.11]) by pooker.samsco.org (8.12.10/8.12.10) with ESMTP id i461xTu6030022; Wed, 5 May 2004 19:59:29 -0600 (MDT) (envelope-from scottl@freebsd.org) Message-ID: <40999AED.9080606@freebsd.org> Date: Wed, 05 May 2004 19:54:53 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040304 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Gerrit Nagelhout References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, hits=0.0 required=3.8 tests=none autolearn=no version=2.63 X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on pooker.samsco.org cc: freebsd-current@freebsd.org cc: 'Andrew Gallatin' Subject: Re: 4.7 vs 5.2.1 SMP/UP bridging performance X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 May 2004 01:55:25 -0000 Gerrit Nagelhout wrote: > Andrew Gallatin wrote: > >>Bruce Evans writes: >> >> > >> > Athlon XP2600 UP system: !SMP case: 22 cycles SMP case: >>37 cycles >> > Celeron 366 SMP system: 35 48 >> > >> > The extra cycles for the SMP case are just the extra cost >>of a one lock >> > instruction. Note that SMP should cost twice as much >>extra, but the >> > non-SMP atomic_store_rel_int(&slock, 0) is pessimized by >>using xchgl >> > which always locks the bus. After fixing this: >> > >> > Athlon XP2600 UP system: !SMP case: 6 cycles SMP case: >>37 cycles >> > Celeron 366 SMP system: 10 48 >> > >> > Mutexes take longer than simple locks, but not much longer >>unless the >> > lock is contested. In particular, they don't lock the bus any more >> > and the extra cycles for locking dominate (even in the >>!SMP case due >> > to the pessimization). >> > >> > So there seems to be something wrong with your benchmark. >>Locking the >> > bus for the SMP case always costs about 20+ cycles, but this hasn't >> > changed since RELENG_4 and mutexes can't be made much faster in the >> > uncontested case since their overhead is dominated by the bus lock >> > time. >> > >> >>Actually, I think his tests are accurate and bus locked instructions >>take an eternity on P4. See >>http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html >> >>For example, with your test above, I see 212 cycles for the UP case on >>a 2.53GHz P4. Replacing the atomic_store_rel_int(&slock, 0) with a >>simple slock = 0; reduces that count to 18 cycles. >> >>If its really safe to remove the xchg* from non-SMP atomic_store_rel*, >>then I think you should do it. Of course, that still leaves mutexes >>as very expensive on SMP (253 cycles on the 2.53GHz from above). >> >>Drew >> > > > I wonder if there is anything that can be done to make the locking more > efficient for the Xeon. Are there any other locking types that could > be used instead? > This might also explain why we are seeing much worse system call > performance under 4.7 in SMP versus UP. Here is a table of results > for some system call tests I ran. (The numbers are calls/s) Int 0x80 system calls are known to be extremely expensive on a P4. I think that Jeff Roberson measured them as taking 300 cycles on average. Some work was done on implementing the alternate sysenter/sysexit method, but I don't think it was ever finished. I think that it was shown to have a modest speed improvement, but there was still a lot of overhead that made it slow on a P4. There are other optimizations that can be done like having a shared page that lets you avoid calls like getpid and gettimeofday, but it opens some security risks that have to be dealt with. Scott