From owner-freebsd-smp Tue Jun 24 03:55:36 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id DAA08458 for smp-outgoing; Tue, 24 Jun 1997 03:55:36 -0700 (PDT) Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.19]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id DAA08445 for ; Tue, 24 Jun 1997 03:55:15 -0700 (PDT) Received: (from bde@localhost) by godzilla.zeta.org.au (8.8.5/8.6.9) id UAA16458; Tue, 24 Jun 1997 20:37:15 +1000 Date: Tue, 24 Jun 1997 20:37:15 +1000 From: Bruce Evans Message-Id: <199706241037.UAA16458@godzilla.zeta.org.au> To: nnd@itfs.nsk.su, smp@freebsd.org Subject: Re: SMP_PRIVPAGES Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > After that I've decide to try the next patch on >/sys/i386/isa/npx.c: > >--- npx.c Tue Jun 24 14:42:03 1997 >+++ npx.c.orig Tue Jun 24 09:02:56 1997 >@@ -413,7 +413,7 @@ > } > npxinit(__INITIAL_NPXCW__); > >-#if defined(I586_CPU) >+#if defined(I586_CPU) && !defined(SMP) > /* FPU not working under SMP yet */ > if (cpu_class == CPUCLASS_586 && npx_ex16) { > if (!(dvp->id_flags & NPX_DISABLE_I586_OPTIMIZED_BCOPY)) { I thought that copying through the FPU (not the FPU itself) still doesn't work yet, but it seems that the private page changes have automagically fixed it - there is now at least a chance that direct accesses to _npxproc and _curpcb work right for the same reasons that direct accesses to C variables npxproc and curpcb work right. >and ... this is a result: > >dd if=/dev/zero of=/dev/null bs=1m count=1000 >1000+0 records in >1000+0 records out >1048576000 bytes transferred in 11.985571 secs (87486529 bytes/sec) > > The last number is still lose to NON-SMP case, but it seems Did you get > 110MB/sec for non-SMP? >to me that there is another place to gain speed for bzero/bcopy - > >in file /sys/i386/i386/support.s there is a "label" kernel_fpu_lock: > >As I can (NOT?) understand it can be duplicated for each CPU >and may be this can give us some more speedup of bzero/bcopy ? kernel_fpu_lock may already work under SMP too, except for the obvious problem that the bus is not locked when it is accessed. It should be an array instead of a per-process variable, since it prevents reentry by interrupt handlers and interrupt handling is unrelated to processes. I don't think there large speedups to be gained here, since multiple large concurrent bcopy/bzero's are probably rare, and anyway, a single fast bzero/bcopy or a couple of slow ones will saturate the memory bus. Bruce