From owner-freebsd-current  Fri Apr  5 15:21:33 1996
Return-Path: owner-current
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id PAA20874
          for current-outgoing; Fri, 5 Apr 1996 15:21:33 -0800 (PST)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id PAA20861
          for <current@FreeBSD.org>; Fri, 5 Apr 1996 15:21:17 -0800 (PST)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id QAA25117; Fri, 5 Apr 1996 16:14:00 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199604052314.QAA25117@phaeton.artisoft.com>
Subject: Re: fast memory copy for large data sizes
To: paul@netcraft.co.uk
Date: Fri, 5 Apr 1996 16:14:00 -0700 (MST)
Cc: davidg@Root.COM, asami@cs.berkeley.edu, current@FreeBSD.org,
        nisha@cs.berkeley.edu, tege@matematik.su.se, hasty@rah.star-gate.com
In-Reply-To: <199604051156.MAA00692@originat.demon.co.uk> from "Paul Richards" at Apr 5, 96 12:56:29 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-current@FreeBSD.org
X-Loop: FreeBSD.org
Precedence: bulk

> >    This would be a big lose in the kernel since just about all bcopy's fall
> > into this range _except_ disk I/O block copies. I know this can be done better
> > using other techniques (non-FP, see hackers mail from about 3 months ago). You
> > should talk to John Dyson who's also working on this.
> 
> A quick check of the size would probably help and use the original
> method for small copies. Run a benchmark on such a scheme and see what
> happens.
> 
> Anyway, I had another thought, do we save the fp registers across
> context switches? I seem to remember that we don't always and instead
> save them when something tries to do FP operations, I might be imagining
> this but if it's true increased use of the fp regs is going to impact
> context switching.

This is true.

I also don't see the code seriously dealing with misalignment between
wource and target, which need to be aligned on the same boundry for
everything but the initial and final sub-increment sized moves.

Otherwise the cache lines will still require multiple fetches and
stores to make work (a design flaw in the P5/P6, which could easily
have had special purpose registers for unaligned access if two
increments could be in cache at the same time in a barrel shifter
or similar hardware zero-clock cache line access rotor).

Often it's better if the alignment isn't there to fallback to the
old code.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.