From owner-freebsd-current Fri Apr 5 15:21:33 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id PAA20874 for current-outgoing; Fri, 5 Apr 1996 15:21:33 -0800 (PST) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id PAA20861 for ; Fri, 5 Apr 1996 15:21:17 -0800 (PST) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id QAA25117; Fri, 5 Apr 1996 16:14:00 -0700 From: Terry Lambert Message-Id: <199604052314.QAA25117@phaeton.artisoft.com> Subject: Re: fast memory copy for large data sizes To: paul@netcraft.co.uk Date: Fri, 5 Apr 1996 16:14:00 -0700 (MST) Cc: davidg@Root.COM, asami@cs.berkeley.edu, current@FreeBSD.org, nisha@cs.berkeley.edu, tege@matematik.su.se, hasty@rah.star-gate.com In-Reply-To: <199604051156.MAA00692@originat.demon.co.uk> from "Paul Richards" at Apr 5, 96 12:56:29 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-current@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk > > This would be a big lose in the kernel since just about all bcopy's fall > > into this range _except_ disk I/O block copies. I know this can be done better > > using other techniques (non-FP, see hackers mail from about 3 months ago). You > > should talk to John Dyson who's also working on this. > > A quick check of the size would probably help and use the original > method for small copies. Run a benchmark on such a scheme and see what > happens. > > Anyway, I had another thought, do we save the fp registers across > context switches? I seem to remember that we don't always and instead > save them when something tries to do FP operations, I might be imagining > this but if it's true increased use of the fp regs is going to impact > context switching. This is true. I also don't see the code seriously dealing with misalignment between wource and target, which need to be aligned on the same boundry for everything but the initial and final sub-increment sized moves. Otherwise the cache lines will still require multiple fetches and stores to make work (a design flaw in the P5/P6, which could easily have had special purpose registers for unaligned access if two increments could be in cache at the same time in a barrel shifter or similar hardware zero-clock cache line access rotor). Often it's better if the alignment isn't there to fallback to the old code. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.