From owner-freebsd-hackers Mon Apr 15 14:06:02 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id OAA06648 for hackers-outgoing; Mon, 15 Apr 1996 14:06:02 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id OAA06633 for ; Mon, 15 Apr 1996 14:05:57 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id OAA09539; Mon, 15 Apr 1996 14:01:51 -0700 From: Terry Lambert Message-Id: <199604152101.OAA09539@phaeton.artisoft.com> Subject: Re: Pentium fast copy? To: msmith@atrad.adelaide.edu.au (Michael Smith) Date: Mon, 15 Apr 1996 14:01:51 -0700 (MST) Cc: peter@nmti.com, hackers@FreeBSD.ORG In-Reply-To: <199604140416.NAA11976@genesis.atrad.adelaide.edu.au> from "Michael Smith" at Apr 14, 96 01:46:21 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk [ ... Lai/Baker Pentium bcopy notes ... ] > Most of the implementations that have been thrown around here tend to > average at about 40M/sec. There have been some significantly faster under > certain specialised circumstances, but they tend to perform poorly on older > processors, or they fall foul of some of the caching policies imposed > by some motherboards, or they impose extra overhead elsewhere in the > system (the most common complaint is that context-switches have to become > more complex to handle the technique). > > I'd be really interested to see what sort of hardware they're using that > has 160M/sec of memory bandwidth. Unless they're running 100% static > RAM, I suspect they've never actually implemented their code on a practical > scale 8( I would suspect that a large part of the performance is in ensuring source and target addresses of the same quad alignment (using the 8 byte floating point copy) or the same dword alignment (using the integer register cache line prefetch method). The 160M/sec number is exceedingly optimististic. I would expect that we would be unable to see that performance until we could enable the alignment trap on a P5/P6 and fail unaligned memory access like a decent RISC chip, without causing the kernel to panic. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.