From owner-cvs-src@FreeBSD.ORG Thu Mar 27 00:15:12 2003 Return-Path: Delivered-To: cvs-src@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 05EED37B435; Thu, 27 Mar 2003 00:15:06 -0800 (PST) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 823A1440D2; Thu, 27 Mar 2003 00:07:27 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id TAA15664; Thu, 27 Mar 2003 19:07:16 +1100 Date: Thu, 27 Mar 2003 19:07:15 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Mike Silbersack In-Reply-To: <20030326225530.G2075@odysseus.silby.com> Message-ID: <20030327180247.D1825@gamplex.bde.org> References: <20030326225530.G2075@odysseus.silby.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Status: No, hits=-26.1 required=5.0 tests=AWL,EMAIL_ATTRIBUTION,IN_REP_TO,QUOTED_EMAIL_TEXT, REFERENCES,REPLY_WITH_QUOTES version=2.50 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) cc: cvs-src@FreeBSD.org cc: src-committers@FreeBSD.org cc: cvs-all@FreeBSD.org cc: Nate Lawson Subject: Re: Checksum/copy (was: Re: cvs commit: src/sys/netinet ip_output.c) X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Mar 2003 08:15:16 -0000 On Wed, 26 Mar 2003, Mike Silbersack wrote: > On Wed, 26 Mar 2003, Nate Lawson wrote: > > > I don't want to hijack the thread too much, but has thought gone into a > > combined checksum and copy function? The first mention I can remember of > > this is in RFC 817 p. 19-20. Is this RFC old? Combined checksum and copy hasn't been a larger optimization since L1 caches became large enough, since to a first approximation, everything is dominated by memory bandwidth and another pass to calculate the checksum is free because copying left all the data in the L1 cache. > Heh, I don't think anyone has. What actually would make sense is for > someone who feels like doing ASM timing to look at our bcopy routines / > etc. I spent a lot of time on this about 7 years ago. See ~bde/cache on freefall for old versions of programs that try lots of different copy/read/write checksum methods. Better hardware made the differences between various methods relatively small. One can probably do better (50%?) for largish (1K+ ?) buffers using SSE instructions on i386's now. > On my Mobile Celeron, a for (i = 0; i < max; i++) array[i]=0 runs > faster than bzero. :( Saved data from my benchmarks show that bzero (stosl) was OK on 486's, poor on original Pentiums, OK on K6-1's, best by far on second generation Celerons (ones like PII) and poor on Athlon XP's (but not as relatively bad as on original Pentiums). The C loop could easily be competitive with hand-unrolled asm that uses the same instruction to access memory (no SSE etc) for large buffers, but I would expect it to be slower for small buffers since it does an unnecesarily large number of instructions per memory access. But maybe these get pipelined perfectly so that everything is limited by memory, while stosl has extra limits. Bruce