From owner-cvs-all@FreeBSD.ORG Sun Dec 24 11:08:49 2006 Return-Path: X-Original-To: cvs-all@FreeBSD.org Delivered-To: cvs-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B92C416A403; Sun, 24 Dec 2006 11:08:49 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1-3.pacific.net.au [61.8.2.210]) by mx1.freebsd.org (Postfix) with ESMTP id 780CE13C46D; Sun, 24 Dec 2006 11:08:49 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.2.162]) by mailout1.pacific.net.au (Postfix) with ESMTP id 074BB5A06B3; Sun, 24 Dec 2006 22:08:48 +1100 (EST) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (Postfix) with ESMTP id ACBD98C0B; Sun, 24 Dec 2006 22:08:46 +1100 (EST) Date: Sun, 24 Dec 2006 22:08:45 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Robert Watson In-Reply-To: <20061224085231.Y37996@fledge.watson.org> Message-ID: <20061224211712.W25632@delplex.bde.org> References: <20061223213014.U35809@fledge.watson.org> <458E11AE.2000004@samsco.org> <20061224085231.Y37996@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: cvs-src@FreeBSD.org, Scott Long , src-committers@FreeBSD.org, cvs-all@FreeBSD.org, John Polstra Subject: Re: cvs commit: src/sys/dev/bge if_bge.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Dec 2006 11:08:49 -0000 On Sun, 24 Dec 2006, Robert Watson wrote: >> From the perspective of optimizing these particular paths, small packet >> sizes > best reveal processing overhead up to about the TCP/socket buffer layer on > modern hardware (DMA, etc). The uni/bidirectional axis is interesting > because it helps reveal the impact of the direct dispatch vs. netisr dispatch > choice for the IP layer with respect to exercising parallelism. I didn't > explicitly measure CPU, but as the configurations max out the CPUs in my test > bed, typically any significant CPU reduction is measurable in an improvement > in throughput. For example, I was easily able to measure the CPU reduction > in switching from using the socket reference to the file descriptor reference > in sosend() on small packet transmit, which was a relatively minor functional > change in locking and reference counting. Be careful with micro-optimizations. I saw a single change (adding about 1K in unrelated code that is never executed) give a pessimization of 15% for tx bge (from 360 kpps to 300 kpps). Before that I was trying harder than now to find optimizations involving avoiding copying, and thought that I had increased the speed from 330 kpps to 360 kpps by removing things, but I may have just increased the speed by moving cache phenomena. The phenomena in this case seem to be related to instructions more than data and I suspect that they are very MD. The machine that has them doesn't support APIC or ACPI, so hwpmc cannot do anything useful on it. Bruce