From owner-freebsd-net@FreeBSD.ORG Fri Jun 4 07:30:53 2010 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 15CAB106564A; Fri, 4 Jun 2010 07:30:53 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by mx1.freebsd.org (Postfix) with ESMTP id A29C68FC16; Fri, 4 Jun 2010 07:30:52 +0000 (UTC) Received: from c122-106-160-243.carlnfd1.nsw.optusnet.com.au (c122-106-160-243.carlnfd1.nsw.optusnet.com.au [122.106.160.243]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o547UlEn011621 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 4 Jun 2010 17:30:49 +1000 Date: Fri, 4 Jun 2010 17:30:47 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: George Neville-Neil In-Reply-To: <54198502-A432-4FA7-9176-0AB85D809597@freebsd.org> Message-ID: <20100604165857.D28688@delplex.bde.org> References: <0BC7AD09-B627-4F6A-AD93-B7E794A78CA2@freebsd.org> <20100603181439.Q27699@delplex.bde.org> <54198502-A432-4FA7-9176-0AB85D809597@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: net@freebsd.org Subject: Re: A slight change to tcpip_fillheaders... X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Jun 2010 07:30:53 -0000 On Thu, 3 Jun 2010, George Neville-Neil wrote: > For what it's worth I checked the assembly for both versions as well. The bzero > version does not inline, as you said, and the original does do a move of > 0 for each and every field, again on Nehalem with our default version of > gcc. > > I think that for now I will leave this alone, the code is clear either way, > and what I cared about was finding out if the code could be sped up. I couldn't find any options to make gcc-4.2.1 coalesce the assignments in the following simple example: %%% struct foo { char x; char y; }; xx(struct foo *fp) { fp->x = 0; fp->y = 0; } %%% The non-coalesced version may be a bottleneck in the instruction stream in some relatively rare cases. The worst case seems to be non-coalescing 8 8-bit variables on a 64-bit arch. (gcc does do the coalescing for bit-fields, else the worst cast would be 64 assignments of 1-bit bit-fields generating 3*64 micro-instructions (3 for each assignment to preserve nearby bits).) But since there are no dependencies between these assignments they are easy to schedule, and 8 instructions isn't many (they probably take 4 cycles). struct ip has 11 separate fields (after combining the bit-fields). 11 instructions for these is a few, the extern bzero() takes almost that many just to call; then on i386 it takes 12 instructions internally for administrivia and 5 instructions internally to do the work; on amd64 it takes 7 instructions interally for administivia and 6 instructions internally to do the work (amd64 bzero actually does more assignments internally -- ones of size 8,8,1,1,1,1 instead of ones of size 4,4,4,4,4; it could do fewer, but only at a cost of more for administrivia). The function call instructions and other adminstrivia instructions are almost all heavyweight ones with strong dependencies, so you would be lucky if they ran in 25 cycles where the 11 asignments may run in 5.5 cycles. But 25 cycles isn't many, so the difference is usually insignificant. Since this is initialization code, it may involve a cache miss or two, taking several hundred cycles each... Bruce