From owner-freebsd-net@FreeBSD.ORG  Fri Jun  4 07:30:53 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 15CAB106564A;
	Fri,  4 Jun 2010 07:30:53 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au
	[211.29.132.189])
	by mx1.freebsd.org (Postfix) with ESMTP id A29C68FC16;
	Fri,  4 Jun 2010 07:30:52 +0000 (UTC)
Received: from c122-106-160-243.carlnfd1.nsw.optusnet.com.au
	(c122-106-160-243.carlnfd1.nsw.optusnet.com.au [122.106.160.243])
	by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o547UlEn011621
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 4 Jun 2010 17:30:49 +1000
Date: Fri, 4 Jun 2010 17:30:47 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: George Neville-Neil <gnn@freebsd.org>
In-Reply-To: <54198502-A432-4FA7-9176-0AB85D809597@freebsd.org>
Message-ID: <20100604165857.D28688@delplex.bde.org>
References: <0BC7AD09-B627-4F6A-AD93-B7E794A78CA2@freebsd.org>
	<20100603181439.Q27699@delplex.bde.org>
	<54198502-A432-4FA7-9176-0AB85D809597@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: net@freebsd.org
Subject: Re: A slight change to tcpip_fillheaders...
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Jun 2010 07:30:53 -0000

On Thu, 3 Jun 2010, George Neville-Neil wrote:

> For what it's worth I checked the assembly for both versions as well.  The bzero
> version does not inline, as you said, and the original does do a move of
> 0 for each and every field, again on Nehalem with our default version of
> gcc.
>
> I think that for now I will leave this alone, the code is clear either way,
> and what I cared about was finding out if the code could be sped up.

I couldn't find any options to make gcc-4.2.1 coalesce the assignments in the
following simple example:

%%%
struct foo {
 	char x;
 	char y;
};

xx(struct foo *fp)
{
 	fp->x = 0;
 	fp->y = 0;
}
%%%

The non-coalesced version may be a bottleneck in the instruction stream
in some relatively rare cases.  The worst case seems to be non-coalescing
8 8-bit variables on a 64-bit arch.  (gcc does do the coalescing for
bit-fields, else the worst cast would be 64 assignments of 1-bit bit-fields
generating 3*64 micro-instructions (3 for each assignment to preserve
nearby bits).)  But since there are no dependencies between these assignments
they are easy to schedule, and 8 instructions isn't many (they probably take
4 cycles).

struct ip has 11 separate fields (after combining the bit-fields).  11
instructions for these is a few, the extern bzero() takes almost that
many just to call; then on i386 it takes 12 instructions internally
for administrivia and 5 instructions internally to do the work; on
amd64 it takes 7 instructions interally for administivia and 6
instructions internally to do the work (amd64 bzero actually does more
assignments internally -- ones of size 8,8,1,1,1,1 instead of ones of
size 4,4,4,4,4; it could do fewer, but only at a cost of more for
administrivia).  The function call instructions and other adminstrivia
instructions are almost all heavyweight ones with strong dependencies,
so you would be lucky if they ran in 25 cycles where the 11 asignments
may run in 5.5 cycles.  But 25 cycles isn't many, so the difference is
usually insignificant.  Since this is initialization code, it may involve
a cache miss or two, taking several hundred cycles each...

Bruce