Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Jun 2001 09:33:52 -0700 (PDT)
From:      Matt Dillon <dillon@earth.backplane.com>
To:        Alfred Perlstein <bright@sneakerz.org>
Cc:        Mike Silbersack <silby@FreeBSD.ORG>, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG, jlemon@FreeBSD.ORG, bmilekic@FreeBSD.ORG
Subject:   Re: cvs commit: src/sys/netinet tcp_input.c tcp_output.c tcp_subr.c tcp_timer.c tcp_usrreq.c tcp_var.h
Message-ID:  <200106231633.f5NGXqp72502@earth.backplane.com>
References:  <200106230321.f5N3Llv09510@freefall.freebsd.org> <20010623102801.F57058@sneakerz.org>

next in thread | previous in thread | raw e-mail | index | archive | help

:* Mike Silbersack <silby@FreeBSD.org> [010622 22:21] wrote:
:> silby       2001/06/22 20:21:47 PDT
:> 
:>   Modified files:
:>     sys/netinet          tcp_input.c tcp_output.c tcp_subr.c 
:>                          tcp_timer.c tcp_usrreq.c tcp_var.h 
:>   Log:
:>   Eliminate the allocation of a tcp template structure for each
:>   connection.  The information contained in a tcptemp can be
:>   reconstructed from a tcpcb when needed.
:
:I may have missed it, but did you guys happen to run a perf test
:on this along the same lines as the excellent work done to benchmark
:the new mbuf allocator?
:
:My main concern is that a simple bcopy is cheaper than digging into
:the inpcb/tcpcb to fill in the packet info.
:
:There's patches avaiable to use a smaller zone-like allocation
:strategy to conserve storage space that we may want to use instead
:of eliminating it completely in case there is noticeable performance
:penalties.
:
:Could someone generate those lovely numbers we saw earlier this
:week with and without this patch?
:
:-Alfred

    I think the new way might even be faster.  If the in_pseudo() call could
    be optimized, the new way would definitely be faster.  The problem with
    bcopy() is that it still requires memory reads, and even when a memory
    read is in the L1 cache it still represents a hicup (at least on IA32)
    when combined with memory writes.  That and bcopy() has rather severe
    overhead when all its doing is copying 20x2 bytes.  The arguments being
    passed to bcopy() themselves eat up 12 bytes for each call, for example.
    The alignment and size tests within bcopy() are equivalent to a couple
    of reads, etc etc etc.

    The new code makes up for the overhead by storing mostly constants 
    into the destination buffer, which is six times faster then a memory
    copy.

    bcopy() turns out to be *real* slow in this case.  Check it out.  Now,
    I suppose you could replace the bcopy()'s with inline structural copies,
    and it would probably be faster, but even in this case I think the new
    code would still be very close if in_pseudo() were optimized out.

						-Matt

mobile:/home/dillon> ./mt
Test1 - bcopy 20x2 bytes        223.24 nS/loop
Test2 - manual load data         32.65 nS/loop
Test3 - man load w/ptrs          43.14 nS/loop


/*
 * MEMTEST.C
 */

#include <sys/types.h>
#include <sys/time.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>

#define LOOPS	1000000

struct DBuf {
    int	x[5];
    int	y[5];
    char notonsamecacheline[256];
} DBuf, Template, Template2, *GlobPtr = &Template2;

static void showtimes(struct timeval *t1, struct timeval *t2, const char *str, int loops);
static void test1(void);
static void test2(void);
static void test3(struct DBuf *template);

int
main(int ac, char **av)
{
    struct timeval tbeg;
    struct timeval tend;
    int i;

    test1();
    gettimeofday(&tbeg, NULL);
    gettimeofday(&tend, NULL);
    for (i = LOOPS; i; --i)
	test1();
    gettimeofday(&tend, NULL);
    showtimes(&tbeg, &tend, "Test1 - bcopy 20x2 bytes", LOOPS);

    test2();
    gettimeofday(&tbeg, NULL);
    gettimeofday(&tend, NULL);
    for (i = LOOPS; i; --i)
	test2();
    gettimeofday(&tend, NULL);
    showtimes(&tbeg, &tend, "Test2 - manual load data", LOOPS);

    test3(&Template);
    gettimeofday(&tbeg, NULL);
    gettimeofday(&tend, NULL);
    for (i = LOOPS; i; --i)
	test3(&Template);
    gettimeofday(&tend, NULL);
    showtimes(&tbeg, &tend, "Test3 - man load w/ptrs ", LOOPS);
    return(0);
}

static void
showtimes(struct timeval *t1, struct timeval *t2, const char *str, int loops)
{
    long us;

    us = (t2->tv_usec + 1000000 - t1->tv_usec) + 
	    (t2->tv_sec - t1->tv_sec - 1) * 1000000;
    printf("%s\t%6.2f nS/loop\n", str, (double)us * 1000.0 / (double)loops);
}

static void
test1(void)
{
    bcopy(Template.x, DBuf.x, sizeof(DBuf.x));
    bcopy(Template.y, DBuf.y, sizeof(DBuf.y));
}

static void
test2(void)
{
    DBuf.x[0] = 0;
    DBuf.x[1] = 0;
    DBuf.x[2] = 0;
    DBuf.x[3] = 0;
    DBuf.x[4] = 0;

    DBuf.y[0] = 0;
    DBuf.y[1] = 0;
    DBuf.y[2] = 0;
    DBuf.y[3] = 0;
    DBuf.y[4] = 0;
}

static void
test3(struct DBuf *template)
{
    DBuf.x[0] = 0;
    DBuf.x[1] = GlobPtr->x[1];
    DBuf.x[2] = template->x[2];
    DBuf.x[3] = 0;
    DBuf.x[4] = 0;

    DBuf.y[0] = template->y[0];
    DBuf.y[1] = template->y[1];
    DBuf.y[2] = template->y[2];
    DBuf.y[3] = 5;
    DBuf.y[4] = 0;
}


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200106231633.f5NGXqp72502>