Date: Sat, 23 Jun 2001 09:33:52 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: Alfred Perlstein <bright@sneakerz.org> Cc: Mike Silbersack <silby@FreeBSD.ORG>, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG, jlemon@FreeBSD.ORG, bmilekic@FreeBSD.ORG Subject: Re: cvs commit: src/sys/netinet tcp_input.c tcp_output.c tcp_subr.c tcp_timer.c tcp_usrreq.c tcp_var.h Message-ID: <200106231633.f5NGXqp72502@earth.backplane.com> References: <200106230321.f5N3Llv09510@freefall.freebsd.org> <20010623102801.F57058@sneakerz.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:* Mike Silbersack <silby@FreeBSD.org> [010622 22:21] wrote: :> silby 2001/06/22 20:21:47 PDT :> :> Modified files: :> sys/netinet tcp_input.c tcp_output.c tcp_subr.c :> tcp_timer.c tcp_usrreq.c tcp_var.h :> Log: :> Eliminate the allocation of a tcp template structure for each :> connection. The information contained in a tcptemp can be :> reconstructed from a tcpcb when needed. : :I may have missed it, but did you guys happen to run a perf test :on this along the same lines as the excellent work done to benchmark :the new mbuf allocator? : :My main concern is that a simple bcopy is cheaper than digging into :the inpcb/tcpcb to fill in the packet info. : :There's patches avaiable to use a smaller zone-like allocation :strategy to conserve storage space that we may want to use instead :of eliminating it completely in case there is noticeable performance :penalties. : :Could someone generate those lovely numbers we saw earlier this :week with and without this patch? : :-Alfred I think the new way might even be faster. If the in_pseudo() call could be optimized, the new way would definitely be faster. The problem with bcopy() is that it still requires memory reads, and even when a memory read is in the L1 cache it still represents a hicup (at least on IA32) when combined with memory writes. That and bcopy() has rather severe overhead when all its doing is copying 20x2 bytes. The arguments being passed to bcopy() themselves eat up 12 bytes for each call, for example. The alignment and size tests within bcopy() are equivalent to a couple of reads, etc etc etc. The new code makes up for the overhead by storing mostly constants into the destination buffer, which is six times faster then a memory copy. bcopy() turns out to be *real* slow in this case. Check it out. Now, I suppose you could replace the bcopy()'s with inline structural copies, and it would probably be faster, but even in this case I think the new code would still be very close if in_pseudo() were optimized out. -Matt mobile:/home/dillon> ./mt Test1 - bcopy 20x2 bytes 223.24 nS/loop Test2 - manual load data 32.65 nS/loop Test3 - man load w/ptrs 43.14 nS/loop /* * MEMTEST.C */ #include <sys/types.h> #include <sys/time.h> #include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #define LOOPS 1000000 struct DBuf { int x[5]; int y[5]; char notonsamecacheline[256]; } DBuf, Template, Template2, *GlobPtr = &Template2; static void showtimes(struct timeval *t1, struct timeval *t2, const char *str, int loops); static void test1(void); static void test2(void); static void test3(struct DBuf *template); int main(int ac, char **av) { struct timeval tbeg; struct timeval tend; int i; test1(); gettimeofday(&tbeg, NULL); gettimeofday(&tend, NULL); for (i = LOOPS; i; --i) test1(); gettimeofday(&tend, NULL); showtimes(&tbeg, &tend, "Test1 - bcopy 20x2 bytes", LOOPS); test2(); gettimeofday(&tbeg, NULL); gettimeofday(&tend, NULL); for (i = LOOPS; i; --i) test2(); gettimeofday(&tend, NULL); showtimes(&tbeg, &tend, "Test2 - manual load data", LOOPS); test3(&Template); gettimeofday(&tbeg, NULL); gettimeofday(&tend, NULL); for (i = LOOPS; i; --i) test3(&Template); gettimeofday(&tend, NULL); showtimes(&tbeg, &tend, "Test3 - man load w/ptrs ", LOOPS); return(0); } static void showtimes(struct timeval *t1, struct timeval *t2, const char *str, int loops) { long us; us = (t2->tv_usec + 1000000 - t1->tv_usec) + (t2->tv_sec - t1->tv_sec - 1) * 1000000; printf("%s\t%6.2f nS/loop\n", str, (double)us * 1000.0 / (double)loops); } static void test1(void) { bcopy(Template.x, DBuf.x, sizeof(DBuf.x)); bcopy(Template.y, DBuf.y, sizeof(DBuf.y)); } static void test2(void) { DBuf.x[0] = 0; DBuf.x[1] = 0; DBuf.x[2] = 0; DBuf.x[3] = 0; DBuf.x[4] = 0; DBuf.y[0] = 0; DBuf.y[1] = 0; DBuf.y[2] = 0; DBuf.y[3] = 0; DBuf.y[4] = 0; } static void test3(struct DBuf *template) { DBuf.x[0] = 0; DBuf.x[1] = GlobPtr->x[1]; DBuf.x[2] = template->x[2]; DBuf.x[3] = 0; DBuf.x[4] = 0; DBuf.y[0] = template->y[0]; DBuf.y[1] = template->y[1]; DBuf.y[2] = template->y[2]; DBuf.y[3] = 5; DBuf.y[4] = 0; } To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200106231633.f5NGXqp72502>