From owner-freebsd-net Wed Dec 16 10:18:13 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id KAA20185 for freebsd-net-outgoing; Wed, 16 Dec 1998 10:18:13 -0800 (PST) (envelope-from owner-freebsd-net@FreeBSD.ORG) Received: from netrinsics.com ([210.74.175.32]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id KAA20079 for ; Wed, 16 Dec 1998 10:17:49 -0800 (PST) (envelope-from robinson@netrinsics.com) Received: (from robinson@localhost) by netrinsics.com (8.8.8/8.8.7) id CAA00532; Thu, 17 Dec 1998 02:13:10 GMT (envelope-from robinson) Date: Thu, 17 Dec 1998 02:13:10 GMT From: Michael Robinson Message-Id: <199812170213.CAA00532@netrinsics.com> To: dot@dotat.at Subject: Re: MLEN < write length < MINCLSIZE "bug" Cc: fenner@parc.xerox.com, freebsd-net@FreeBSD.ORG In-Reply-To: Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Tony Finch writes: >Having read this bit of the red demon book recently (although I can't >find the precise reference again at the moment), ISTR that the >heuristic is that since allocating an mbuf with a cluster takes two >allocations, MINCLSIZE is just bigger than two mbufs. So it is as I suspected. MINCLSIZE is a parameter for a classic time/space performance tradeoff. A small MINCLSIZE gives you fewer mbuf allocations, but with lots of unused space in mbuf clusters. A big MINCLSIZE gives you more mbuf allocations, and more copy operations, but with more efficient memory use. As such, MINCLSIZE seems like a good candidate for a sysctl (a patch for which can be found at the end of this message). People running heavily-used dedicated network servers may find it useful to be able to tune this parameter. It seems to me that this is largely orthogonal, though, to the issue of segmenting writes in sosend before sending them to the protocol. That is more an issue of hardware speed vs. kernel speed. For example, on a dialup PPP connection, the additional packet header overhead vastly outweighs the mostly non-existent parallelism of the serial interface. However, a 100Mhz 64-bit PCI gigabit Ethernet controller can process buffers faster than the CPU can spit them out, so segmenting the writes could result in significant improvements in throughput and latency. So I think this behavior is something that one should be able to turn on and off. The question is with what granularity: kernel, interface, or socket? A socket option would be trivial to implement, but wouldn't work for existing code until it was retrofitted in. A sysctl would also be trivial to implement, would work with existing code, but the granularity is probably to coarse. A new option for ifconfig would work at the interface level, but I don't know if that's what people want or will accept. Comments? -Michael Robinson Index: sys/mbuf.h =================================================================== RCS file: /cdrom/CVSROOT/src/sys/sys/mbuf.h,v retrieving revision 1.18 diff -u -r1.18 mbuf.h --- mbuf.h 1996/08/19 18:30:15 1.18 +++ mbuf.h 1998/12/17 01:39:44 @@ -52,7 +52,8 @@ #define MLEN (MSIZE - sizeof(struct m_hdr)) /* normal data len */ #define MHLEN (MLEN - sizeof(struct pkthdr)) /* data len w/pkthdr */ -#define MINCLSIZE (MHLEN + MLEN) /* smallest amount to put in cluster */ +extern int minclsize; +#define MINCLSIZE minclsize /* smallest amount to put in cluster */ #define M_MAXCOMPRESS (MHLEN / 2) /* max amount to copy for compression */ /* Index: sys/sysctl.h =================================================================== RCS file: /cdrom/CVSROOT/src/sys/sys/sysctl.h,v retrieving revision 1.48.2.2 diff -u -r1.48.2.2 sysctl.h --- sysctl.h 1997/08/30 14:08:56 1.48.2.2 +++ sysctl.h 1998/12/17 01:39:58 @@ -231,6 +231,7 @@ #define KERN_PS_STRINGS 32 /* int: address of PS_STRINGS */ #define KERN_USRSTACK 33 /* int: address of USRSTACK */ #define KERN_MAXID 34 /* number of valid kern ids */ +#define KERN_MINCLSIZE 35 /* minumum size for mbuf cluster */ #define CTL_KERN_NAMES { \ { 0, 0 }, \ @@ -267,6 +268,7 @@ { "maxsockbuf", CTLTYPE_INT }, \ { "ps_strings", CTLTYPE_INT }, \ { "usrstack", CTLTYPE_INT }, \ + { "minclsize", CTLTYPE_INT }, \ } /* Index: kern/uipc_socket.c =================================================================== RCS file: /cdrom/CVSROOT/src/sys/kern/uipc_socket.c,v retrieving revision 1.20.2.5 diff -u -r1.20.2.5 uipc_socket.c --- uipc_socket.c 1998/03/02 07:58:12 1.20.2.5 +++ uipc_socket.c 1998/12/17 01:40:26 @@ -53,6 +53,9 @@ static int somaxconn = SOMAXCONN; SYSCTL_INT(_kern, KERN_SOMAXCONN, somaxconn, CTLFLAG_RW, &somaxconn, 0, ""); +int minclsize = (MHLEN + MLEN); +SYSCTL_INT(_kern, KERN_MINCLSIZE, minclsize, CTLFLAG_RW, &minclsize, 0, ""); + /* * Socket operation routines. * These routines are called by the routines in To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message