From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 19 02:45:57 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3E72A16A4E9 for ; Fri, 19 Sep 2003 02:45:57 -0700 (PDT) Received: from firecrest.mail.pas.earthlink.net (firecrest.mail.pas.earthlink.net [207.217.121.247]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6883043FE0 for ; Fri, 19 Sep 2003 02:45:55 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfmp3.dialup.mindspring.com ([165.247.219.35] helo=mindspring.com) by firecrest.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 1A0Hpj-0007Ou-00; Fri, 19 Sep 2003 02:45:50 -0700 Message-ID: <3F6AD006.B9A0B898@mindspring.com> Date: Fri, 19 Sep 2003 02:44:38 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Dan Nelson References: <3F6975BD.14CD05EE@mindspring.com> <20030918150311.GG51544@dan.emsphone.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a431457ddf4f51b9221a4bd3ef468b2ff3a7ce0e8f8d31aa3f350badd9bab72f9c350badd9bab72f9c cc: "freebsd-hackers@FreeBSD. ORG" Subject: Re: TCP information X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Sep 2003 09:45:57 -0000 Dan Nelson wrote: > > These types of statistics aren't kept. > > > > They usually do not make it into commercial product distributions for > > performance reasons, and because every byte added to a tcpcb > > structure is one byte less that can be used for something else. In > > practice, adding 134 bytes of statistics to a tcpcb would double its > > size and halve the number of simultaneous connections you would be > > able to support with the same amount of RAM in a given machine (as > > one example), if all of that memory had to come out of the same > > space, all other things being equal. > > tcpcb is currently 236 bytes though, and I don't imagine adding another > 8 bytes for an unsigned long "dropped packets" counter is going to kill > him. 236 is too large. We do stupid things like not compressing the state. For example, there is state that is unique to a listen socket and state that is unique to a connecting socket: this state should be in a union, so that tcpcb's are smaller. The kqueue bloat, particularly that for accept filters is another issue. So is the bloated credential and other information, most of which belongs in application-specific extension data chains that are *only* used when the aplication is active vs. the TCP connection (e.g. when IPSec is active, when kqueues have been registered, etc.). In 4.x, the structure size was 134 bytes (maybe 136; depends on which 4.x, I guess). The exra 100 bytes are cruft. Removing the cruft and compressing the state with a union would get you just under 128 bytes, so the current structure is almost 100% additional bloat for features that are rarely used, or are used, but are generally only in effect on a small number of the open sockets you are dealing with; very very annoying. > Deepak: if you really want stats, try adding a struct tcpstat to tcpcb > and hack all the netinet/tcp* code to update those whenever the global > tcpstat gets updated. You'll get all the info that netstat -s prints, > for each socket. *That* will definitely double the size of struct tcpcb :) The statistics gathering really should be macrotized, and a macro declaration added for this. You could then make it a compile-time option as to whether or not you gather the stats (default to off!). Assuming some FreeBSD committer is willing to stick the macros in the headers and the instrumentation points. If you did the extension structure chaining trick, noted above, you could even make it runtime adjustable; however, you would need to (1) add a timestamp to the structure to indicate the start time for statistics gathering and (2) walk the list of open sockets to add an extension for each of the already open sockets in the system. You could even have a seperate set of commands (I would suggest a psuedo device driver for doing it) to enable/start/stop/disable, so you can leave dormant extension structure lying around to control sample intervals separated by non-sample intervals of indeterminate length. Either way, though, I think you would want it to be "off" by default, just like you want the IPSEC to be "off" by default, given that it soaks up a huge default object per socket just by bing compiled in, even if the socket never actually uses the feature. 8-(. -- Terry