From owner-cvs-src@FreeBSD.ORG Wed Sep 6 14:15:32 2006 Return-Path: X-Original-To: cvs-src@FreeBSD.org Delivered-To: cvs-src@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 203CE16A4E2 for ; Wed, 6 Sep 2006 14:15:32 +0000 (UTC) (envelope-from silby@silby.com) Received: from relay03.pair.com (relay03.pair.com [209.68.5.17]) by mx1.FreeBSD.org (Postfix) with SMTP id ED0B843D4C for ; Wed, 6 Sep 2006 14:15:30 +0000 (GMT) (envelope-from silby@silby.com) Received: (qmail 16494 invoked from network); 6 Sep 2006 14:15:29 -0000 Received: from unknown (HELO localhost) (unknown) by unknown with SMTP; 6 Sep 2006 14:15:29 -0000 X-pair-Authenticated: 209.68.2.70 Date: Wed, 6 Sep 2006 09:16:03 -0500 (CDT) From: Mike Silbersack To: Gleb Smirnoff In-Reply-To: <200609061356.k86DuZ0w016069@repoman.freebsd.org> Message-ID: <20060906091204.B6691@odysseus.silby.com> References: <200609061356.k86DuZ0w016069@repoman.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/netinet in_pcb.c tcp_subr.c tcp_timer.c tcp_var.h X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Sep 2006 14:15:32 -0000 On Wed, 6 Sep 2006, Gleb Smirnoff wrote: > glebius 2006-09-06 13:56:35 UTC > > FreeBSD src repository > > Modified files: > sys/netinet in_pcb.c tcp_subr.c tcp_timer.c tcp_var.h > Log: > o Backout rev. 1.125 of in_pcb.c. It appeared to behave extremely > bad under high load. For example with 40k sockets and 25k tcptw > entries, connect() syscall can run for seconds. Debugging showed > that it iterates the cycle millions times and purges thousands of > tcptw entries at a time. > Besides practical unusability this change is architecturally > wrong. First, in_pcblookup_local() is used in connect() and bind() > syscalls. No stale entries purging shouldn't be done here. Second, > it is a layering violation. So you're returning to the behavior where the system chokes and stops all outbound TCP connections because everything is in the timewait state? There has to be a way to fix the problem without removing this heuristic entirely. How did you run your tests? > o Return back the tcptw purging cycle to tcp_timer_2msl_tw(), > that was removed in rev. 1.78 by rwatson. The commit log of this > revision tells nothing about the reason cycle was removed. Now > we need this cycle, since major cleaner of stale tcptw structures > is removed. Looks good, this is probably the reason for the code in in_pcb behaving so poorly. Did you test just this change alone to see if it solved the problem that you were seeing? Mike "Silby" Silbersack