From owner-freebsd-net@FreeBSD.ORG Sun Apr 4 15:03:11 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EC20C16A4CE for ; Sun, 4 Apr 2004 15:03:11 -0700 (PDT) Received: from out003.verizon.net (out003pub.verizon.net [206.46.170.103]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9D3D743D39 for ; Sun, 4 Apr 2004 15:03:09 -0700 (PDT) (envelope-from cswiger@mac.com) Received: from mac.com ([68.160.247.127]) by out003.verizon.net (InterMail vM.5.01.06.06 201-253-122-130-106-20030910) with ESMTP id <20040404220308.EEIR6671.out003.verizon.net@mac.com>; Sun, 4 Apr 2004 17:03:08 -0500 Message-ID: <4070860F.6030701@mac.com> Date: Sun, 04 Apr 2004 18:02:55 -0400 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7b) Gecko/20040316 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Brandon Erhart References: <6.0.2.0.2.20040404152043.01c83320@mx1.erhartgroup.com> In-Reply-To: <6.0.2.0.2.20040404152043.01c83320@mx1.erhartgroup.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at out003.verizon.net from [68.160.247.127] at Sun, 4 Apr 2004 17:03:08 -0500 cc: freebsd-net@freebsd.org Subject: Re: FIN_WAIT_[1,2] and LAST_ACK X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Apr 2004 22:03:12 -0000 Brandon Erhart wrote: > I am writing a network application that mirrors a given website (such as > a suped-up "wget"). I use a lot of FDs, and was getting connect() errors > when I would run out of local_ip:local_port tuples. I lowered the MSL so > that TIME_WAIT would timeout very quick (yes, I know, this is "bad", but > I'm going for sheer speed here), and it alleviated the problem a bit. > > However, I have run into a new problem. I am getting a good amount of > blocks stuck in FIN_WAIT_1, FIN_WAIT_2 or LAST_ACK that stick around for > a long while. I have been unable to find must information on a timeout > for these states. Well, these are defined in RFC-791 (aka STD-5). If you want to mirror the content of a given website rapidly, a good approach would be to use a tool like rsync and duplicate the changed portions at the filesystem level rather than mirroring via HTTP requests. It would also be the case that using HTTP/1.1 pipelining ought to greatly reduce the number of new connections you need to open, which ought to speed up your program significantly while reducing load on the servers you're mirroring. Since I've given some helpful advice (or so I think :-), perhaps you'll be willing to listen to a word of caution: if your client is pushing so hard that it exhausts the local machine's resources, you're very probably doing something that reasonable website administrators would consider to be abusive and you may cause denial-of-service conditions for other users of that site. Does your tool pay attention to /robots.txt? -- -Chuck