From owner-freebsd-questions@FreeBSD.ORG Thu Aug 14 07:44:47 2003 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 98FB337B401 for ; Thu, 14 Aug 2003 07:44:47 -0700 (PDT) Received: from munk.nu (213-152-51-194.dsl.eclipse.net.uk [213.152.51.194]) by mx1.FreeBSD.org (Postfix) with ESMTP id D5DD943F3F for ; Thu, 14 Aug 2003 07:44:46 -0700 (PDT) (envelope-from munk@munk.nu) Received: from munk by munk.nu with local (Exim 4.20) id 19nJLK-000Ian-8e for freebsd-questions@freebsd.org; Thu, 14 Aug 2003 15:44:46 +0100 Date: Thu, 14 Aug 2003 15:44:46 +0100 From: Jez Hancock To: freebsd-questions@freebsd.org Message-ID: <20030814144446.GC69860@users.munk.nu> Mail-Followup-To: freebsd-questions@freebsd.org References: <3.0.5.32.20030814084949.012f40e8@sage-one.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3.0.5.32.20030814084949.012f40e8@sage-one.net> User-Agent: Mutt/1.4.1i Sender: User Munk Subject: Re: Script help needed please X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Aug 2003 14:44:48 -0000 On Thu, Aug 14, 2003 at 08:49:49AM -0500, Jack L. Stone wrote: > Server Version: Apache/1.3.27 (Unix) FrontPage/5.0.2.2510 PHP/4.3.1 > The above is typical of the servers in use, and with csh shells employed, > plus IPFW. > > My apologies for the length of this question, but the background seems > necessary as brief as I can make it so the question makes sense. > > The problem: > We have several servers that provide online reading of Technical articles > and each have several hundred MB to a GB of content. > > When we started providing the articles 6-7 years ago, folks used browsers > to read the articles. Now, the trend has become a more lazy approach and > there is an increasing use of those download utilities which can be left > unattended to download entire web sites taking several hours to do so. > Multiply this by a number of similar downloads and there goes the > bandwidth, denying those other normal online readers the speed needed for > loading and browsing in the manner intended. Several hundred will be > reading at a time and several 1000 daily. There is no easy solution to this, but one avenue might be to look at bandwidth throttling in an apache module. One that I've used before is mod_throttle which is in the ports: /usr/ports/www/mod_throttle which allows you to throttle users by ip address to a certain number of documents and/or up to a certain transfer limit. IIRC it's fairly limited though in that you can only apply per IP limits to _every_ virtual host - ie in the global httpd.conf context. A more finegrained solution (from what I've read, haven't tried it) is mod_bwshare - this one isn't in the ports but can be found here: http://www.topology.org/src/bwshare/ this module overcomes some of the shortfalls of mod_throttle and allows you to specify finer granularity over who consumes how much bandwidth over what time period. > Now, my question: Is it possible to write a script that can constantly scan > the Apache logs to look for certain footprints of those downloaders, > perhaps the names, like "HTTRACK", being one I see a lot. Whenever I see > one of those sessions, I have been able to abort them by adding a rule to > the firewall to deny the IP address access to the server. This aborts the > downloading, but have seen the attempts constantly continue for a day or > two, confirming unattended downloads. > > Thus, if the script could spot an "offender" and then perhaps make use of > the firewall to add a rule containing the offender's IP address and then > flush to reset the firewall, this would at least abort the download and > free up the bandwidth (I already have a script that restarts the firewall). > > Is this possible and how would I go about it....??? If you really wanted to go down this route then I found a script someone wrote a while back to find 'rude robots' from a httpd logfile which you could perhaps adapt to do dynamic filtering in conjunction with your firewall: http://stein.cshl.org/~lstein/talks/perl_conference/cute_tricks/log9.html If you have any success let me know. -- Jez http://www.munk.nu/