From owner-freebsd-questions@FreeBSD.ORG Thu Aug 14 08:36:59 2003 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2871037B401 for ; Thu, 14 Aug 2003 08:36:59 -0700 (PDT) Received: from franklin-belle.com (adsl-65-68-247-73.dsl.crchtx.swbell.net [65.68.247.73]) by mx1.FreeBSD.org (Postfix) with ESMTP id 481BD43FCB for ; Thu, 14 Aug 2003 08:36:58 -0700 (PDT) (envelope-from jackstone@sage-one.net) Received: from sagea (sagea.sage-american [10.0.0.3]) by franklin-belle.com (8.12.8p1/8.12.8) with SMTP id h7EFantI007930; Thu, 14 Aug 2003 10:36:57 -0500 (CDT) (envelope-from jackstone@sage-one.net) Message-Id: <3.0.5.32.20030814103652.012fa800@sage-one.net> X-Sender: jackstone@sage-one.net X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.5 (32) Date: Thu, 14 Aug 2003 10:36:52 -0500 To: Jez Hancock , freebsd-questions@freebsd.org From: "Jack L. Stone" In-Reply-To: <20030814144446.GC69860@users.munk.nu> References: <3.0.5.32.20030814084949.012f40e8@sage-one.net> <3.0.5.32.20030814084949.012f40e8@sage-one.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, hits=-1.9 required=4.5 tests=AWL,EMAIL_ATTRIBUTION,IN_REP_TO,QUOTED_EMAIL_TEXT, REFERENCES,REPLY_WITH_QUOTES autolearn=ham version=2.55-fbelle.rules_v1 X-Spam-Checker-Version: SpamAssassin 2.55-fbelle.rules_v1 (1.174.2.19-2003-05-19-exp) Subject: Re: Script help needed please X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Aug 2003 15:36:59 -0000 At 03:44 PM 8.14.2003 +0100, Jez Hancock wrote: >On Thu, Aug 14, 2003 at 08:49:49AM -0500, Jack L. Stone wrote: >> Server Version: Apache/1.3.27 (Unix) FrontPage/5.0.2.2510 PHP/4.3.1 >> The above is typical of the servers in use, and with csh shells employed, >> plus IPFW. >> >> My apologies for the length of this question, but the background seems >> necessary as brief as I can make it so the question makes sense. >> >> The problem: >> We have several servers that provide online reading of Technical articles >> and each have several hundred MB to a GB of content. >> >> When we started providing the articles 6-7 years ago, folks used browsers >> to read the articles. Now, the trend has become a more lazy approach and >> there is an increasing use of those download utilities which can be left >> unattended to download entire web sites taking several hours to do so. >> Multiply this by a number of similar downloads and there goes the >> bandwidth, denying those other normal online readers the speed needed for >> loading and browsing in the manner intended. Several hundred will be >> reading at a time and several 1000 daily. > >There is no easy solution to this, but one avenue might be to look at >bandwidth throttling in an apache module. > >One that I've used before is mod_throttle which is in the ports: > >/usr/ports/www/mod_throttle > >which allows you to throttle users by ip address to a certain number of >documents and/or up to a certain transfer limit. IIRC it's fairly >limited though in that you can only apply per IP limits to _every_ >virtual host - ie in the global httpd.conf context. > >A more finegrained solution (from what I've read, haven't tried it) is >mod_bwshare - this one isn't in the ports but can be found here: > >http://www.topology.org/src/bwshare/ > >this module overcomes some of the shortfalls of mod_throttle and allows >you to specify finer granularity over who consumes how much bandwidth >over what time period. > >> Now, my question: Is it possible to write a script that can constantly scan >> the Apache logs to look for certain footprints of those downloaders, >> perhaps the names, like "HTTRACK", being one I see a lot. Whenever I see >> one of those sessions, I have been able to abort them by adding a rule to >> the firewall to deny the IP address access to the server. This aborts the >> downloading, but have seen the attempts constantly continue for a day or >> two, confirming unattended downloads. >> >> Thus, if the script could spot an "offender" and then perhaps make use of >> the firewall to add a rule containing the offender's IP address and then >> flush to reset the firewall, this would at least abort the download and >> free up the bandwidth (I already have a script that restarts the firewall). >> >> Is this possible and how would I go about it....??? >If you really wanted to go down this route then I found a script someone >wrote a while back to find 'rude robots' from a httpd logfile which you >could perhaps adapt to do dynamic filtering in conjunction with your >firewall: > >http://stein.cshl.org/~lstein/talks/perl_conference/cute_tricks/log9.html > >If you have any success let me know. > >-- >Jez > Interesting. Looks like a step in the right direction. Will weigh this one along the possibilities. Many thanks...! Best regards, Jack L. Stone, Administrator SageOne Net http://www.sage-one.net jackstone@sage-one.net