From owner-freebsd-isp Sun Oct 14 0:32:32 2001 Delivered-To: freebsd-isp@freebsd.org Received: from femail4.sdc1.sfba.home.com (femail4.sdc1.sfba.home.com [24.0.95.84]) by hub.freebsd.org (Postfix) with ESMTP id 833E137B40C for ; Sun, 14 Oct 2001 00:32:29 -0700 (PDT) Received: from veager.jwweeks.com ([65.14.122.116]) by femail4.sdc1.sfba.home.com (InterMail vM.4.01.03.20 201-229-121-120-20010223) with ESMTP id <20011014073229.ZZUG8041.femail4.sdc1.sfba.home.com@veager.jwweeks.com> for ; Sun, 14 Oct 2001 00:32:29 -0700 Date: Sun, 14 Oct 2001 03:32:27 -0400 (EDT) From: Jim Weeks X-Sender: jim@veager.jwweeks.com To: freebsd-isp@freebsd.org Subject: Re: Being Used! *Update* In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-isp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org I hate to answer my own post, but I thought this might be worth a heads up to anyone allowing cgi-bin access to their hosting clients. It would appear that the betsie script (http://www.bbc.co.uk/education/betsie/) in its original form already has a list of safe URL's listed in the "@safe" array. my @safe = qw ( bbc.co.uk beeb.com bbcworldwide.com bbcresources.com bbcshop.com radiotimes.com open.ac.uk open2.net freebeeb.net ); Of course these URL's should be replaced with those of your clients approved web sites, however in my case the client simply added his to the list. I can now tell you by experience that once one of Googles robots indexes one of these scripts with the array intact, you can then expect to furnish a *lot* of bandwidth and processor time to help Google index these sites. A word to the wise! -- Jim Weeks On Sun, 14 Oct 2001, Jim Weeks wrote: > I know this has nothing to do with FreeBSD, Just wondered if any others > have experienced this. > > I notice quite a lot of user nobody perl activity on one of my servers, > and set about to find where it was coming from. I quickly discovered that > one of my virtual hosting clients was running "betsie-1.5.pl". This is a > script developed by the BBC to convert normal (image filled) html > documents to a more simple text based page. I don't have any problem with > the concept, however I also discovered that it was being used to do all of > the parsing work for a group of web robots owned by "googlebot.com". > > Any comments would be appreciated, > > -- > Jim Weeks > > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-isp" in the body of the message