From owner-freebsd-isp Sat Oct 5 00:20:51 1996 Return-Path: owner-isp Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id AAA01008 for isp-outgoing; Sat, 5 Oct 1996 00:20:51 -0700 (PDT) Received: from dnai.com (dnai.com [140.174.162.28]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id AAA01002 for ; Sat, 5 Oct 1996 00:20:49 -0700 (PDT) Received: from mars.dnai.com (mars.dnai.com [140.174.162.14]) by dnai.com (8.7.5/8.7.3) with SMTP id AAA10327 for ; Sat, 5 Oct 1996 00:20:18 -0700 (PDT) Date: Sat, 5 Oct 1996 00:20:18 -0700 (PDT) From: Dror Matalon Reply-To: Dror Matalon To: freebsd-isp@freebsd.org Subject: How to solve the news server problem In-Reply-To: <199610031825.NAA07158@brasil.moneng.mei.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-isp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hi folks, All this discussion of RAID, ccd etc got me again thinking about the news server problem. Our news has way more problems than any other server we run. I believe that this is typical for most ISPs. Our population actually reads news less than many other ISPs. With around 3000 users I've never seen more than 30 concurrent readers on our news server. Our server runs on : 128 Meg memory 4 Quantum XP34300W (Fast wide 4Gig) Yes, I know 8 2 Gigs would be better. Pentium 133 Response time is fine, but not spectacular. I suspect that the next step for speedup would be for us to have separate reader and feed machines. Right now this machine connect to 4 other ISPs to send and receive news. I'm annoyed with how indeficient the news system is. I know the history, (pun intended) of Usenet and it all makes sense in the context of uucp and store and forward on 56K lines to have a news system where everyone keeps all the articles and everyone has a full feed. Today with our fast lines and 150 - 200 Megs of news I believe that my news server is spending most of its time receiving, writing to disk, organizing, and then removing files that NONE OF MY USERS WILL EVER LOOK AT. To put it another way, the reason that we all have these really full feeds (other than to be able to tell someone who calls and wants to know, "oh yes we have 500,000 newsgroups"), is so that when one of our users wants to subscribe to a new newsgroup we want the to have the articles there. We could quite easily figure out which newsgroups our users subscribe to, accept only articles for these newsgroups and reduce the traffic, the disk space, the memory etc to ... 5%? 10%? 30%? The problem is that we want to have newsgroups available when our users want to subscribe to something new. So, it looks like we could have some kind of algorithm that keeps everything in subscribed newsgroups for 14 days. Keeps subscribed binary newsgroups for 7 days, keeps everything else for 1 day. This way when someone subscribes to a new newsgroup they have something to start with, and they'll see all the new stuff from the point of subscription. The only time they lose is when the subscribe to a new newsgroup they only get to see 1 day instead of 13 days or articles. On the other hand, I just checked and our users only looked at 554 newsgroups out of the 17,000 or so we have (I lied we don't have 500,000). So even if the binary newsgroups will still contain most of the same material and even if we do a keep a day's work of other unsubscribed newsgroups we should be able to handle only 20% or so of all the articles and our disks will not be working as hard since they'll have a lot less material to look through, which should make them more reliable, need to worry less about disks failing, RAID, ccd etc. Now, I know I'm not the only smart person in the world so I looked around and sure enough I found ftp://ftp.math.psu.edu/pub/INN/contrib/actgroups.pl #!/usr/local/bin/perl # Active Groups -- Detecting actively accessed news groups and setting # expire.ctl accordingly $progname = "actgroups"; $version = "Ver 0.03c, 30 August 1994"; $author = "Yufan Hu "; # # Slightly modified by Alan Brown (alan@manawatu.planet.co.nz) 15 Dec 1994 But I couldn't find anyone using this. So folks, is this a good solution? Dror Matalon Voice: 510 649-6110 Direct Network Access Fax: 510 649-7130 2039 Shattuck Avenue Modem: 510 649-6116 Berkeley, CA 94704 Email: dror@dnai.com