From owner-freebsd-isp Mon Sep 23 07:21:19 1996 Return-Path: owner-isp Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id HAA14600 for isp-outgoing; Mon, 23 Sep 1996 07:21:19 -0700 (PDT) Received: from brasil.moneng.mei.com (brasil.moneng.mei.com [151.186.109.160]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id HAA14577 for ; Mon, 23 Sep 1996 07:21:15 -0700 (PDT) Received: (from jgreco@localhost) by brasil.moneng.mei.com (8.7.Beta.1/8.7.Beta.1) id JAA15753; Mon, 23 Sep 1996 09:20:09 -0500 From: Joe Greco Message-Id: <199609231420.JAA15753@brasil.moneng.mei.com> Subject: Re: Thoughts on a news server cluster To: taob@io.org (Brian Tao) Date: Mon, 23 Sep 1996 09:20:09 -0500 (CDT) Cc: freebsd-isp@FreeBSD.ORG In-Reply-To: from "Brian Tao" at Sep 23, 96 01:31:47 am X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: owner-isp@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > The stuff I've been posting about hardware RAID products will > ultimately lead to the installation of a news server cluster. I've > been running fairly happily so far on a single P133 with 128MB, 3 > NCR53810 controllers and 9 drives. No RAID, no ccd... just different > drives mounted at different points in the filesystem. The concurrency > is actually fairly decent, looking at iostat. You get more concurrency if you use ccd too :-) One of my pets: /dev/ccd3e 1971087 765364 1048037 42% /nov /dev/ccd2e 1971087 357161 1456240 20% /usr/local /dev/ccd0e 3899273 2478146 1109185 69% /news /dev/ccd1e 3899273 1705248 1882083 48% /news/.0 /dev/ccd4e 8075148 5569491 1859646 75% /news/.1 > Anyhow, management has decided they want something more robust and > survivable, and that has led me down the path of redundant and > high-availability hardware without having to switch to some commercial > OS vendor. ;-) I've read a lot of discussion on building scalable, > reliable news server configurations. I'd like to know if anyone has > some wisdown to share on the FreeBSD specifics. If nothing else, you can get a "SCSI-SCSI" translator (made by Mylex and others) where you just provide a fast/wide SCSI adaptor (2940, etc) and let the black box handle the RAID aspects. Support is probably going to appear for several host adapter RAID solutions in the not too distant future, if I believe what people are telling me :-) > For example, someone mentioned that ccd did not appear to be all > that stable in -current. Would using 2.1.5-RELEASE be better? Proudly running ccd since 2.1.0-R without a hiccup. Or at least, a hiccup that wasn't my own stupid fault for not reading the docs :-) > Another thread mentioned that heavy NFS client activity causes > instability. Should I then avoid NFS altogether and pay a premium for > a local disk subsystem for each server? Is there any other way to do it??? You do NOT want to NFS mount!!! I have done it. If you have the I/O bandwidth and CPU (but not the RAM) to spare on a machine, it may be a worthwhile option... but the tax on the host is high. And you take a major reliability hit if that host goes down. If you add additional disks with each additional slave, you are growing your I/O bandwidth in a nearly linear fashion (the exception: the feeder machine). While this is somewhat expensive, it is not terrible... the df output above is from a slave at Exec. It is 4 1G 31055N's, 4 32550N's, and 2 15150N's (soon to be Barra 9G's). At Exec, we discussed reliability in a lot of detail. We ended up agreeing that it was more reliable and more expandable simply to maintain N+1 servers where N was the number of systems I felt were required for their load. That means I can take ANY one machine out of the loop (even the central feeder, although obviously only for a short period of time) and not affect end user service in the least. For informational purposes, the service afforded by a system such as the following is excellent: ASUS P/E-mumble dual CPU/8 SIMM slot board P133 192MB RAM 3 NCR 810 1 SMC EtherPower 10/100 6 x ST32550N 4 x ST31055N 2 x ST15150N It gives me 9 days retention on most stuff, 12 on alt, 2 on alt.binaries. It supports 150 users and _flies_, and even at 200 the performance is "acceptable" (but starts to degrade.. pretty much simultaneously RAM and I/O bandwidth start to dry up). For us, it turned out to be cheaper to build these machines than it would have been to build somewhat beefier machines to support a few more readers per machine. Mostly a cost thing. But it's also much higher availability. As "N" grows, even the loss of two machines is less of a concern.. > This is the configuration I'm looking at. There will be three > PPro200 servers on a 100MB Ethernet segment. One will be dedicated to > incoming and outgoing feeds. The other two will be for readers. PPro200? Heavy iron for news... what exactly are you doing with all that horsepower... :-) > The feeder server will have 4x2GB of local storage, holding about > 2 days of news. It will handle all transactions with other servers > and not have to deal with NNRP reading or posting. One of its feeds > will be to the primary reader server. This reader server will be a > full news server in its own right, except that it has just the one > single upstream feed. I shouldn't have to mess around with XREPLIC or > NFS-mounting a huge spool off a busy feeder server. That is one way to handle it, but I find that running XREPLIC off of the feeds system is perfectly acceptable... if I was going to have a separate "reader" fan-out machine I would probably STILL run it as an XREPLIC slave from the feeder machine... convenience. > The primary reader server will have 16x2GB drives, RAID 5 with hot > spares, two fast/wide controllers, 8 drives per controller. It > exchanges articles with the main feeder server as well as accepting > NNRP connections. I figure with just a single feed, I should be able > to avoid the problem of long newsreader startup delays because innd is > blocked on one of its connections. Secondary reader servers will > simply NFS-mount the spool as read-only and run in.nnrpd -S out of > inetd. > > With the sharedactive patch, each 256MB reader server should be > able to handle 500 to 600 readers at once, based on my experiences I don't know, I would think that I would start seeing some I/O contention with just one machine.. And I have not seen any basis for supporting that many readers on a single machine.. how big is your active file? What does "top"'s output look like on one of your readers? Enquiring minds want to know :-) > with my current news server. 128MB on the feeder server should be > more than enough for a few dozen feeds. This setup can then be > replicated to serve different geographical regions, with only the > feeder servers exchanging traffic to save on WAN bandwidth. That's actually a case for XREPLIC... if all your news systems run in sync... think about it :-) > Any caveats I should look out for, especially with NFS and ccd? > Any other recommendations (besides getting more RAM ;-))? Have there > been any significant improvements to the AHA-2940UW driver in 2.2 that > isn't in 2.1.5? ... JG