Date: Tue, 24 Sep 1996 11:55:23 -0500 (CDT) From: Joe Greco <jgreco@brasil.moneng.mei.com> To: taob@io.org Cc: freebsd-isp@FreeBSD.ORG Subject: Re: Thoughts on a news server cluster Message-ID: <199609241655.LAA06476@brasil.moneng.mei.com> In-Reply-To: <Pine.NEB.3.92.960923122044.24621R-100000@zap.io.org> from "Brian Tao" at Sep 24, 96 12:06:11 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> On Mon, 23 Sep 1996, Joe Greco wrote:
> >
> > You get more concurrency if you use ccd too :-)
>
> True enough, but I didn't feel ccd was stable enough when I first
> built our news server (late last year).
I've been using it for about that long without many problems... but it
was certainly rough around the edges at first.
> > If nothing else, you can get a "SCSI-SCSI" translator (made by Mylex
> > and others) where you just provide a fast/wide SCSI adaptor (2940,
> > etc) and let the black box handle the RAID aspects.
>
> Good news... Open Storage says they will have a 5x4GB CRD-5300
> (might be a bit off on the model number) with 64MB cache available for
> me in the next couple of days. The PPro systems are arriving this
> afternoon, and I'm going to order a bunch of 2GB drives in a rackmount
> chassis for next week. That will give me one system with a single F/W
> drive, a ccd of 2GB drives, a Streamlogic hardware RAID and a CMD
> hardware RAID for benchmark comparisons. The bits will be flying. ;-)
Ahhh nice :-)
> > Support is probably going to appear for several host adapter RAID
> > solutions in the not too distant future, if I believe what people are
> > telling me :-)
>
> Anything happening with the effort to pool some money together to
> pay a programmer to accelerate his port the DPT drivers? I *might* be
> able to convince the company to toss in some money towards such an
> effort.
I had heard a few various words from people, but IIRC somebody already
"almost" has a DPT driver sitting on a back burner. Rod Grimes might have
said something about looking at this - but you will have to ask him.
> > You do NOT want to NFS mount!!! I have done it. If you have the I/O
> > bandwidth and CPU (but not the RAM) to spare on a machine, it may be a
> > worthwhile option... but the tax on the host is high. And you take a
> > major reliability hit if that host goes down.
>
> I'm trying to do a simple sort of cost-benefit analysis. Two F/W
> controllers and level 5 RAID with 25GB of usuable capacity costs in
> the $25000 range. Per machine. For that kind of money, I'm
> definitely willing to give NFS-mounted reader servers a try.
J****!
Let me build this in my mind quickly...
Pentium 133 with ASUS P/E-XXXX????? MB $ 800
3 x NCR 810, 1 x SMC EtherPower 10/100 $ 320
192MB RAM $1200
6 x ST32550N $4100
4 x ST31055N $1200
2 x ST15150N $1850
Ext enclosures (3) $ 660
-----
$10130
That gives you 24GB usable _local_ disk capacity and additional
I/O bandwidth on top of it... and you can build three with your
$25000 plus some change, considering that you can get quantity
pricing on a purchase of so many drives.
And you get _complete_ redundancy rather than only disk subsystem
redundancy. That is the part that gets me excited.
> > It gives me 9 days retention on most stuff, 12 on alt, 2 on
> > alt.binaries. It supports 150 users and _flies_, and even at 200 the
> > performance is "acceptable" (but starts to degrade.. pretty much
> > simultaneously RAM and I/O bandwidth start to dry up).
>
> The only performance problem I'm seeing is long delays or timeouts
> when attempting to open an NNRP session. Once I'm in, the server is
> niec and fast. I haven't tried anything special with starting
> in.nnrpd's out of inetd and running innd on a different port, etc. It
> seems to be related to the number of incoming innxmit connections.
Yes. I deal with it by not launching nnrp's out of innd. I have
something (local hackery) called "connectd" which is like a nnrp inetd,
but has additional intelligence and allows me to limit the number of
simultaneous connections or the respawn rate from a particular host.
We have wankers around here who like running crap like NewsBin95.
You also have windows of unavailability when the server is running
news.daily and doing a renumber, etc. etc... spawning out of innd
is not ideal.
> > PPro200? Heavy iron for news... what exactly are you doing with all
> > that horsepower... :-)
>
> They were roughly same price as Pentium 200's (a couple hundred
> dollars difference). Maybe I'll start playing with on-the-fly
> compression of news articles. ;-)
Why not compute a few prime numbers too. ;-)
> > That is one way to handle it, but I find that running XREPLIC off of
> > the feeds system is perfectly acceptable... if I was going to have a
> > separate "reader" fan-out machine I would probably STILL run it as an
> > XREPLIC slave from the feeder machine... convenience.
>
> I don't want to "lock" myself into using XREPLIC though. If the
> main feeder blows up and I have to newfs the spool, it'll take extra
> work to resync those article numbers. If I just treat the feeder and
Why?
Grab the active off of a slave - if you are really anal, grab the active
off all the slaves and write a little perl script to find the max for
each group (just in case the slaves were a tad out of sync). That is
a bit of work, I agree, but not hard.
> the primary reader machine as entirely autonomous servers, something
> that goes wrong with one is less likely to affect the other. Also,
> isn't slave mode required for XREPLIC?
Yes.
> If the feeder server is
> unavailable, none of the reader machines will be able to post.
A qualified "Yes." You have the same problem no matter what you do,
since INN has a synchronous posting paradigm that in my opinion bites
the big one.
I got exasperated and did something different. I developed a smart
spooling system to deal with it. Now people can "post" even if the
master and all the other slaves are dead. At the same time I took
the opportunity to add a comprehensive posting accounting system that
records whatever the hell gets posted. It's been useful several times
already...
% cd /var/log/news/logposts/posts/idiot.user@execpc.com/
% grep "^Message-ID: " * | awk '{print $2}' > /tmp/cancelme
% spamcancel /tmp/cancelme
:-)
MUCH easier than in the old days... digging through logs, etc.
> I've
> not played with XREPLIC before, so my understanding may be off.
XREPLIC is a form of mildly tying your hands. On the other hand, it
keeps your machines in sync! Which is what an ISP needs.
> > I don't know, I would think that I would start seeing some I/O
> > contention with just one machine..
>
> I don't think we're going to hit 1000 simultaneous readers at this
> POP for a while yet. It will be a gradual curve up, so any
> anticipated I/O bottlenecks can be headed off before they become a
> problem. Do we have any kernel optimizations yet for PPro memory-
> intensive operations?
Dunno
> > And I have not seen any basis for supporting that many readers on a
> > single machine.. how big is your active file? What does "top"'s
> > output look like on one of your readers? Enquiring minds want to know :-)
>
> It's a pretty small active file, just under 9000 groups (407187
> bytes). 'top' looks like this:
Ahhh that's why. I have 25000+++ with well over 1MB size.
> load averages: 0.36, 0.42, 0.41 11:52:58
> 109 processes: 1 running, 118 sleeping
> Cpu states: 2.7% user, 1.5% nice, 14.2% system, 2.3% interrupt, 79.2% idle
> Mem: 82M Active, 6152K Inact, 20M Wired, 19M Cache, 7785K Buf, 176K Free
> Swap: 262M Total, 8336K Used, 254M Free, 3% Inuse
>
> PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
> 27230 news -6 0 24M 24M biowai 95:05 13.08% 11.02% innd.nodebug
> 27238 root 29 0 352K 808K RUN 0:00 3.15% 0.57% top
> 26658 news 2 4 316K 708K select 0:01 0.38% 0.38% in.nnrpd
> 25061 news 2 0 220K 352K sbwait 0:22 0.31% 0.31% innxmit
> 27200 news 2 4 292K 868K sbwait 0:00 0.23% 0.23% in.nnrpd
> 27235 news 2 0 292K 992K select 0:00 0.38% 0.19% in.nnrpd
> 27233 news -6 0 152K 484K piperd 0:00 0.20% 0.15% overchan
> 27150 news 2 4 288K 728K sbwait 0:00 0.08% 0.08% in.nnrpd
> 27190 news 2 4 284K 692K sbwait 0:00 0.08% 0.08% in.nnrpd
> 26803 news 2 4 292K 732K sbwait 0:00 0.04% 0.04% in.nnrpd
> 26480 news 2 0 448K 548K select 0:04 0.04% 0.04% innxmit
> 23024 news 2 0 220K 308K sbwait 0:31 0.04% 0.04% innxmit
> [...]
Looks more like
load averages: 1.48, 0.89, 0.60 11:40:14
96 processes: 2 running, 94 sleeping
Cpu states: 1.8% user, 15.5% nice, 24.1% system, 2.5% interrupt, 56.1% idle
Mem: 92M Active, 396K Inact, 19M Wired, 71M Cache, 5304K Buf, 260K Free
Swap: 369M Total, 40M Used, 329M Free, 11% Inuse
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
24372 news 74 4 47M 21M RUN 727:22 27.89% 27.89% innd
6228 news 2 0 1048K 2368K select 0:02 1.64% 1.64% in.nnrpd
6805 news 2 0 1080K 2384K select 0:00 2.02% 1.07% in.nnrpd
6801 news 2 0 1032K 2340K select 0:01 1.14% 0.99% in.nnrpd
6812 news 2 0 1028K 2324K select 0:00 1.89% 0.95% in.nnrpd
6633 news 2 0 1020K 2332K netio 0:03 0.92% 0.92% in.nnrpd
here, see the difference in size... :-(
> Assuming 32MB for kernel and OS stuff, 32MB for innd, 150MB for 500
> readers and no feeds, that still leaves ~40MB for disk cache and other
> processes (like expires) on a 256MB machine.
Must be nice to have a small active file. ;-)
... JG
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609241655.LAA06476>
