Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Sep 1996 11:55:23 -0500 (CDT)
From:      Joe Greco <jgreco@brasil.moneng.mei.com>
To:        taob@io.org
Cc:        freebsd-isp@FreeBSD.ORG
Subject:   Re: Thoughts on a news server cluster
Message-ID:  <199609241655.LAA06476@brasil.moneng.mei.com>
In-Reply-To: <Pine.NEB.3.92.960923122044.24621R-100000@zap.io.org> from "Brian Tao" at Sep 24, 96 12:06:11 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> On Mon, 23 Sep 1996, Joe Greco wrote:
> >
> > You get more concurrency if you use ccd too :-)
> 
>     True enough, but I didn't feel ccd was stable enough when I first
> built our news server (late last year).

I've been using it for about that long without many problems...  but it
was certainly rough around the edges at first.

> > If nothing else, you can get a "SCSI-SCSI" translator (made by Mylex
> > and others) where you just provide a fast/wide SCSI adaptor (2940,
> > etc) and let the black box handle the RAID aspects.
> 
>     Good news... Open Storage says they will have a 5x4GB CRD-5300
> (might be a bit off on the model number) with 64MB cache available for
> me in the next couple of days.  The PPro systems are arriving this
> afternoon, and I'm going to order a bunch of 2GB drives in a rackmount
> chassis for next week.  That will give me one system with a single F/W
> drive, a ccd of 2GB drives, a Streamlogic hardware RAID and a CMD
> hardware RAID for benchmark comparisons.  The bits will be flying.  ;-)

Ahhh nice :-)

> > Support is probably going to appear for several host adapter RAID
> > solutions in the not too distant future, if I believe what people are
> > telling me :-)
> 
>     Anything happening with the effort to pool some money together to
> pay a programmer to accelerate his port the DPT drivers?  I *might* be
> able to convince the company to toss in some money towards such an
> effort.

I had heard a few various words from people, but IIRC somebody already
"almost" has a DPT driver sitting on a back burner.  Rod Grimes might have
said something about looking at this - but you will have to ask him.

> > You do NOT want to NFS mount!!!  I have done it.  If you have the I/O
> > bandwidth and CPU (but not the RAM) to spare on a machine, it may be a
> > worthwhile option...  but the tax on the host is high.  And you take a
> > major reliability hit if that host goes down.
> 
>     I'm trying to do a simple sort of cost-benefit analysis.  Two F/W
> controllers and level 5 RAID with 25GB of usuable capacity costs in
> the $25000 range.  Per machine.  For that kind of money, I'm
> definitely willing to give NFS-mounted reader servers a try.

J****!

Let me build this in my mind quickly...

Pentium 133 with ASUS P/E-XXXX????? MB	$ 800
3 x NCR 810, 1 x SMC EtherPower 10/100	$ 320
192MB RAM				$1200
6 x ST32550N				$4100
4 x ST31055N				$1200
2 x ST15150N				$1850
Ext enclosures (3)			$ 660
					-----
					$10130

That gives you 24GB usable _local_ disk capacity and additional
I/O bandwidth on top of it...  and you can build three with your 
$25000 plus some change, considering that you can get quantity 
pricing on a purchase of so many drives.

And you get _complete_ redundancy rather than only disk subsystem
redundancy.  That is the part that gets me excited.

> > It gives me 9 days retention on most stuff, 12 on alt, 2 on
> > alt.binaries.  It supports 150 users and _flies_, and even at 200 the
> > performance is "acceptable" (but starts to degrade..  pretty much
> > simultaneously RAM and I/O bandwidth start to dry up).
> 
>     The only performance problem I'm seeing is long delays or timeouts
> when attempting to open an NNRP session.  Once I'm in, the server is
> niec and fast.  I haven't tried anything special with starting
> in.nnrpd's out of inetd and running innd on a different port, etc.  It
> seems to be related to the number of incoming innxmit connections.

Yes.  I deal with it by not launching nnrp's out of innd.  I have 
something (local hackery) called "connectd" which is like a nnrp inetd,
but has additional intelligence and allows me to limit the number of
simultaneous connections or the respawn rate from a particular host.
We have wankers around here who like running crap like NewsBin95.

You also have windows of unavailability when the server is running
news.daily and doing a renumber, etc. etc...  spawning out of innd
is not ideal.

> > PPro200?  Heavy iron for news...  what exactly are you doing with all
> > that horsepower...  :-)
> 
>     They were roughly same price as Pentium 200's (a couple hundred
> dollars difference).  Maybe I'll start playing with on-the-fly
> compression of news articles.  ;-)

Why not compute a few prime numbers too.  ;-)

> > That is one way to handle it, but I find that running XREPLIC off of
> > the feeds system is perfectly acceptable... if I was going to have a
> > separate "reader" fan-out machine I would probably STILL run it as an
> > XREPLIC slave from the feeder machine...  convenience.
> 
>     I don't want to "lock" myself into using XREPLIC though.  If the
> main feeder blows up and I have to newfs the spool, it'll take extra
> work to resync those article numbers.  If I just treat the feeder and

Why?

Grab the active off of a slave - if you are really anal, grab the active
off all the slaves and write a little perl script to find the max for
each group (just in case the slaves were a tad out of sync).  That is
a bit of work, I agree, but not hard.

> the primary reader machine as entirely autonomous servers, something
> that goes wrong with one is less likely to affect the other.  Also,
> isn't slave mode required for XREPLIC?  

Yes.

> If the feeder server is
> unavailable, none of the reader machines will be able to post. 

A qualified "Yes."  You have the same problem no matter what you do,
since INN has a synchronous posting paradigm that in my opinion bites
the big one.

I got exasperated and did something different.  I developed a smart
spooling system to deal with it.  Now people can "post" even if the
master and all the other slaves are dead.  At the same time I took
the opportunity to add a comprehensive posting accounting system that
records whatever the hell gets posted.  It's been useful several times
already...  

% cd /var/log/news/logposts/posts/idiot.user@execpc.com/
% grep "^Message-ID: " * | awk '{print $2}' > /tmp/cancelme
% spamcancel /tmp/cancelme

:-)

MUCH easier than in the old days... digging through logs, etc.

> I've
> not played with XREPLIC before, so my understanding may be off.

XREPLIC is a form of mildly tying your hands.  On the other hand, it
keeps your machines in sync!  Which is what an ISP needs.

> > I don't know, I would think that I would start seeing some I/O
> > contention with just one machine..
> 
>     I don't think we're going to hit 1000 simultaneous readers at this
> POP for a while yet.  It will be a gradual curve up, so any
> anticipated I/O bottlenecks can be headed off before they become a
> problem.  Do we have any kernel optimizations yet for PPro memory-
> intensive operations?

Dunno

> > And I have not seen any basis for supporting that many readers on a
> > single machine..  how big is your active file?  What does "top"'s
> > output look like on one of your readers?  Enquiring minds want to know :-)
> 
>     It's a pretty small active file, just under 9000 groups (407187
> bytes).  'top' looks like this:

Ahhh that's why.  I have 25000+++ with well over 1MB size.

> load averages:   0.36,  0.42,  0.41                                  11:52:58
> 109 processes: 1 running, 118 sleeping
> Cpu states:  2.7% user,  1.5% nice, 14.2% system,  2.3% interrupt, 79.2% idle
> Mem: 82M Active, 6152K Inact, 20M Wired, 19M Cache, 7785K Buf, 176K Free
> Swap: 262M Total, 8336K Used, 254M Free, 3% Inuse
> 
>   PID USERNAME PRI NICE  SIZE   RES STATE    TIME   WCPU    CPU COMMAND
> 27230 news      -6    0   24M   24M biowai  95:05 13.08% 11.02% innd.nodebug
> 27238 root      29    0  352K  808K RUN      0:00  3.15%  0.57% top
> 26658 news       2    4  316K  708K select   0:01  0.38%  0.38% in.nnrpd
> 25061 news       2    0  220K  352K sbwait   0:22  0.31%  0.31% innxmit
> 27200 news       2    4  292K  868K sbwait   0:00  0.23%  0.23% in.nnrpd
> 27235 news       2    0  292K  992K select   0:00  0.38%  0.19% in.nnrpd
> 27233 news      -6    0  152K  484K piperd   0:00  0.20%  0.15% overchan
> 27150 news       2    4  288K  728K sbwait   0:00  0.08%  0.08% in.nnrpd
> 27190 news       2    4  284K  692K sbwait   0:00  0.08%  0.08% in.nnrpd
> 26803 news       2    4  292K  732K sbwait   0:00  0.04%  0.04% in.nnrpd
> 26480 news       2    0  448K  548K select   0:04  0.04%  0.04% innxmit
> 23024 news       2    0  220K  308K sbwait   0:31  0.04%  0.04% innxmit
> [...]

Looks more like

load averages:   1.48,  0.89,  0.60				 11:40:14
96 processes:  2 running, 94 sleeping
Cpu states:  1.8% user, 15.5% nice, 24.1% system,  2.5% interrupt, 56.1% idle
Mem: 92M Active, 396K Inact, 19M Wired, 71M Cache, 5304K Buf, 260K Free
Swap: 369M Total, 40M Used, 329M Free, 11% Inuse

  PID USERNAME PRI NICE  SIZE   RES STATE    TIME   WCPU    CPU COMMAND
24372 news      74    4   47M   21M RUN    727:22 27.89% 27.89% innd
 6228 news       2    0 1048K 2368K select   0:02  1.64%  1.64% in.nnrpd
 6805 news       2    0 1080K 2384K select   0:00  2.02%  1.07% in.nnrpd
 6801 news       2    0 1032K 2340K select   0:01  1.14%  0.99% in.nnrpd
 6812 news       2    0 1028K 2324K select   0:00  1.89%  0.95% in.nnrpd
 6633 news       2    0 1020K 2332K netio    0:03  0.92%  0.92% in.nnrpd

here, see the difference in size...  :-(

>    Assuming 32MB for kernel and OS stuff, 32MB for innd, 150MB for 500
> readers and no feeds, that still leaves ~40MB for disk cache and other
> processes (like expires) on a 256MB machine.

Must be nice to have a small active file.  ;-)

... JG



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609241655.LAA06476>