From owner-freebsd-isp  Mon Sep 23 07:21:19 1996
Return-Path: owner-isp
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id HAA14600
          for isp-outgoing; Mon, 23 Sep 1996 07:21:19 -0700 (PDT)
Received: from brasil.moneng.mei.com (brasil.moneng.mei.com [151.186.109.160])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id HAA14577
          for <freebsd-isp@FreeBSD.ORG>; Mon, 23 Sep 1996 07:21:15 -0700 (PDT)
Received: (from jgreco@localhost) by brasil.moneng.mei.com (8.7.Beta.1/8.7.Beta.1) id JAA15753; Mon, 23 Sep 1996 09:20:09 -0500
From: Joe Greco <jgreco@brasil.moneng.mei.com>
Message-Id: <199609231420.JAA15753@brasil.moneng.mei.com>
Subject: Re: Thoughts on a news server cluster
To: taob@io.org (Brian Tao)
Date: Mon, 23 Sep 1996 09:20:09 -0500 (CDT)
Cc: freebsd-isp@FreeBSD.ORG
In-Reply-To: <Pine.NEB.3.92.960923003632.24621M-100000@zap.io.org> from "Brian Tao" at Sep 23, 96 01:31:47 am
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Sender: owner-isp@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

>     The stuff I've been posting about hardware RAID products will
> ultimately lead to the installation of a news server cluster.  I've
> been running fairly happily so far on a single P133 with 128MB, 3
> NCR53810 controllers and 9 drives.  No RAID, no ccd... just different
> drives mounted at different points in the filesystem.  The concurrency
> is actually fairly decent, looking at iostat.

You get more concurrency if you use ccd too :-)

One of my pets:

/dev/ccd3e     1971087   765364  1048037    42%    /nov
/dev/ccd2e     1971087   357161  1456240    20%    /usr/local
/dev/ccd0e     3899273  2478146  1109185    69%    /news
/dev/ccd1e     3899273  1705248  1882083    48%    /news/.0
/dev/ccd4e     8075148  5569491  1859646    75%    /news/.1

>     Anyhow, management has decided they want something more robust and
> survivable, and that has led me down the path of redundant and
> high-availability hardware without having to switch to some commercial
> OS vendor.  ;-)  I've read a lot of discussion on building scalable,
> reliable news server configurations.  I'd like to know if anyone has
> some wisdown to share on the FreeBSD specifics.

If nothing else, you can get a "SCSI-SCSI" translator (made by Mylex and
others) where you just provide a fast/wide SCSI adaptor (2940, etc) and
let the black box handle the RAID aspects.

Support is probably going to appear for several host adapter RAID solutions
in the not too distant future, if I believe what people are telling me :-)

>     For example, someone mentioned that ccd did not appear to be all
> that stable in -current.  Would using 2.1.5-RELEASE be better?

Proudly running ccd since 2.1.0-R without a hiccup.  Or at least, a hiccup
that wasn't my own stupid fault for not reading the docs :-)

> Another thread mentioned that heavy NFS client activity causes
> instability.  Should I then avoid NFS altogether and pay a premium for
> a local disk subsystem for each server?

Is there any other way to do it???

You do NOT want to NFS mount!!!  I have done it.  If you have the I/O 
bandwidth and CPU (but not the RAM) to spare on a machine, it may be
a worthwhile option...  but the tax on the host is high.  And you
take a major reliability hit if that host goes down.

If you add additional disks with each additional slave, you are growing
your I/O bandwidth in a nearly linear fashion (the exception: the feeder
machine).  While this is somewhat expensive, it is not terrible... the
df output above is from a slave at Exec.  It is 4 1G 31055N's, 4 32550N's,
and 2 15150N's (soon to be Barra 9G's).

At Exec, we discussed reliability in a lot of detail.  We ended up 
agreeing that it was more reliable and more expandable simply to 
maintain N+1 servers where N was the number of systems I felt were
required for their load.  That means I can take ANY one machine out
of the loop (even the central feeder, although obviously only for a
short period of time) and not affect end user service in the least.

For informational purposes, the service afforded by a system such as
the following is excellent:

ASUS P/E-mumble dual CPU/8 SIMM slot board
P133
192MB RAM
3 NCR 810
1 SMC EtherPower 10/100
6 x ST32550N
4 x ST31055N
2 x ST15150N

It gives me 9 days retention on most stuff, 12 on alt, 2 on alt.binaries.
It supports 150 users and _flies_, and even at 200 the performance is
"acceptable" (but starts to degrade..  pretty much simultaneously RAM
and I/O bandwidth start to dry up).

For us, it turned out to be cheaper to build these machines than it
would have been to build somewhat beefier machines to support a few more
readers per machine.  Mostly a cost thing.  But it's also much higher
availability.  As "N" grows, even the loss of two machines is less of
a concern..

>     This is the configuration I'm looking at.  There will be three
> PPro200 servers on a 100MB Ethernet segment.  One will be dedicated to
> incoming and outgoing feeds.  The other two will be for readers.

PPro200?  Heavy iron for news...  what exactly are you doing with all
that horsepower...  :-)

>     The feeder server will have 4x2GB of local storage, holding about
> 2 days of news.  It will handle all transactions with other servers
> and not have to deal with NNRP reading or posting.  One of its feeds
> will be to the primary reader server.  This reader server will be a
> full news server in its own right, except that it has just the one
> single upstream feed.  I shouldn't have to mess around with XREPLIC or
> NFS-mounting a huge spool off a busy feeder server.

That is one way to handle it, but I find that running XREPLIC off of the
feeds system is perfectly acceptable... if I was going to have a separate
"reader" fan-out machine I would probably STILL run it as an XREPLIC slave
from the feeder machine...  convenience.

>     The primary reader server will have 16x2GB drives, RAID 5 with hot
> spares, two fast/wide controllers, 8 drives per controller.  It
> exchanges articles with the main feeder server as well as accepting
> NNRP connections.  I figure with just a single feed, I should be able
> to avoid the problem of long newsreader startup delays because innd is
> blocked on one of its connections.  Secondary reader servers will
> simply NFS-mount the spool as read-only and run in.nnrpd -S out of
> inetd.
> 
>     With the sharedactive patch, each 256MB reader server should be
> able to handle 500 to 600 readers at once, based on my experiences

I don't know, I would think that I would start seeing some I/O contention
with just one machine..

And I have not seen any basis for supporting that many readers on a single
machine..  how big is your active file?  What does "top"'s output look like
on one of your readers?  Enquiring minds want to know :-)

> with my current news server.  128MB on the feeder server should be
> more than enough for a few dozen feeds.  This setup can then be
> replicated to serve different geographical regions, with only the
> feeder servers exchanging traffic to save on WAN bandwidth.

That's actually a case for XREPLIC...  if all your news systems run in
sync...  think about it  :-)

>     Any caveats I should look out for, especially with NFS and ccd?
> Any other recommendations (besides getting more RAM ;-))?  Have there
> been any significant improvements to the AHA-2940UW driver in 2.2 that
> isn't in 2.1.5?

... JG