Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Aug 2000 15:38:31 -0500 (CDT)
From:      BWS - Offwhite <brennan@offwhite.net>
To:        Alfred Perlstein <bright@wintelcom.net>
Cc:        "James E. Pace" <jepace@pobox.com>, freebsd-questions@FreeBSD.ORG
Subject:   Re: Scaling Apache?
Message-ID:  <Pine.BSF.4.21.0008281445140.33533-100000@home.offwhite.net>
In-Reply-To: <Pine.BSF.4.05.10008281243440.22201-100000@greg.ad9.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I have had to deal with Apache under high load/traffic for some time
now.  You basically have to treat apache as just one part of the
system.  That goes for any web server software.

I manage a relatively high traffic site on a PIII server with under 500mb
of RAM.  It runs great, but I have tweaked many areas.  Here is how I
scale apache for my purposes.

1) I code all cgi scripts in perl and therefore I have mod_perl compiled
into apache statically.  The benefits of mod_perl are essential.  When the
same script is running on the main pages of the site you want the
advantages of Apache::Registry.  (http://perl.apache.org)

2) I also allocate a great deal of RAM.  With very high load you would
want at least 500mb of RAM up to 2 gigs or more.  It all depends on how
much you are digging into swap.  Again it comes back to hardware.  Apache
likes to cache it's content into memory.  When it does that it is very
fast.  Accessing your files directly every single time will cause apache
to kill your system, so he needs the memory.  This goes for any web
server.

3) I watch the dynamic content closely.  On the site I manage,
onmilwaukee.com, all of the content is served out of a database,
mysql.  It can be a hog if your scripts are not tweaked.  As an example,
there is a feature on the site which tallies up the most read articles and
displays the top 10.  This can be an intense script to run many times in a
second so I had to alter it after the designer placed the SSI for this
script on the homepage.  I basically created a table in the database to
act as the cache for these "top clicks" which would allow me to expire the
cache after 3 hours.  If the script does the tally and sort routine at 7
am the script would not have to do the heavy lifting for another 3
hours.  That saves a lot of the CPU and the database.  When it was running
many times in a second it started to lock tables in the database and
delay responses.  It was not pretty.  After I tweaked that I found ways
to cache other content, like the dynamic list of headlines on the
homepage.

4) Lastly, it could be your database which is too slow.  Under high
traffic times, like when the local radio shows are discussing the website
we get a peak level of traffic.  That is how I learned that "top
clicks" needed tweaking.  We resolved, perhaps prematurely, that we needed
a database server to sit behind the web server.  That will be very
powerful with lots of RAM for all the database tasks.  That will allow the
web server to focus on serving images and content from cgi scripts.  You
may also want to consider indexing more fields in your database.  If you
sort on a field heavily, it should be indexed for the sake of the speed
benefit.  That gets into database administration, and is not a web server
issue.

You can also use the tactics used by many of the large sites.  Do a lookup
of cnn.com

> nslookup www.cnn.com

You will see many addresses assigned to that web address.  That is round
robin DNS.  They basically have several servers serving up the same
content.  This method spreads the traffic across all of the servers.  I do
not like this method because it likely spreads your access logs across
multiple servers unless you are logging to a database server.  Either
option can be hard to manage and debug.

http://www.netcraft.com/whats/?host=www.cnn.com

Another high traffic site is Slashdot.org, which is a very busy all the
time.  They have so much traffic that when they link to another site they
often cause that site to get overloaded with new traffic.  It is called
the Slashdot effect.  They use Apache with mod_perl.  I also read that
they use a cluster with Linux. Their hardware is very powerful.  If you do
an nslookup you will notice they only have one IP for the site.

http://www.netcraft.com/whats/?host=slashdot.org

The Apache site uses FreeBSD.

http://www.netcraft.com/whats/?host=apache.org

What it comes down to is that have to look at your system as a whole.  Do
not simply blame Apache.  If you say it is slow, that is likely due to you
not configuring it for your level of load.  Sure it may seem that Apache
is slow, but maybe it is the database.  Maybe it is eating into swap space
because you do not have enough RAM.

And remember, if you are getting that much traffic, you hardware has to
handle it first.  It does not matter what software you use... add another
processor, more RAM, a RAID system, etc...  Build a system which can
handle your load.

Brennan Stehling - web developer and sys admin
projects: www.greasydaemon.com | www.onmilwaukee.com | www.sncalumni.com

On Mon, 28 Aug 2000, Steve Lewis wrote:

> On Mon, 28 Aug 2000, Alfred Perlstein wrote:
> 
> > > I imagine that these faster servers would use the hardware in a way that
> > > keeps request overhead lower (logging and caching tricks) but the
> > > trade-offs in server-side scripting support could kill that.
> > 
> > That is true, one should be investigating fast-cgi or some equivelant
> > to deal with that.
> 
> Well, Zeus supports ISAPI, JDBC, etc, and the other options you pointed
> out have apparently designed themselves with migration from apache in mind
> (which doesn't inspire confidence), so it looks like it should be possible
> to use a reasonable server-side scripting tool that supports ISAPI, or
> back-end Java, to make dynamic content scream on these servers.
> 
> Thanks for the tip Alfred (and others), and thanks for bringing it up
> James. I was completely in the dark until James asked.  
> 
> --Steve
> 
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-questions" in the body of the message
> 



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0008281445140.33533-100000>