Date: Mon, 28 Aug 2000 15:38:31 -0500 (CDT) From: BWS - Offwhite <brennan@offwhite.net> To: Alfred Perlstein <bright@wintelcom.net> Cc: "James E. Pace" <jepace@pobox.com>, freebsd-questions@FreeBSD.ORG Subject: Re: Scaling Apache? Message-ID: <Pine.BSF.4.21.0008281445140.33533-100000@home.offwhite.net> In-Reply-To: <Pine.BSF.4.05.10008281243440.22201-100000@greg.ad9.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I have had to deal with Apache under high load/traffic for some time now. You basically have to treat apache as just one part of the system. That goes for any web server software. I manage a relatively high traffic site on a PIII server with under 500mb of RAM. It runs great, but I have tweaked many areas. Here is how I scale apache for my purposes. 1) I code all cgi scripts in perl and therefore I have mod_perl compiled into apache statically. The benefits of mod_perl are essential. When the same script is running on the main pages of the site you want the advantages of Apache::Registry. (http://perl.apache.org) 2) I also allocate a great deal of RAM. With very high load you would want at least 500mb of RAM up to 2 gigs or more. It all depends on how much you are digging into swap. Again it comes back to hardware. Apache likes to cache it's content into memory. When it does that it is very fast. Accessing your files directly every single time will cause apache to kill your system, so he needs the memory. This goes for any web server. 3) I watch the dynamic content closely. On the site I manage, onmilwaukee.com, all of the content is served out of a database, mysql. It can be a hog if your scripts are not tweaked. As an example, there is a feature on the site which tallies up the most read articles and displays the top 10. This can be an intense script to run many times in a second so I had to alter it after the designer placed the SSI for this script on the homepage. I basically created a table in the database to act as the cache for these "top clicks" which would allow me to expire the cache after 3 hours. If the script does the tally and sort routine at 7 am the script would not have to do the heavy lifting for another 3 hours. That saves a lot of the CPU and the database. When it was running many times in a second it started to lock tables in the database and delay responses. It was not pretty. After I tweaked that I found ways to cache other content, like the dynamic list of headlines on the homepage. 4) Lastly, it could be your database which is too slow. Under high traffic times, like when the local radio shows are discussing the website we get a peak level of traffic. That is how I learned that "top clicks" needed tweaking. We resolved, perhaps prematurely, that we needed a database server to sit behind the web server. That will be very powerful with lots of RAM for all the database tasks. That will allow the web server to focus on serving images and content from cgi scripts. You may also want to consider indexing more fields in your database. If you sort on a field heavily, it should be indexed for the sake of the speed benefit. That gets into database administration, and is not a web server issue. You can also use the tactics used by many of the large sites. Do a lookup of cnn.com > nslookup www.cnn.com You will see many addresses assigned to that web address. That is round robin DNS. They basically have several servers serving up the same content. This method spreads the traffic across all of the servers. I do not like this method because it likely spreads your access logs across multiple servers unless you are logging to a database server. Either option can be hard to manage and debug. http://www.netcraft.com/whats/?host=www.cnn.com Another high traffic site is Slashdot.org, which is a very busy all the time. They have so much traffic that when they link to another site they often cause that site to get overloaded with new traffic. It is called the Slashdot effect. They use Apache with mod_perl. I also read that they use a cluster with Linux. Their hardware is very powerful. If you do an nslookup you will notice they only have one IP for the site. http://www.netcraft.com/whats/?host=slashdot.org The Apache site uses FreeBSD. http://www.netcraft.com/whats/?host=apache.org What it comes down to is that have to look at your system as a whole. Do not simply blame Apache. If you say it is slow, that is likely due to you not configuring it for your level of load. Sure it may seem that Apache is slow, but maybe it is the database. Maybe it is eating into swap space because you do not have enough RAM. And remember, if you are getting that much traffic, you hardware has to handle it first. It does not matter what software you use... add another processor, more RAM, a RAID system, etc... Build a system which can handle your load. Brennan Stehling - web developer and sys admin projects: www.greasydaemon.com | www.onmilwaukee.com | www.sncalumni.com On Mon, 28 Aug 2000, Steve Lewis wrote: > On Mon, 28 Aug 2000, Alfred Perlstein wrote: > > > > I imagine that these faster servers would use the hardware in a way that > > > keeps request overhead lower (logging and caching tricks) but the > > > trade-offs in server-side scripting support could kill that. > > > > That is true, one should be investigating fast-cgi or some equivelant > > to deal with that. > > Well, Zeus supports ISAPI, JDBC, etc, and the other options you pointed > out have apparently designed themselves with migration from apache in mind > (which doesn't inspire confidence), so it looks like it should be possible > to use a reasonable server-side scripting tool that supports ISAPI, or > back-end Java, to make dynamic content scream on these servers. > > Thanks for the tip Alfred (and others), and thanks for bringing it up > James. I was completely in the dark until James asked. > > --Steve > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-questions" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0008281445140.33533-100000>