Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Apr 2000 15:41:13 -0500 (CDT)
From:      Chris Dillon <cdillon@wolves.k12.mo.us>
To:        mi@privatelabs.com
Cc:        questions@FreeBSD.ORG
Subject:   Re: configuring squid
Message-ID:  <Pine.BSF.4.20.0004131405250.74405-100000@mail.wolves.k12.mo.us>
In-Reply-To: <200004131855.OAA27159@misha.privatelabs.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 13 Apr 2000 mi@privatelabs.com wrote:

> 13 Apr, Chris Dillon wrote:
> =Snipped from freebsd-scsi, this is only appropriate for -questions.
> 
> Sorry,  I thought,  SCSI people  may have  something to  say about  this
> drives and how to best use them :)
> 
> = On Thu, 13 Apr 2000 mi@privatelabs.com wrote:
> =
> = > Hello! I'm setting  up a fairly big squid server  with two 45Gb (but
> = > slow) SCSI SEAGATE ST446452W (external).
> = >
> = > I wonder if I  should use ccd to make one  90Gb interleaved array of
> = > them or use them separately and tell Squid about the two independent
> = > partitions...  Speed  is  the  only factor  --  I  understand,  that
> = > separately they'd be easier to manage...
> =
> = Keep them separate. Squid  load-balances among multiple cache_dirs. If
> = speed is the  biggest factor, you really should be  using many smaller
> = drives  with a  single cache_dir  on each  one, instead  of two  large
> = drives.
> 
> We are likely to  be serving big files rather then  many files -- that's
> what I mean by big -- images  instead of pages :) The load balancing can
> be done  by squid (in user  space) or by ccd-driver  (in kernel). What's
> more efficient?  Squid, because it  know what  its dealing with  and can
> adapt or ccd because it is simpler and uses predetermined interleaf?

Details, now we're getting somewhere. :-)  Yes, larger drives might be
appropriate in such cases, but not necessarily.  If you expect a high
request rate, and all of your users will be fetching different images
at the same time, you'll still need a lot of disk spindles to serve up
those different requests efficiently.  If all of your users will be
requesting the same image at roughly the same time, then you could get
away with just one humungo disk if you really wanted to.  Also, it is
probably still better to let Squid handle the interleaving of objects.  
This has the added advantage that if one of the disks dies, you only
lose one cache filesystem and not your entire cache.

> = Keep in mind you're also going to need a lot of memory for a full 90GB
> = cache. You need  at least 10MB RAM  per 1GB of cache (this  is from my
> = personal  experience with  Squid, and  does not  include OS  overhead,
> = filesystem cache,  or anything else), so  you'll need at least  1GB in
> = there.
> 
> Thanks, that's  very valuable  info... Anything special  I need  to tell
> newfs when building the filesystems?

Since you're going to be caching mostly large objects, that seriously
skews the average object size in your favor.  Fewer objects mean less
overhead required to keep track of them.  If your average object size
is, for example, 100KB or more, you'll need less than half of the RAM
I mentioned.  If you're talking huge 1MB to 10MB objects, well, you
might need a lot of RAM to hold multiple large in-transit objects, but
definately not for object metadata overhead.

As for what to tell newfs when creating the filesystem, I have always
created my cache filesystems with 0% reserved space, and a SPACE
optimization preference.  This helps to prevent object fragmentation
when you're dealing with hundreds of thousands or millions of objects
in a cache filesystem that are constantly being replaced.  This will
probably not matter as much with larger objects or if you expect the
same objects to stay on disk for long periods of time (long object
lifetime).  If you have mostly large objects, you can also increase
the filesystem block size and lower the inode count.

> = Since you're  also going to be  using two large disks  instead of many
> = smaller ones, you'll  want plenty of RAM available  for the filesystem
> = cache  and  to  increase  Squid's cache_mem  significantly  above  the
> = default of  8MB to  hold the  most popular  objects without  having to
> = fetch them from disk often.
> 
> Fetching them from the proxy's local disk  does not bother me as much as
> having to  re-fetch them from  the source,  which can be  seriously time
> consuming...  We also  expect the  fairly uniform  popularity among  the
> objects, so caching in memory does not buy much vs. caching on disk.

Do you mean that every object is going to be equally popular, and
there won't be a smaller subset of objects that are going to be
fetched more frequently than others?  Even if this is the case, a
large memory cache will still help if you have different users
fetching the same object at roughly the same time, or even if you
think that you'll have enough memory cache to serve up an object at
least twice before it gets flushed out by other objects.

> = How  many requests  per second  are you  expecting during  peak times,
> = anyway?
> 
> I  don't even  know :)  But it  will  be in  thousands --  we will  have
> multiple such squids humming next to each other -- with multiple disks.

Hmm... You'll just have to set up a box and see how many requests per
second you can squeeze out of it, because I have never seen any Squid
benchmarks that have involved mostly large, equally popular objects,
just the average "surf-the-internet" object distribution.


-- Chris Dillon - cdillon@wolves.k12.mo.us - cdillon@inter-linc.net
   FreeBSD: The fastest and most stable server OS on the planet.
   For Intel x86 and Alpha architectures. ( http://www.freebsd.org )




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.20.0004131405250.74405-100000>