Date: Thu, 13 Apr 2000 15:41:13 -0500 (CDT) From: Chris Dillon <cdillon@wolves.k12.mo.us> To: mi@privatelabs.com Cc: questions@FreeBSD.ORG Subject: Re: configuring squid Message-ID: <Pine.BSF.4.20.0004131405250.74405-100000@mail.wolves.k12.mo.us> In-Reply-To: <200004131855.OAA27159@misha.privatelabs.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 13 Apr 2000 mi@privatelabs.com wrote: > 13 Apr, Chris Dillon wrote: > =Snipped from freebsd-scsi, this is only appropriate for -questions. > > Sorry, I thought, SCSI people may have something to say about this > drives and how to best use them :) > > = On Thu, 13 Apr 2000 mi@privatelabs.com wrote: > = > = > Hello! I'm setting up a fairly big squid server with two 45Gb (but > = > slow) SCSI SEAGATE ST446452W (external). > = > > = > I wonder if I should use ccd to make one 90Gb interleaved array of > = > them or use them separately and tell Squid about the two independent > = > partitions... Speed is the only factor -- I understand, that > = > separately they'd be easier to manage... > = > = Keep them separate. Squid load-balances among multiple cache_dirs. If > = speed is the biggest factor, you really should be using many smaller > = drives with a single cache_dir on each one, instead of two large > = drives. > > We are likely to be serving big files rather then many files -- that's > what I mean by big -- images instead of pages :) The load balancing can > be done by squid (in user space) or by ccd-driver (in kernel). What's > more efficient? Squid, because it know what its dealing with and can > adapt or ccd because it is simpler and uses predetermined interleaf? Details, now we're getting somewhere. :-) Yes, larger drives might be appropriate in such cases, but not necessarily. If you expect a high request rate, and all of your users will be fetching different images at the same time, you'll still need a lot of disk spindles to serve up those different requests efficiently. If all of your users will be requesting the same image at roughly the same time, then you could get away with just one humungo disk if you really wanted to. Also, it is probably still better to let Squid handle the interleaving of objects. This has the added advantage that if one of the disks dies, you only lose one cache filesystem and not your entire cache. > = Keep in mind you're also going to need a lot of memory for a full 90GB > = cache. You need at least 10MB RAM per 1GB of cache (this is from my > = personal experience with Squid, and does not include OS overhead, > = filesystem cache, or anything else), so you'll need at least 1GB in > = there. > > Thanks, that's very valuable info... Anything special I need to tell > newfs when building the filesystems? Since you're going to be caching mostly large objects, that seriously skews the average object size in your favor. Fewer objects mean less overhead required to keep track of them. If your average object size is, for example, 100KB or more, you'll need less than half of the RAM I mentioned. If you're talking huge 1MB to 10MB objects, well, you might need a lot of RAM to hold multiple large in-transit objects, but definately not for object metadata overhead. As for what to tell newfs when creating the filesystem, I have always created my cache filesystems with 0% reserved space, and a SPACE optimization preference. This helps to prevent object fragmentation when you're dealing with hundreds of thousands or millions of objects in a cache filesystem that are constantly being replaced. This will probably not matter as much with larger objects or if you expect the same objects to stay on disk for long periods of time (long object lifetime). If you have mostly large objects, you can also increase the filesystem block size and lower the inode count. > = Since you're also going to be using two large disks instead of many > = smaller ones, you'll want plenty of RAM available for the filesystem > = cache and to increase Squid's cache_mem significantly above the > = default of 8MB to hold the most popular objects without having to > = fetch them from disk often. > > Fetching them from the proxy's local disk does not bother me as much as > having to re-fetch them from the source, which can be seriously time > consuming... We also expect the fairly uniform popularity among the > objects, so caching in memory does not buy much vs. caching on disk. Do you mean that every object is going to be equally popular, and there won't be a smaller subset of objects that are going to be fetched more frequently than others? Even if this is the case, a large memory cache will still help if you have different users fetching the same object at roughly the same time, or even if you think that you'll have enough memory cache to serve up an object at least twice before it gets flushed out by other objects. > = How many requests per second are you expecting during peak times, > = anyway? > > I don't even know :) But it will be in thousands -- we will have > multiple such squids humming next to each other -- with multiple disks. Hmm... You'll just have to set up a box and see how many requests per second you can squeeze out of it, because I have never seen any Squid benchmarks that have involved mostly large, equally popular objects, just the average "surf-the-internet" object distribution. -- Chris Dillon - cdillon@wolves.k12.mo.us - cdillon@inter-linc.net FreeBSD: The fastest and most stable server OS on the planet. For Intel x86 and Alpha architectures. ( http://www.freebsd.org ) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.20.0004131405250.74405-100000>