Date: Sat, 24 Aug 1996 11:31:26 -0500 (CDT) From: Joe Greco <jgreco@brasil.moneng.mei.com> To: michaelv@MindBender.serv.net (Michael L. VanLoon) Cc: jgreco@brasil.moneng.mei.com, michael@memra.com, craigs@os.com, freebsd-isp@freebsd.org, mvanloon@microsoft.com Subject: Re: Anyone using ccd (FreeBSD disk striper) for news Message-ID: <199608241631.LAA28292@brasil.moneng.mei.com> In-Reply-To: <199608240708.AAA02076@MindBender.serv.net> from "Michael L. VanLoon" at Aug 24, 96 00:08:15 am
next in thread | previous in thread | raw e-mail | index | archive | help
> >Build it for speed and as close to zero latency as possible. Use more disks > >instead of less. Stripe lots of FAST 1GB disks - like the new Hawk 31055's > >- instead of going with larger drives. 2 9ms 1G disks are ALWAYS faster > >than 1 8ms 2G disk, and the price is similar! Go with more SCSI busses. > >NCR controllers are $60 apiece. Get three, and a 10/100 PCI Ethernet > >controller, and you're still only putting out about $350 for your I/O > >controllers. > > Have you compared Adaptec 2940UW's with tagged-command-queuing enabled > to these? I found tagged-queuing to be a huge win in some benchmarks > I ran recently when comparing a BusLogic and Adaptec controller. Does > the NCR driver do tagged-command-queuing? # ncrcontrol T:L Vendor Device Rev Speed Max Wide Tags 0:0 SEAGATE ST31055N 0318 10.0 10.0 8 4 1:0 SEAGATE ST31055N 0318 10.0 10.0 8 4 2:0 SEAGATE ST31055N 0318 10.0 10.0 8 4 3:0 SEAGATE ST31055N 0318 10.0 10.0 8 4 4:0 SEAGATE ST31055N 0318 10.0 10.0 8 4 I believe that you can set the number; I haven't seen any reason yet to bother with it. > >Use a large stripe size. I use 1 cylinder group. You are not striping for > >bandwidth. You are striping for CONCURRENCY. You _want_ one mechanism to > >be able to handle an _entire_ file access on its own. > > Is this something you just deduced, or have you proven this under real > newsfeed conditions? It's something that some simple filesystem concurrency tests did favor, it's the traditional news wisdom, and if you think about it, it makes a lot of sense. Your traditional striping paradigm is designed to double the BANDWIDTH off the disks... i.e. combine two drives that peak out at 2.5MB/s to get an aggregate 5MB/s. This is done by a combination of small stripe sizes, the fact that the drive will tend to read ahead, and concurrent read requests. You end up with multiple mechanisms whose heads are moving closely in sync. This is stupid for news, where your average transaction is very small, and in reality what you want is not greater bandwidth, but greater transactions per second. You engineer for this by engineering your disk I/O subsystem for concurrency: if you open a particular file, you want (best case) ONE mechanism to do the directory lookup and data fetch for that file. This is hopeless in reality.. /news/comp/protocols/tcp-ip/domains/12345 because each directory will be in a different area of the disk, in general. So the best optimization you can make is hope that you can arrange for "domains/12345" to be accessed by a single mechanism, which you can do by setting a LARGE stripe size. Incidentally, you often end up getting a free ride for the "/news/comp/protocols/tcp-ip" portion, because a good amount of that is likely to be already cached by the system. Your terminal node directories (domains in this case) are the least likely to be cached, most likely to be read. You see how it works? :-) It ain't perfect but there's no obviously better solution unless you move to a news-specialized FS. > I'm still slightly skeptical -- I think I'd > start by trying smaller interleaves to increase the liklihood of > randomizing the drive accessed per file, going with maybe cluster size > (16K) up to a physical drive cylinder (~600K, probably) per > interleave. But, if you've done extensive testing (and only if you've > done extensive testing) of these alternatives, I'll take your advice > as the direction to go in. You have the right idea (randomize the drive accessed per file!!!!) but you have to remember that you often are forced to do that lookup in the "domains" directory, and then fetch the data. A smaller interleave means an increased likelihood that one mechanism will do the directory lookup and the other gets the data. This is inefficient because the first mechanism was already in the neighborhood and the second mechanism's time is being wasted. It is not up to me to convince you, however. Do your own tests and draw your own conclusions. Then go look with DejaNews through news.software.nntp for discussions of this in the past. > How many drives per controller, and controllers per machine would you > say is "optimum"? At $60 a controller I say stuff the machine with controllers and spread your disks out over them! (On a PCI system that means 3 SCSI controllers, this is better than your average Sun with its $800 SCSI controllers, so most Sun news servers have one or maybe two SCSI busses). It only costs you $120 more (two extras) to get one third the SCSI bus contention of using a single controller. At that price why bother figuring out if two or three is optimal... "just do it". Your drives then obviously get spread out among the busses. Note: I stripe _across_ busses because I intuitively believe that this may give me better response. > >Don't compromise on RAM. Stuff it. My feeds box has 128MB RAM. The > >readers have 256MB (we had some fun with that though). > > What special tricks did you need to do to FreeBSD to make it run in > 128MB of RAM? 128MB works fine with Triton-I and Triton-II. RTFMM (motherboard manual) for recommendations on RAM though. You then set options MAXMEM because your standard PC BIOS apparently reports memory > 64MB in some odd fashion that FreeBSD doesn't comprehend yet. > 256MB? Anything? We had a summerlong adventure with 256MB. You need a Triton-II board. If you really plan to do this, contact me in e-mail and I'll talk to some people and give you some more details. > Did I understand that you're running a 2.2 snapshot? Is there a > particular reason you're using this and not 2.1.5? I'm using 2.1.5R. I do not use snapshots on production systems. > Also, what ethernet card has given you the best results (specific > model, please)? I've used the Kingston PCI 10bT cheapies with great success, maybe the KNE40T but I don't recall the model # for sure (my supplier knows what I mean when I order one), the SMC EtherPower 10/100 (9332?) works great as well, I've seen these hooked up to a SynOptics switch and you can really shovel data around. > I'm going to be setting up a killer newsfeed-sucking machine at work > to do performance testing against, and I want to wring as much > performance as I can out of this box (It'll be a Dell OptiPlex P5 > 133MHz -- the rest is up to me). > > Any other tips you (or anyone else) would like to share? "The more, the merrier". That applies to every resource: RAM, disks, SCSI busses, etc. ... JG
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199608241631.LAA28292>