From owner-freebsd-hackers Tue Mar 3 22:16:50 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id WAA02461 for freebsd-hackers-outgoing; Tue, 3 Mar 1998 22:16:50 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from sendero.simon-shapiro.org (sendero-fxp0.Simon-Shapiro.ORG [206.190.148.34]) by hub.freebsd.org (8.8.8/8.8.8) with SMTP id WAA02447 for ; Tue, 3 Mar 1998 22:16:36 -0800 (PST) (envelope-from shimon@sendero-fxp0.simon-shapiro.org) Received: (qmail 28856 invoked by uid 1000); 4 Mar 1998 06:23:33 -0000 Message-ID: X-Mailer: XFMail 1.3-alpha-021598 [p0] on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <19980303232444.59397@mcs.net> Date: Tue, 03 Mar 1998 22:23:32 -0800 (PST) Reply-To: shimon@simon-shapiro.org Organization: The Simon Shapiro Foundation From: Simon Shapiro To: Karl Denninger Subject: Re: SCSI Bus redundancy... Cc: Wilko Bulte , sbabkin@dcn.att.com, tlambert@primenet.com, jdn@acp.qiv.com, blkirk@float.eli.net, hackers@FreeBSD.ORG, grog@lemis.com Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 04-Mar-98 Karl Denninger wrote: ... > The probelm is getting enough cache to matter. > > We have a 25GB RAID 0+1 news spool. Its about half full right now, on > its > way upward (we keep tuning expiration times). There is basically ZERO > locality of reference by the readers, which means that you'd need at > least a > couple of GB of RAM to make any difference at all. Well known problem. Similar problems happen with RDBMS code in the CPU proper; How much L1/L2 cache can you throw at it. > Now the RAID adapter helps - a lot - by striping the writes and reads, > primarily. The ultra SCSI bus ends up being the controlling factor. Yup. And fast it is not. This is what I like about the DPT setup; I have three independant SCSI busses to work with. ... > Yep. The other problem is that a kernel RAID *cannot* do writeback > caching. > If it does, you're f*d if the power goes out or the OS goes down dirty. Yup. Layering and modularity and compartmentalizing take many shapes. > The standalones CAN do writeback, because they can have a battery on them > AND if the CPU dies they keep running. They are a true I/O Channel, in the mainframe tradition. > RAID 5, in particular, benefits enormously from writeback, as it allows > it to defer writes until an entire stripe is ready, which means no > read/compute/write cycle. This is a monstrous win for performance. I played, on the DPT with RAID{0,1,5} stripe size vs. perfromance. The numbers really move around. I used to even know how to compute this stuff... ... > The best I've seen off our RAID systems right now is about 11MB/sec > (that's > megaBYTES, not bits). That's on an Ultra bus, with 2 ultra busses going > to > the RAID disks. About right. SCSI-II used to be 4-5 MB/bus. Ultra-wide is about 5-6, for small O/S-type blocks. I see about 18 MB/Sec on the DPT on three busses. The difficulty is in having FreeBSD capable of producing this traffic on small blocks (dd if=/dev/zero of=/dev/something bs=64k is NOT typical application). > Neither the disk buses nor the RAID controller CPU are saturated. I > believe this is pretty much the wall on one SCSI channel, at least with > 16 SCBs. I'm going to try it with SCBPAGING turned on and see if that > helps, but for sequential reads it probably won't matter much. Hook up a DPT to one of these boxes. Will be interesting to see what happens. Seriously. > I could run two host channels on this thing across two RAID sets into two > Adaptec adapters. That might be a big win. > > I suspect the bottleneck is in the AIC code at this point, or the bus > itself, or the interrupt latency on the DMA completion is killing me. > There is no appreciable difference between running at 40MB/sec (ultra > full-bore) and 20MB/sec, indicating that perhaps the hold-up is in the > Adaptec microcode, driver, and/or the Adaptec/PCI bus interface. I read the AIC code quite carefully when writing the DPT code. There is nothing obviously wrong with it. Justin is a very careful engineer. It is either the sequencer itself, or the SCSI layer, or FreeBSD. To get a DPT to saturate, you need about (with 4KB transfers, random across the entire array), 1,900 transactions per second. To reach 1,740 or so, from userspace, I have to run about 256 copies of st.c. The LA is about 220 at this point. Not very good for interactive work. Trying the same with SMP either crashes, or goes down to about 880 TPS. Simon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message