From owner-freebsd-hackers Sat Apr 22 21:05:06 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id VAA12779 for hackers-outgoing; Sat, 22 Apr 1995 21:05:06 -0700 Received: from cs.weber.edu (cs.weber.edu [137.190.16.16]) by freefall.cdrom.com (8.6.10/8.6.6) with SMTP id VAA12769 for ; Sat, 22 Apr 1995 21:05:05 -0700 Received: by cs.weber.edu (4.1/SMI-4.1.1) id AA10300; Sat, 22 Apr 95 21:55:59 MDT From: terry@cs.weber.edu (Terry Lambert) Message-Id: <9504230355.AA10300@cs.weber.edu> Subject: Re: large filesystems/multiple disks [RAID] To: rgrimes@gndrsh.aac.dev.com (Rodney W. Grimes) Date: Sat, 22 Apr 95 21:55:59 MDT Cc: jgreco@brasil.moneng.mei.com, freebsd-hackers@FreeBSD.org In-Reply-To: <199504221758.KAA02014@gndrsh.aac.dev.com> from "Rodney W. Grimes" at Apr 22, 95 10:58:16 am X-Mailer: ELM [version 2.4dev PL52] Sender: hackers-owner@FreeBSD.org Precedence: bulk [ ... striping code ... ] > > Did you ever make any progress on this? If not, I will (try to) look at > > it, but I'd prefer that somebody that knows what the heck they're doing down > > within the device driver code putz with it.. :-) > > Yes, I played with that code (infact I have a kernel with /dev/ilv in > it). I never made it work completly. Then I remeber the sys/dev/cd.c > driver that came with 4.4 Lite and went and looked at it. I also have > that working (renamed to concat.c to elimanate the conflict) partially, > enough to say that I took 2 4MB/sec drives and interleaved them and > got a 5.2MB/sec transfer rate for reads (I can't write due to bugs) > *with out* spindle sync. This is nearly spot-on the theoretical performance of 5.3333 for two devices replacing a single device with a 100% random distribution of stripes between the media (assuming the 4MB/S number and 5.2MB/S number are correct). Congradulations! For anyone that's interested, the expected speed up is +33% for two units with N (N>=two) outstanding operations or 79% for three units with N (N>=three) outstanding operations. > I have done a bunch of aggregate bandwidth testing now using from 1 to > 4 NCR810 SCSI controllers on a P54C-90 and found I can actually hit > 12-14MB/sec using 4MB/sec drives. We seem to have a bottleneck in > the ncr.c driver when trying to run multiple drives on one controller. > I have run single drives on that controler at 6.6MB/sec, but two 4MB > drives only get 5.3MB/sec. This would be indicative of command queuing not working quite as expected, or a maximum of two outstanding requests simultaneously (5.3 is the maximum, which you'd expect if you didn't have drive interleave latency to consider. > My first pass through concat.c was a ``mechanical conversion, just make > the bloody thing compile and do *something*''. I am now onto the task > of actually going through it and cleaning it up to work correctly. I don't think you will get better than your first shot for random I/O. Unfortunately, I have an algorithm for calculating expected performance using equivalent drive/interface combinations, but it only works for the number of requests to satisfy being less than or equal to the number of disks. > > Having recently seen Solaris' Online: DiskSuite, which suffers from fairly > > significant performance degradations, I'm curious to see what a real > > operating system can do. ;-) > > It will be at least another week, but you'll now I have made serious > progress when you see a cvs commit message for the import of > sys/dev/concat. For truly random strip placement, there will be a potential for performance degradation based on the file system mechanism used to address the blocks themselves, and whether it is a high percentage of the overhead on the attempted I/O. The other consideration is that you are not typically going to see the performance increase unless you either split the drives between SCSI controllers or actually get command queueing working. Typically, I would expect that spindle sync would do nothing for you unless your stripe lengths are on the order of single cluster size and you divide the actual rotational latency of the drive by the number of synced spindles before using it, and then scale it by the relative sync notification time added to the rotational period. Adjusting all this for possible ZBR variations in the effective rotational period based on distance from the spindle. Then use round-robin allocation of sequential blocks from disk to disk to ensure linear ordering of the distribution. Or, you could get complicated. 8^). Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.