From owner-freebsd-current@FreeBSD.ORG Mon Oct 25 22:43:38 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 027A816A4CE for ; Mon, 25 Oct 2004 22:43:38 +0000 (GMT) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6C14843D45 for ; Mon, 25 Oct 2004 22:43:37 +0000 (GMT) (envelope-from scottl@freebsd.org) Received: from [192.168.254.11] (junior-wifi.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.12.11/8.12.10) with ESMTP id i9PMiYG7078287; Mon, 25 Oct 2004 16:44:34 -0600 (MDT) (envelope-from scottl@freebsd.org) Message-ID: <417D812F.1040404@freebsd.org> Date: Mon, 25 Oct 2004 16:41:51 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.2) Gecko/20040929 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Charles Swiger References: <14479.1098695558@critter.freebsd.dk> <417D25E8.6080804@ng.fadesa.es> <200410251928.01536.victor@alf.dyndns.ws> <"200410251837.58257.Thoma s.Sparrev ohn"@btinternet.com> <417D3F12.20302@DeepCore.dk> <417D40A1.9030802@ng.fadesa.es> <417D45F1.9090504@freebsd.org> <77F3FD4D-26BE-11D9-9A2F-003065ABFD92@mac.com> <417D65F1.2040809@freebsd.org> <417D6F4C.9000404@freebsd.org> <64029B30-26D2-11D9-9A2F-003065ABFD92@mac.com> In-Reply-To: <64029B30-26D2-11D9-9A2F-003065ABFD92@mac.com> X-Enigmail-Version: 0.86.1.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, hits=0.0 required=3.8 tests=none autolearn=no version=2.63 X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on pooker.samsco.org cc: freebsd-current@freebsd.org Subject: Re: FreeBSD 5.3b7and poor ata performance X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Oct 2004 22:43:38 -0000 Charles Swiger wrote: > On Oct 25, 2004, at 5:39 PM, Brad Knowles wrote: > >> At 3:25 PM -0600 2004-10-25, Scott Long wrote: >> >>> But as was said, there is always >>> a performance vs. reliability tradeoff. >> >> >> Well, more like "Pick two: performance, reliability, price" ;) > > > That sounds familiar. :-) > > If you prefer... ...consider using: > ---------------------------------------------- > performance, reliability: RAID-1 mirroring > performance, cost: RAID-0 striping > reliability, performance: RAID-1 mirroring (+ hot spare, if possible) > reliability, cost: RAID-5 (+ hot spare) > cost, reliability: RAID-5 > cost, performance: RAID-0 striping It's more complex than that. Are you talking software RAID, PCI RAID, or external RAID? That affects all three quite a bit. Also, how do you define reliability? Do you verify reads on RAID-1 and 5? Also, what about error recovery? > >>> And when you are talking about RAID-10 with a bunch of disks, you >>> will indeed start seeing bottlenecks in the bus. >> >> >> When you're talking about using a lot of disks, that's going to be >> true for any disk subsystem that you're trying to get a lot of >> performance out of. > > > That depends on your hardware, of course. :-) > > There's a Sun E450 with ten disks over 5 SCSI channels in the room next > door: one UW channel native on the MB, and two U160 channels apiece from > two dual-channel cards which come with each 8-drive-bay extender kit. > It's running Solaris and DiskSuite (ODS) now, but it would be > interesting to put FreeBSD on it and see how that does, if I ever get > the chance. > >> The old rule was that if you had more than four disks per channel, >> you were probably hitting saturation. I don't know if that specific >> rule-of-thumb is still valid, but I'd be surprised if disk controller >> performance hasn't roughly kept up with disk performance over time. > > > That rule dates back to the early days of SCSI-2, where you could fit > about four drives worth of aggregate throughput over a 40Mbs ultra-wide > bus. The idea behind it is still sound, although the numbers of drives > you can fit obviously changes whether you talk about ATA-100 or SATA-150. > The formula here is simple: ATA: 2 SATA: 1 So the channel transport starts becoming irrlevant now (except when you talk about SAS and having bonded channels going to switches). The limiting factor again becomes PCI. An easy example is the software RAID cards that are based on the Marvell 8 channel SATA chip. It can drive all 8 drives at max platter speed if you have enough PCI bandwidth (and I've tested this recently with FreeBSD 5.3, getting >200 MB/s across 4 drives). However, you're talking about PCI-X-100 bandwidth at that point, which is not what most peole have in their desktop systems. And for reasons of reliability, I wouldn't consider software RAID to be something that you would base your server-class storage on other than to mirror the boot drive so a failure there doesn't immediately bring you down. Anyways, it sounds like the original poster found that at least part of the problem was due to local ATA problems. In the longer term, I'd like to see people who care about performance focus on things like I/Os per second, not raw bandwidth. As I mentioned above, I've seen that a software RAID driver on FreeBSD can sustain line rate with the drives on large transfers. That would make sense because the overhead to set up the DMA is dwarfed in comparison to the time to do the DMA. I'd also like to see more 'apples-to-apples' comparisons. It doesn't mean a whole lot to say, for example, that software RAID on SCSI doesn't perform as well as a single ATA drive, regardless of how 'common sense' this argument might sound. The performance characteristics of ATA and SCSI really are quite different. With SCSI you get the ability to do lots of parallel request via tagged queueing, and ATA just can't touch that. With ATA you tend to get large caches and agressive read-ahead, so sequential performance is always good. In my opinion these qualities can have a detrimental impact on reliability, but again my focus has always been on reliability first. What is interesting is measuring how many single-sector transfers can be done per second and how much CPU that consumes. I used to be able to get about 11,000 io/s on an aac card on a 5.2-CURRENT system from last winter. Now I can only get about 7,000. I not sure where the problem is yet, unfortunately. I'm using KSE pthreads to generate a lot of parallel requests with as little overhead as possible, so maybe something there has changed, or maybe something in the I/O path above the driver has changed, or maybe something in interrupt handling or shceduling has changed. It would be interesting to figure this out since this definitenly shows a problem. Scott