From owner-freebsd-questions@FreeBSD.ORG Thu Jun 23 08:15:04 2005 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8DD8A16A41C for ; Thu, 23 Jun 2005 08:15:04 +0000 (GMT) (envelope-from sandy@krvarr.bc.ca) Received: from szamoca.krvarr.bc.ca (szamoca.krvarr.bc.ca [142.179.111.232]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3A49943D4C for ; Thu, 23 Jun 2005 08:15:04 +0000 (GMT) (envelope-from sandy@krvarr.bc.ca) Received: from szamoca.krvarr.bc.ca (localhost [127.0.0.1]) by szamoca.krvarr.bc.ca (8.13.1/8.12.11) with ESMTP id j5N8EuR0036050; Thu, 23 Jun 2005 01:14:56 -0700 (PDT) (envelope-from sandy@szamoca.krvarr.bc.ca) Received: (from sandy@localhost) by szamoca.krvarr.bc.ca (8.13.1/8.12.11/Submit) id j5N8EucF036047; Thu, 23 Jun 2005 01:14:56 -0700 (PDT) (envelope-from sandy) From: Sandy Rutherford MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17082.28544.491826.813510@szamoca.krvarr.bc.ca> Date: Thu, 23 Jun 2005 01:14:56 -0700 To: "Ted Mittelstaedt" In-Reply-To: References: <17081.14522.350761.161301@szamoca.krvarr.bc.ca> X-Mailer: VM 7.07 under Emacs 21.3.1 X-krvarr.bc.ca-MailScanner-Information: Please contact postmaster@krvarr.bc.ca for more information. X-krvarr.bc.ca-MailScanner: Not scanned: please contact postmaster@krvarr.bc.ca for details. X-krvarr.bc.ca-MailScanner-From: sandy@szamoca.krvarr.bc.ca Cc: freebsd-questions@freebsd.org Subject: RE: Yet another RAID Question (YARQ) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2005 08:15:04 -0000 >>>>> On Wed, 22 Jun 2005 23:37:20 -0700, >>>>> "Ted Mittelstaedt" said: > Seagate wrote a paper on this titled: > "Seagate Technology Paper 338.1 Estimating Drive Reliability in > Desktop Computers and Consumer Electronic Systems" > that explains how they define MTBF. Basically, they define MTBF as > what percentage of disks will fail in the FIRST year. Is this in the public domain? I wouldn't mind having a look at it. > What they are saying is if you purchase 160 Cheetahs and run them at > 100% duty cycle for 1 year then there is 100% chance that 1 out of the > 160 will fail. > Thus, if you only purchase 80 disks and run them at 100% duty cycle for 1 > year, then you only have a 50% chance that 1 will fail. And so on. > Ain't statistics grand? You can make them say anything! For an encore > Seagate went on to prove that their CEO would live 3 centuries > by statistical grouping. :-) Now don't knock statistics. The problem does not lie with statistics, but with its misuse by people who do not understand what they are doing. No, I am not a statistician; however, I am a mathematician. > So, in getting back to the gist of what I was saying, the issue is > as you mentioned standard deviation. I think we all understand that > in a disk drive assembly line that it's all robotic, and that there > is an extremely high chance that disk drives that are within a few > serial numbers of each other are going to have virtually identical > characteristics. In fact I would say using the Seagate MTBF definition, > that 1 in every 160 drives manufactured in a particular run is going > to have a significant enough deviation to fail at a significantly > different > period of time, given identical workload. I am not so sure. If we were talking about can openers, I would agree. However, a disk drive is basically a mechanical object which performs huge numbers of mechanical actions over the course of a number of years. Even extremely minute variations in the physical characteristics of the materials could lead to substantive variations over time. However, the operative word here is "could". Real data is required. I tried to google for a relevant study, but came up empty. This surprised me as it seems like the sort of thing that masses of data should have been collected for. > In short you have better than 99% chance that if you install 2 brand > new Cheetahs that are from the same production run, they will have > virtually identical characteristics. And, failure due to wear is going > to be > very similar - there's only so many times the disk head can seek > before it's bearings are worn out - and your proposing to give them > the exact same usage. > I think the reason your seeing alternation is that the disks are > so damn fast that they complete their reads well before their internal > buffers have finished emptying themselves over the SCSI bus to the > array card. In other words, you wasted your money on your fast > disks, Not much money. After having been burned by failures of lower end drives, I bought high-end stuff on EBay. Made me nervous at the beginning, because who knows how many flights of stairs the drive bounced down before it was popped into the mail, and for that matter, who knows how many flights of stairs it bounced down while it was in the mail. However, so far it has worked out quite well. > if you had used slower disks you would see identical read performance > but you would see less alternative flickering > and more simultaneous and continuous activity. > If you got a faster array card you wouldn't see the alternative > flickering. > Or, it could be the PCI bus not being fast enough for the array card. It's almost certainly the PCI bus. The DAC1100, although not state-of-the-art, is still reasonably fast. It has 3 U2W channels and it could certainly max out my PCI bus. > Ah well, a computer just wouldn't be a computer without blinking > lights on it!!! ;-) Gotta agree there;-) Once upon a time I had the dip switch settings required to boot a PDP-11 from the front panel memorized, because I had to do it so often. Our data runs extended far beyond the typical uptime, so we did checkpoints by dumping the relevant bits of core to a teletype and I used to have to re-type in the data from the teletype when we brought it back up after a crash. Even on an old PDP-11, this took a while. We needed 3 months+ of uptime and we did well if we could keep that thing up for longer than a week. I became well-acquainted with those dip switches. Sandy