From owner-freebsd-stable@FreeBSD.ORG Fri Feb 25 11:05:45 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5FE7A16A4CE for ; Fri, 25 Feb 2005 11:05:45 +0000 (GMT) Received: from obh.snafu.de (obh.snafu.de [213.73.92.34]) by mx1.FreeBSD.org (Postfix) with ESMTP id ACDF243D2F for ; Fri, 25 Feb 2005 11:05:44 +0000 (GMT) (envelope-from ob@gruft.de) Received: from ob by obh.snafu.de with local (Exim 4.44 (FreeBSD)) id 1D4dHz-000K9k-RX for freebsd-stable@freebsd.org; Fri, 25 Feb 2005 12:05:43 +0100 Date: Fri, 25 Feb 2005 12:05:43 +0100 From: Oliver Brandmueller To: freebsd-stable@freebsd.org Message-ID: <20050225110543.GA70464@e-Gitt.NET> Mail-Followup-To: freebsd-stable@freebsd.org References: <20050223204202.31797.qmail@web11604.mail.yahoo.com> <421DFC73.7060602@cs.tu-berlin.de> <013301c51a8b$ed2ad230$0300000a@Uzi> <7f6570e5997b376fb8f75d812d386264@elhombre.us> <6.2.1.2.0.20050224153015.02be6df0@64.7.153.2> <20050224233122.GD41951@e-Gitt.NET> <20050225083339.GA5014@aoi.wolfpond.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20050225083339.GA5014@aoi.wolfpond.org> User-Agent: Mutt/1.5.8i Sender: Oliver Brandmueller Subject: Re: SATA RAID Support X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Feb 2005 11:05:45 -0000 Hi. On Fri, Feb 25, 2005 at 09:33:39AM +0100, Francois Tigeot wrote: > On Fri, Feb 25, 2005 at 12:31:22AM +0100, Oliver Brandmueller wrote: > > We had problems here with 3ware + 72GB Raptors (10 krpm), so we moved to > > What sort of problems ? > I was planning to use some sort of 3Ware/Raptor combination with amd64 > -STABLE machines in the near future, and I am very interested by your > experience... Under heavy load (I/O load on the disks constantly over 200 tps, average at about 250 tps, peaks over 600 tps) a random drive disconnects from the RAID 10. After removing the drive from the config and rescanning the bus, the drive does not show up anymore. The only way to get the drive back is to unplug the drive (or switch the computer off, so that power is removed). After that there is no problem to rebuild the RAID with the drive. -> It's not reproducable. The error occurs under high load, sometimes three times a week, sometimes it does not happen in 3 months. -> It happens only with the Raptors. -> It's always a random drive, there's no drive, that disconnects more often -> It happens with 8506 and 9500 type of SATA 3ware Escalade -> It does not depend on the firmware of the controller, we tried different versions -> With the same drives, same OS, same motherboard, same drive bays but an ICP controller we never saw the error. -> FBSD 5.1-CURRENT up to 5-STABLE as of mid january What we did not yet try: - other OS - other drives (in fact, the raptors are the only SATA drives with 10 krpm available - or at least were when we bought the machines). slower drives are not an options here. We did not see this dureing testing, but the testing phase was very short (only 2 weeks). During the tests we let dd's run, bonnie++ and different other things, but none of the usual tools obviously put enough load constantly on the disks. The machines are spamfilters. As long as we have more machines working (meaning lower workload for each machine) or the load goes down due to other reasons, the errors don't occur anymore (we almost never see a failed drive on a weekend, but during the week between 10 and 12 local time we see it more often). So I guess, that most people won't see this error in their setups, especially when they need the disk performance only during peaks. My experience with the ICP Vortex controllers is very well up to now. They are fast and the management software is very comfortable. The only thing I'm missing is the simplicity of tw_cli (the management tool for the 3wares), which allowed to request status of the RAID by a simple script. The ICP software ("srcd") is more flexible, but only gives you the opportunity to execute a program on an event or send an SNMP trap. Both is fine, but is a little bit more complicated to include in nagios for example. - Oliver -- | Oliver Brandmueller | Offenbacher Str. 1 | Germany D-14197 Berlin | | Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: http://the.addict.de/ | | Ich bin das Internet. Sowahr ich Gott helfe. | | Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! |