From owner-freebsd-current@FreeBSD.ORG Sun Jan 27 18:44:04 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id ADC8F485; Sun, 27 Jan 2013 18:44:04 +0000 (UTC) (envelope-from prvs=1739a0aae4=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 17EA3FC1; Sun, 27 Jan 2013 18:44:03 +0000 (UTC) Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50001881137.msg; Sun, 27 Jan 2013 18:44:01 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Sun, 27 Jan 2013 18:44:01 +0000 (not processed: message from valid local sender) X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1739a0aae4=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <917933DB5C9A490D93A739058C2507A1@multiplay.co.uk> From: "Steven Hartland" To: "Vladislav Prodan" References: <16B555759C2041ED8185DF478193A59D@multiplay.co.uk> <93308.1359297551.14145052969567453184@ffe15.ukr.net> <13391.1359029978.3957795939058384896@ffe16.ukr.net> <221B307551154F489452F89E304CA5F7@multiplay.co.uk> <70362.1359299605.3196836531757973504@ffe11.ukr.net> Subject: Re: AHCI timeout when using ZFS + AIO + NCQ Date: Sun, 27 Jan 2013 18:44:37 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: current@freebsd.org, fs@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Jan 2013 18:44:04 -0000 ----- Original Message ----- From: "Vladislav Prodan" To: "Steven Hartland" Cc: ; Sent: Sunday, January 27, 2013 3:13 PM Subject: Re[2]: Re[2]: AHCI timeout when using ZFS + AIO + NCQ > > >> ----- Original Message ----- >> From: "Vladislav Prodan" >> >> >> Is it always the same disk, of so replace it SMART helps identify issues >> >> but doesn't tell you 100% there's no problem. >> > >> > >> > Now it has fallen off a different HDD - ada0. >> > I'm 99% sure that MHDD will not find problems in HDD - ada0 and ada2. >> > I still have three servers with similar chipsets that have similar problems >> > with blade ahci times out. >> >> I notice your disks are connecting at SATA 3.x, which rings bells. We had >> a very similar issue on a new Supermicro machine here and after much >> testing we proved to our satisfaction that the problem was the HW. > > > I have a motherboard ASUS M5A97 PRO > http://www.asus.com/Motherboard/M5A97_PRO/#specifications > Has replacement SATA data cables. > Putting hard RAID controller does not guarantee data recovery at his death. Not sure what that has to do with cable / track lengths via things like a backplane? Do you or do you not have a hotswap backplane? >> Essentially the combination of SATA 3 speeds the midplane / backplane >> degraded the connection between the MB and HDD enough to cause >> the disks to randomly drop when under load. >> >> If we connected the disks directly to the MB with SATA cables the >> problem went away. In the end we had midplanes changed from an >> AHCI pass-through to active LSI controller. >> >> So if you have any sort of midplane / backplane connecting your disks >> try connecting them direct to the MB / controller via known SATA 3.x >> compliant cables and see if that stops the drops. >> >> Another test you can do is to force the disks to connect at SATA 2.x >> this also fixed it in our case, but wasn't something we wanted to >> put into production hence the controller swap. >> >> To force SATA 2 speeds you can use the following in /boot/loader.conf >> where 'X' is disk identifier e.g. for ada0 X = 0:- >> hint.ahcich.X.sata_rev=2 This is still worth trying as it could still indicate a problem with your controller, cables or disks. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.