From owner-freebsd-current@FreeBSD.ORG Thu Oct 27 23:19:52 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6B1B3106564A for ; Thu, 27 Oct 2011 23:19:52 +0000 (UTC) (envelope-from ken@mthelicon.com) Received: from hercules.mthelicon.com (unknown [IPv6:2001:49f0:2023::2]) by mx1.freebsd.org (Postfix) with ESMTP id 30B5C8FC08 for ; Thu, 27 Oct 2011 23:19:52 +0000 (UTC) Received: from PortaPegIII (hydra.fletchermoorland.co.uk [78.33.209.59]) (authenticated bits=0) by hercules.mthelicon.com (8.14.5/8.14.3) with ESMTP id p9RNJO0C008557; Thu, 27 Oct 2011 23:19:25 GMT (envelope-from ken@mthelicon.com) From: "Pegasus Mc Cleaft" To: "'Alexander Kabaev'" , "'C. P. Ghost'" References: <20111008201456.GA3529@lexx.ifp.tuwien.ac.at> <20111017190027.GA9873@lexx.ifp.tuwien.ac.at> <20111018131353.GA83797@lexx.ifp.tuwien.ac.at> <649509EEAEBA42D4A3DCC1FDF5DA72E5@multiplay.co.uk> <20111025202755.4243ae74@kan.dyndns.org> <20111027185957.54ece0ad@kan.dyndns.org> In-Reply-To: <20111027185957.54ece0ad@kan.dyndns.org> Date: Fri, 28 Oct 2011 00:19:26 +0100 Message-ID: <005e01cc94fe$dfbe3390$9f3a9ab0$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcyU/Fg95Bo3loPLTbqK0RFUzIBBHAAANXrg Content-Language: en-gb X-Spam-Status: No, score=0.9 required=15.0 tests=BAYES_00,DOS_OUTLOOK_TO_MX, FSL_HELO_NON_FQDN_1 autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on hercules.mthelicon.com Cc: 'Alexey Shuvaev' , freebsd-current@freebsd.org Subject: RE: Panics after AHCI timeouts X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Oct 2011 23:19:52 -0000 >> If it's only one process, the machine (usually) doesn't hang, even >> when that process is copying big files back and forth for a long >> period of time (it's a backup process). But interleave that process >> with another one accessing the same disk, and poof!, almost >> immediately ahci timeouts. occur. Very strange... Maybe a race >> condition of some sort after all? >> > >No, I cannot say there is any specific correlation to IO load of the machine, >timeouts I saw happen randomly and seem almost always happen as system uptime >crosses two weeks boundary. I am suspecting Samsung firmware at this point. Now that's interesting as I use a mixture of Samsung, WD, and Seagate.. And I do believe the Samsungs tend to do this more. I see ACHI timeouts from time to time on my machine (10-Current AMD64) but normally only when I am doing something like a scrub. The machine has never panicked as a result of this, it normally just FAULTS the drive in the pool and keeps on going. At that point, doing a camcontrol rescan all does not bring the drive back into existence (it will normally just hang on that bus for 15-20 seconds and then carry on without identifying a drive). I have to pull the drive, let it spin down and then reinsert it. Once its reinserted, the drive comes back on the bus and I can online it again. The weird thing is this.. For me, it only ever seems to be when I am writing to the pool/disk. Pure reads don't seem to bother it. I don't really know at this point if the SATA ports have gone wonkey on the motherboard, or if the processor on the HD has crashed. I almost tend to believe it's the drive because camcontrol stops on that port almost as it if knows there is a link there, but can't talk to it. Peg