From owner-freebsd-hardware@FreeBSD.ORG Tue Oct 23 19:45:13 2012 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F0920DB5 for ; Tue, 23 Oct 2012 19:45:13 +0000 (UTC) (envelope-from nate.keegan@gmail.com) Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 9E8F58FC19 for ; Tue, 23 Oct 2012 19:45:13 +0000 (UTC) Received: by mail-vb0-f54.google.com with SMTP id v11so5887990vbm.13 for ; Tue, 23 Oct 2012 12:45:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=npRg0R0PYckWZZONA6BaXH2i37L/bxDIokPuP6RMgmQ=; b=tjIWas0w10XiUQ0F9faWzSrHKuJknBbyka9Mb8mXp6FOpQRQF17EuTAUGLBrD7UeZu m61UvpImtlfM8rXr7XLMCY+lnDwOT0Qv+CTEzz83L8UrNXaVONmB0KHDUAfmAdaBUhKc 87XICxE7teULXOyboo8ceZh0Hny9wWYfwjY1zAMULcIXcSXgDrofTiYCKx5cLmbq2uOA 49lN2tMKAGUA5Cww8fF3Hskb0xOcNHlE+N/fhTnAABg3kVyXlJMcivJwiJ9PukS2g+Kj CB+u9wyE5qBoACOg/BY6GxFhpsXBPerdQHidvId3j5NVV1IwsItARU6lzgDSSpGnJ0gY R0Dw== MIME-Version: 1.0 Received: by 10.52.89.146 with SMTP id bo18mr18021542vdb.33.1351021512558; Tue, 23 Oct 2012 12:45:12 -0700 (PDT) Received: by 10.58.240.42 with HTTP; Tue, 23 Oct 2012 12:45:12 -0700 (PDT) In-Reply-To: References: <20121015203229.40280@gmx.com> Date: Tue, 23 Oct 2012 12:45:12 -0700 Message-ID: Subject: Re: ahcich Timeouts SATA SSD From: nate keegan To: freebsd-hardware@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Oct 2012 19:45:14 -0000 Since replacing the SSD disks with good old plain SATA in external enclosures I have not experienced a single issue. I can only surmise that something is wonky with the Crucial M4 firmware with FreeBSD 8.2/9.0 under certain circumstances. Thanks to everyone who contributed on this as the information about debugging kernels, etc was very helpful from a procedural point of view. On Tue, Oct 16, 2012 at 12:48 PM, nate keegan wrote: > I'm only seeing gstat output of a few percentage points for the OS disks. > > I am using ECC memory (both the Kingston and the new Crucial memory) > and went ahead and swapped out the SSD for SATA disks this morning. > > Since both SSD were the same firmware and type/manufacturer I figured > it was a good time to address this variable. > > I also went ahead and put in a serial console server this morning so I > have proper console access instead of relying on the Supermicro iLO > utility. > > Will keep an eye on the pure SATA setup to see if it barfs or not. > Will try to gather some ddb(4) information if it does barf again. > > > On Mon, Oct 15, 2012 at 1:32 PM, Dieter BSD wrote: >>> SSD are connected to on-board SATA port on motherboard >> >> Presumably to controllers provided by the Intel Tylersburg 5520 chipset. >> >>> This system was commissioned in February of 2012 and ran without issue >>> as a ZFS backup system on our network until about 3 weeks ago. >> >>> The system is dual PSU behind a UPS so I don't think that this is an issue. >> >> No changes? e.g. no added hardware to increase power load. >> Overloading the power supply and/or the wiring (with too many splitters) >> can result in flaky problems like this. >> >>> OS will respond to ping requests after the issue and if you have an >>> active SSH session you will remain connected to the system until you >>> attempt to do something like 'ls', 'ps', etc. >> >>> I am not able to drop into DDB when the issue happens as the system is >>> locked up completely. Could be a failure on my part to >>> understand/engage in how to do this, will try if the issue happens >>> again (should on Wednesday AM unless setting camcontrol apm to off for >>> the disks somehow fixes the issue). >> >> If the system is alive enough to respond to ping, I'd expect you >> should be able to get into DDB? Can you get into DDB when the system >> is working normally? >> >>> 2 x Crucial M4 64 Gb SATA SSD for FreeBSD OS (zroot) >>> 2 x Intel 320 MLC 80 Gb SATA SSD for L2ARC and swap >> >>> I ran the Crucial firmware update ISO and it did not see any firmware >>> updates as necessary on the SSD disks. >> >> Does the problem happen with both the Crucial and the Intel SSDs? >> >>> If software I agree that it would not make sense that this would >>> suddenly pop-up after months of operation with no issues. >> >> If something causes the software/firmware to take a different >> path, new issues can appear. E.g. error handling or even timing. >> Infrequently used code paths might not have been tested sufficiently. >> >> Does the controller have firmware? Part of the BIOS I suppose. >> Is there a BIOS update available? Have you considered connecting the >> SSDs to a different controller? >> >>> the on-board AHCI portion of the BIOS does >>> not always see the disks after the event without a hard system power >>> reset. >> >> That's at least one bug somewhere, probably the hardware isn't getting reset >> properly. Does Supermicro know about this bug? >> >>> I have 48 Gb of Crucial memory that I will put in this system today to >>> replace the 24 Gb or so of Kingston memory I have in the system. >> >> Which in addition to being different memory, should reduce swap activity. >> >> Suggestion: move everything to conventional drives. Keep at least one >> SSD connected to system, but normally unused. Now you can beat on the >> SSD in a controlled manner to debug the problem. Does reading trigger >> the problem? Writing? Try dd with different blocksizes, accessing >> multiple SSDs at once, etc. I have to wonder if there is a timing problem, >> or missing interrupt, or... >> >>> * Ditch FreeBSD for Solaris so I can keep ZFS lovin for the intended >>> purpose of this system >> >> If it fails with FreeBSD but works with Solaris on the same hardware, >> then it is almost certainly a problem with the device driver. (Or >> at least a problem that Solaris has a workaround for.) >> _______________________________________________ >> freebsd-hardware@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hardware >> To unsubscribe, send any mail to "freebsd-hardware-unsubscribe@freebsd.org"