Date: Tue, 23 Oct 2012 12:45:12 -0700 From: nate keegan <nate.keegan@gmail.com> To: freebsd-hardware@freebsd.org Subject: Re: ahcich Timeouts SATA SSD Message-ID: <CABVjXffrpqhr-JwJb%2BKjzDGwjGEKFgXZVkrXvsKdRdkeHeL6xw@mail.gmail.com> In-Reply-To: <CABVjXfePQvNs8NZnUgO5ZCBT0dAcn1SfkihtCE1wQjwou-Oj7A@mail.gmail.com> References: <20121015203229.40280@gmx.com> <CABVjXfePQvNs8NZnUgO5ZCBT0dAcn1SfkihtCE1wQjwou-Oj7A@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Since replacing the SSD disks with good old plain SATA in external enclosures I have not experienced a single issue. I can only surmise that something is wonky with the Crucial M4 firmware with FreeBSD 8.2/9.0 under certain circumstances. Thanks to everyone who contributed on this as the information about debugging kernels, etc was very helpful from a procedural point of view. On Tue, Oct 16, 2012 at 12:48 PM, nate keegan <nate.keegan@gmail.com> wrote: > I'm only seeing gstat output of a few percentage points for the OS disks. > > I am using ECC memory (both the Kingston and the new Crucial memory) > and went ahead and swapped out the SSD for SATA disks this morning. > > Since both SSD were the same firmware and type/manufacturer I figured > it was a good time to address this variable. > > I also went ahead and put in a serial console server this morning so I > have proper console access instead of relying on the Supermicro iLO > utility. > > Will keep an eye on the pure SATA setup to see if it barfs or not. > Will try to gather some ddb(4) information if it does barf again. > > > On Mon, Oct 15, 2012 at 1:32 PM, Dieter BSD <dieterbsd@engineer.com> wrote: >>> SSD are connected to on-board SATA port on motherboard >> >> Presumably to controllers provided by the Intel Tylersburg 5520 chipset. >> >>> This system was commissioned in February of 2012 and ran without issue >>> as a ZFS backup system on our network until about 3 weeks ago. >> >>> The system is dual PSU behind a UPS so I don't think that this is an issue. >> >> No changes? e.g. no added hardware to increase power load. >> Overloading the power supply and/or the wiring (with too many splitters) >> can result in flaky problems like this. >> >>> OS will respond to ping requests after the issue and if you have an >>> active SSH session you will remain connected to the system until you >>> attempt to do something like 'ls', 'ps', etc. >> >>> I am not able to drop into DDB when the issue happens as the system is >>> locked up completely. Could be a failure on my part to >>> understand/engage in how to do this, will try if the issue happens >>> again (should on Wednesday AM unless setting camcontrol apm to off for >>> the disks somehow fixes the issue). >> >> If the system is alive enough to respond to ping, I'd expect you >> should be able to get into DDB? Can you get into DDB when the system >> is working normally? >> >>> 2 x Crucial M4 64 Gb SATA SSD for FreeBSD OS (zroot) >>> 2 x Intel 320 MLC 80 Gb SATA SSD for L2ARC and swap >> >>> I ran the Crucial firmware update ISO and it did not see any firmware >>> updates as necessary on the SSD disks. >> >> Does the problem happen with both the Crucial and the Intel SSDs? >> >>> If software I agree that it would not make sense that this would >>> suddenly pop-up after months of operation with no issues. >> >> If something causes the software/firmware to take a different >> path, new issues can appear. E.g. error handling or even timing. >> Infrequently used code paths might not have been tested sufficiently. >> >> Does the controller have firmware? Part of the BIOS I suppose. >> Is there a BIOS update available? Have you considered connecting the >> SSDs to a different controller? >> >>> the on-board AHCI portion of the BIOS does >>> not always see the disks after the event without a hard system power >>> reset. >> >> That's at least one bug somewhere, probably the hardware isn't getting reset >> properly. Does Supermicro know about this bug? >> >>> I have 48 Gb of Crucial memory that I will put in this system today to >>> replace the 24 Gb or so of Kingston memory I have in the system. >> >> Which in addition to being different memory, should reduce swap activity. >> >> Suggestion: move everything to conventional drives. Keep at least one >> SSD connected to system, but normally unused. Now you can beat on the >> SSD in a controlled manner to debug the problem. Does reading trigger >> the problem? Writing? Try dd with different blocksizes, accessing >> multiple SSDs at once, etc. I have to wonder if there is a timing problem, >> or missing interrupt, or... >> >>> * Ditch FreeBSD for Solaris so I can keep ZFS lovin for the intended >>> purpose of this system >> >> If it fails with FreeBSD but works with Solaris on the same hardware, >> then it is almost certainly a problem with the device driver. (Or >> at least a problem that Solaris has a workaround for.) >> _______________________________________________ >> freebsd-hardware@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hardware >> To unsubscribe, send any mail to "freebsd-hardware-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABVjXffrpqhr-JwJb%2BKjzDGwjGEKFgXZVkrXvsKdRdkeHeL6xw>