Date: Mon, 15 Oct 2012 07:54:21 -0700 From: nate keegan <nate.keegan@gmail.com> To: freebsd-hardware@freebsd.org Subject: Re: ahcich Timeouts SATA SSD Message-ID: <CABVjXfceHC3s0u6pMBWcPb1XqTrvVW52FN9G3A1Oh1F-UUVqNQ@mail.gmail.com> In-Reply-To: <20121015095858.GC33428@server.rulingia.com> References: <CABVjXfeV9VvF6sJC3Tb78z=jP%2B2sF%2BOJ2q0euCZkNqN_Yjs9ag@mail.gmail.com> <20121015095858.GC33428@server.rulingia.com>
next in thread | previous in thread | raw e-mail | index | archive | help
The system is dual PSU behind a UPS so I don't think that this is an issue. My notes show that we replaced one of the DIMMs on this system a few months ago as it was detected as bad during a POST. During the cycle of reboots that I have taken on with testing resolutions to this issue I have seen a single time where the BIOS detected a bad DIMM but only one time. I do have a complete set of replacement memory (Crucial vs Kingston that is in the system now) and will swap out the memory in case one of the DIMMs is flaky but not poor enough for the BIOS to notice on a consistent basis. I am not able to drop into DDB when the issue happens as the system is locked up completely. Could be a failure on my part to understand/engage in how to do this, will try if the issue happens again (should on Wednesday AM unless setting camcontrol apm to off for the disks somehow fixes the issue). I am running GENERIC kernel and have not set any loader tunables or sysctls other than that related to addressing this issue (SATA power management, AHCI, etc). The problem first started around the time when we setup pool scrubbing and at that time it was a single instance which seemed to be tied to the bad DIMM. Have not run pool scrubbing since that time. Will get the output of gstat -a and post it up here. Will upgrade to FreeBSD 9.1RC2 today and compile kernel with the options you suggested. I already went ahead and removed the L2ARC and one of the OS SSD drives to simplify things - now I have 1 x SSD with OS and 1 x SSD for swap and that is it. I ran the Crucial firmware update ISO and it did not see any firmware updates as necessary on the SSD disks. I appreciate the feedback as part of the difficulty here has been making a determination of whether this is software/driver or hardware. If software I agree that it would not make sense that this would suddenly pop-up after months of operation with no issues. > Are you running a GENERIC kernel? If not, what changes have you made? > Have you set any loader tunables or sysctls? > Have you scrubbed the pools? > If you run "gstat -a", do any devices have anomolous readings? > > I can't offer any definite fixes but can suggest a few more things to > try: > 1) Try FreeBSD-9.1RC2 and see if the problem persists. > 2) Try a new kernel with > options WITNESS > options WITNESS_SKIPSPIN > this may make a software bug more obvious (but will somewhat increase > kernel overheads) > 3) If you can afford it, detach the L2ARC - which removes one potential issue. > 4) If you haven't already, build a kernel with > makeoptions DEBUG=-g > options KDB > options KDB_TRACE > options KDB_UNATTENDED > options DDB > this won't have any impact on normal operation but will simplify debugging.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABVjXfceHC3s0u6pMBWcPb1XqTrvVW52FN9G3A1Oh1F-UUVqNQ>
