Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Oct 2012 08:54:51 +1100
From:      Peter Jeremy <peter@rulingia.com>
To:        nate keegan <nate.keegan@gmail.com>
Cc:        freebsd-hardware@freebsd.org
Subject:   Re: ahcich Timeouts SATA SSD
Message-ID:  <20121015215451.GE33428@server.rulingia.com>
In-Reply-To: <CABVjXffVSFvtgNfMX3BsHqDe-ntqC1rwPw2-HpPGgaoFG6js2w@mail.gmail.com>
References:  <CABVjXfeV9VvF6sJC3Tb78z=jP%2B2sF%2BOJ2q0euCZkNqN_Yjs9ag@mail.gmail.com> <20121015095858.GC33428@server.rulingia.com> <CABVjXfceHC3s0u6pMBWcPb1XqTrvVW52FN9G3A1Oh1F-UUVqNQ@mail.gmail.com> <CABVjXffVSFvtgNfMX3BsHqDe-ntqC1rwPw2-HpPGgaoFG6js2w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--lMM8JwqTlfDpEaS6
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2012-Oct-15 07:54:21 -0700, nate keegan <nate.keegan@gmail.com> wrote:
>The system is dual PSU behind a UPS so I don't think that this is an issue.

OK

>I do have a complete set of replacement memory (Crucial vs Kingston
>that is in the system now) and will swap out the memory in case one of
>the DIMMs is flaky but not poor enough for the BIOS to notice on a
>consistent basis.

I presume this is registered ECC RAM - which makes it more robust.
Non-ECC RAM can develop pattern-sensitive faults - which are virtually
impossible to test for.  And BIOS RAM 'tests' generally can't be
relied on to do much more than verify that something is responding.
Swapping RAM is the best way to rule out RAM issues.

>I am not able to drop into DDB when the issue happens as the system is
>locked up completely.

That's surprising.  I haven't seen a failure mode where the kernel
will respond to pings but not the console.

>Will get the output of gstat -a and post it up here.

"gstat -a" gives a dynamic picture of disk activity.  I was hoping
you could watch it for a minute or so (on a tall window) whilst
the system was running and see if any disks look odd - significantly
higher or lower than expected I/O volume or long ms/r or ms/w.

On 2012-Oct-15 10:21:06 -0700, nate keegan <nate.keegan@gmail.com> wrote:
>I took a look at the DDB man page and I am not able to do this when
>the issue happens as the system is completely blown up (meaning no
>keyboard input on IPMI console, existing SSH sessions, etc.

Note that I'm referring to ddb(4), not ddb(8).  The former is
entered via a "magic" key sequence on the console and should work
even if the system won't react to normal commands.  To enter ddb,
use Ctrl-Alt-ESC on a graphical console or the character sequence
CR ~ Ctrl-B on a serial console (in the latter case, the sysctl
debug.kdb.alt_break_to_debugger also needs to be set to 1).

If you do get into ddb, a useful set of initial commands is:
show all procs
show alllocks
show allpcpu
show lockedvnods
call doadump

Note that the first 4 commands will generate lots of output - ideally
you would have a serial console with logging.  The last command
generates a crashdump and needs 'dumpdev=3D"AUTO"' in /etc/rc.conf (run
"service dumpon start" after editing rc.conf to enable it without
rebooting).

>The amount of monkeying that I have had to do via /boot/loader.conf
>and the camcontrol script I run is telling me that the SSD, the
>firmware on the SSD, etc is somehow causing the issue as we have
>plenty of other FreeBSD 8.x and 9.x systems that use non-SSD SATA
>drives without this issue popping up in their daily workloads.

Are you able to move the SSD(s) to a different type of SATA port?  One
(not especially likely) possibility is it's an interaction between the
SSD and the SATA controller.

--=20
Peter Jeremy

--lMM8JwqTlfDpEaS6
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlB8hisACgkQ/opHv/APuIeFVQCfbV8Oj+V1KFHTq0mutiGBBWLl
kYcAnR7gP4OFXOzvUl8Y/ZIajZN1Wy9N
=qzoM
-----END PGP SIGNATURE-----

--lMM8JwqTlfDpEaS6--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121015215451.GE33428>