Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 Oct 2012 10:21:06 -0700
From:      nate keegan <nate.keegan@gmail.com>
To:        freebsd-hardware@freebsd.org
Subject:   Re: ahcich Timeouts SATA SSD
Message-ID:  <CABVjXffVSFvtgNfMX3BsHqDe-ntqC1rwPw2-HpPGgaoFG6js2w@mail.gmail.com>
In-Reply-To: <CABVjXfceHC3s0u6pMBWcPb1XqTrvVW52FN9G3A1Oh1F-UUVqNQ@mail.gmail.com>
References:  <CABVjXfeV9VvF6sJC3Tb78z=jP%2B2sF%2BOJ2q0euCZkNqN_Yjs9ag@mail.gmail.com> <20121015095858.GC33428@server.rulingia.com> <CABVjXfceHC3s0u6pMBWcPb1XqTrvVW52FN9G3A1Oh1F-UUVqNQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I took a look at the DDB man page and I am not able to do this when
the issue happens as the system is completely blown up (meaning no
keyboard input on IPMI console, existing SSH sessions, etc.

No changes have been seen in the ZFS load on the system. The nature of
this system (backup) is such that the heaviest load would be created
in the first week or so of going online as we use rsync to copy files
down from our Windows servers and during this first week or so the
system has to 'seed' the initial copies which would be much heavier on
I/O than after that first week where things are relatively constant in
terms of I/O.

I have 48 Gb of Crucial memory that I will put in this system today to
replace the 24 Gb or so of Kingston memory I have in the system. If
the issue happens again with the memory change I plan on replacing
both SSD (Crucial M4) with two non-SSD SATA disks with the idea that
maybe the Crucial firmware on the disks (002 on both disks) is the
culprit somehow.

It neither item turn out to solve the issue will move on to 9.1RC2 or
9.1-RELEASE if it is out by then and adding kernel options requested.

The amount of monkeying that I have had to do via /boot/loader.conf
and the camcontrol script I run is telling me that the SSD, the
firmware on the SSD, etc is somehow causing the issue as we have
plenty of other FreeBSD 8.x and 9.x systems that use non-SSD SATA
drives without this issue popping up in their daily workloads.

My /boot/loader.conf looks like this currently:

# Set in the BIOS as well to activate
ahci_load="YES"

# Should be auto-negotiation in FreeBSD 9.x
# See ahci(4)
hint.ahcich.0.sata_rev=1
hint.ahcich.1.sata_rev=1

hint.ahcich.0.pm_level=1
hint.ahcich.1.pm_level=1


And /usr/local/etc/rc.d/camcontrol:

#!/bin/sh
CAMCONTROL=/sbin/camcontrol

# Disable NCQ
$CAMCONTROL tags ada0 -N 1 > /dev/null
$CAMCONTROL tags ada1 -N 1 > /dev/null

# Disable APM
$CAMCONTROL cmd ada0 -a "EF 85 00 00 00 00 00 00 00 00 00 00" > /dev/null
$CAMCONTROL cmd ada1 -a "EF 85 00 00 00 00 00 00 00 00 00 00" > /dev/null

Without both of these shims in place I get maybe 1.5 hours to two
hours or so before the system goes kablooie and that is without the
system doing any real I/O work just running FreeBSD during the
business day and a few scripts from cron to check for data and shuffle
it around.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABVjXffVSFvtgNfMX3BsHqDe-ntqC1rwPw2-HpPGgaoFG6js2w>