Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Feb 2010 15:35:46 -0800
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        freebsd-stable@freebsd.org
Subject:   Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64
Message-ID:  <20100220233546.GA36973@icarus.home.lan>
In-Reply-To: <20100220224959.c424dd9e.torfinn.ingolfsen@broadpark.no>
References:  <20100131144217.ca08e965.torfinn.ingolfsen@broadpark.no> <20100131175639.86ba9aee.torfinn.ingolfsen@broadpark.no> <20100207163631.da7205fc.torfinn.ingolfsen@broadpark.no> <20100213192404.5e15b5eb.torfinn.ingolfsen@broadpark.no> <20100217091625.d0e74570.torfinn.ingolfsen@broadpark.no> <20100220202108.e1dd1b74.torfinn.ingolfsen@broadpark.no> <20100220193718.GA33214@icarus.home.lan> <20100220224959.c424dd9e.torfinn.ingolfsen@broadpark.no>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Feb 20, 2010 at 10:49:59PM +0100, Torfinn Ingolfsen wrote:
> On Sat, 20 Feb 2010 11:37:18 -0800
> Jeremy Chadwick <freebsd@jdc.parodius.com> wrote:
> 
> > Can you re-run smartctl -a instead of -H?  Some of the SMART attributes
> > may help determine what's going on, or there may be related errors in
> > the SMART error log.
> 
> smartctl -a output attached. Test sequence: ad4 - ad12, ada0.

Most of your disks look to be in decent shape.  Well, that is to say,
all of them should be working fine; I don't see anything that's of
major, or even minor concern.  Others might focus on Attributes 191 or
195, but neither of those are absurdly high given the number of hours
these disks have been in use (see Attribute 9).

> > Otherwise I'd say what's happening is a SATA controller lock-up of some
> > sort, since it happens on any of your channels.  Could be a quirk of
> > some kind in the SATA->CAM stuff (unless it also happens when using pure
> > ata(4)).
> 
> I am running a quite recent 8.0-stable:
> root@kg-f2# uname -a
> FreeBSD kg-f2.kg4.no 8.0-STABLE FreeBSD 8.0-STABLE #2: Sun Jan 31 18:39:17 CET 2010     root@kg-f2.kg4.no:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> Perhaps I should upgrade.
> 
> > What controller are these disks hooked to again?
> 
> Six  of the disks (ad4, ad6, ad8, ad10, ad12) are connected to the SATA ports on the motherboard:
> root@kg-f2# pciconf -lv | grep ata -A 4
> atapci0@pci0:0:17:0:	class=0x010601 card=0xb0021458 chip=0x43911002 rev=0x00 hdr=0x00
>     vendor     = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.'
>     device     = 'SB700 SATA Controller [AHCI mode]'
>     class      = mass storage
>     subclass   = SATA

Let's backtrack a bit.  I've gone back and read through all of your
previous posts on this matter, and so far all the problems are happening
on ata5 and ata6.  No timeouts or anomalies have appeared on any other
ports -- just those two.  The kernel error messages indicate that
commands submit to the controller took longer than 10 seconds to get a
response, so the OS does a force-reset of the ports in attempt to get
things working again.

We can safely rule out the Silicon Image controller (otherwise "ataX"
wouldn't be involved), which leaves the AMD SB700 SATA controller and
the AMD SB700 PATA controller.

What exact disks (e.g. adX) are attached to ata5 and ata6?  You haven't
provided dmesg output in any of your posts, and atacontrol/pciconf is
not sufficient (I should really improve atacontrol by printing this
information.  I'll work on that in a few minutes).

Some Linux users have reported AHCI-related issues with the SB600
southbridge, but the core of the problem turned out to be MSI on certain
AMD northbridges (specifically RS480, RS400, and RS200).  By disabling
MSI entirely they were able to achieve stability.  The FreeBSD
equivalent would be to set the following in loader.conf and reboot:

hw.pci.enable_msix="0"
hw.pci.enable_msi="0"

The Linux quirk fix for this:

http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob_plain;f=queue-2.6.21/pci-quirks-disable-msi-on-rs400-200-and-rs480.patch;hb=05ab505f2909acf3a614d3e6a32271c4c1f8a69d

Your board has an AMD 740G northbridge, but it might be worth trying the
MSI disable trick anyway.  If it doesn't fix the problem then definitely
re-enable MSI.  Isn't hardware fun?  ;-)

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100220233546.GA36973>