Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Feb 2010 11:37:18 -0800
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        freebsd-stable@freebsd.org
Subject:   Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64
Message-ID:  <20100220193718.GA33214@icarus.home.lan>
In-Reply-To: <20100220202108.e1dd1b74.torfinn.ingolfsen@broadpark.no>
References:  <20100131144217.ca08e965.torfinn.ingolfsen@broadpark.no> <20100131175639.86ba9aee.torfinn.ingolfsen@broadpark.no> <20100207163631.da7205fc.torfinn.ingolfsen@broadpark.no> <20100213192404.5e15b5eb.torfinn.ingolfsen@broadpark.no> <20100217091625.d0e74570.torfinn.ingolfsen@broadpark.no> <20100220202108.e1dd1b74.torfinn.ingolfsen@broadpark.no>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Feb 20, 2010 at 08:21:08PM +0100, Torfinn Ingolfsen wrote:
> Another day, another crash. 
> >From /var/log/messages:
> Feb 20 08:52:26 kg-f2 ntpd[58609]: time reset +1.169751 s
> Feb 20 08:54:57 kg-f2 kernel: ata5: port is not ready (timeout 10000ms) tfd = 0000007f
> Feb 20 08:54:57 kg-f2 kernel: ata5: hardware reset timeout
> Feb 20 19:18:51 kg-f2 syslogd: kernel boot file is /boot/kernel/kernel
> 
> The drives are as follows:
> root@kg-f2# atacontrol list;camcontrol devlist
> ATA channel 0:
>     Master:      no device present
>     Slave:       no device present
> ATA channel 2:
>     Master:  ad4 <SAMSUNG HD252HJ/1AC01118> SATA revision 2.x
>     Slave:       no device present
> ATA channel 3:
>     Master:  ad6 <SAMSUNG HD252HJ/1AC01118> SATA revision 2.x
>     Slave:       no device present
> ATA channel 4:
>     Master:  ad8 <SAMSUNG HD103SJ/1AJ100E4> SATA revision 2.x
>     Slave:       no device present
> ATA channel 5:
>     Master: ad10 <SAMSUNG HD103SJ/1AJ100E4> SATA revision 2.x
>     Slave:       no device present
> ATA channel 6:
>     Master: ad12 <SAMSUNG HD103SJ/1AJ100E4> SATA revision 2.x
>     Slave:       no device present
> ATA channel 7:
>     Master: ad14 <SAMSUNG HD103SJ/1AJ100E4> SATA revision 2.x
>     Slave:       no device present
> <SAMSUNG HD103SJ 1AJ100E4>         at scbus0 target 0 lun 0 (pass0,ada0)
> 
> Smartctl is happy, too:
> root@kg-f2# smartctl -H /dev/ad4
> smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
> Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> root@kg-f2# smartctl -H /dev/ad6
> smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
> Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> root@kg-f2# smartctl -H /dev/ad8
> smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
> Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> root@kg-f2# smartctl -H /dev/ad10
> smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
> Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> root@kg-f2# smartctl -H /dev/ad12
> smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
> Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> root@kg-f2# smartctl -H /dev/ada0
> smartctl 5.39 2009-12-09 r2995 [FreeBSD 8.0-STABLE amd64] (local build)
> Copyright (C) 2002-9 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> Maybe the hardware is just plain broken.

Can you re-run smartctl -a instead of -H?  Some of the SMART attributes
may help determine what's going on, or there may be related errors in
the SMART error log.

Otherwise I'd say what's happening is a SATA controller lock-up of some
sort, since it happens on any of your channels.  Could be a quirk of
some kind in the SATA->CAM stuff (unless it also happens when using pure
ata(4)).

What controller are these disks hooked to again?

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100220193718.GA33214>