Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 06 Mar 2010 14:19:44 +0100
From:      Torfinn Ingolfsen <torfing@broadpark.no>
To:        freebsd-stable@freebsd.org
Subject:   Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64
Message-ID:  <20100306141944.95ec8cb6.torfinn.ingolfsen@broadpark.no>
In-Reply-To: <20100131144217.ca08e965.torfinn.ingolfsen@broadpark.no>
References:  <20100131144217.ca08e965.torfinn.ingolfsen@broadpark.no>

next in thread | previous in thread | raw e-mail | index | archive | help
Ok, a new development in this story.
Note that as of yet, I haven't change SATA cables or done anything else
with the hardware. However, I did upgrade to latest FreeBSD
8.0-stable / amd64 yesterday.
The machine is still up (it iahsn't crashed yet), and today I found this in
/var/log/messages:
Mar  6 06:25:34 kg-f2 kernel: ata5: port is not ready (timeout 10000ms) tfd = 0000007f
Mar  6 06:25:34 kg-f2 kernel: ata5: hardware reset timeout
Mar  6 06:25:45 kg-f2 kernel: ata6: port is not ready (timeout 10000ms) tfd = 0000007f
Mar  6 06:25:45 kg-f2 kernel: ata6: hardware reset timeout
Mar  6 06:25:45 kg-f2 root: ZFS: vdev failure, zpool=storage type=vdev.no_replicas
Mar  6 06:25:56 kg-f2 kernel: ata5: port is not ready (timeout 10000ms) tfd = 00000080
Mar  6 06:25:56 kg-f2 kernel: ata5: hardware reset timeout
Mar  6 06:26:06 kg-f2 kernel: ata6: port is not ready (timeout 10000ms) tfd = 0000007f
Mar  6 06:26:06 kg-f2 kernel: ata6: hardware reset timeout
Mar  6 06:26:08 kg-f2 root: ZFS: zpool I/O failure, zpool=storage error=28
Mar  6 06:26:08 kg-f2 last message repeated 2 times
Mar  6 06:26:08 kg-f2 root: ZFS: vdev I/O failure, zpool=storage path= offset= size= error=
Mar  6 06:26:16 kg-f2 kernel: ata5: port is not ready (timeout 10000ms) tfd = 0000007f
Mar  6 06:26:16 kg-f2 kernel: ata5: hardware reset timeout
Mar  6 06:26:27 kg-f2 kernel: ata6: port is not ready (timeout 10000ms) tfd = 0000007f
Mar  6 06:26:27 kg-f2 kernel: ata6: hardware reset timeout
Mar  6 06:26:37 kg-f2 kernel: ata5: port is not ready (timeout 10000ms) tfd = 00000080
Mar  6 06:26:37 kg-f2 kernel: ata5: hardware reset timeout
Mar  6 06:26:47 kg-f2 kernel: ata6: port is not ready (timeout 10000ms) tfd = 0000007f
Mar  6 06:26:47 kg-f2 kernel: ata6: hardware reset timeout
Mar  6 06:26:58 kg-f2 kernel: ata5: port is not ready (timeout 10000ms) tfd = 0000007f
Mar  6 06:26:58 kg-f2 kernel: ata5: hardware reset timeout
Mar  6 06:27:08 kg-f2 kernel: ata6: port is not ready (timeout 10000ms) tfd = 00000080
Mar  6 06:27:08 kg-f2 kernel: ata6: hardware reset timeout

Before the upgrade, messages such as these would (AFAICT) nresult on a panic and reboot.
Uptime:
root@kg-f2# uptime
 2:11PM  up 19:38, 3 users, load averages: 0.00, 0.00, 0.00
The boot / root mirror pool is okay:
root@kg-f2# zpool status zroot
  pool: zroot
 state: ONLINE
 scrub: scrub completed after 0h8m with 0 errors on Fri Mar  5 18:45:24 2010
config:

	NAME           STATE     READ WRITE CKSUM
	zroot          ONLINE       0     0     0
	  mirror       ONLINE       0     0     0
	    gpt/disk0  ONLINE       0     0     0
	    gpt/disk1  ONLINE       0     0     0

errors: No known data errors

However, the storage pool is not:
root@kg-f2# zpool status storage
  pool: storage
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-HC
 scrub: scrub completed after 0h0m with 0 errors on Fri Mar  5 18:36:17 2010
config:

	NAME        STATE     READ WRITE CKSUM
	storage     UNAVAIL      0     3     0  insufficient replicas
	  raidz1    UNAVAIL      0     0     0  insufficient replicas
	    ad8     ONLINE       0     0     0
	    ad10    REMOVED      0     0     0
	    ad12    REMOVED      0     0     0
	    ad14    ONLINE       0     0     0
	    ada0    ONLINE       0     0     0

errors: 2 data errors, use '-v' for a list
Currently, this pool isn't in use, so I am not concerned about data loss (luckily).
Note that before this upgrade, with all panics and reboots, both zfs pools have always been
clean and trouble-free after a reboot.
atacontrol confirms that ad10 and ad12 are "gone" (ie. disconnected:
root@kg-f2# atacontrol list
ATA channel 0:
    Master:      no device present
    Slave:       no device present
ATA channel 2:
    Master:  ad4 <SAMSUNG HD252HJ/1AC01118> SATA revision 2.x
    Slave:       no device present
ATA channel 3:
    Master:  ad6 <SAMSUNG HD252HJ/1AC01118> SATA revision 2.x
    Slave:       no device present
ATA channel 4:
    Master:  ad8 <SAMSUNG HD103SJ/1AJ100E4> SATA revision 2.x
    Slave:       no device present
ATA channel 5:
    Master:      no device present
    Slave:       no device present
ATA channel 6:
    Master:      no device present
    Slave:       no device present
ATA channel 7:
    Master: ad14 <SAMSUNG HD103SJ/1AJ100E4> SATA revision 2.x
    Slave:       no device present

What happens if I just rebot the server now? (I think that ad10 and ad12 will be detected and connected),
but what will zfs do with the 'storage' pool?

As always, more info (including verbose dmesgs etc.) on the FreeBSD page[1] for this machine.

References:
1) FreeBSd on this machine: http://sites.google.com/site/tingox/ga-ma74gm-s2h_freebsd
-- 
Torfinn




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100306141944.95ec8cb6.torfinn.ingolfsen>