Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Aug 2013 14:07:02 +0100
From:      Karl Pielorz <kpielorz_lst@tdx.co.uk>
To:        freebsd-geom@FreeBSD.org
Subject:   Onboard RAID panic / reboot after CAM timeout?
Message-ID:  <4C7053FCE24480BF96DF525A@Mail-PC.tdx.co.uk>

next in thread | raw e-mail | index | archive | help

Hi,

I've got a amd64 '9.1-STABLE' box running with the systems 'onboard' RAID, 
i.e.

ahci0: <Intel ICH8 AHCI SATA controller> port 
0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf000-0xf01f mem 
0xdfa22000-0xdfa227ff irq 19 at device 31.2 on pci0
ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported


This is setup, and has been running fine:

   Name   Status  Components
raid/r0  OPTIMAL  ada0 (ACTIVE (ACTIVE))
                  ada1 (ACTIVE (ACTIVE))


The other day the machine picked up a CAM timeout, and rebooted:

"
ahcich1: Timeout on slot 31 port 0
ahcich1: is 00000000 cs 00000000 ss 80000000 rs 80000000 tfd 40 serr 
00000000 cmd 0004df17
(ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 c0 4a e9 40 03 00 00 
00 00 00
(ada1:ahcich1:0:0:0): CAM status: Command timeout
(ada1:ahcich1:0:0:0): Retrying command
"

By the time we'd gotten onto the box it had restarted, and had started 
rebuilding the RAID array. This completed OK - and it has been OK since.

Presumably RAID should have either recovered/handled this, or at least just 
failed ada1 and continued?

Are there any known issues with CAM timeouts on graid'ed drives not being 
survivable?

Cheers,

-Karl





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C7053FCE24480BF96DF525A>