From owner-freebsd-hardware@FreeBSD.ORG  Mon Jul 22 14:36:13 2013
Return-Path: <owner-freebsd-hardware@FreeBSD.ORG>
Delivered-To: freebsd-hardware@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 3A274D3C
 for <freebsd-hardware@freebsd.org>; Mon, 22 Jul 2013 14:36:13 +0000 (UTC)
 (envelope-from Bob.Bawn@nirvanix.com)
Received: from db9outboundpool.messaging.microsoft.com
 (mail-db9lp0250.outbound.messaging.microsoft.com [213.199.154.250])
 (using TLSv1 with cipher AES128-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id A24F82219
 for <freebsd-hardware@freebsd.org>; Mon, 22 Jul 2013 14:36:12 +0000 (UTC)
Received: from mail128-db9-R.bigfish.com (10.174.16.244) by
 DB9EHSOBE033.bigfish.com (10.174.14.96) with Microsoft SMTP Server id
 14.1.225.22; Mon, 22 Jul 2013 14:36:04 +0000
Received: from mail128-db9 (localhost [127.0.0.1])	by
 mail128-db9-R.bigfish.com (Postfix) with ESMTP id D229240252	for
 <freebsd-hardware@freebsd.org>; Mon, 22 Jul 2013 14:36:04 +0000 (UTC)
X-Forefront-Antispam-Report: CIP:208.84.97.55; KIP:(null); UIP:(null); IPV:NLI;
 H:CORPEX001.nirvanix.com; RD:mail.nirvanix.com; EFVD:NLI
X-SpamScore: 10
X-BigFish: VPS10(zz103dKzz1f42h208ch1ee6h1de0h1fdah2073h1202h1e76h1d1ah1d2ah1fc6hzz177df4h17326ah1de097h1de096h8275bhf73b6uz2dh2a8h668h839h944hd25hf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h14ddh1504h1537h153bh15d0h162dh1631h1758h18e1h1946h19b5h19ceh1b0ah1d0ch1d2eh1d3fh1dc1h1dfeh1dffh1e1dh1155h)
Received: from mail128-db9 (localhost.localdomain [127.0.0.1]) by mail128-db9
 (MessageSwitch) id 1374503762848179_9687;
 Mon, 22 Jul 2013 14:36:02 +0000 (UTC)
Received: from DB9EHSMHS019.bigfish.com (unknown [10.174.16.240])	by
 mail128-db9.bigfish.com (Postfix) with ESMTP id CADE32201FF	for
 <freebsd-hardware@freebsd.org>; Mon, 22 Jul 2013 14:36:02 +0000 (UTC)
Received: from CORPEX001.nirvanix.com (208.84.97.55) by
 DB9EHSMHS019.bigfish.com (10.174.14.29) with Microsoft SMTP Server (TLS) id
 14.16.227.3; Mon, 22 Jul 2013 14:36:00 +0000
Received: from CORPEX001.nirvanix.com ([::1]) by CORPEX001.nirvanix.com
 ([::1]) with mapi id 14.01.0355.002; Mon, 22 Jul 2013 07:35:58 -0700
From: Bob Bawn <Bob.Bawn@nirvanix.com>
To: "freebsd-hardware@freebsd.org" <freebsd-hardware@freebsd.org>
Subject: Reset Problem with SATA Port Multiplier
Thread-Topic: Reset Problem with SATA Port Multiplier
Thread-Index: Ac6G6MSbMPl46FAoQOey01iNxNo2cA==
Date: Mon, 22 Jul 2013 14:35:57 +0000
Message-ID: <94969AC586B81A4BBD2484F9862736A80CDAE28E@CORPEX001.nirvanix.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [208.14.191.60]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: nirvanix.com
X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn%
X-BeenThere: freebsd-hardware@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: General discussion of FreeBSD hardware <freebsd-hardware.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hardware>, 
 <mailto:freebsd-hardware-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hardware>
List-Post: <mailto:freebsd-hardware@freebsd.org>
List-Help: <mailto:freebsd-hardware-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hardware>, 
 <mailto:freebsd-hardware-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Jul 2013 14:36:13 -0000

Hello,

I'm testing high-density SATA storage with FreeBSD 9.1-STABLE. The
hardware is:

Drives: 45 * Seagate Altos ST3000NC002
Port Multipliers: 9 * SiI3826
SATA Controller: 3 * Marvell 88SX7042


After a few hours of a database-like workload over ZFS (NCQ enable, disk
write caches disabled), a disk becomes unresponsive (we think due to a
drive firmware problem):

Jun 14 21:39:54 adlax12st002 root: sysbench tests are now underway
Jun 15 12:12:07 adlax12st002 kernel: mvsch1: SNTF 15
Jun 15 12:12:37 adlax12st002 kernel: mvsch1: Timeout on slot 12
Jun 15 12:12:37 adlax12st002 kernel: mvsch1: iec 00000000 sstat 00000123 se=
rr 00400000 edma_s 00000024 dma_c 10000708 dma_s 00000008 rs 08c81408 statu=
s 40
Jun 15 12:12:37 adlax12st002 kernel: mvsch1:  ... waiting for slots 08c8040=
8
Jun 15 12:12:37 adlax12st002 kernel: mvsch1: Timeout on slot 3
Jun 15 12:12:37 adlax12st002 kernel: mvsch1: iec 00000000 sstat 00000123 se=
rr 00400000 edma_s 00000024 dma_c 10000708 dma_s 00000008 rs 08c81408 statu=
s 40
Jun 15 12:12:37 adlax12st002 kernel: mvsch1:  ... waiting for slots 08c8040=
0

After a few timeout/reset cycles, the afflicted device is removed:

Jun 15 12:13:41 adlax12st002 kernel: (aprobe1:mvsch1:0:1:0): ATA_IDENTIFY. =
ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jun 15 12:13:41 adlax12st002 kernel: (aprobe1:mvsch1:0:1:0): CAM status: Co=
mmand timeout
Jun 15 12:13:41 adlax12st002 kernel: (aprobe1:mvsch1:0:1:0): Error 5, Retry=
 was blocked
Jun 15 12:13:41 adlax12st002 kernel: (ada6:mvsch1:0:1:0): lost device
Jun 15 12:13:41 adlax12st002 kernel: (pass7:mvsch1:0:1:0): lost device
Jun 15 12:13:41 adlax12st002 kernel: (pass7:mvsch1:0:1:0): removing device =
entry
Jun 15 12:13:41 adlax12st002 kernel: mvsch1: MVS reset: device ready after =
500ms


All of that seems like reasonable OS behavior when a drive is
unresponsive. In fact Linux/CentOS/ZoL behaves pretty much the same up
to this point.

The problem is that the other four drives behind the port multiplier
start timing out and get removed, one at a time, in target order, over
the next few minutes:

# grep "lost device" adlax12st002-messages.log
Jun 15 12:13:41 adlax12st002 kernel: (ada6:mvsch1:0:1:0): lost device
Jun 15 12:13:41 adlax12st002 kernel: (pass7:mvsch1:0:1:0): lost device
Jun 15 12:16:16 adlax12st002 kernel: (ada7:mvsch1:0:2:0): lost device
Jun 15 12:16:16 adlax12st002 kernel: (pass8:mvsch1:0:2:0): lost device
Jun 15 12:18:50 adlax12st002 kernel: (ada8:mvsch1:0:3:0): lost device
Jun 15 12:18:50 adlax12st002 kernel: (pass9:mvsch1:0:3:0): lost device
Jun 15 12:22:23 adlax12st002 kernel: (ada9:mvsch1:0:4:0): lost device
Jun 15 12:22:23 adlax12st002 kernel: (pass10:mvsch1:0:4:0): lost device
Jun 15 12:26:57 adlax12st002 kernel: (ada5:mvsch1:0:0:0): lost device
Jun 15 12:26:57 adlax12st002 kernel: (pass6:mvsch1:0:0:0): lost device

It looks like the timeout/reset/recovery sequence for the initial frozen
disk has somehow broken connectivity to all the drives behind the port
multiplier. This part does not happen on Linux. Sometimes the entire
machine is locked up after the "lost device" sequence. In all cases, a
full power cycle is required to make the devices available again. When I
soft reset the box over IPMI, the boot process gets stuck in a loop with
"mvsch2: MVS reset" and "mvsch2: Wait status d0".=20

Full /var/log/messages are at:

http://pastebin.com/xCJyfvSN

Unfortunately, I failed to grab the dmesg output and the box has since
been re-imaged. Here is a dmesg from a machine which I believe to be
identical to the test box:

http://pastebin.com/NYjezuMX

/var/log/messages for the CentOS/Linux case is at:

http://pastebin.com/qrWm0HJ0

Maybe this is a topic for a different post, but has anybody successfully
used high-density port-multiplied SATA platforms with FreeBSD? I've
heard lots of anecdotes about hardware and/or driver flakiness (like the
above), undocumented hardware, etc. (Actually, I've heard similar
complaints from Linux folks.) SAS machines seem to handle this workload
without any problems.  We have tried 9.1-RELEASE and the behavior was
worse. =20

We're actually more interested in archive type workloads than this
database workload and we have not observed the problem with an archive
workload. However, we're worried that general single-drive failures
could turn into unavailability of five drives regardless of workload.

Any guidance would be appreciated.=20

Thanks!
Bob Bawn