From owner-freebsd-hardware@FreeBSD.ORG Mon Jul 22 14:36:13 2013 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3A274D3C for ; Mon, 22 Jul 2013 14:36:13 +0000 (UTC) (envelope-from Bob.Bawn@nirvanix.com) Received: from db9outboundpool.messaging.microsoft.com (mail-db9lp0250.outbound.messaging.microsoft.com [213.199.154.250]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A24F82219 for ; Mon, 22 Jul 2013 14:36:12 +0000 (UTC) Received: from mail128-db9-R.bigfish.com (10.174.16.244) by DB9EHSOBE033.bigfish.com (10.174.14.96) with Microsoft SMTP Server id 14.1.225.22; Mon, 22 Jul 2013 14:36:04 +0000 Received: from mail128-db9 (localhost [127.0.0.1]) by mail128-db9-R.bigfish.com (Postfix) with ESMTP id D229240252 for ; Mon, 22 Jul 2013 14:36:04 +0000 (UTC) X-Forefront-Antispam-Report: CIP:208.84.97.55; KIP:(null); UIP:(null); IPV:NLI; H:CORPEX001.nirvanix.com; RD:mail.nirvanix.com; EFVD:NLI X-SpamScore: 10 X-BigFish: VPS10(zz103dKzz1f42h208ch1ee6h1de0h1fdah2073h1202h1e76h1d1ah1d2ah1fc6hzz177df4h17326ah1de097h1de096h8275bhf73b6uz2dh2a8h668h839h944hd25hf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h14ddh1504h1537h153bh15d0h162dh1631h1758h18e1h1946h19b5h19ceh1b0ah1d0ch1d2eh1d3fh1dc1h1dfeh1dffh1e1dh1155h) Received: from mail128-db9 (localhost.localdomain [127.0.0.1]) by mail128-db9 (MessageSwitch) id 1374503762848179_9687; Mon, 22 Jul 2013 14:36:02 +0000 (UTC) Received: from DB9EHSMHS019.bigfish.com (unknown [10.174.16.240]) by mail128-db9.bigfish.com (Postfix) with ESMTP id CADE32201FF for ; Mon, 22 Jul 2013 14:36:02 +0000 (UTC) Received: from CORPEX001.nirvanix.com (208.84.97.55) by DB9EHSMHS019.bigfish.com (10.174.14.29) with Microsoft SMTP Server (TLS) id 14.16.227.3; Mon, 22 Jul 2013 14:36:00 +0000 Received: from CORPEX001.nirvanix.com ([::1]) by CORPEX001.nirvanix.com ([::1]) with mapi id 14.01.0355.002; Mon, 22 Jul 2013 07:35:58 -0700 From: Bob Bawn To: "freebsd-hardware@freebsd.org" Subject: Reset Problem with SATA Port Multiplier Thread-Topic: Reset Problem with SATA Port Multiplier Thread-Index: Ac6G6MSbMPl46FAoQOey01iNxNo2cA== Date: Mon, 22 Jul 2013 14:35:57 +0000 Message-ID: <94969AC586B81A4BBD2484F9862736A80CDAE28E@CORPEX001.nirvanix.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [208.14.191.60] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: nirvanix.com X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn% X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jul 2013 14:36:13 -0000 Hello, I'm testing high-density SATA storage with FreeBSD 9.1-STABLE. The hardware is: Drives: 45 * Seagate Altos ST3000NC002 Port Multipliers: 9 * SiI3826 SATA Controller: 3 * Marvell 88SX7042 After a few hours of a database-like workload over ZFS (NCQ enable, disk write caches disabled), a disk becomes unresponsive (we think due to a drive firmware problem): Jun 14 21:39:54 adlax12st002 root: sysbench tests are now underway Jun 15 12:12:07 adlax12st002 kernel: mvsch1: SNTF 15 Jun 15 12:12:37 adlax12st002 kernel: mvsch1: Timeout on slot 12 Jun 15 12:12:37 adlax12st002 kernel: mvsch1: iec 00000000 sstat 00000123 se= rr 00400000 edma_s 00000024 dma_c 10000708 dma_s 00000008 rs 08c81408 statu= s 40 Jun 15 12:12:37 adlax12st002 kernel: mvsch1: ... waiting for slots 08c8040= 8 Jun 15 12:12:37 adlax12st002 kernel: mvsch1: Timeout on slot 3 Jun 15 12:12:37 adlax12st002 kernel: mvsch1: iec 00000000 sstat 00000123 se= rr 00400000 edma_s 00000024 dma_c 10000708 dma_s 00000008 rs 08c81408 statu= s 40 Jun 15 12:12:37 adlax12st002 kernel: mvsch1: ... waiting for slots 08c8040= 0 After a few timeout/reset cycles, the afflicted device is removed: Jun 15 12:13:41 adlax12st002 kernel: (aprobe1:mvsch1:0:1:0): ATA_IDENTIFY. = ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Jun 15 12:13:41 adlax12st002 kernel: (aprobe1:mvsch1:0:1:0): CAM status: Co= mmand timeout Jun 15 12:13:41 adlax12st002 kernel: (aprobe1:mvsch1:0:1:0): Error 5, Retry= was blocked Jun 15 12:13:41 adlax12st002 kernel: (ada6:mvsch1:0:1:0): lost device Jun 15 12:13:41 adlax12st002 kernel: (pass7:mvsch1:0:1:0): lost device Jun 15 12:13:41 adlax12st002 kernel: (pass7:mvsch1:0:1:0): removing device = entry Jun 15 12:13:41 adlax12st002 kernel: mvsch1: MVS reset: device ready after = 500ms All of that seems like reasonable OS behavior when a drive is unresponsive. In fact Linux/CentOS/ZoL behaves pretty much the same up to this point. The problem is that the other four drives behind the port multiplier start timing out and get removed, one at a time, in target order, over the next few minutes: # grep "lost device" adlax12st002-messages.log Jun 15 12:13:41 adlax12st002 kernel: (ada6:mvsch1:0:1:0): lost device Jun 15 12:13:41 adlax12st002 kernel: (pass7:mvsch1:0:1:0): lost device Jun 15 12:16:16 adlax12st002 kernel: (ada7:mvsch1:0:2:0): lost device Jun 15 12:16:16 adlax12st002 kernel: (pass8:mvsch1:0:2:0): lost device Jun 15 12:18:50 adlax12st002 kernel: (ada8:mvsch1:0:3:0): lost device Jun 15 12:18:50 adlax12st002 kernel: (pass9:mvsch1:0:3:0): lost device Jun 15 12:22:23 adlax12st002 kernel: (ada9:mvsch1:0:4:0): lost device Jun 15 12:22:23 adlax12st002 kernel: (pass10:mvsch1:0:4:0): lost device Jun 15 12:26:57 adlax12st002 kernel: (ada5:mvsch1:0:0:0): lost device Jun 15 12:26:57 adlax12st002 kernel: (pass6:mvsch1:0:0:0): lost device It looks like the timeout/reset/recovery sequence for the initial frozen disk has somehow broken connectivity to all the drives behind the port multiplier. This part does not happen on Linux. Sometimes the entire machine is locked up after the "lost device" sequence. In all cases, a full power cycle is required to make the devices available again. When I soft reset the box over IPMI, the boot process gets stuck in a loop with "mvsch2: MVS reset" and "mvsch2: Wait status d0".=20 Full /var/log/messages are at: http://pastebin.com/xCJyfvSN Unfortunately, I failed to grab the dmesg output and the box has since been re-imaged. Here is a dmesg from a machine which I believe to be identical to the test box: http://pastebin.com/NYjezuMX /var/log/messages for the CentOS/Linux case is at: http://pastebin.com/qrWm0HJ0 Maybe this is a topic for a different post, but has anybody successfully used high-density port-multiplied SATA platforms with FreeBSD? I've heard lots of anecdotes about hardware and/or driver flakiness (like the above), undocumented hardware, etc. (Actually, I've heard similar complaints from Linux folks.) SAS machines seem to handle this workload without any problems. We have tried 9.1-RELEASE and the behavior was worse. =20 We're actually more interested in archive type workloads than this database workload and we have not observed the problem with an archive workload. However, we're worried that general single-drive failures could turn into unavailability of five drives regardless of workload. Any guidance would be appreciated.=20 Thanks! Bob Bawn