From owner-freebsd-scsi@freebsd.org Sun Dec 11 09:44:29 2016 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9A2DCC72E79 for ; Sun, 11 Dec 2016 09:44:29 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-wm0-x229.google.com (mail-wm0-x229.google.com [IPv6:2a00:1450:400c:c09::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 300701CA4; Sun, 11 Dec 2016 09:44:29 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: by mail-wm0-x229.google.com with SMTP id g23so22672058wme.1; Sun, 11 Dec 2016 01:44:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=jnYUktJuwoQe4nrmGmSZ91FAjVntopKvT5xXoDIIvLg=; b=O16scAOM1a8MZU2wya20vUEm06Rh7W9kePCPL3Y9u7Z5Xp7AFhenzUqmnd0OKdV80/ 3tv1wZeNq4jLbHc1ZMhWMg6/gOTxpa/16/CQjD1qE6QuOAQ3UjuUYCPGai7U2F5/pwLu H5cRw1pm0QD59ZFDI4oJWhFpt74Yfq18aIMDe/17AoTFai2UWPEfkyV1h5TY93I3t/ku kOzKf6NKkSsGT9yD4Vd9Owc35KuO6XXxRYIDUoKMVny8n2K8YaEkETJc50ikBErrWF0N rnE1eNWUomPibeA1RqCuPLNtluufKFSBx1+aEkgoK5Pt8SppRr32gbpY0+/R+qJFWiw1 uRuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=jnYUktJuwoQe4nrmGmSZ91FAjVntopKvT5xXoDIIvLg=; b=mlljZi9qvVEZ1v5XGZ1VPKScz9BlVqRFx2h9VQYJE8TsT+hPqYBatz6TDHi6hi7W3u gO1nw7XS1qjTxhPiegjMFN4rVUxIy0E5CwEyrSuhuczf/0J1wQiOLFKeaP5VGmo6048G 39pktWjkH7GpbDlHhnp9G2JkUiCFEETHMNfqYeMCnzCrXS+Kpp00U4RRHFvDQUsaCECL eZ+FLNuT17OP6oKe9r6Zy85kA+c81vvAIFHPISJ5sAHXj84NxUtpPvgMynh6TQCJ8gDq kBSsaYXVp8DyBXDfpZm3hF409fn4kCJBLygBpHmnpU/2wUgfYFfN1hdyMb10rp2Lm+0F l0Kg== X-Gm-Message-State: AKaTC01Iq3+q0r8YxxdKBLKEMo/OdPjyS1t1bcKbE3ef6iFNAvegyOBUmLhT8XRUHSuX8w== X-Received: by 10.46.33.165 with SMTP id h37mr38615569lji.57.1481449467358; Sun, 11 Dec 2016 01:44:27 -0800 (PST) Received: from spectre.mavhome.dp.ua ([134.249.139.101]) by smtp.gmail.com with ESMTPSA id c10sm7915374ljd.38.2016.12.11.01.44.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 11 Dec 2016 01:44:26 -0800 (PST) Sender: Alexander Motin Subject: Re: Fwd: frequent timeouts with mvs(4) SATA controller, GELI, and ZFS To: Alan Somers , FreeBSD-scsi References: From: Alexander Motin Message-ID: <106f66f2-90a8-884d-40d1-b202163c9eb4@FreeBSD.org> Date: Sun, 11 Dec 2016 11:44:25 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Dec 2016 09:44:29 -0000 This controller uses Marvell proprietary API, and alike to most of their products is not publicly documented. This family of chips also known for long errata history, which is also not publicly documented. In addition to that, this line of chips is discontinued for years since Marvell switched to new line of AHCI compatible 6Gbps chips. "iec 02000000" means device error reported by EDMA engine. It should be properly handled, not causing timeouts, but it seems something went wrong. Either chip forgot to generate the interrupt, or driver did something wrong about it. As workaround you may try to disable NCQ for those drives using `camcontrol negotiate` and see what happen. May be that allow you to see some real error reported by the drive or at least allow error recovery. On 11.12.2016 02:03, Alan Somers wrote: > I have an 11.0-RELEASE machine with a Via Nano CPU and a Marvell SATA > 88SX7042 controller. I have a GELI-encrypted triple-mirror zpool with > disks on that controller. But the number doesn't matter; I have the > same problems even when only one disk is connected. Whenever I write > to this pool, after a few GB of writes I get a timeout on one of the > mvs(4) slots, followed shortly by timeouts on every disk on that > controller. From this point until I reboot, no command sent to any > disk on that controller will ever complete. CAM tries to reprobe the > disks, fails, and their ada nodes disappear. This is repeatable. > Does anybody have any ideas what's going on? > Anybody know any dirt about this SATA controller? > > pciconf -lv > ... > atapci0@pci0:0:15:0: class=0x01018f card=0xaa241106 chip=0x90011106 rev=0x00 > hdr=0x00 > vendor = 'VIA Technologies, Inc.' > device = 'VX900 Serial ATA Controller' > class = mass storage > subclass = ATA > mvs0@pci0:1:0:0: class=0x010000 card=0x11ab11ab chip=0x704211ab rev=0x02 > hdr=0x00 > vendor = 'Marvell Technology Group Ltd.' > device = '88SX7042 PCI-e 4-port SATA-II' > class = mass storage > subclass = SCSI > ... > > dmesg > ... > mvsch3: Timeout on slot 7 > mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 > dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 > mvsch3: ... waiting for slots 00000072 > mvsch3: Timeout on slot 6 > mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 > dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 > mvsch3: ... waiting for slots 00000032 > mvsch3: Timeout on slot 5 > mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 > dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 > mvsch3: ... waiting for slots 00000012 > mvsch3: Timeout on slot 4 > mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 > dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 > mvsch3: ... waiting for slots 00000002 > mvsch3: Timeout on slot 1 > mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 > dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 > (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 95 e4 11 40 4d 00 00 01 00 00 > (ada3:mvsch3:0:0:0): CAM status: Command timeout > (ada3:mvsch3:0:0:0): Retrying command > (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 5f 00 40 21 00 00 01 00 00 > (ada3:mvsch3:0:0:0): CAM status: Command timeout > (ada3:mvsch3:0:0:0): Retrying command > (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 61 00 40 21 00 00 01 00 00 > (ada3:mvsch3:0:0:0): CAM status: Command timeout > (ada3:mvsch3:0:0:0): Retrying command > (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 63 00 40 21 00 00 01 00 00 > (ada3:mvsch3:0:0:0): CAM status: Command timeout > (ada3:mvsch3:0:0:0): Retrying command > (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 67 00 40 21 00 00 01 00 00 > (ada3:mvsch3:0:0:0): CAM status: Command timeout > (ada3:mvsch3:0:0:0): Retrying command > ... > > -Alan > -- Alexander Motin