From owner-freebsd-scsi@freebsd.org Sun Dec 11 20:09:16 2016 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 32F1BC720A9 for ; Sun, 11 Dec 2016 20:09:16 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-qt0-x22a.google.com (mail-qt0-x22a.google.com [IPv6:2607:f8b0:400d:c0d::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E2833F2; Sun, 11 Dec 2016 20:09:15 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-qt0-x22a.google.com with SMTP id c47so59988297qtc.2; Sun, 11 Dec 2016 12:09:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=GdyZHFwUzGvFvpVk9k3j8FdptwR58Y0lMMrruOjXPVM=; b=TPU7hG8eRgJED84VX05adGUkFsOT4NguMj0rY0xIs9LiGAEvvZVn3D6h6hBuvT+7Nc P1H6g+TBjhdglXEHf6ff+bwl3/cjx5UBZEWgc74Marocgj5jmc+uMXjg4ntDDYho4YzS 4GHJJESJubBcvDTgk7IQWNfIsIxal+cp9YjXL6rpUe7cKIW3Yx6OYJZ4huqccmdgyA8N nUs77Es9BsmQi4pS1tcXc1r/Q9d648ln5PYcauddNaPu3m5BlVEDRZQR198mvekjTz8s qw5KvHUNBVn6cfyyRjCuwZb5V5gvKiiwWiq4JoJEb/sL1ksikOsCWrzTGU+rBdDQ5pKz 7oVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=GdyZHFwUzGvFvpVk9k3j8FdptwR58Y0lMMrruOjXPVM=; b=LqtmJ4S1PM7wpXUrlicUSi5ohteUv0QnXVVriwqZGy3+KumWg5/poviN7FDJdubbfB 2iMXlWmZ5IA6Ettb2kGeRMNZc0n6h6sS6OInjeBCXV5pAVeR0U7dKgMo6Hx9jZyQvyeP KdqkGzqvMINlbo1EkWHyIFKsdCKJPcmOAp/pOpTXok9wrpHjldymwtmRGZ7mJ6mF1Iwz CoAEiDQBQcAcTZ/I59EtB5GXbScBcKHnSmN0+VVGvZdYgabPeGw/XtUGCsNWK+MMe9j3 k6ahzLiC70mWiVpwJErYDegc+mUFXC6gQbcKxYqGLTPPuyLTdwoXPSpghzdA2p79+kSW /9sg== X-Gm-Message-State: AKaTC02r31cI7oDewJkhx8wP066FmkqtZkiTZQnAJtnBfcxn0VEYQH4RTUHzY0SPTPmxWqMqsEOvBBGpsxVxlw== X-Received: by 10.200.53.172 with SMTP id k41mr79749792qtb.202.1481486954985; Sun, 11 Dec 2016 12:09:14 -0800 (PST) MIME-Version: 1.0 Sender: asomers@gmail.com Received: by 10.12.174.145 with HTTP; Sun, 11 Dec 2016 12:09:14 -0800 (PST) In-Reply-To: <106f66f2-90a8-884d-40d1-b202163c9eb4@FreeBSD.org> References: <106f66f2-90a8-884d-40d1-b202163c9eb4@FreeBSD.org> From: Alan Somers Date: Sun, 11 Dec 2016 13:09:14 -0700 X-Google-Sender-Auth: 3WeSkQuP2_Ma8B80o8ieH5mN15s Message-ID: Subject: Re: Fwd: frequent timeouts with mvs(4) SATA controller, GELI, and ZFS To: Alexander Motin Cc: FreeBSD-scsi Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Dec 2016 20:09:16 -0000 I was afraid you'd say something like that. Sadly, disabling NCQ didn't help. For good measure, I tried disabling interrupt coalescing too, but that didn't help either. The error message did change slightly: the iec field is now zero. mvsch2: Timeout on slot 0 mvsch2: iec 00000000 sstat 00000123 serr 00000000 edma_s 000000c0 dma_c 20000700 dma_s 00000008 rs 00000001 status 50 (ada1:mvsch2:0:0:0): WRITE_DMA. ACB: ca 00 18 72 60 49 00 00 00 00 00 00 (ada1:mvsch2:0:0:0): CAM status: Command timeout (ada1:mvsch2:0:0:0): Retrying command mvsch0: Timeout on slot 0 Eventually I get a "Retry was blocked" error like this, but the CAM Status is always "Command timeout". mvsch0: Timeout on slot 0 mvsch0: iec 00000000 sstat 00000123 serr 00000000 edma_s 00001140 dma_c 00000000 dma_s 00000008 rs 00000001 status 58 (aprobe1:mvsch0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 (aprobe1:mvsch0:0:0:0): CAM status: Command timeout (aprobe1:mvsch0:0:0:0): Error 5, Retry was blocked What's your recommendation? Is there anyway to make this hardware work, or do I need to buy a new SATA card? That would be a disappointment. The 88SX7042 got generally positive reviews. -Alan On Sun, Dec 11, 2016 at 2:44 AM, Alexander Motin wrote: > This controller uses Marvell proprietary API, and alike to most of their > products is not publicly documented. This family of chips also known > for long errata history, which is also not publicly documented. In > addition to that, this line of chips is discontinued for years since > Marvell switched to new line of AHCI compatible 6Gbps chips. > > "iec 02000000" means device error reported by EDMA engine. It should be > properly handled, not causing timeouts, but it seems something went > wrong. Either chip forgot to generate the interrupt, or driver did > something wrong about it. > > As workaround you may try to disable NCQ for those drives using > `camcontrol negotiate` and see what happen. May be that allow you to > see some real error reported by the drive or at least allow error recovery. > > On 11.12.2016 02:03, Alan Somers wrote: >> I have an 11.0-RELEASE machine with a Via Nano CPU and a Marvell SATA >> 88SX7042 controller. I have a GELI-encrypted triple-mirror zpool with >> disks on that controller. But the number doesn't matter; I have the >> same problems even when only one disk is connected. Whenever I write >> to this pool, after a few GB of writes I get a timeout on one of the >> mvs(4) slots, followed shortly by timeouts on every disk on that >> controller. From this point until I reboot, no command sent to any >> disk on that controller will ever complete. CAM tries to reprobe the >> disks, fails, and their ada nodes disappear. This is repeatable. >> Does anybody have any ideas what's going on? >> Anybody know any dirt about this SATA controller? >> >> pciconf -lv >> ... >> atapci0@pci0:0:15:0: class=0x01018f card=0xaa241106 chip=0x90011106 rev=0x00 >> hdr=0x00 >> vendor = 'VIA Technologies, Inc.' >> device = 'VX900 Serial ATA Controller' >> class = mass storage >> subclass = ATA >> mvs0@pci0:1:0:0: class=0x010000 card=0x11ab11ab chip=0x704211ab rev=0x02 >> hdr=0x00 >> vendor = 'Marvell Technology Group Ltd.' >> device = '88SX7042 PCI-e 4-port SATA-II' >> class = mass storage >> subclass = SCSI >> ... >> >> dmesg >> ... >> mvsch3: Timeout on slot 7 >> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 >> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 >> mvsch3: ... waiting for slots 00000072 >> mvsch3: Timeout on slot 6 >> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 >> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 >> mvsch3: ... waiting for slots 00000032 >> mvsch3: Timeout on slot 5 >> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 >> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 >> mvsch3: ... waiting for slots 00000012 >> mvsch3: Timeout on slot 4 >> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 >> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 >> mvsch3: ... waiting for slots 00000002 >> mvsch3: Timeout on slot 1 >> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 >> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 >> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 95 e4 11 40 4d 00 00 01 00 00 >> (ada3:mvsch3:0:0:0): CAM status: Command timeout >> (ada3:mvsch3:0:0:0): Retrying command >> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 5f 00 40 21 00 00 01 00 00 >> (ada3:mvsch3:0:0:0): CAM status: Command timeout >> (ada3:mvsch3:0:0:0): Retrying command >> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 61 00 40 21 00 00 01 00 00 >> (ada3:mvsch3:0:0:0): CAM status: Command timeout >> (ada3:mvsch3:0:0:0): Retrying command >> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 63 00 40 21 00 00 01 00 00 >> (ada3:mvsch3:0:0:0): CAM status: Command timeout >> (ada3:mvsch3:0:0:0): Retrying command >> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 67 00 40 21 00 00 01 00 00 >> (ada3:mvsch3:0:0:0): CAM status: Command timeout >> (ada3:mvsch3:0:0:0): Retrying command >> ... >> >> -Alan >> > > -- > Alexander Motin