Date: Sun, 11 Dec 2016 13:09:14 -0700 From: Alan Somers <asomers@freebsd.org> To: Alexander Motin <mav@freebsd.org> Cc: FreeBSD-scsi <freebsd-scsi@freebsd.org> Subject: Re: Fwd: frequent timeouts with mvs(4) SATA controller, GELI, and ZFS Message-ID: <CAOtMX2jQisAf1ZnuP3FrOWv2ONUhr3Db0Wq%2B7J%2ByZZeaGUystg@mail.gmail.com> In-Reply-To: <106f66f2-90a8-884d-40d1-b202163c9eb4@FreeBSD.org> References: <CAOtMX2jYzMatN5WSZjBL5hi%2B_EMpa4bv9QsVxeHthMkaSR9FNw@mail.gmail.com> <CAOtMX2ghs_KwQDJQ4hyqb0mebZw_hvRVBS_48DYm=DvekVP=rw@mail.gmail.com> <CAOtMX2g7pjAVhRFcxOVN%2BucMVvMzH%2B5ZnVDo17eTPNAaPC86tA@mail.gmail.com> <106f66f2-90a8-884d-40d1-b202163c9eb4@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
I was afraid you'd say something like that. Sadly, disabling NCQ didn't help. For good measure, I tried disabling interrupt coalescing too, but that didn't help either. The error message did change slightly: the iec field is now zero. mvsch2: Timeout on slot 0 mvsch2: iec 00000000 sstat 00000123 serr 00000000 edma_s 000000c0 dma_c 20000700 dma_s 00000008 rs 00000001 status 50 (ada1:mvsch2:0:0:0): WRITE_DMA. ACB: ca 00 18 72 60 49 00 00 00 00 00 00 (ada1:mvsch2:0:0:0): CAM status: Command timeout (ada1:mvsch2:0:0:0): Retrying command mvsch0: Timeout on slot 0 Eventually I get a "Retry was blocked" error like this, but the CAM Status is always "Command timeout". mvsch0: Timeout on slot 0 mvsch0: iec 00000000 sstat 00000123 serr 00000000 edma_s 00001140 dma_c 00000000 dma_s 00000008 rs 00000001 status 58 (aprobe1:mvsch0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 (aprobe1:mvsch0:0:0:0): CAM status: Command timeout (aprobe1:mvsch0:0:0:0): Error 5, Retry was blocked What's your recommendation? Is there anyway to make this hardware work, or do I need to buy a new SATA card? That would be a disappointment. The 88SX7042 got generally positive reviews. -Alan On Sun, Dec 11, 2016 at 2:44 AM, Alexander Motin <mav@freebsd.org> wrote: > This controller uses Marvell proprietary API, and alike to most of their > products is not publicly documented. This family of chips also known > for long errata history, which is also not publicly documented. In > addition to that, this line of chips is discontinued for years since > Marvell switched to new line of AHCI compatible 6Gbps chips. > > "iec 02000000" means device error reported by EDMA engine. It should be > properly handled, not causing timeouts, but it seems something went > wrong. Either chip forgot to generate the interrupt, or driver did > something wrong about it. > > As workaround you may try to disable NCQ for those drives using > `camcontrol negotiate` and see what happen. May be that allow you to > see some real error reported by the drive or at least allow error recovery. > > On 11.12.2016 02:03, Alan Somers wrote: >> I have an 11.0-RELEASE machine with a Via Nano CPU and a Marvell SATA >> 88SX7042 controller. I have a GELI-encrypted triple-mirror zpool with >> disks on that controller. But the number doesn't matter; I have the >> same problems even when only one disk is connected. Whenever I write >> to this pool, after a few GB of writes I get a timeout on one of the >> mvs(4) slots, followed shortly by timeouts on every disk on that >> controller. From this point until I reboot, no command sent to any >> disk on that controller will ever complete. CAM tries to reprobe the >> disks, fails, and their ada nodes disappear. This is repeatable. >> Does anybody have any ideas what's going on? >> Anybody know any dirt about this SATA controller? >> >> pciconf -lv >> ... >> atapci0@pci0:0:15:0: class=0x01018f card=0xaa241106 chip=0x90011106 rev=0x00 >> hdr=0x00 >> vendor = 'VIA Technologies, Inc.' >> device = 'VX900 Serial ATA Controller' >> class = mass storage >> subclass = ATA >> mvs0@pci0:1:0:0: class=0x010000 card=0x11ab11ab chip=0x704211ab rev=0x02 >> hdr=0x00 >> vendor = 'Marvell Technology Group Ltd.' >> device = '88SX7042 PCI-e 4-port SATA-II' >> class = mass storage >> subclass = SCSI >> ... >> >> dmesg >> ... >> mvsch3: Timeout on slot 7 >> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 >> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 >> mvsch3: ... waiting for slots 00000072 >> mvsch3: Timeout on slot 6 >> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 >> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 >> mvsch3: ... waiting for slots 00000032 >> mvsch3: Timeout on slot 5 >> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 >> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 >> mvsch3: ... waiting for slots 00000012 >> mvsch3: Timeout on slot 4 >> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 >> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 >> mvsch3: ... waiting for slots 00000002 >> mvsch3: Timeout on slot 1 >> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1 >> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40 >> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 95 e4 11 40 4d 00 00 01 00 00 >> (ada3:mvsch3:0:0:0): CAM status: Command timeout >> (ada3:mvsch3:0:0:0): Retrying command >> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 5f 00 40 21 00 00 01 00 00 >> (ada3:mvsch3:0:0:0): CAM status: Command timeout >> (ada3:mvsch3:0:0:0): Retrying command >> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 61 00 40 21 00 00 01 00 00 >> (ada3:mvsch3:0:0:0): CAM status: Command timeout >> (ada3:mvsch3:0:0:0): Retrying command >> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 63 00 40 21 00 00 01 00 00 >> (ada3:mvsch3:0:0:0): CAM status: Command timeout >> (ada3:mvsch3:0:0:0): Retrying command >> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 67 00 40 21 00 00 01 00 00 >> (ada3:mvsch3:0:0:0): CAM status: Command timeout >> (ada3:mvsch3:0:0:0): Retrying command >> ... >> >> -Alan >> > > -- > Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2jQisAf1ZnuP3FrOWv2ONUhr3Db0Wq%2B7J%2ByZZeaGUystg>
