Date: Wed, 01 Mar 2017 22:07:46 -0800 From: "Chris H" <bsd-lists@bsdforge.com> To: <freebsd-hackers@freebsd.org> Subject: Re: Disk controller heizenbug. Message-ID: <b9ece7e7a92b371fa8afa77404756403@ultimatedns.net> In-Reply-To: <CACpH0Mdu7g2YCUphtZ_2P0T7-Ju9XH0QGoL-pSGei6nDQtpnvA@mail.gmail.com> References: <CACpH0Mdu7g2YCUphtZ_2P0T7-Ju9XH0QGoL-pSGei6nDQtpnvA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2 Mar 2017 00:16:58 -0500 Zaphod Beeblebrox <zbeeble@gmail.com> wrote > I have a disk controller. I works in a modern AMD motherboard at home > (9590 processor), Nice thing about these processors, is the ability to cook your meals on them, too. :-) > but when connected to a sunfire 4140 (opteron 2345 based > machine vintage 2008-ish) the disks spontaneously detach by just doing a > "zfs import" > > The board has it's own mounting for the flash disks (two of them) and > probes as: > > ahci0: <Marvell 88SE9230 AHCI SATA controller> port > 0x8c00-0x8c07,0x8880-0x8883,0x8800-0x8807,0x8480-0x8483,0x8400-0x841f mem > 0xdfbff800-0xdfbfffff irq 16 at device 0.0 numa-domain 0 on pci3 > > The disks show up as: > > ada0 at ahcich0 bus 0 scbus6 target 0 lun 0 > ada0: <Samsung SSD 850 EVO mSATA 250GB EMT41B6Q> ACS-2 ATA SATA 3.x device > ada0: Serial Number S248NXAH112465B > ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) > ada0: Command Queueing enabled > ada0: 238475MB (488397168 512 byte sectors) > ada0: quirks=0x3<4K,NCQ_TRIM_BROKEN> > > Under heavy bonnie++, they work in the AMD 9590 system. On the opteron > machine, the following occurs: > > ahcich1: Timeout on slot 11 port 0 > ahcich1: is ffffffff cs ffffffff ss ffffffff rs 00000800 tfd ffffffff serr > ffffffff cmd ffffffff > (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 90 e0 20 a0 40 17 00 00 > 00 00 00 > (ada1:ahcich1:0:0:0): CAM status: Command timeout > (ada1:ahcich1:0:0:0): Retrying command > ahcich1: stopping AHCI engine failed > ahcich0: ada1 at ahcich1 bus 0 scbus7 target 0 lun 0 > Timeout on slot 31 port 0 > ada1: ahcich0: <Samsung SSD 850 EVO mSATA 250GB EMT41B6Q>is ffffffff cs > ffffffff ss ffffffff rs 80000000 tfd ffffffff serr ffffffff cmd ffffffff > s/n S248NXAH112471L detached > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 90 e0 20 a0 40 17 00 00 > 00 00 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Retrying command > ahcich0: stopping AHCI engine failed > ada0 at ahcich0 bus 0 scbus6 target 0 lun 0 > ada0: <Samsung SSD 850 EVO mSATA 250GB EMT41B6Q> s/n S248NXAH112465B > detached > [2:43:343]root@yak:/usr/ports/net-mgmt/net-snmp> less /var/run/dmesg.boot > [2:44:344]root@yak:/usr/ports/net-mgmt/net-snmp> dmesg > pid 78200 (httpd), uid 80: exited on signal 11 > ahcich1: Timeout on slot 11 port 0 > ahcich1: is ffffffff cs ffffffff ss ffffffff rs 00000800 tfd ffffffff serr > ffffffff cmd ffffffff > (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 90 e0 20 a0 40 17 00 00 > 00 00 00 > (ada1:ahcich1:0:0:0): CAM status: Command timeout > (ada1:ahcich1:0:0:0): Retrying command > ahcich1: stopping AHCI engine failed > ahcich0: ada1 at ahcich1 bus 0 scbus7 target 0 lun 0 > Timeout on slot 31 port 0 > ada1: ahcich0: <Samsung SSD 850 EVO mSATA 250GB EMT41B6Q>is ffffffff cs > ffffffff ss ffffffff rs 80000000 tfd ffffffff serr ffffffff cmd ffffffff > s/n S248NXAH112471L detached > (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 90 e0 20 a0 40 17 00 00 > 00 00 00 > (ada0:ahcich0:0:0:0): CAM status: Command timeout > (ada0:ahcich0:0:0:0): Retrying command > ahcich0: stopping AHCI engine failed > ada0 at ahcich0 bus 0 scbus6 target 0 lun 0 > ada0: <Samsung SSD 850 EVO mSATA 250GB EMT41B6Q> s/n S248NXAH112465B > detached > > I'm posting here to hackers because this seems to violate layers --- on the > AMD machine ... it runs fine... even under load. The SATA bus is local to > the card (and so travels with it to the server), yet the error looks like a > SATA BUS or drive error. > > What gives? I may be misunderstanding your question. But this smells like a BUS timing issue. eg; maybe your sunfire isn't quite synced -- BIOS settings for bus, ram && cpu? I have the same issue on one of my boards. In my case, I OC'd the CPU ~500Mhz over spec. Just thought I'd mention it. --Chris > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b9ece7e7a92b371fa8afa77404756403>