From owner-freebsd-hardware@freebsd.org Thu Apr 13 21:16:24 2017 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5F87BD108B5 for ; Thu, 13 Apr 2017 21:16:24 +0000 (UTC) (envelope-from heas@shrubbery.net) Received: from guelah.shrubbery.net (guelah.shrubbery.net [198.58.5.1]) by mx1.freebsd.org (Postfix) with ESMTP id 42415D29 for ; Thu, 13 Apr 2017 21:16:23 +0000 (UTC) (envelope-from heas@shrubbery.net) Received: by guelah.shrubbery.net (Postfix, from userid 7053) id CB92D5B987; Thu, 13 Apr 2017 20:59:32 +0000 (UTC) Date: Thu, 13 Apr 2017 20:59:32 +0000 From: heasley To: freebsd-hardware@freebsd.org Cc: heasley Subject: SSD errors Message-ID: <20170413205932.GJ2149@shrubbery.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-PGPkey: http://www.shrubbery.net/~heas/public-key.asc X-note: live free, or die! X-homer: i just want to have a beer while i am caring. X-Claimation: an engineer needs a manager like a fish needs a bicycle X-reality: only YOU can put an end to the embarrassment that is Tom Cruise User-Agent: Mutt/1.8.0 (2017-02-23) X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Apr 2017 21:16:24 -0000 I have 4 SSDs in zfs raidz2 on 11.0-RELEASE-p2. There on QD sleds rated for sata6 to convert them from 2.5" to 3.5 slots in a Supermicro SC733TQ chassis (2012) with a Supermicro X10SRi-F mb and using the on-board controller (either one). I swapped the SATA cables for some with "extra shielding". And the 4 HDs used previously (mix of WD and seagate 750g and 2T) worked flawlessly. The bios has a "SATA Device Type" option for SSH/hd, which is set to SSD and seems to only apply to spin-up signals. The chassis manual makes no reference to SSDs, I have found no fbsd configuration or recommendations specific to SSDs. When I push a lot of data to them, such as an rsync, I receive errors like the below. If I move drives between slots, it seems to follow the chassis slots, those closest to the power supply, but I'm not positive about this. I suppose the questions for list are: - have I missed any fbsd ssd-specific configuration? - all 4 have non-zero UDMA_CRC_Error_Count counters; not many, about the same number, which I believe implies electrical interference - most likely in the cable or chassis backplane. Should I buy some specific model cable? other recommendations? tia (ada2:ahcich6:0:0:0): READ_FPDMA_QUEUED. ACB: 60 80 d0 c2 cf 40 06 00 00 00 00 00 (ada2:ahcich6:0:0:0): CAM status: Uncorrectable parity/CRC error (ada2:ahcich6:0:0:0): Retrying command (ada2:ahcich6:0:0:0): READ_FPDMA_QUEUED. ACB: 60 80 d8 c3 cf 40 06 00 00 00 00 00 (ada2:ahcich6:0:0:0): CAM status: Uncorrectable parity/CRC error (ada2:ahcich6:0:0:0): Retrying command (ada3:ahcich7:0:0:0): READ_FPDMA_QUEUED. ACB: 60 80 18 1d fb 40 03 00 00 00 00 00 (ada3:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error (ada3:ahcich7:0:0:0): Retrying command (ada3:ahcich7:0:0:0): READ_FPDMA_QUEUED. ACB: 60 80 90 31 40 40 50 00 00 00 00 00 Device Model: Samsung SSD 850 EVO 2TB LU WWN Device Id: 5 002538 c4042fdb8 Firmware Version: EMT02B6Q User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Apr 13 20:43:52 2017 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x53) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 265) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 2552 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 14 177 Wear_Leveling_Count 0x0013 100 100 000 Pre-fail Always - 0 179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0 181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0 182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0 183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0032 074 062 000 Old_age Always - 26 195 Hardware_ECC_Recovered 0x001a 200 200 000 Old_age Always - 0 199 UDMA_CRC_Error_Count 0x003e 099 099 000 Old_age Always - 33 235 Unknown_Attribute 0x0012 099 099 000 Old_age Always - 2 241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 5911167739 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing 255 0 65535 Read_scanning was never started Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.