From nobody Mon Sep 16 06:46:39 2024 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4X6b6l63bzz5WVTK for ; Mon, 16 Sep 2024 06:46:39 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4X6b6l4bYBz4H3q for ; Mon, 16 Sep 2024 06:46:39 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1726469199; a=rsa-sha256; cv=none; b=X4/w1+QF0wtZTfI/hjaGQdKLZCHhyuheb7wNFvTE2YsLIgDzIHL+1lMaXD5Ekj7YP2wQuY zlAdI+0rhEQuY80VnP+3Ju7NapN/4L4OHBukw0mvUN8nat0BXnGNXsqfMR1SMqC217wjap Dpj99/Q4MBpjoPXYJxWADGgXjMHG79XgRkHs6ZjFtppYS5jmWYSBb+488ryg3k+VUlbFIK e69gkA/HC7xXjPunLVSbrWfhJMSgUknerUQ6WO0Ixy3gHj0usi6YcMaa/WvLW+hmJqc5R1 8AKKqZJCo75OlgoBUwuNMw5Jmi+PMuE1oqQr+sTZOE8wnMTdAsrXeEqhmRArPw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1726469199; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=ryV1Uw8DHBVbSDP2HHY8gWqXlgOCMAz4Zv7XnWWLJMo=; b=UxCrdqGwBx6jcmp5JAckOxUO0OfP+Ll/Nb3x1BGuDIoVPFLHpVu+d8ppNaGApYwyhqWOxy 5PXjrTPsB8D4wSYpqoFI6uZc2vvFrtS9YOhPG9Tk1RIGLM+zT0eAT2VGp6FDAu2nhWYnMX mpXBa3PLSPw2AptZomDFvyL5h+zeBwyx7TBMI3SJnga0dczz1ZvVuy9+2VHUQPijtcFqwX 0wYRvZrMKw3fdJ08gr47gquamlGBwIjikggjVjizXMw7ZX1hlrR+C3NVHkkjlPoo2bPLAE H7x5Fb2g5Ae0eMdRQfKWwXqRc+9tRruwf8L9Tft8Z4ROZSRPO101Tpy4rrE/yQ== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4X6b6l450dzl8Q for ; Mon, 16 Sep 2024 06:46:39 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 48G6kdU1065450 for ; Mon, 16 Sep 2024 06:46:39 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 48G6kdLP065449 for bugs@FreeBSD.org; Mon, 16 Sep 2024 06:46:39 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 281528] SCSI tag-queue error with Samsung 870 EVO SSD [incl. workaround] Date: Mon, 16 Sep 2024 06:46:39 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: Unspecified X-Bugzilla-Keywords: cam, performance X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: wbe@psr.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status keywords bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@FreeBSD.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D281528 Bug ID: 281528 Summary: SCSI tag-queue error with Samsung 870 EVO SSD [incl. workaround] Product: Base System Version: Unspecified Hardware: Any OS: Any Status: New Keywords: cam, performance Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: wbe@psr.com SUMMARY: Problem: A brand new, 4/2024 Samsung 870 EVO SSD connected via AHCI and SATA II to a Supermicro motherboard gets parity/CRC, ATA Status, and other errors. The SSD itself, via smartctl, reports no read/write or bad sector errors, just interface/CRC errors. Solution/workaround: "camcontrol negotiate $theSSD -T disable" to disable command queueing/tagging. DESCRIPTION: [This is an edited version of articles I posted to comp.unix.bsd.freebsd.mi= sc.] On a system running FreeBSD 14.1-RELEASE (though I don't think that matters= ), I connected a Samsung 870 EVO SSD via AMD-AHCI and SATA II (3.0Gb/s). The SSD is rated for SATA III (6.0Gb/s). Temperature is fine (~29C). Lots of errors occurred (see log extracts below). ZFS, for example, got about 180 write errors while resilvering ~80GB to the new/empty drive. Most errors seemed to be retryable and succeeded on the second try. My reading of the error messages and the output from smartctl -x indicated some kind of interface problem.=20 [Ignore the ada0/ada1 difference: that's my doing.] ---------- [sample log entries for some read errors:] [edited] Aug 21 03:01:24: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 78 ff = 64 40 13 00 00 00 00 00 Aug 21 03:01:24: (ada0:ahcich0:0:0:0): CAM status: Auto-Sense Retrieval Fai= led Aug 21 03:01:24: (ada0:ahcich0:0:0:0): Error 5, Unretryable error Aug 21 03:01:25: ahcich0: Timeout on slot 9 port 0 Aug 21 03:01:25: ahcich0: is 04000000 cs 00000200 ss 00000000 rs 00000200 t= fd 451 serr 00400000 cmd 0000e917 Aug 21 03:01:25: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 58 00 = 65 40 13 00 00 00 00 00 Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: Auto-Sense Retrieval Fai= led Aug 21 03:01:25: (ada0:ahcich0:0:0:0): Error 5, Unretryable error Aug 21 03:01:25: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 c0 36 = 65 40 13 00 00 00 00 00 Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error Aug 21 03:01:25: (ada0:ahcich0:0:0:0): ATA status: 00 () Aug 21 03:01:25: (ada0:ahcich0:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Aug 21 03:01:25: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain Aug 21 03:01:25: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 c8 ff = 64 40 13 00 00 00 00 00 Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error Aug 21 03:01:25: (ada0:ahcich0:0:0:0): ATA status: 00 () Aug 21 03:01:25: (ada0:ahcich0:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Aug 21 03:01:25: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain Aug 21 03:01:25 ZFS[1332]: vdev I/O failure, path=3D/dev/ada0p3 offset=3D149417648128 size=3D4096 error=3D5 Aug 21 03:01:26: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 38 88 e4= 80 40 13 00 00 00 00 00 Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error Aug 21 03:01:26: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain Aug 21 03:01:26: (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 10 48 = 29 40 05 00 00 00 00 00 Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error Aug 21 03:01:26: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remain Aug 21 03:01:26: (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 c0 e4= 80 40 13 00 00 00 00 00 Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/CRC error Aug 21 03:01:26: (ada0:ahcich0:0:0:0): Retrying command, 3 more tries remai= n. ---------- [sample log entries for the write errors during resilvering:] [edited] Aug 21 00:33:01: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 e0 b2= e7 40 02 00 00 00 00 00 Aug 21 00:33:01: (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Fai= led Aug 21 00:33:01: (ada1:ahcich1:0:0:0): Error 5, Unretryable error Aug 21 00:33:02: ahcich1: Timeout on slot 19 port 0 Aug 21 00:33:02: ahcich1: is 04000000 cs 00080000 ss 00000000 rs 00080000 t= fd 451 serr 00400000 cmd 0000f317 Aug 21 00:33:02: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 48 f0 b2= e7 40 02 00 00 00 00 00 Aug 21 00:33:02: (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Fai= led Aug 21 00:33:02: (ada1:ahcich1:0:0:0): Error 5, Unretryable error Aug 21 00:33:02 ZFS[1322]: vdev I/O failure, path=3D/dev/ada1p3 offset=3D77= 74244864 size=3D36864 error=3D5 Aug 21 00:33:05: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 78 2e= f4 40 02 00 00 00 00 00 Aug 21 00:33:05: (ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error Aug 21 00:33:05: (ada1:ahcich1:0:0:0): Retrying command, 3 more tries remain Aug 21 00:33:05: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 20 58 2e= f4 40 02 00 00 00 00 00 Aug 21 00:33:05: (ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error Aug 21 00:33:05: (ada1:ahcich1:0:0:0): Retrying command, 3 more tries remain [end of log entries] ---------- Here's smartctl -x output, keeping only what looked "interesting"/relevant: SATA Version is: SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s) 199 CRC_Error_Count -OSRCK 099 099 000 - 64 235 POR_Recovery_Count -O--C- 099 099 000 - 7 241 Total_LBAs_Written -O--CK 099 099 000 - 213396105 0x06 0x018 4 64 --- Number of Interface CRC Errors [WBE note: the 65535+ numbers below may be the result of my not knowing about the "-F samsung2" option to smartctl at the time. Currently (as I submit this), those numbers are 0s.] SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 2 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 65535+ R_ERR response for non-data FIS 0x0006 2 65535+ R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0008 2 0 Device-to-host non-data FIS retries 0x0009 2 5 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 5 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000d 2 65535+ Non-CRC errors within host-to-device FIS 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0010 2 0 R_ERR response for host-to-device data FIS, non-CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x0013 2 65535+ R_ERR response for host-to-device non-data FIS, non= -CRC SCT Error Recovery Control: Read: Disabled Write: Disabled ---------- The significant lines from the errors above were: > Aug 21 03:01:24: (ada0:ahcich0:0:0:0): CAM status: Auto-Sense Retrieval F= ailed > Aug 21 03:01:24: (ada0:ahcich0:0:0:0): Error 5, Unretryable error ... > Aug 21 03:01:25: (ada0:ahcich0:0:0:0): CAM status: ATA Status Error ... > Aug 21 03:01:26: (ada0:ahcich0:0:0:0): CAM status: Uncorrectable parity/C= RC error ... > 0x06 0x018 4 64 --- Number of Interface CRC Errors Results: * Test 1: It's not a bad data cable: Some people suggested the problem might be a bad cable. I ordered two new ones (SATA III). Tried both. Result: no improvement. It was cheap to try. * Test 2: leave queueing enabled and reduce the number of tags from 32 to 2. Didn't help: the errors continued to happen. * Fix 1: Disable command queueing ("camcontrol negotiate $theSSD -T disable= "). * Fix 2: Connect the SSD with a USB-to-SATA adapter cable. Perhaps this works because there's no command queueing over USB? It was suggested that I post this here, as perhaps FreeBSD can add a quirk = for these drives. Even if that's not appropriate, anyone else having this prob= lem can now find this workaround here on bugzilla (current USENET articles are = no longer archived by Google). Of course, Samsung may some day come out with new firmware that fixes this problem, in which case the quirk test might need to become "with firmware o= lder than ____". HTH, -WBE --=20 You are receiving this mail because: You are the assignee for the bug.=