From nobody Thu Jul 10 17:03:56 2025 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4bdLlw4MvFz61HhL; Thu, 10 Jul 2025 17:03:56 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4bdLlw1hsSz3sbT; Thu, 10 Jul 2025 17:03:56 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1752167036; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=D9AvvtslT+j+c8PRCPCJ8XlXqQqGknbzX9yFF6bPrBQ=; b=pjdyG9nsB0v9YBfDX8E5GCDTALwwb8X/06CK6jTdpF/vGYeWayiMYWiqdsdzTYiGZq4HaI 0HoZOY3rP+yeWixjWaadiGINEskI/7c+/vreXCWRaenA0yVChWBWmqVH1RZ/sYKI7rTQcr 9XJDhaqRbKQqnot/WyeIPP6PP8O0+qHDt315GhENrO+LkZzLGQEBRiKngHrb+kjAapl8gw GTFN81d6db6KI2itXtWIy58Ma8bDkpQBThBDuG6IVUCcuPt88Pxe/ga3lXlxuxDrB2x1JI PGteNa6Q6VW+Jtt7tlHqy8RGs8PWs33XUUebbYT2Vny6OcaO+0M458r6Qkoggg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1752167036; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=D9AvvtslT+j+c8PRCPCJ8XlXqQqGknbzX9yFF6bPrBQ=; b=XqlP4+4UpWiFvz2y5Uh8y82zJyAJIpsBjeutSE6uXbaPBCsiOrsaydUzLYiQJ6pN8EZ/wH 4/p121VEkMtj9/bx4A5eHg24zOCYQCK+GzKQo6ubB8wJmNiz8K5KBiypeav9J9hcqi6Svh nv9KQJXWa62H5LbUr1YY6YF/EQfcDdJ+UYEqWg9y4B6zf8O34jVpdYRkz/m4N/Mky5Nk89 +yoAlJdv+sa4gFNHnEARaIxKm3Ho+oZfhRNgpSdAX83t+0l0KTQJaEvcaTxd9AadYJtXXe Be0San6XKAtZhCXmchJ+0mfGP28w4ewQoQttD8KvOEZr+RbP7QYEthHwq7LXnQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1752167036; a=rsa-sha256; cv=none; b=uCsQC8vgMuXhhl2ERsdnswNR+sc2m6EMgmCxgLlTjnB1pVBw+1NHjX6mmPeoohSdEdq2/a S9BRWSp20x3V5hBv/CTdsOtNpsh8E4Z2+/v/TFVT0OTUOMsAwyVgJD4kxrRtxK/E+LjE7n 3PLhorGYtw/sI7vpq+gdwuEj7n4fUozDXwofmogK23RjydTiOUPBqJ+Y92TtCYQREn+5EC ZjHv4ozA4ZShmwMUGlp372Dmmo89XObge1XNWOIhNLmFNANmfUyw3AP55nNzaHqJt4/o2c /DkNv91cMiEQRi3UB3gH/ZMisXDkBSCMeS78UHNdrwMeRnofCCMRRNTvBJ34Gw== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4bdLlw1FfPz1G2x; Thu, 10 Jul 2025 17:03:56 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 56AH3uE0078988; Thu, 10 Jul 2025 17:03:56 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 56AH3uEj078985; Thu, 10 Jul 2025 17:03:56 GMT (envelope-from git) Date: Thu, 10 Jul 2025 17:03:56 GMT Message-Id: <202507101703.56AH3uEj078985@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Warner Losh Subject: git: d78d04b17cb2 - main - cam: Fail the disk if READ CAPACITY returns 4/2 asc/ascq List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-main@freebsd.org Sender: owner-dev-commits-src-main@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: imp X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: d78d04b17cb2498186e8fd2681f224a760e75b28 Auto-Submitted: auto-generated The branch main has been updated by imp: URL: https://cgit.FreeBSD.org/src/commit/?id=d78d04b17cb2498186e8fd2681f224a760e75b28 commit d78d04b17cb2498186e8fd2681f224a760e75b28 Author: Warner Losh AuthorDate: 2025-07-10 15:56:26 +0000 Commit: Warner Losh CommitDate: 2025-07-10 16:17:01 +0000 cam: Fail the disk if READ CAPACITY returns 4/2 asc/ascq HGST disks that are sick are returning 44/0 for START UNIT (which we ignore) and then 4/2 on READ CAPACITY. START UNIT should be enough for READ CAPACITY to succeed or UNIT ATTENTION. However, we get NOT_READ + 4/2 back. I've seen this on several models of HGST drives. Invalidate the peripheral when we detect this condition. This is likely the least bad thing we can do: It removes access to daX, but leaves passY so logs may be extracted (if awkwardly). Removing daX access removes the disk device that causes problems to geom outlined below. Although the timeout is 5s for READ_CAPACITY, we wait the full 30s for READ_CAPACITY_16. This causes us to stall booting as we start to taste as soon as we release the final hold... but the tasting means g_wait_idle() takes now takes over 5 minutes to clear since we do this for all the opens. Even using a timeout of 3s instead of 30s leads to boot times of almost 5 minutes in these cases, so there are other, downstream operations that are taking a while, so it's not just a matter of adjusting the timeout. Failing the periph early solves the bulk of this problem (the tasting related delays). What the HBA does is HBA specific and some have firmwares that are also confused by this when they enumerate or discover the drive, leading to long (but still shorter than 5 minute) delays. This patch won't solve that aspect of startup delays with sick disks. Perhaps we should fail the periph when START UNIT fails with the same codes we check in the read capacity path. I'm reluctant to do such a global change since it's in cam_periph, and there seems no good way to flag that we want this behavior. It's also a bit magical when it runs (some drive report 44/0 always, and some just report it on START UNIT, and these HGST drive fall into the latter category). Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D51218 --- sys/cam/scsi/scsi_da.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/sys/cam/scsi/scsi_da.c b/sys/cam/scsi/scsi_da.c index 9eda664ee7b0..d02750aaacaf 100644 --- a/sys/cam/scsi/scsi_da.c +++ b/sys/cam/scsi/scsi_da.c @@ -5073,6 +5073,18 @@ dadone_proberc(struct cam_periph *periph, union ccb *done_ccb) * behind a SATL translation that's fallen into a * terminally fatal state. * + * 4/2 happens on some HGST drives that are quite + * ill. We've already sent the start unit command (for + * which we ignore a 44/0 asc/ascq, which I'm hesitant + * to change since it's so basic and there's other error + * conditions to the START UNIT we should ignore). So to + * require initialization at this point when it should + * be fine implies to me, at least, that we should + * invalidate. Since we do read capacity in geom tasting + * a lot, and since this timeout is long, this leads to + * up to a 10 minute delay in booting. + * + * 4/2: LOGICAL UNIT NOT READY, INITIALIZING COMMAND REQUIRED * 25/0: LOGICAL UNIT NOT SUPPORTED * 44/0: INTERNAL TARGET FAILURE * 44/1: PERSISTENT RESERVATION INFORMATION LOST @@ -5080,6 +5092,7 @@ dadone_proberc(struct cam_periph *periph, union ccb *done_ccb) */ if ((have_sense) && (asc != 0x25) && (asc != 0x44) + && (asc != 0x04 && ascq != 0x02) && (error_code == SSD_CURRENT_ERROR || error_code == SSD_DESC_CURRENT_ERROR)) { const char *sense_key_desc;