From nobody Tue Dec  3 05:15:24 2024
X-Original-To: questions@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Y2TPf5ykmz5fm13
	for <questions@mlmmj.nyi.freebsd.org>; Tue, 03 Dec 2024 05:15:34 +0000 (UTC)
	(envelope-from dpchrist@holgerdanske.com)
Received: from holgerdanske.com (holgerdanske.com [IPv6:2001:470:0:19b::b869:801b])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (4096 bits) client-digest SHA256)
	(Client CN "holgerdanske.com", Issuer "R10" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4Y2TPc6kp9z43xF
	for <questions@freebsd.org>; Tue,  3 Dec 2024 05:15:32 +0000 (UTC)
	(envelope-from dpchrist@holgerdanske.com)
Authentication-Results: mx1.freebsd.org;
	dkim=pass header.d=holgerdanske.com header.s=nov-20210719-112354 header.b=lcP39eQW;
	spf=pass (mx1.freebsd.org: domain of dpchrist@holgerdanske.com designates 2001:470:0:19b::b869:801b as permitted sender) smtp.mailfrom=dpchrist@holgerdanske.com;
	dmarc=pass (policy=none) header.from=holgerdanske.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=holgerdanske.com;
	s=nov-20210719-112354; t=1733202924;
	bh=V5DUN+JJ5G6qhjrEFPER0eIqjBvSwMZorDFGsx8Op8A=;
	h=Received:Message-ID:Date:MIME-Version:User-Agent:Subject:To:
	 References:Content-Language:From:In-Reply-To:Content-Type:
	 Content-Transfer-Encoding;
	b=lcP39eQW1TquB2FUum1w235D68BDfCdmLmq94nIJireMYEe2JvYMx+BQHAW6vz6ud
	 Q78qQuPFRBlNYhhucx598rWlL59Q33D0ZG66PXo8YixtKFPUI23i3HdJlwRaYJ9tXF
	 hWBAXgpJNZhvnJexWXKRjXRpBLhZ4IElGZolotLzMoBmV5+9MTUzEL+7nD0h86N4OT
	 c3oM4mgMu/iZEOcTc6W5PAA1RgZYlUc9XSG2jMKkTcjKkkhE4vxWWlUERetRD+44He
	 Cd1dl+VFtMlK6tnKUba1kSAA4flKmSOtj/TC10T+GVamutQiwM2GBkXNDkKCSDCvtu
	 mn8tl+a7hTzCJf/68RJV5v7aWKQajyqeW+5sGh+W0IFcC3NKk81u0Bnm55BOFYK45B
	 b4/SvhKVfhXzH+40dsPMRq688V62VEFPbvfmWgE+btDWDkZhYwhDQR29/YD2WQnlFn
	 /obTNnGW4v6mmvJGF0lzVco48I2zJTAl2L2Nssb0eWH0kq6vXxu9q8Wnvi5gkPB5kt
	 53X5O7MVSgxjiTvutPF70JAYT5Cv8WAMj1A67G1IKq8Y8cas8BX3mYTsY5F1Vm73Yt
	 cQCmdZl35aYAaa88LJx+umq1be96jnlUGSWWuyh1wFTG+RPAHdSkRFPk/rpqSiE/if
	 kE1wsOiCTgI39wLr/tzhHukc=
Received: from 99.100.19.101 (99-100-19-101.lightspeed.frokca.sbcglobal.net [99.100.19.101])
	by holgerdanske.com with ESMTPSA (TLS_AES_128_GCM_SHA256:TLSv1.3:Kx=any:Au=any:Enc=AESGCM(128):Mac=AEAD) (SMTP-AUTH username dpchrist@holgerdanske.com, mechanism PLAIN)
	for <questions@freebsd.org>; Mon, 2 Dec 2024 21:15:24 -0800
Message-ID: <c5291a54-36e7-40fa-ac65-0e4347b05306@holgerdanske.com>
Date: Mon, 2 Dec 2024 21:15:24 -0800
List-Id: User questions <freebsd-questions.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-questions
List-Help: <mailto:questions+help@freebsd.org>
List-Post: <mailto:questions@freebsd.org>
List-Subscribe: <mailto:questions+subscribe@freebsd.org>
List-Unsubscribe: <mailto:questions+unsubscribe@freebsd.org>
X-BeenThere: freebsd-questions@freebsd.org
Sender: owner-freebsd-questions@FreeBSD.org
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: CAM status: SCSI Status Error
To: questions@freebsd.org
References: <665ca364-6538-4ef7-bb8b-260dd86ca0bb@app.fastmail.com>
 <20721bcf-7c99-4918-bbb0-53d6c8e9cda7@holgerdanske.com>
 <3a9549fa-c8e1-479e-8492-6dd812462731@app.fastmail.com>
Content-Language: en-US
From: David Christensen <dpchrist@holgerdanske.com>
In-Reply-To: <3a9549fa-c8e1-479e-8492-6dd812462731@app.fastmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spamd-Result: default: False [-3.89 / 15.00];
	NEURAL_HAM_MEDIUM(-1.00)[-1.000];
	NEURAL_HAM_LONG(-1.00)[-1.000];
	NEURAL_HAM_SHORT(-1.00)[-1.000];
	DMARC_POLICY_ALLOW(-0.50)[holgerdanske.com,none];
	R_SPF_ALLOW(-0.20)[+a];
	R_DKIM_ALLOW(-0.20)[holgerdanske.com:s=nov-20210719-112354];
	ONCE_RECEIVED(0.10)[];
	MIME_GOOD(-0.10)[text/plain];
	XM_UA_NO_VERSION(0.01)[];
	RCPT_COUNT_ONE(0.00)[1];
	RCVD_VIA_SMTP_AUTH(0.00)[];
	ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US];
	MIME_TRACE(0.00)[0:+];
	RCVD_COUNT_ONE(0.00)[1];
	RCVD_TLS_ALL(0.00)[];
	MLMMJ_DEST(0.00)[questions@freebsd.org];
	ARC_NA(0.00)[];
	FROM_EQ_ENVFROM(0.00)[];
	FROM_HAS_DN(0.00)[];
	MID_RHS_MATCH_FROM(0.00)[];
	TO_DN_NONE(0.00)[];
	PREVIOUSLY_DELIVERED(0.00)[questions@freebsd.org];
	TO_MATCH_ENVRCPT_ALL(0.00)[];
	DKIM_TRACE(0.00)[holgerdanske.com:+]
X-Rspamd-Queue-Id: 4Y2TPc6kp9z43xF
X-Spamd-Bar: ---

On 12/2/24 08:15, Dan Langille wrote:
> On Fri, Nov 22, 2024, at 1:14 PM, David Christensen wrote:
>> On 11/22/24 05:11, Dan Langille wrote:
>>> On FreeBSD 14.1, is this a server issue (e.g. cable/hardware) as opposed to a drive issue?
>>>
>>> Nov 21 05:28:48 r730-03 kernel: (da7:mrsas0:1:7:0): READ(10). CDB: 28 00 aa d9 5b 5f 00 00 20 00
>>> Nov 21 05:28:48 r730-03 kernel: (da7:mrsas0:1:7:0): CAM status: SCSI Status Error
>>> Nov 21 05:28:48 r730-03 kernel: (da7:mrsas0:1:7:0): SCSI status: OK
>>> Nov 21 05:28:54 r730-03 kernel: (da7:mrsas0:1:7:0): READ(10). CDB: 28 00 aa d9 6b 08 00 00 10 00
>>> Nov 21 05:28:54 r730-03 kernel: (da7:mrsas0:1:7:0): CAM status: SCSI Status Error
>>> Nov 21 05:28:54 r730-03 kernel: (da7:mrsas0:1:7:0): SCSI status: OK
>>> Nov 21 05:55:34 r730-03 smartd[17215]: Device: /dev/da7 [SAT], ATA error count increased from 4 to 8
>>
>>
>> I believe those errors are related to the connection between the drive
>> and the host -- e.g. cables, connectors, and/or interface chips.  I
>> would replace the cable with a known good cable.
> 
> This drive is in a drive bay. Perhaps a re-seat is called for.


Yes.  I might clean whatever electrical contacts are accessible with a 
cotton swap and rubbing alcohol, then re-seat the connection a couple of 
times to wipe the pins and sleeves.


>> A failing power supply can cause all sorts of problems.  I would check
>> the PSU with a hardware tester.
> 
> I don't have that option. It is a Dell R730 with dual PSU.


Understood.  Do the PSU's and/or server have PSU test buttons and/or 
status LED's?


>>> Followed by this from time to time:
>>>
>>> Nov 21 16:55:33 r730-03 smartd[17215]: Device: /dev/da7 [SAT], Self-Test Log error count increased from 0 to 1
>>> Nov 22 11:25:35 r730-03 smartd[17215]: Device: /dev/da7 [SAT], 1 Currently unreadable (pending) sectors
>>
>>
>> STFW I found a good explanation for pending sectors:
>>
>> https://superuser.com/questions/384095/how-to-force-a-remap-of-sectors-reported-in-s-m-a-r-t-c5-current-pending-sector
>>
>>
>> If you can identify the address (LBA) of the bad sector, you could use
>> dd(1) to overwrite the bad sector.  If the drive is in an operating
>> pool, this could be risky.  Shutting down and using live media would be
>> safer.  In either case, you will want to scrub afterwards.
> 
> Sounds like RMA is much easier. ;)


If the warranty covers "unreadable (pending) sectors", perhaps so.


Otherwise, I think failing sectors on magnetic HDD's have become a fact 
of life; given the fact that disk drives have become so large and 
contain so many sectors.  With ZFS, sufficient redundancy, regular 
scrubs, and system administrator intervention, if the quantity and 
frequency of failed sectors is small enough then there should be no data 
loss.  Continued use of such drives may be justified.  Of course, 
continue to backup and archive regularly.


> There is a replacement drive here now. I'm just waiting for other hardware to arrive. All the drive bays are full. I'm going to move 2x 2.5" drives to the read via PCIe slots.


What is "read via PCIe slots"?  Please clarify.


> That will allow me to install the new drive, add it as a replacement to the mirror. When resilvered, the old drive will be dropped out of the filesystem.
> 
> Then I can play with zeroing the whole drive. 


I would add the replacement drive to the pool, allow it to resilver, 
remove the drive in question from the pool, physically remove the drive 
in question, and put the drive in question  into a workbench machine for 
testing and trouble-shooting.  I would overwrite the problematic sector 
and then run a SMART long test.


> If energetic, I may then add the drive back as a single drive filesystem (for testing purposes). Then fill it up with data and see how that goes.
> 
> Thank you.


YW.  Let us know how it turns out.


David