From nobody Tue Dec 3 05:15:24 2024 X-Original-To: questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Y2TPf5ykmz5fm13 for ; Tue, 03 Dec 2024 05:15:34 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Received: from holgerdanske.com (holgerdanske.com [IPv6:2001:470:0:19b::b869:801b]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "holgerdanske.com", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Y2TPc6kp9z43xF for ; Tue, 3 Dec 2024 05:15:32 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=holgerdanske.com header.s=nov-20210719-112354 header.b=lcP39eQW; spf=pass (mx1.freebsd.org: domain of dpchrist@holgerdanske.com designates 2001:470:0:19b::b869:801b as permitted sender) smtp.mailfrom=dpchrist@holgerdanske.com; dmarc=pass (policy=none) header.from=holgerdanske.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=holgerdanske.com; s=nov-20210719-112354; t=1733202924; bh=V5DUN+JJ5G6qhjrEFPER0eIqjBvSwMZorDFGsx8Op8A=; h=Received:Message-ID:Date:MIME-Version:User-Agent:Subject:To: References:Content-Language:From:In-Reply-To:Content-Type: Content-Transfer-Encoding; b=lcP39eQW1TquB2FUum1w235D68BDfCdmLmq94nIJireMYEe2JvYMx+BQHAW6vz6ud Q78qQuPFRBlNYhhucx598rWlL59Q33D0ZG66PXo8YixtKFPUI23i3HdJlwRaYJ9tXF hWBAXgpJNZhvnJexWXKRjXRpBLhZ4IElGZolotLzMoBmV5+9MTUzEL+7nD0h86N4OT c3oM4mgMu/iZEOcTc6W5PAA1RgZYlUc9XSG2jMKkTcjKkkhE4vxWWlUERetRD+44He Cd1dl+VFtMlK6tnKUba1kSAA4flKmSOtj/TC10T+GVamutQiwM2GBkXNDkKCSDCvtu mn8tl+a7hTzCJf/68RJV5v7aWKQajyqeW+5sGh+W0IFcC3NKk81u0Bnm55BOFYK45B b4/SvhKVfhXzH+40dsPMRq688V62VEFPbvfmWgE+btDWDkZhYwhDQR29/YD2WQnlFn /obTNnGW4v6mmvJGF0lzVco48I2zJTAl2L2Nssb0eWH0kq6vXxu9q8Wnvi5gkPB5kt 53X5O7MVSgxjiTvutPF70JAYT5Cv8WAMj1A67G1IKq8Y8cas8BX3mYTsY5F1Vm73Yt cQCmdZl35aYAaa88LJx+umq1be96jnlUGSWWuyh1wFTG+RPAHdSkRFPk/rpqSiE/if kE1wsOiCTgI39wLr/tzhHukc= Received: from 99.100.19.101 (99-100-19-101.lightspeed.frokca.sbcglobal.net [99.100.19.101]) by holgerdanske.com with ESMTPSA (TLS_AES_128_GCM_SHA256:TLSv1.3:Kx=any:Au=any:Enc=AESGCM(128):Mac=AEAD) (SMTP-AUTH username dpchrist@holgerdanske.com, mechanism PLAIN) for ; Mon, 2 Dec 2024 21:15:24 -0800 Message-ID: Date: Mon, 2 Dec 2024 21:15:24 -0800 List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-questions@freebsd.org Sender: owner-freebsd-questions@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: CAM status: SCSI Status Error To: questions@freebsd.org References: <665ca364-6538-4ef7-bb8b-260dd86ca0bb@app.fastmail.com> <20721bcf-7c99-4918-bbb0-53d6c8e9cda7@holgerdanske.com> <3a9549fa-c8e1-479e-8492-6dd812462731@app.fastmail.com> Content-Language: en-US From: David Christensen In-Reply-To: <3a9549fa-c8e1-479e-8492-6dd812462731@app.fastmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spamd-Result: default: False [-3.89 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[holgerdanske.com,none]; R_SPF_ALLOW(-0.20)[+a]; R_DKIM_ALLOW(-0.20)[holgerdanske.com:s=nov-20210719-112354]; ONCE_RECEIVED(0.10)[]; MIME_GOOD(-0.10)[text/plain]; XM_UA_NO_VERSION(0.01)[]; RCPT_COUNT_ONE(0.00)[1]; RCVD_VIA_SMTP_AUTH(0.00)[]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_ONE(0.00)[1]; RCVD_TLS_ALL(0.00)[]; MLMMJ_DEST(0.00)[questions@freebsd.org]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[questions@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[holgerdanske.com:+] X-Rspamd-Queue-Id: 4Y2TPc6kp9z43xF X-Spamd-Bar: --- On 12/2/24 08:15, Dan Langille wrote: > On Fri, Nov 22, 2024, at 1:14 PM, David Christensen wrote: >> On 11/22/24 05:11, Dan Langille wrote: >>> On FreeBSD 14.1, is this a server issue (e.g. cable/hardware) as opposed to a drive issue? >>> >>> Nov 21 05:28:48 r730-03 kernel: (da7:mrsas0:1:7:0): READ(10). CDB: 28 00 aa d9 5b 5f 00 00 20 00 >>> Nov 21 05:28:48 r730-03 kernel: (da7:mrsas0:1:7:0): CAM status: SCSI Status Error >>> Nov 21 05:28:48 r730-03 kernel: (da7:mrsas0:1:7:0): SCSI status: OK >>> Nov 21 05:28:54 r730-03 kernel: (da7:mrsas0:1:7:0): READ(10). CDB: 28 00 aa d9 6b 08 00 00 10 00 >>> Nov 21 05:28:54 r730-03 kernel: (da7:mrsas0:1:7:0): CAM status: SCSI Status Error >>> Nov 21 05:28:54 r730-03 kernel: (da7:mrsas0:1:7:0): SCSI status: OK >>> Nov 21 05:55:34 r730-03 smartd[17215]: Device: /dev/da7 [SAT], ATA error count increased from 4 to 8 >> >> >> I believe those errors are related to the connection between the drive >> and the host -- e.g. cables, connectors, and/or interface chips. I >> would replace the cable with a known good cable. > > This drive is in a drive bay. Perhaps a re-seat is called for. Yes. I might clean whatever electrical contacts are accessible with a cotton swap and rubbing alcohol, then re-seat the connection a couple of times to wipe the pins and sleeves. >> A failing power supply can cause all sorts of problems. I would check >> the PSU with a hardware tester. > > I don't have that option. It is a Dell R730 with dual PSU. Understood. Do the PSU's and/or server have PSU test buttons and/or status LED's? >>> Followed by this from time to time: >>> >>> Nov 21 16:55:33 r730-03 smartd[17215]: Device: /dev/da7 [SAT], Self-Test Log error count increased from 0 to 1 >>> Nov 22 11:25:35 r730-03 smartd[17215]: Device: /dev/da7 [SAT], 1 Currently unreadable (pending) sectors >> >> >> STFW I found a good explanation for pending sectors: >> >> https://superuser.com/questions/384095/how-to-force-a-remap-of-sectors-reported-in-s-m-a-r-t-c5-current-pending-sector >> >> >> If you can identify the address (LBA) of the bad sector, you could use >> dd(1) to overwrite the bad sector. If the drive is in an operating >> pool, this could be risky. Shutting down and using live media would be >> safer. In either case, you will want to scrub afterwards. > > Sounds like RMA is much easier. ;) If the warranty covers "unreadable (pending) sectors", perhaps so. Otherwise, I think failing sectors on magnetic HDD's have become a fact of life; given the fact that disk drives have become so large and contain so many sectors. With ZFS, sufficient redundancy, regular scrubs, and system administrator intervention, if the quantity and frequency of failed sectors is small enough then there should be no data loss. Continued use of such drives may be justified. Of course, continue to backup and archive regularly. > There is a replacement drive here now. I'm just waiting for other hardware to arrive. All the drive bays are full. I'm going to move 2x 2.5" drives to the read via PCIe slots. What is "read via PCIe slots"? Please clarify. > That will allow me to install the new drive, add it as a replacement to the mirror. When resilvered, the old drive will be dropped out of the filesystem. > > Then I can play with zeroing the whole drive. I would add the replacement drive to the pool, allow it to resilver, remove the drive in question from the pool, physically remove the drive in question, and put the drive in question into a workbench machine for testing and trouble-shooting. I would overwrite the problematic sector and then run a SMART long test. > If energetic, I may then add the drive back as a single drive filesystem (for testing purposes). Then fill it up with data and see how that goes. > > Thank you. YW. Let us know how it turns out. David