Date: Tue, 12 Dec 2017 19:21:53 +0100 From: "O. Hartmann" <ohartmann@walstatt.org> To: FreeBSD CURRENT <freebsd-current@freebsd.org> Subject: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAM status: ATA Status Error Message-ID: <20171212192220.119ca2d3@thor.intern.walstatt.dynvpn.de>
next in thread | raw e-mail | index | archive | help
--Sig_/MAPZTDQbOAZbQS7jpzj0YDm
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Hello,
running CURRENT (recent r326769), I realised that smartmond sends out some =
console
messages when booting the box:
[...]
Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ada6, 1 Currently un=
readable
(pending) sectors Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ad=
a6, 1
Offline uncorrectable sectors
[...]
Checking the drive's SMART log with smartctl (it is one of four 3TB disk dr=
ives), I
gather these informations:
[... smartctl -x /dev/ada6 ...]
Error 42 [17] occurred at disk power-on lifetime: 25335 hours (1055 days + =
15 hours)
When the command that caused the error occurred, the device was active or=
idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- =3D=3D -- =3D=3D =3D=3D =3D=3D -- -- -- -- --
40 -- 51 00 00 00 00 c2 7a 72 98 40 00 Error: UNC at LBA =3D 0xc27a7298 =
=3D 3262804632
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_=
Name
-- =3D=3D -- =3D=3D -- =3D=3D =3D=3D =3D=3D -- -- -- -- -- -------------=
-- --------------------
60 00 b0 00 88 00 00 c2 7a 73 20 40 08 23:38:12.195 READ FPDMA QUEUED
60 00 b0 00 80 00 00 c2 7a 72 70 40 08 23:38:12.195 READ FPDMA QUEUED
2f 00 00 00 01 00 00 00 00 00 10 40 08 23:38:12.195 READ LOG EXT
60 00 b0 00 70 00 00 c2 7a 73 20 40 08 23:38:09.343 READ FPDMA QUEUED
60 00 b0 00 68 00 00 c2 7a 72 70 40 08 23:38:09.343 READ FPDMA QUEUED
[...]
and
[...]
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 64
3 Spin_Up_Time POS--K 178 170 021 - 6075
4 Start_Stop_Count -O--CK 098 098 000 - 2406
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 066 066 000 - 25339
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 100 000 - 0
12 Power_Cycle_Count -O--CK 098 098 000 - 2404
192 Power-Off_Retract_Count -O--CK 200 200 000 - 154
193 Load_Cycle_Count -O--CK 001 001 000 - 2055746
194 Temperature_Celsius -O---K 122 109 000 - 28
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 1
198 Offline_Uncorrectable ----CK 200 200 000 - 1
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 5
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
[...]
The ZFS pool is RAIDZ1, comprised of 3 WD Green 3TB HDD and one WD RED 3 TB=
HDD. The
failure occured is on one of the WD Green 3 TB HDD.
The pool is marked as "resilvered" - I do scrubbing on a regular basis and =
the
"resilvering" message has now aapeared the second time in row. Searching th=
e net
recommend on SMART attribute 197 errors, in my case it is one, and in combi=
nation with
the problems occured that I should replace the disk.
Well, here comes the problem. The box is comprised from "electronical waste=
" made by
ASRock - it is a Socket 1150/IvyBridge board, which has its last Firmware/B=
IOS update got
in 2013 and since then UEFI booting FreeBSD from a HDD isn't possible (just=
to indicate
that I'm aware of having issues with crap, but that is some other issue rig=
ht now). The
board's SATA connectors are all populated.
So: Due to the lack of adequate backup space I can only selectively backup =
portions, most
of the space is occupied by scientific modelling data, which I had worked o=
n. So backup
exists! In one way or the other. My concern is how to replace the faulty HD=
D! Most
HowTo's indicate a replacement disk being prepared and then "replaced" via =
ZFS's replace
command. This isn't applicable here.
Question: is it possible to simply pull the faulty disk (implies I know exa=
ctly which one
to pull!) and then prepare and add the replacement HDD and let the system d=
o its job
resilvering the pool?
Next question is: I'm about to replace the 3 TB HDD with a more recent and =
modern 4 TB
HDD (WD RED 4TB). I'm aware of the fact that I can only use 3 TB as the oth=
er disks are 3
TB, but I'd like to know whether FreeBSD's ZFS is capable of handling it?=20
This is the first time I have issues with ZFS and a faulty drive, so if som=
e of my
questions sound naive, please forgive me.
Thanks in advance,
Oliver
--=20
O. Hartmann
Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr
Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs.=
4 BDSG).
--Sig_/MAPZTDQbOAZbQS7jpzj0YDm
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature
-----BEGIN PGP SIGNATURE-----
iLUEARMKAB0WIQQZVZMzAtwC2T/86TrS528fyFhYlAUCWjAeXAAKCRDS528fyFhY
lI+BAf0XT3r8xc0Q7Sk907xI7WlEieVKtoQAGh675oWEUMMSDXWHhTpJNjcqfLfJ
8L1cerPxaJs935Kx9HO/pPDB1chdAf9QExo1rzvExWa7LKU0xKLig3Z9+kCytwdh
avY+STsj2LSW7DJZqUq7H74oLv5wA4XVWakchMR8ffTux93f124p
=ZXU2
-----END PGP SIGNATURE-----
--Sig_/MAPZTDQbOAZbQS7jpzj0YDm--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171212192220.119ca2d3>
