Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Dec 2017 19:21:53 +0100
From:      "O. Hartmann" <ohartmann@walstatt.org>
To:        FreeBSD CURRENT <freebsd-current@freebsd.org>
Subject:   SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAM status: ATA Status Error
Message-ID:  <20171212192220.119ca2d3@thor.intern.walstatt.dynvpn.de>

next in thread | raw e-mail | index | archive | help
--Sig_/MAPZTDQbOAZbQS7jpzj0YDm
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hello,

running CURRENT (recent r326769), I realised that smartmond sends out some =
console
messages when booting the box:

[...]
Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ada6, 1 Currently un=
readable
(pending) sectors Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ad=
a6, 1
Offline uncorrectable sectors
[...]

Checking the drive's SMART log with smartctl (it is one of four 3TB disk dr=
ives), I
gather these informations:

[... smartctl -x /dev/ada6 ...]
Error 42 [17] occurred at disk power-on lifetime: 25335 hours (1055 days + =
15 hours)
  When the command that caused the error occurred, the device was active or=
 idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- =3D=3D -- =3D=3D =3D=3D =3D=3D -- -- -- -- --
  40 -- 51 00 00 00 00 c2 7a 72 98 40 00  Error: UNC at LBA =3D 0xc27a7298 =
=3D 3262804632

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_=
Name
  -- =3D=3D -- =3D=3D -- =3D=3D =3D=3D =3D=3D -- -- -- -- --  -------------=
--  --------------------
  60 00 b0 00 88 00 00 c2 7a 73 20 40 08     23:38:12.195  READ FPDMA QUEUED
  60 00 b0 00 80 00 00 c2 7a 72 70 40 08     23:38:12.195  READ FPDMA QUEUED
  2f 00 00 00 01 00 00 00 00 00 10 40 08     23:38:12.195  READ LOG EXT
  60 00 b0 00 70 00 00 c2 7a 73 20 40 08     23:38:09.343  READ FPDMA QUEUED
  60 00 b0 00 68 00 00 c2 7a 72 70 40 08     23:38:09.343  READ FPDMA QUEUED
[...]

and

[...]
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    64
  3 Spin_Up_Time            POS--K   178   170   021    -    6075
  4 Start_Stop_Count        -O--CK   098   098   000    -    2406
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   066   066   000    -    25339
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   098   098   000    -    2404
192 Power-Off_Retract_Count -O--CK   200   200   000    -    154
193 Load_Cycle_Count        -O--CK   001   001   000    -    2055746
194 Temperature_Celsius     -O---K   122   109   000    -    28
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    1
198 Offline_Uncorrectable   ----CK   200   200   000    -    1
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    5
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

[...]

The ZFS pool is RAIDZ1, comprised of 3 WD Green 3TB HDD and one WD RED 3 TB=
 HDD. The
failure occured is on one of the WD Green 3 TB HDD.

The pool is marked as "resilvered" - I do scrubbing on a regular basis and =
the
"resilvering" message has now aapeared the second time in row. Searching th=
e net
recommend on SMART attribute 197 errors, in my case it is one, and in combi=
nation with
the problems occured that I should replace the disk.

Well, here comes the problem. The box is comprised from "electronical waste=
" made by
ASRock - it is a Socket 1150/IvyBridge board, which has its last Firmware/B=
IOS update got
in 2013 and since then UEFI booting FreeBSD from a HDD isn't possible (just=
 to indicate
that I'm aware of having issues with crap, but that is some other issue rig=
ht now). The
board's SATA connectors are all populated.

So: Due to the lack of adequate backup space I can only selectively backup =
portions, most
of the space is occupied by scientific modelling data, which I had worked o=
n. So backup
exists! In one way or the other. My concern is how to replace the faulty HD=
D! Most
HowTo's indicate a replacement disk being prepared and then "replaced" via =
ZFS's replace
command. This isn't applicable here.

Question: is it possible to simply pull the faulty disk (implies I know exa=
ctly which one
to pull!) and then prepare and add the replacement HDD and let the system d=
o its job
resilvering the pool?

Next question is: I'm about to replace the 3 TB HDD with a more recent and =
modern 4 TB
HDD (WD RED 4TB). I'm aware of the fact that I can only use 3 TB as the oth=
er disks are 3
TB, but I'd like to know whether FreeBSD's ZFS is capable of handling it?=20

This is the first time I have issues with ZFS and a faulty drive, so if som=
e of my
questions sound naive, please forgive me.

Thanks in advance,

Oliver

--=20
O. Hartmann

Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr
Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs.=
 4 BDSG).

--Sig_/MAPZTDQbOAZbQS7jpzj0YDm
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----

iLUEARMKAB0WIQQZVZMzAtwC2T/86TrS528fyFhYlAUCWjAeXAAKCRDS528fyFhY
lI+BAf0XT3r8xc0Q7Sk907xI7WlEieVKtoQAGh675oWEUMMSDXWHhTpJNjcqfLfJ
8L1cerPxaJs935Kx9HO/pPDB1chdAf9QExo1rzvExWa7LKU0xKLig3Z9+kCytwdh
avY+STsj2LSW7DJZqUq7H74oLv5wA4XVWakchMR8ffTux93f124p
=ZXU2
-----END PGP SIGNATURE-----

--Sig_/MAPZTDQbOAZbQS7jpzj0YDm--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171212192220.119ca2d3>