From owner-freebsd-current@freebsd.org  Tue Dec 12 18:22:36 2017
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9E721EA1D2C
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Tue, 12 Dec 2017 18:22:36 +0000 (UTC)
 (envelope-from ohartmann@walstatt.org)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.19])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 17B3469FED
 for <freebsd-current@freebsd.org>; Tue, 12 Dec 2017 18:22:35 +0000 (UTC)
 (envelope-from ohartmann@walstatt.org)
Received: from thor.intern.walstatt.dynvpn.de ([85.182.112.82]) by
 mail.gmx.com (mrgmx001 [212.227.17.190]) with ESMTPSA (Nemesis) id
 0Me8di-1efuU12t2M-00PwKX for <freebsd-current@freebsd.org>; Tue, 12 Dec 2017
 19:22:27 +0100
Date: Tue, 12 Dec 2017 19:21:53 +0100
From: "O. Hartmann" <ohartmann@walstatt.org>
To: FreeBSD CURRENT <freebsd-current@freebsd.org>
Subject: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAM
 status: ATA Status Error
Message-ID: <20171212192220.119ca2d3@thor.intern.walstatt.dynvpn.de>
Organization: WALSTATT
User-Agent: OutScare 3.1415926
X-Operating-System: ImNotAnOperatingSystem 3.141592527
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
 boundary="Sig_/MAPZTDQbOAZbQS7jpzj0YDm"; protocol="application/pgp-signature"
X-Provags-ID: V03:K0:lDNfAEM+uLEY3ncTXOUdlBMDhDGn0Gi5BGy68q5qxW1hRN8rDwJ
 3mFuIaXSa8af+C2e8FhXE05PpkaNNFe9NbVfHEA2PyEFgSRtWUbp++Pr1QCQt+MOtVO6RKS
 I1amIj9klTliQGSJXOUMfgs2TOuRa7bjYf9xO8oFc/1UYdcuUWZW9ob+i5uA2f851sarql1
 63R1dXZLvGAJ6y0ZJIOLQ==
X-UI-Out-Filterresults: notjunk:1;V01:K0:hoSxMlyodDI=:HC0HtVJ0PdftgKyzQn18TF
 gFEyXS5yciZ7AxB62uaHMJmPZS4VfSmTIiZOHN86A6GjdRFGZADOpaZSIiUajG66VZpQwtuB+
 r7TaEJZIrEqzhepfa5sUKnAQy+UsHWPbJ7+jiADRYcrtHNp2NTFryFIBiEamv63zAKxnyKu/q
 l6tGug6EzntCU9ByatFziTAvpCW/Vbw9fNuL0NiNnf9kllOJxeBbRzCfLkOCouXvwnT9S+r3m
 i89ZXEmaNEG6V+K0uPlTQEUFJfG1bB54yBiKfveGBHiRuLLaY/d07e0qWbVUIpEp4VrmWqrbo
 UdeFWewavpnDj6OI7yizlKiw4sBqK00P2mKAn7zydvtGfW7M2S0fwQnK3/l9QHE8C8ZzrQ9Ef
 l4SFKBhJQiVumuczxmlNe5lm8ihiYNJkpZCUzjoAJd5fOMHwTAlHa9/Z8QIDTECIkUhC64wgf
 y5QZZ9dA9r8vdb64PrXwKe5/PNBWryFawZhD1AZkSZXYUf3jJswUmwDonlS21Zrl3i9hHMrVE
 IXKKVem091QcfFHkgeiHpYqCyEljYL/HNPA3YgVhSiGeLdEme2bBsM+GpIb1GhRwlhgbd2VVg
 NgSQj6kyvP2DQ5lIpfOAesagWmIxq1BMnssuIimd3N9z/iEVaxJvXNVjJidPv8JU2Mi89V4pp
 F6ZoP5zRvikCC5HdCmRMN7tqBs3NcfHxmIpxD9UVYlzLeDmXc6LWQap8bmnswGaWLk/34Mgvp
 yrPSLa3mA9N4bMC+k68S5F58RLkoiQnTbur96jb3vJdhZbLXVXGxO5KFboQaWcecoV0bAQXoL
 xXSgkKWSMd73665zzcp3FSz1F/rKg==
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Dec 2017 18:22:36 -0000

--Sig_/MAPZTDQbOAZbQS7jpzj0YDm
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hello,

running CURRENT (recent r326769), I realised that smartmond sends out some =
console
messages when booting the box:

[...]
Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ada6, 1 Currently un=
readable
(pending) sectors Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ad=
a6, 1
Offline uncorrectable sectors
[...]

Checking the drive's SMART log with smartctl (it is one of four 3TB disk dr=
ives), I
gather these informations:

[... smartctl -x /dev/ada6 ...]
Error 42 [17] occurred at disk power-on lifetime: 25335 hours (1055 days + =
15 hours)
  When the command that caused the error occurred, the device was active or=
 idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- =3D=3D -- =3D=3D =3D=3D =3D=3D -- -- -- -- --
  40 -- 51 00 00 00 00 c2 7a 72 98 40 00  Error: UNC at LBA =3D 0xc27a7298 =
=3D 3262804632

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_=
Name
  -- =3D=3D -- =3D=3D -- =3D=3D =3D=3D =3D=3D -- -- -- -- --  -------------=
--  --------------------
  60 00 b0 00 88 00 00 c2 7a 73 20 40 08     23:38:12.195  READ FPDMA QUEUED
  60 00 b0 00 80 00 00 c2 7a 72 70 40 08     23:38:12.195  READ FPDMA QUEUED
  2f 00 00 00 01 00 00 00 00 00 10 40 08     23:38:12.195  READ LOG EXT
  60 00 b0 00 70 00 00 c2 7a 73 20 40 08     23:38:09.343  READ FPDMA QUEUED
  60 00 b0 00 68 00 00 c2 7a 72 70 40 08     23:38:09.343  READ FPDMA QUEUED
[...]

and

[...]
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    64
  3 Spin_Up_Time            POS--K   178   170   021    -    6075
  4 Start_Stop_Count        -O--CK   098   098   000    -    2406
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   066   066   000    -    25339
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   098   098   000    -    2404
192 Power-Off_Retract_Count -O--CK   200   200   000    -    154
193 Load_Cycle_Count        -O--CK   001   001   000    -    2055746
194 Temperature_Celsius     -O---K   122   109   000    -    28
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    1
198 Offline_Uncorrectable   ----CK   200   200   000    -    1
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    5
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

[...]

The ZFS pool is RAIDZ1, comprised of 3 WD Green 3TB HDD and one WD RED 3 TB=
 HDD. The
failure occured is on one of the WD Green 3 TB HDD.

The pool is marked as "resilvered" - I do scrubbing on a regular basis and =
the
"resilvering" message has now aapeared the second time in row. Searching th=
e net
recommend on SMART attribute 197 errors, in my case it is one, and in combi=
nation with
the problems occured that I should replace the disk.

Well, here comes the problem. The box is comprised from "electronical waste=
" made by
ASRock - it is a Socket 1150/IvyBridge board, which has its last Firmware/B=
IOS update got
in 2013 and since then UEFI booting FreeBSD from a HDD isn't possible (just=
 to indicate
that I'm aware of having issues with crap, but that is some other issue rig=
ht now). The
board's SATA connectors are all populated.

So: Due to the lack of adequate backup space I can only selectively backup =
portions, most
of the space is occupied by scientific modelling data, which I had worked o=
n. So backup
exists! In one way or the other. My concern is how to replace the faulty HD=
D! Most
HowTo's indicate a replacement disk being prepared and then "replaced" via =
ZFS's replace
command. This isn't applicable here.

Question: is it possible to simply pull the faulty disk (implies I know exa=
ctly which one
to pull!) and then prepare and add the replacement HDD and let the system d=
o its job
resilvering the pool?

Next question is: I'm about to replace the 3 TB HDD with a more recent and =
modern 4 TB
HDD (WD RED 4TB). I'm aware of the fact that I can only use 3 TB as the oth=
er disks are 3
TB, but I'd like to know whether FreeBSD's ZFS is capable of handling it?=20

This is the first time I have issues with ZFS and a faulty drive, so if som=
e of my
questions sound naive, please forgive me.

Thanks in advance,

Oliver

--=20
O. Hartmann

Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr
Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs.=
 4 BDSG).

--Sig_/MAPZTDQbOAZbQS7jpzj0YDm
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----

iLUEARMKAB0WIQQZVZMzAtwC2T/86TrS528fyFhYlAUCWjAeXAAKCRDS528fyFhY
lI+BAf0XT3r8xc0Q7Sk907xI7WlEieVKtoQAGh675oWEUMMSDXWHhTpJNjcqfLfJ
8L1cerPxaJs935Kx9HO/pPDB1chdAf9QExo1rzvExWa7LKU0xKLig3Z9+kCytwdh
avY+STsj2LSW7DJZqUq7H74oLv5wA4XVWakchMR8ffTux93f124p
=ZXU2
-----END PGP SIGNATURE-----

--Sig_/MAPZTDQbOAZbQS7jpzj0YDm--