From owner-freebsd-current@freebsd.org Tue Dec 12 18:22:36 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9E721EA1D2C for ; Tue, 12 Dec 2017 18:22:36 +0000 (UTC) (envelope-from ohartmann@walstatt.org) Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 17B3469FED for ; Tue, 12 Dec 2017 18:22:35 +0000 (UTC) (envelope-from ohartmann@walstatt.org) Received: from thor.intern.walstatt.dynvpn.de ([85.182.112.82]) by mail.gmx.com (mrgmx001 [212.227.17.190]) with ESMTPSA (Nemesis) id 0Me8di-1efuU12t2M-00PwKX for ; Tue, 12 Dec 2017 19:22:27 +0100 Date: Tue, 12 Dec 2017 19:21:53 +0100 From: "O. Hartmann" To: FreeBSD CURRENT Subject: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAM status: ATA Status Error Message-ID: <20171212192220.119ca2d3@thor.intern.walstatt.dynvpn.de> Organization: WALSTATT User-Agent: OutScare 3.1415926 X-Operating-System: ImNotAnOperatingSystem 3.141592527 MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; boundary="Sig_/MAPZTDQbOAZbQS7jpzj0YDm"; protocol="application/pgp-signature" X-Provags-ID: V03:K0:lDNfAEM+uLEY3ncTXOUdlBMDhDGn0Gi5BGy68q5qxW1hRN8rDwJ 3mFuIaXSa8af+C2e8FhXE05PpkaNNFe9NbVfHEA2PyEFgSRtWUbp++Pr1QCQt+MOtVO6RKS I1amIj9klTliQGSJXOUMfgs2TOuRa7bjYf9xO8oFc/1UYdcuUWZW9ob+i5uA2f851sarql1 63R1dXZLvGAJ6y0ZJIOLQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:hoSxMlyodDI=:HC0HtVJ0PdftgKyzQn18TF gFEyXS5yciZ7AxB62uaHMJmPZS4VfSmTIiZOHN86A6GjdRFGZADOpaZSIiUajG66VZpQwtuB+ r7TaEJZIrEqzhepfa5sUKnAQy+UsHWPbJ7+jiADRYcrtHNp2NTFryFIBiEamv63zAKxnyKu/q l6tGug6EzntCU9ByatFziTAvpCW/Vbw9fNuL0NiNnf9kllOJxeBbRzCfLkOCouXvwnT9S+r3m i89ZXEmaNEG6V+K0uPlTQEUFJfG1bB54yBiKfveGBHiRuLLaY/d07e0qWbVUIpEp4VrmWqrbo UdeFWewavpnDj6OI7yizlKiw4sBqK00P2mKAn7zydvtGfW7M2S0fwQnK3/l9QHE8C8ZzrQ9Ef l4SFKBhJQiVumuczxmlNe5lm8ihiYNJkpZCUzjoAJd5fOMHwTAlHa9/Z8QIDTECIkUhC64wgf y5QZZ9dA9r8vdb64PrXwKe5/PNBWryFawZhD1AZkSZXYUf3jJswUmwDonlS21Zrl3i9hHMrVE IXKKVem091QcfFHkgeiHpYqCyEljYL/HNPA3YgVhSiGeLdEme2bBsM+GpIb1GhRwlhgbd2VVg NgSQj6kyvP2DQ5lIpfOAesagWmIxq1BMnssuIimd3N9z/iEVaxJvXNVjJidPv8JU2Mi89V4pp F6ZoP5zRvikCC5HdCmRMN7tqBs3NcfHxmIpxD9UVYlzLeDmXc6LWQap8bmnswGaWLk/34Mgvp yrPSLa3mA9N4bMC+k68S5F58RLkoiQnTbur96jb3vJdhZbLXVXGxO5KFboQaWcecoV0bAQXoL xXSgkKWSMd73665zzcp3FSz1F/rKg== X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Dec 2017 18:22:36 -0000 --Sig_/MAPZTDQbOAZbQS7jpzj0YDm Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello, running CURRENT (recent r326769), I realised that smartmond sends out some = console messages when booting the box: [...] Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ada6, 1 Currently un= readable (pending) sectors Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ad= a6, 1 Offline uncorrectable sectors [...] Checking the drive's SMART log with smartctl (it is one of four 3TB disk dr= ives), I gather these informations: [... smartctl -x /dev/ada6 ...] Error 42 [17] occurred at disk power-on lifetime: 25335 hours (1055 days + = 15 hours) When the command that caused the error occurred, the device was active or= idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- =3D=3D -- =3D=3D =3D=3D =3D=3D -- -- -- -- -- 40 -- 51 00 00 00 00 c2 7a 72 98 40 00 Error: UNC at LBA =3D 0xc27a7298 = =3D 3262804632 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_= Name -- =3D=3D -- =3D=3D -- =3D=3D =3D=3D =3D=3D -- -- -- -- -- -------------= -- -------------------- 60 00 b0 00 88 00 00 c2 7a 73 20 40 08 23:38:12.195 READ FPDMA QUEUED 60 00 b0 00 80 00 00 c2 7a 72 70 40 08 23:38:12.195 READ FPDMA QUEUED 2f 00 00 00 01 00 00 00 00 00 10 40 08 23:38:12.195 READ LOG EXT 60 00 b0 00 70 00 00 c2 7a 73 20 40 08 23:38:09.343 READ FPDMA QUEUED 60 00 b0 00 68 00 00 c2 7a 72 70 40 08 23:38:09.343 READ FPDMA QUEUED [...] and [...] SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 64 3 Spin_Up_Time POS--K 178 170 021 - 6075 4 Start_Stop_Count -O--CK 098 098 000 - 2406 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 200 200 000 - 0 9 Power_On_Hours -O--CK 066 066 000 - 25339 10 Spin_Retry_Count -O--CK 100 100 000 - 0 11 Calibration_Retry_Count -O--CK 100 100 000 - 0 12 Power_Cycle_Count -O--CK 098 098 000 - 2404 192 Power-Off_Retract_Count -O--CK 200 200 000 - 154 193 Load_Cycle_Count -O--CK 001 001 000 - 2055746 194 Temperature_Celsius -O---K 122 109 000 - 28 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 1 198 Offline_Uncorrectable ----CK 200 200 000 - 1 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 5 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning [...] The ZFS pool is RAIDZ1, comprised of 3 WD Green 3TB HDD and one WD RED 3 TB= HDD. The failure occured is on one of the WD Green 3 TB HDD. The pool is marked as "resilvered" - I do scrubbing on a regular basis and = the "resilvering" message has now aapeared the second time in row. Searching th= e net recommend on SMART attribute 197 errors, in my case it is one, and in combi= nation with the problems occured that I should replace the disk. Well, here comes the problem. The box is comprised from "electronical waste= " made by ASRock - it is a Socket 1150/IvyBridge board, which has its last Firmware/B= IOS update got in 2013 and since then UEFI booting FreeBSD from a HDD isn't possible (just= to indicate that I'm aware of having issues with crap, but that is some other issue rig= ht now). The board's SATA connectors are all populated. So: Due to the lack of adequate backup space I can only selectively backup = portions, most of the space is occupied by scientific modelling data, which I had worked o= n. So backup exists! In one way or the other. My concern is how to replace the faulty HD= D! Most HowTo's indicate a replacement disk being prepared and then "replaced" via = ZFS's replace command. This isn't applicable here. Question: is it possible to simply pull the faulty disk (implies I know exa= ctly which one to pull!) and then prepare and add the replacement HDD and let the system d= o its job resilvering the pool? Next question is: I'm about to replace the 3 TB HDD with a more recent and = modern 4 TB HDD (WD RED 4TB). I'm aware of the fact that I can only use 3 TB as the oth= er disks are 3 TB, but I'd like to know whether FreeBSD's ZFS is capable of handling it?=20 This is the first time I have issues with ZFS and a faulty drive, so if som= e of my questions sound naive, please forgive me. Thanks in advance, Oliver --=20 O. Hartmann Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs.= 4 BDSG). --Sig_/MAPZTDQbOAZbQS7jpzj0YDm Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iLUEARMKAB0WIQQZVZMzAtwC2T/86TrS528fyFhYlAUCWjAeXAAKCRDS528fyFhY lI+BAf0XT3r8xc0Q7Sk907xI7WlEieVKtoQAGh675oWEUMMSDXWHhTpJNjcqfLfJ 8L1cerPxaJs935Kx9HO/pPDB1chdAf9QExo1rzvExWa7LKU0xKLig3Z9+kCytwdh avY+STsj2LSW7DJZqUq7H74oLv5wA4XVWakchMR8ffTux93f124p =ZXU2 -----END PGP SIGNATURE----- --Sig_/MAPZTDQbOAZbQS7jpzj0YDm--