Date: Tue, 12 Dec 2017 19:21:53 +0100 From: "O. Hartmann" <ohartmann@walstatt.org> To: FreeBSD CURRENT <freebsd-current@freebsd.org> Subject: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAM status: ATA Status Error Message-ID: <20171212192220.119ca2d3@thor.intern.walstatt.dynvpn.de>
next in thread | raw e-mail | index | archive | help
--Sig_/MAPZTDQbOAZbQS7jpzj0YDm Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello, running CURRENT (recent r326769), I realised that smartmond sends out some = console messages when booting the box: [...] Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ada6, 1 Currently un= readable (pending) sectors Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ad= a6, 1 Offline uncorrectable sectors [...] Checking the drive's SMART log with smartctl (it is one of four 3TB disk dr= ives), I gather these informations: [... smartctl -x /dev/ada6 ...] Error 42 [17] occurred at disk power-on lifetime: 25335 hours (1055 days + = 15 hours) When the command that caused the error occurred, the device was active or= idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- =3D=3D -- =3D=3D =3D=3D =3D=3D -- -- -- -- -- 40 -- 51 00 00 00 00 c2 7a 72 98 40 00 Error: UNC at LBA =3D 0xc27a7298 = =3D 3262804632 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_= Name -- =3D=3D -- =3D=3D -- =3D=3D =3D=3D =3D=3D -- -- -- -- -- -------------= -- -------------------- 60 00 b0 00 88 00 00 c2 7a 73 20 40 08 23:38:12.195 READ FPDMA QUEUED 60 00 b0 00 80 00 00 c2 7a 72 70 40 08 23:38:12.195 READ FPDMA QUEUED 2f 00 00 00 01 00 00 00 00 00 10 40 08 23:38:12.195 READ LOG EXT 60 00 b0 00 70 00 00 c2 7a 73 20 40 08 23:38:09.343 READ FPDMA QUEUED 60 00 b0 00 68 00 00 c2 7a 72 70 40 08 23:38:09.343 READ FPDMA QUEUED [...] and [...] SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 64 3 Spin_Up_Time POS--K 178 170 021 - 6075 4 Start_Stop_Count -O--CK 098 098 000 - 2406 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 200 200 000 - 0 9 Power_On_Hours -O--CK 066 066 000 - 25339 10 Spin_Retry_Count -O--CK 100 100 000 - 0 11 Calibration_Retry_Count -O--CK 100 100 000 - 0 12 Power_Cycle_Count -O--CK 098 098 000 - 2404 192 Power-Off_Retract_Count -O--CK 200 200 000 - 154 193 Load_Cycle_Count -O--CK 001 001 000 - 2055746 194 Temperature_Celsius -O---K 122 109 000 - 28 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 1 198 Offline_Uncorrectable ----CK 200 200 000 - 1 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 5 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning [...] The ZFS pool is RAIDZ1, comprised of 3 WD Green 3TB HDD and one WD RED 3 TB= HDD. The failure occured is on one of the WD Green 3 TB HDD. The pool is marked as "resilvered" - I do scrubbing on a regular basis and = the "resilvering" message has now aapeared the second time in row. Searching th= e net recommend on SMART attribute 197 errors, in my case it is one, and in combi= nation with the problems occured that I should replace the disk. Well, here comes the problem. The box is comprised from "electronical waste= " made by ASRock - it is a Socket 1150/IvyBridge board, which has its last Firmware/B= IOS update got in 2013 and since then UEFI booting FreeBSD from a HDD isn't possible (just= to indicate that I'm aware of having issues with crap, but that is some other issue rig= ht now). The board's SATA connectors are all populated. So: Due to the lack of adequate backup space I can only selectively backup = portions, most of the space is occupied by scientific modelling data, which I had worked o= n. So backup exists! In one way or the other. My concern is how to replace the faulty HD= D! Most HowTo's indicate a replacement disk being prepared and then "replaced" via = ZFS's replace command. This isn't applicable here. Question: is it possible to simply pull the faulty disk (implies I know exa= ctly which one to pull!) and then prepare and add the replacement HDD and let the system d= o its job resilvering the pool? Next question is: I'm about to replace the 3 TB HDD with a more recent and = modern 4 TB HDD (WD RED 4TB). I'm aware of the fact that I can only use 3 TB as the oth= er disks are 3 TB, but I'd like to know whether FreeBSD's ZFS is capable of handling it?=20 This is the first time I have issues with ZFS and a faulty drive, so if som= e of my questions sound naive, please forgive me. Thanks in advance, Oliver --=20 O. Hartmann Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs.= 4 BDSG). --Sig_/MAPZTDQbOAZbQS7jpzj0YDm Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iLUEARMKAB0WIQQZVZMzAtwC2T/86TrS528fyFhYlAUCWjAeXAAKCRDS528fyFhY lI+BAf0XT3r8xc0Q7Sk907xI7WlEieVKtoQAGh675oWEUMMSDXWHhTpJNjcqfLfJ 8L1cerPxaJs935Kx9HO/pPDB1chdAf9QExo1rzvExWa7LKU0xKLig3Z9+kCytwdh avY+STsj2LSW7DJZqUq7H74oLv5wA4XVWakchMR8ffTux93f124p =ZXU2 -----END PGP SIGNATURE----- --Sig_/MAPZTDQbOAZbQS7jpzj0YDm--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171212192220.119ca2d3>