From owner-freebsd-current@freebsd.org Wed Dec 13 19:39:55 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8CF59E84899 for ; Wed, 13 Dec 2017 19:39:55 +0000 (UTC) (envelope-from o.hartmann@walstatt.org) Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 05AE27F875; Wed, 13 Dec 2017 19:39:54 +0000 (UTC) (envelope-from o.hartmann@walstatt.org) Received: from thor.intern.walstatt.dynvpn.de ([77.180.147.251]) by mail.gmx.com (mrgmx102 [212.227.17.168]) with ESMTPSA (Nemesis) id 0MGXV6-1eC3iR2PK4-00DG9l; Wed, 13 Dec 2017 20:39:43 +0100 Date: Wed, 13 Dec 2017 20:39:08 +0100 From: "O. Hartmann" To: "Rodney W. Grimes" Cc: Cy Schubert , "O. Hartmann" , FreeBSD CURRENT , Freddie Cash , Alan Somers Subject: Re: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAMstatus: ATA Status Error Message-ID: <20171213203935.270e5f65@thor.intern.walstatt.dynvpn.de> In-Reply-To: <201712131647.vBDGlrf2092528@pdx.rh.CN85.dnsmgr.net> References: <20171213161116.1889f178@hermann> <201712131647.vBDGlrf2092528@pdx.rh.CN85.dnsmgr.net> Organization: WALSTATT User-Agent: OutScare 3.1415926 X-Operating-System: ImNotAnOperatingSystem 3.141592527 MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; boundary="Sig_/yQe_re/+qDm74aywZGbTQDl"; protocol="application/pgp-signature" X-Provags-ID: V03:K0:ZtkQeU+OjVWQTcHst0+pbKwKVE9QHuN2NVpz9xW1hhGhCTfJjAx SU70yIdKf2cvQjJc8QEbBX+Gu6wQDGG0U+O8FwWOTLcFHa2ONzhKVOiCDxRZNFkhUpVe6u4 nbNcO7n56sxgS8rc7D9635L7RJQHos4pJsAl4E8l71r/TAOXLCPjDL3JRDExGLaPXMEOyIM h9b2e39i+h0s5kcj6AEMw== X-UI-Out-Filterresults: notjunk:1;V01:K0:g/vcj7G7/8Q=:jngaQIdk/hBkzgw+Wr1jqI 9DgCSamLNviwkrFldwjYKOHpB+FtIhh6Omce0g2rqoErOiQCcMjmY754F3LM5RI9ssGpsbABP JR1qxR/ncsvpJssYF8RoFGyfWU0KJx/e+2hqPry41BRUW1FVn6Hv4t2zqdsN+yMrQ1Uj6oVGV TAtibN0oZXb/mEepWamZskSGp55+A8eganZME9C0bmCKTSrdIsh5tG3vS2GrXA3aH16rds0FW dypM8fIDiiMD4x68N99f7vsHiB1lWppv68U486qMByl2rnHM+pjgjcsw5y2KxKMMooAnND+8d W2JYuJalkAB6IlEE/IirGwKUx6JFPQQcLtZUQGOZHa+xCSJFSZw4SVx/xDfL6C8fDudmCsqKM 5r36XxPUAO7H8aA/fP4Ep4r+OjmDArM8BQ+860ZLwSUYiVL4SSMWyqPtkRBZYyDtGYGLtZO0L 8vISy0gLyKWwAtt1Z4Z1hIr2O7R9XdVKJc8TriTXMjV/92SU3G0qQv9D5BxLIf3qIeX3MZb++ 1a2AYCbfTXCobWfx0om2gUs7KO0f+hfITfR9yQL3cUb6+tgBXZQzsduuHJD8hKpX4Xp+hpNmE gqDSK4uigdqRi3nFYAtfJ6NSzSPd64+aBPu9GN5ZY+7Oze2EP2mr34oYmzrrvVd1eAVx/5Ps2 Sa3E/a1f6YnMszbHIJYL2mBtCbvB6z0fGY6CVFj2yO8dPW8/PM1E1XHKdiqcTAZhe/GIXj3RA h+1/YPRSxgo8mdyHCkgQc5LrIlKPhB1UIwaxtZM6hef087KeRvlJrKL3m1AL/TBBdBKmpkTSN 7m+Q42x7If47HwX+W9gYPt+UH6jPg== X-Mailman-Approved-At: Wed, 13 Dec 2017 21:19:51 +0000 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Dec 2017 19:39:55 -0000 --Sig_/yQe_re/+qDm74aywZGbTQDl Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Am Wed, 13 Dec 2017 08:47:53 -0800 (PST) "Rodney W. Grimes" schrieb: > > On Tue, 12 Dec 2017 14:58:28 -0800 > > Cy Schubert wrote: > > =20 > > > There are a couple of ways you can address this. You'll need to > > > offline the vdev first. If you've done a smartcrl -t long and if the > > > test failed, smartcrl -a will tell you which block it had an issue > > > with. You can use dd, ddrescue or dd_rescue to dd the block over > > > itself. The drive may rewrite the (weak) block or if it fails to it > > > will remap it (subsequently showing as reallocated). > > >=20 > > > Of course there is a risk. If the sector is any of the boot blocks > > > there is a good chance the server will hang. =20 > >=20 > > The drive is part of a dedicated storage-only pool. The boot drive is a > > fast SSD. So I do not care about this - well, to say it more politely: > > I do not have to take care of that aspect. > > =20 > > >=20 > > > You have to be *absolutely* sure which the bad sector is. And, there > > > may be more. There is a risk of data loss. > > >=20 > > > I've used this technique many times. Most times it works perfectly. > > > Other times the affected file is lost but the rest of the file system > > > is recovered. And again there is always the risk. > > >=20 > > > Replace the disk immediately if you experience a growing succession > > > of pending sectors. Otherwise replace the disk at your earliest > > > convenience. =20 > >=20 > > The ZFS scrubbing of the volume ended this morning, leaving the pool in > > a healthy state. After reboot, there was no sign of CAM errors again. > >=20 > > But there is something else I'm worried about. The mainboard I use is a= =20 > >=20 > > ASRock Z77 Pro4-M. > > The board has a cripple Intel MCP with 6 SATA ports from the chipset, > > two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA > > 6GB ports: > >=20 > > [...] > > ahci0@pci0:2:0:0: class=3D0x010601 card=3D0x06121849 chip=3D0x061= 21b21 > > rev=3D0x01 hdr=3D0x00 vendor =3D 'ASMedia Technology Inc.' > > device =3D 'ASM1062 Serial ATA Controller' > > class =3D mass storage > > subclass =3D SATA > > bar [10] =3D type I/O Port, range 32, base 0xe050, size 8, enabled > > bar [14] =3D type I/O Port, range 32, base 0xe040, size 4, enabled > > bar [18] =3D type I/O Port, range 32, base 0xe030, size 8, enabled > > bar [1c] =3D type I/O Port, range 32, base 0xe020, size 4, enabled > > bar [20] =3D type I/O Port, range 32, base 0xe000, size 32, enabl= ed > > bar [24] =3D type Memory, range 32, base 0xf7b00000, size 512, > > enabled > > [...] > >=20 > > Attached to that ASM1062 SATA chip, is a backup drive via eSATA > > connector, a WD 4 TB RED drive. It seems, whenever I attach this drive > > and it is online, I experience problems on the ZFS pool, which is > > attached to the MCP SATA ports. =20 >=20 > How does this external drive get its power? Are the earth grounds of > both the system and the external drive power supply closely tied > togeather? A plug/unplug event with a slight ground creep can > wreck havioc with device operation. The external drive is housed in a external casing. Its PSU is de facto with= the same "grounding" (earth ground) as the server's PSU, they share the same power p= lug at its point were the plug is comeing out of the wall - so to speak. >=20 > > Is this possible? I mean, as I asked before, a weird/defect cabling > > would trigger different error schemes (CRC errors). Due to the fact > > that the external drive is physically decoupled and is not capable of > > coupling in vibrations, bad sector errors seem to me unlikely. But this > > is simply a though of someone without special knowledge about physics > > of HDDs. =20 >=20 > Even if left cabled, does this drive get powered up/down? =20 The drive is cabled (eSATA) all the time, but is switched off for long time= s (4 - 8 weeks or 2 months, it depends, I switch it on for scrubbing or performing backups= of important data). >=20 > > I think people responding to my thread made it clear that the WD Green > > isn't the first-choice-solution for a 20/6 (not 24/7) duty drive and > > the fact, that they have serviced now more than 25000 hours, it would > > be wise to replace them with alternatives. =20 >=20 > I think someone had an apm command that turns off the head park, > that would do wonders for drive life. On the other hand, I think > if it was my data and I saw that the drive had 2M head load cycles > I would be looking to get out of that driv with any data I could > not easily replace. If it was well backed up or easily replaced > my worries would be less. >=20 > ... 275 lines removes ... I'm prepared already, as stated, to change the drive(s), one by one.=20 Hopefully, ZFS is as reliable to me as it has been reliable for others ;-) Kind regards, Oliver --=20 O. Hartmann Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs.= 4 BDSG). --Sig_/yQe_re/+qDm74aywZGbTQDl Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iLUEARMKAB0WIQQZVZMzAtwC2T/86TrS528fyFhYlAUCWjGB9wAKCRDS528fyFhY lB3dAgCYHFdXDKgsrVXMr313TCddH11w6D9DtHlTEuOljeylnMlZrq8bcII+Vtpb xFyj8Kgd8leRan64U5NKr5obOSPWAf9mUXR2PcHX+n8LwCoG4oKD0911LDBk523r vUUc5uwGO3WdO9c4qDHlu8bywV1DQPh0Q3OIXLFuIIDjct8WYpdm =Hlgc -----END PGP SIGNATURE----- --Sig_/yQe_re/+qDm74aywZGbTQDl--