From owner-freebsd-current@freebsd.org Wed Dec 13 22:07:37 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0BB9DE88EB1 for ; Wed, 13 Dec 2017 22:07:37 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp-sofia.digsys.bg", Issuer "Digital Systems Operational CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id A5B5E664CB for ; Wed, 13 Dec 2017 22:07:35 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from [10.182.63.93] (85-118-79-108.mtel.net [85.118.79.108]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.15.2/8.15.2) with ESMTPSA id vBDM7Qdn032751 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 14 Dec 2017 00:07:27 +0200 (EET) (envelope-from daniel@digsys.bg) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAMstatus: ATA Status Error From: Daniel Kalchev X-Mailer: iPhone Mail (15C114) In-Reply-To: <20171213203935.270e5f65@thor.intern.walstatt.dynvpn.de> Date: Thu, 14 Dec 2017 00:07:24 +0200 Cc: "Rodney W. Grimes" , Cy Schubert , "O. Hartmann" , FreeBSD CURRENT , Freddie Cash , Alan Somers Content-Transfer-Encoding: quoted-printable Message-Id: References: <20171213161116.1889f178@hermann> <201712131647.vBDGlrf2092528@pdx.rh.CN85.dnsmgr.net> <20171213203935.270e5f65@thor.intern.walstatt.dynvpn.de> To: "O. Hartmann" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Dec 2017 22:07:37 -0000 > On 13 Dec 2017, at 21:39, O. Hartmann wrote: >=20 > Am Wed, 13 Dec 2017 08:47:53 -0800 (PST) > "Rodney W. Grimes" schrieb: >=20 >>> On Tue, 12 Dec 2017 14:58:28 -0800 >>> Cy Schubert wrote: >>>=20 >>>> There are a couple of ways you can address this. You'll need to >>>> offline the vdev first. If you've done a smartcrl -t long and if the >>>> test failed, smartcrl -a will tell you which block it had an issue >>>> with. You can use dd, ddrescue or dd_rescue to dd the block over >>>> itself. The drive may rewrite the (weak) block or if it fails to it >>>> will remap it (subsequently showing as reallocated). >>>>=20 >>>> Of course there is a risk. If the sector is any of the boot blocks >>>> there is a good chance the server will hang. =20 >>>=20 >>> The drive is part of a dedicated storage-only pool. The boot drive is a >>> fast SSD. So I do not care about this - well, to say it more politely: >>> I do not have to take care of that aspect. >>>=20 >>>>=20 >>>> You have to be *absolutely* sure which the bad sector is. And, there >>>> may be more. There is a risk of data loss. >>>>=20 >>>> I've used this technique many times. Most times it works perfectly. >>>> Other times the affected file is lost but the rest of the file system >>>> is recovered. And again there is always the risk. >>>>=20 >>>> Replace the disk immediately if you experience a growing succession >>>> of pending sectors. Otherwise replace the disk at your earliest >>>> convenience. =20 >>>=20 >>> The ZFS scrubbing of the volume ended this morning, leaving the pool in >>> a healthy state. After reboot, there was no sign of CAM errors again. >>>=20 >>> But there is something else I'm worried about. The mainboard I use is a=20= >>>=20 >>> ASRock Z77 Pro4-M. >>> The board has a cripple Intel MCP with 6 SATA ports from the chipset, >>> two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA >>> 6GB ports: >>>=20 >>> [...] >>> ahci0@pci0:2:0:0: class=3D0x010601 card=3D0x06121849 chip=3D0x0612= 1b21 >>> rev=3D0x01 hdr=3D0x00 vendor =3D 'ASMedia Technology Inc.' >>> device =3D 'ASM1062 Serial ATA Controller' >>> class =3D mass storage >>> subclass =3D SATA >>> bar [10] =3D type I/O Port, range 32, base 0xe050, size 8, enabled >>> bar [14] =3D type I/O Port, range 32, base 0xe040, size 4, enabled >>> bar [18] =3D type I/O Port, range 32, base 0xe030, size 8, enabled >>> bar [1c] =3D type I/O Port, range 32, base 0xe020, size 4, enabled >>> bar [20] =3D type I/O Port, range 32, base 0xe000, size 32, enabled= >>> bar [24] =3D type Memory, range 32, base 0xf7b00000, size 512, >>> enabled >>> [...] >>>=20 >>> Attached to that ASM1062 SATA chip, is a backup drive via eSATA >>> connector, a WD 4 TB RED drive. It seems, whenever I attach this drive >>> and it is online, I experience problems on the ZFS pool, which is >>> attached to the MCP SATA ports. =20 >>=20 >> How does this external drive get its power? Are the earth grounds of >> both the system and the external drive power supply closely tied >> togeather? A plug/unplug event with a slight ground creep can >> wreck havioc with device operation. >=20 > The external drive is housed in a external casing. Its PSU is de facto wit= h the same > "grounding" (earth ground) as the server's PSU, they share the same power p= lug at its > point were the plug is comeing out of the wall - so to speak. Most external drive power supplies are not grounded. At least none I ever sa= w had grounded plugs for the mains cable. Might be, yours has it... Worth checking anyway. Daniel