From owner-freebsd-current@freebsd.org  Wed Dec 13 22:07:37 2017
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0BB9DE88EB1
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Wed, 13 Dec 2017 22:07:37 +0000 (UTC)
 (envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "smtp-sofia.digsys.bg",
 Issuer "Digital Systems Operational CA" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id A5B5E664CB
 for <freebsd-current@freebsd.org>; Wed, 13 Dec 2017 22:07:35 +0000 (UTC)
 (envelope-from daniel@digsys.bg)
Received: from [10.182.63.93] (85-118-79-108.mtel.net [85.118.79.108])
 (authenticated bits=0)
 by smtp-sofia.digsys.bg (8.15.2/8.15.2) with ESMTPSA id vBDM7Qdn032751
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Thu, 14 Dec 2017 00:07:27 +0200 (EET)
 (envelope-from daniel@digsys.bg)
Content-Type: text/plain;
	charset=us-ascii
Mime-Version: 1.0 (1.0)
Subject: Re: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0):
 CAMstatus: ATA Status Error
From: Daniel Kalchev <daniel@digsys.bg>
X-Mailer: iPhone Mail (15C114)
In-Reply-To: <20171213203935.270e5f65@thor.intern.walstatt.dynvpn.de>
Date: Thu, 14 Dec 2017 00:07:24 +0200
Cc: "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net>,
 Cy Schubert <Cy.Schubert@komquats.com>,
 "O. Hartmann" <ohartmann@walstatt.org>,
 FreeBSD CURRENT <freebsd-current@freebsd.org>,
 Freddie Cash <fjwcash@gmail.com>, Alan Somers <asomers@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <E18C1AE8-0450-4563-9093-5C84E937BD5C@digsys.bg>
References: <20171213161116.1889f178@hermann>
 <201712131647.vBDGlrf2092528@pdx.rh.CN85.dnsmgr.net>
 <20171213203935.270e5f65@thor.intern.walstatt.dynvpn.de>
To: "O. Hartmann" <o.hartmann@walstatt.org>
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Dec 2017 22:07:37 -0000


> On 13 Dec 2017, at 21:39, O. Hartmann <o.hartmann@walstatt.org> wrote:
>=20
> Am Wed, 13 Dec 2017 08:47:53 -0800 (PST)
> "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net> schrieb:
>=20
>>> On Tue, 12 Dec 2017 14:58:28 -0800
>>> Cy Schubert <Cy.Schubert@komquats.com> wrote:
>>>=20
>>>> There are a couple of ways you can address this. You'll need to
>>>> offline the vdev first. If you've done a smartcrl -t long and if the
>>>> test failed, smartcrl -a will tell you which block it had an issue
>>>> with. You can use dd, ddrescue or dd_rescue to dd the block over
>>>> itself. The drive may rewrite the (weak) block or if it fails to it
>>>> will remap it (subsequently showing as reallocated).
>>>>=20
>>>> Of course there is a risk. If the sector is any of the boot blocks
>>>> there is a good chance the server will hang. =20
>>>=20
>>> The drive is part of a dedicated storage-only pool. The boot drive is a
>>> fast SSD. So I do not care about this - well, to say it more politely:
>>> I do not have to take care of that aspect.
>>>=20
>>>>=20
>>>> You have to be *absolutely* sure which the bad sector is. And, there
>>>> may be more. There is a risk of data loss.
>>>>=20
>>>> I've used this technique many times. Most times it works perfectly.
>>>> Other times the affected file is lost but the rest of the file system
>>>> is recovered. And again there is always the risk.
>>>>=20
>>>> Replace the disk immediately if you experience a growing succession
>>>> of pending sectors. Otherwise replace the disk at your earliest
>>>> convenience. =20
>>>=20
>>> The ZFS scrubbing of the volume ended this morning, leaving the pool in
>>> a healthy state. After reboot, there was no sign of CAM errors again.
>>>=20
>>> But there is something else I'm worried about. The mainboard I use is a=20=

>>>=20
>>> ASRock Z77 Pro4-M.
>>> The board has a cripple Intel MCP with 6 SATA ports from the chipset,
>>> two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA
>>> 6GB ports:
>>>=20
>>> [...]
>>> ahci0@pci0:2:0:0:       class=3D0x010601 card=3D0x06121849 chip=3D0x0612=
1b21
>>> rev=3D0x01 hdr=3D0x00 vendor     =3D 'ASMedia Technology Inc.'
>>>    device     =3D 'ASM1062 Serial ATA Controller'
>>>    class      =3D mass storage
>>>    subclass   =3D SATA
>>>    bar   [10] =3D type I/O Port, range 32, base 0xe050, size 8, enabled
>>>    bar   [14] =3D type I/O Port, range 32, base 0xe040, size 4, enabled
>>>    bar   [18] =3D type I/O Port, range 32, base 0xe030, size 8, enabled
>>>    bar   [1c] =3D type I/O Port, range 32, base 0xe020, size 4, enabled
>>>    bar   [20] =3D type I/O Port, range 32, base 0xe000, size 32, enabled=

>>>    bar   [24] =3D type Memory, range 32, base 0xf7b00000, size 512,
>>>    enabled
>>> [...]
>>>=20
>>> Attached to that ASM1062 SATA chip, is a backup drive via eSATA
>>> connector, a WD 4 TB RED drive. It seems, whenever I attach this drive
>>> and it is online, I experience problems on the ZFS pool, which is
>>> attached to the MCP SATA ports. =20
>>=20
>> How does this external drive get its power?  Are the earth grounds of
>> both the system and the external drive power supply closely tied
>> togeather?  A plug/unplug event with a slight ground creep can
>> wreck havioc with device operation.
>=20
> The external drive is housed in a external casing. Its PSU is de facto wit=
h the same
> "grounding" (earth ground) as the server's PSU, they share the same power p=
lug at its
> point were the plug is comeing out of the wall - so to speak.

Most external drive power supplies are not grounded. At least none I ever sa=
w had grounded plugs for the mains cable. Might be, yours has it...

Worth checking anyway.

Daniel