Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Dec 2017 00:07:24 +0200
From:      Daniel Kalchev <daniel@digsys.bg>
To:        "O. Hartmann" <o.hartmann@walstatt.org>
Cc:        "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net>, Cy Schubert <Cy.Schubert@komquats.com>, "O. Hartmann" <ohartmann@walstatt.org>, FreeBSD CURRENT <freebsd-current@freebsd.org>, Freddie Cash <fjwcash@gmail.com>, Alan Somers <asomers@freebsd.org>
Subject:   Re: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAMstatus: ATA Status Error
Message-ID:  <E18C1AE8-0450-4563-9093-5C84E937BD5C@digsys.bg>
In-Reply-To: <20171213203935.270e5f65@thor.intern.walstatt.dynvpn.de>
References:  <20171213161116.1889f178@hermann> <201712131647.vBDGlrf2092528@pdx.rh.CN85.dnsmgr.net> <20171213203935.270e5f65@thor.intern.walstatt.dynvpn.de>

next in thread | previous in thread | raw e-mail | index | archive | help


> On 13 Dec 2017, at 21:39, O. Hartmann <o.hartmann@walstatt.org> wrote:
>=20
> Am Wed, 13 Dec 2017 08:47:53 -0800 (PST)
> "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net> schrieb:
>=20
>>> On Tue, 12 Dec 2017 14:58:28 -0800
>>> Cy Schubert <Cy.Schubert@komquats.com> wrote:
>>>=20
>>>> There are a couple of ways you can address this. You'll need to
>>>> offline the vdev first. If you've done a smartcrl -t long and if the
>>>> test failed, smartcrl -a will tell you which block it had an issue
>>>> with. You can use dd, ddrescue or dd_rescue to dd the block over
>>>> itself. The drive may rewrite the (weak) block or if it fails to it
>>>> will remap it (subsequently showing as reallocated).
>>>>=20
>>>> Of course there is a risk. If the sector is any of the boot blocks
>>>> there is a good chance the server will hang. =20
>>>=20
>>> The drive is part of a dedicated storage-only pool. The boot drive is a
>>> fast SSD. So I do not care about this - well, to say it more politely:
>>> I do not have to take care of that aspect.
>>>=20
>>>>=20
>>>> You have to be *absolutely* sure which the bad sector is. And, there
>>>> may be more. There is a risk of data loss.
>>>>=20
>>>> I've used this technique many times. Most times it works perfectly.
>>>> Other times the affected file is lost but the rest of the file system
>>>> is recovered. And again there is always the risk.
>>>>=20
>>>> Replace the disk immediately if you experience a growing succession
>>>> of pending sectors. Otherwise replace the disk at your earliest
>>>> convenience. =20
>>>=20
>>> The ZFS scrubbing of the volume ended this morning, leaving the pool in
>>> a healthy state. After reboot, there was no sign of CAM errors again.
>>>=20
>>> But there is something else I'm worried about. The mainboard I use is a=20=

>>>=20
>>> ASRock Z77 Pro4-M.
>>> The board has a cripple Intel MCP with 6 SATA ports from the chipset,
>>> two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA
>>> 6GB ports:
>>>=20
>>> [...]
>>> ahci0@pci0:2:0:0:       class=3D0x010601 card=3D0x06121849 chip=3D0x0612=
1b21
>>> rev=3D0x01 hdr=3D0x00 vendor     =3D 'ASMedia Technology Inc.'
>>>    device     =3D 'ASM1062 Serial ATA Controller'
>>>    class      =3D mass storage
>>>    subclass   =3D SATA
>>>    bar   [10] =3D type I/O Port, range 32, base 0xe050, size 8, enabled
>>>    bar   [14] =3D type I/O Port, range 32, base 0xe040, size 4, enabled
>>>    bar   [18] =3D type I/O Port, range 32, base 0xe030, size 8, enabled
>>>    bar   [1c] =3D type I/O Port, range 32, base 0xe020, size 4, enabled
>>>    bar   [20] =3D type I/O Port, range 32, base 0xe000, size 32, enabled=

>>>    bar   [24] =3D type Memory, range 32, base 0xf7b00000, size 512,
>>>    enabled
>>> [...]
>>>=20
>>> Attached to that ASM1062 SATA chip, is a backup drive via eSATA
>>> connector, a WD 4 TB RED drive. It seems, whenever I attach this drive
>>> and it is online, I experience problems on the ZFS pool, which is
>>> attached to the MCP SATA ports. =20
>>=20
>> How does this external drive get its power?  Are the earth grounds of
>> both the system and the external drive power supply closely tied
>> togeather?  A plug/unplug event with a slight ground creep can
>> wreck havioc with device operation.
>=20
> The external drive is housed in a external casing. Its PSU is de facto wit=
h the same
> "grounding" (earth ground) as the server's PSU, they share the same power p=
lug at its
> point were the plug is comeing out of the wall - so to speak.

Most external drive power supplies are not grounded. At least none I ever sa=
w had grounded plugs for the mains cable. Might be, yours has it...

Worth checking anyway.

Daniel





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E18C1AE8-0450-4563-9093-5C84E937BD5C>