Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 4 Mar 2016 09:02:25 +0100
From:      Borja Marcos <borjam@sarenet.es>
To:        Scott Long <scott4long@yahoo.com>
Cc:        Steven Hartland <killing@multiplay.co.uk>, FreeBSD-scsi <freebsd-scsi@freebsd.org>
Subject:   Re: mpr(4) SAS3008 Repeated Crashing
Message-ID:  <B2147AEC-2831-443C-8FA0-4148B37AAF95@sarenet.es>
In-Reply-To: <F5E05621-FF84-4BED-B1A7-3252715CD53B@yahoo.com>
References:  <56D5FDB8.8040402@freebsd.org> <56D612FA.6090909@multiplay.co.uk> <A8859ECA-0B58-42A8-AA49-DF6AA3D52CC6@sarenet.es> <E74F5225-1EA8-4B60-ADDC-7B13E1003184@yahoo.com> <D7E0BCCE-EB44-4EF9-8F17-474C162F7D7C@sarenet.es> <56D805FD.50500@multiplay.co.uk> <F9B68610-12C6-4D32-88CA-A34A185F9AD1@sarenet.es> <F5E05621-FF84-4BED-B1A7-3252715CD53B@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help

> On 03 Mar 2016, at 18:09, Scott Long <scott4long@yahoo.com> wrote:
>=20
>=20
> SYNC CACHE seems to have been involved this time, and while it=E2=80=99s=
 sometimes a source of trouble with SATA disks, I=E2=80=99m very =
hesitant to blame it.  Given the seemingly random nature of your =
problems, I=E2=80=99m not as certain anymore to rule out a fault of the =
disk enclosure.  This looks to be a different disk than your last =
report, and your statement that a sibling system exhibits no problems is =
very interesting.  Maybe there=E2=80=99s an issue with the power supply, =
and the disks are getting under-voltage conditions periodically.  If you =
can run smartctl against the disks, the output might be useful.  Also, =
if you=E2=80=99re able, could you make sure that both this system and =
the one that is working well are being fed with sufficient and similar =
AC power?  And if the power supply modules in your enclosures are =
swappable, maybe swap them between systems and see if the problem =
follows the module?  If that doesn=E2=80=99t fix it then I=E2=80=99ll =
think of ways to provide more instrumentation.

The affected disks are completely random. I didn=E2=80=99t copy a lot of =
instances to avoid too much litter, but each time it=E2=80=99s a =
different disk.

Both systems are in the same datacenter, and yes, the power =
infrastructure is working. Swapping modules can be done if
the dealer sends us another one because I prefer not to mess with a =
working system.

The fact that it=E2=80=99s a different disk each time, and that the =
other system works perfectly is what makes me quite certain that it=E2=80=99=
s a hardware problem. Either some trouble
with the backplane or a power problem.

I am tempted to go the oscilloscope route (monitoring the internal power =
rails). But if the problem is in the power distribution of the backplane =
itself
I=E2=80=99ll need to destroy a broken disk to build a backplane power =
probe :)




Borja.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B2147AEC-2831-443C-8FA0-4148B37AAF95>