Date: Fri, 4 Mar 2016 09:02:25 +0100 From: Borja Marcos <borjam@sarenet.es> To: Scott Long <scott4long@yahoo.com> Cc: Steven Hartland <killing@multiplay.co.uk>, FreeBSD-scsi <freebsd-scsi@freebsd.org> Subject: Re: mpr(4) SAS3008 Repeated Crashing Message-ID: <B2147AEC-2831-443C-8FA0-4148B37AAF95@sarenet.es> In-Reply-To: <F5E05621-FF84-4BED-B1A7-3252715CD53B@yahoo.com> References: <56D5FDB8.8040402@freebsd.org> <56D612FA.6090909@multiplay.co.uk> <A8859ECA-0B58-42A8-AA49-DF6AA3D52CC6@sarenet.es> <E74F5225-1EA8-4B60-ADDC-7B13E1003184@yahoo.com> <D7E0BCCE-EB44-4EF9-8F17-474C162F7D7C@sarenet.es> <56D805FD.50500@multiplay.co.uk> <F9B68610-12C6-4D32-88CA-A34A185F9AD1@sarenet.es> <F5E05621-FF84-4BED-B1A7-3252715CD53B@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 03 Mar 2016, at 18:09, Scott Long <scott4long@yahoo.com> wrote: >=20 >=20 > SYNC CACHE seems to have been involved this time, and while it=E2=80=99s= sometimes a source of trouble with SATA disks, I=E2=80=99m very = hesitant to blame it. Given the seemingly random nature of your = problems, I=E2=80=99m not as certain anymore to rule out a fault of the = disk enclosure. This looks to be a different disk than your last = report, and your statement that a sibling system exhibits no problems is = very interesting. Maybe there=E2=80=99s an issue with the power supply, = and the disks are getting under-voltage conditions periodically. If you = can run smartctl against the disks, the output might be useful. Also, = if you=E2=80=99re able, could you make sure that both this system and = the one that is working well are being fed with sufficient and similar = AC power? And if the power supply modules in your enclosures are = swappable, maybe swap them between systems and see if the problem = follows the module? If that doesn=E2=80=99t fix it then I=E2=80=99ll = think of ways to provide more instrumentation. The affected disks are completely random. I didn=E2=80=99t copy a lot of = instances to avoid too much litter, but each time it=E2=80=99s a = different disk. Both systems are in the same datacenter, and yes, the power = infrastructure is working. Swapping modules can be done if the dealer sends us another one because I prefer not to mess with a = working system. The fact that it=E2=80=99s a different disk each time, and that the = other system works perfectly is what makes me quite certain that it=E2=80=99= s a hardware problem. Either some trouble with the backplane or a power problem. I am tempted to go the oscilloscope route (monitoring the internal power = rails). But if the problem is in the power distribution of the backplane = itself I=E2=80=99ll need to destroy a broken disk to build a backplane power = probe :) Borja.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B2147AEC-2831-443C-8FA0-4148B37AAF95>