Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 16 Apr 2017 09:49:00 +0100
From:      Frank Leonhardt <freebsd-doc@fjl.co.uk>
To:        freebsd-hardware@freebsd.org
Subject:   Re: SSD errors
Message-ID:  <02898e76-9285-03e7-e76a-77a5290376b9@fjl.co.uk>
In-Reply-To: <20170413205932.GJ2149@shrubbery.net>
References:  <20170413205932.GJ2149@shrubbery.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 13/04/2017 21:59, heasley wrote:
> <snip>
> When I push a lot of data to them, such as an rsync, I receive errors like
> the below.  If I move drives between slots, it seems to follow the chassis
> slots, those closest to the power supply, but I'm not positive about this.
>
> I suppose the questions for list are:
> - have I missed any fbsd ssd-specific configuration?
>
> - all 4 have non-zero UDMA_CRC_Error_Count counters; not many, about the
>    same number, which I believe implies electrical interference - most
>    likely in the cable or chassis backplane.  Should I buy some specific
>    model cable?  other recommendations?
<snip>

I'm not aware of any SSD-specific stuff you've missed. The SSD option on 
the initialisation code in the BIOS is probably just there because 
there's no need to wait for spin-up time (as you probably thought too).

So I don't have an answer, but here are a few thoughts:

I think it's the CRC error (out of that lot) that you should be worried 
about. It means that the drive wrote data, but when it read it back it 
didn't match. With ST506 this could (and often was) a cable fault but 
not with IDE. This doesn't mean dodgy cables can't cause you problems 
with IDE; only that they'd manifest differently. If the drive wrote the 
data to the flash with a CRC and then the CRC didn't match later, it 
doesn't make any difference if the data was corrupted on it's way to the 
drive, or even if it was corrupted on its way back (ZFS would pick that 
up). So it must have been corrupted on-drive. Right? (I could be wrong 
about where your CRC errors are being tested/detected, so not 
necessarily right).

So with this in mind, why should the drive's location on the shelf 
matter (if it does make a difference). I can think of two reasons - 
electromagnetic interference from adjacent circuits or PSU problems.

So if it were me, I'd check the interference theory by using longer 
cables and spreading the drives out. Serial transfer on long cables 
isn't really a problem like it was with parallel. That's the easy check.

Then it's on to PSU issues. Does an SSD use more or less power than 
spinning rust? Really? Most people assume they'll use less but it's not 
as much less as you think, and it varies in different ways. If the PSU 
can't cope with the peak (e.g. while it's writing).

IT people will know all about watts. Add up the number of watts on all 
your drives and if it's <= the number of watts written on your PSU, cushty.

Wrong! An engineer will tell you you can't add watts together and get 
anything meaningful. And believing the label on a PSU is a mug's game. 
So, if you've got a decent oscilloscope take a look at the supply rails 
where they enter the drives. Try writing, and if you get so much as a 
blip on the voltage then do something about it.

If you haven't got a 'scope to hand, I'd try running (some) the drives 
of a different PSU and see that makes a difference.

Although I haven't hit this problem myself, I'd be surprised if the same 
PSU design intended to power spinning rust at a relatively constant 
current could cope well with an SSD going from nothing much to lots to 
nothing much again over a very short space of time. If I was connecting 
a different PSU to the SSD I'd load it with some real drives just to 
stabilise the current output a bit (i.e. plug an old drive or two on to 
some of the other spare outlets).

Then there's always the chance it's over-cooking, but I think you'd have 
mentioned if they were getting very hot.

Regards, Frank.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?02898e76-9285-03e7-e76a-77a5290376b9>