Date: Mon, 15 Jul 2019 18:09:25 +0200 From: hw <hw@adminart.net> To: "Kevin P. Neal" <kpn@neutralgood.org> Cc: Karl Denninger <karl@denninger.net>, freebsd-questions@freebsd.org Subject: Re: dead slow update servers Message-ID: <87blxvjn4a.fsf@toy.adminart.net> In-Reply-To: <20190715151621.GB31450@neutralgood.org> (Kevin P. Neal's message of "Mon, 15 Jul 2019 11:16:21 -0400") References: <20190712171910.GA25091@neutralgood.org> <871ryuj3ex.fsf@toy.adminart.net> <CAGLDxTW8zw2d%2BaBGOmBgEhipjq6ocn536fH_NScMiDD7hD=eSw@mail.gmail.com> <874l3qfvqw.fsf@toy.adminart.net> <20190714011303.GA25317@neutralgood.org> <87v9w58apd.fsf@toy.adminart.net> <f7d8acd6-6adb-2b4b-38ef-dc988d7d96a7@denninger.net> <87v9w4qjy8.fsf@toy.adminart.net> <20190715014129.GA62729@neutralgood.org> <87ftn8otem.fsf@toy.adminart.net> <20190715151621.GB31450@neutralgood.org>
next in thread | previous in thread | raw e-mail | index | archive | help
"Kevin P. Neal" <kpn@neutralgood.org> writes: > On Mon, Jul 15, 2019 at 05:42:25AM +0200, hw wrote: >> "Kevin P. Neal" <kpn@neutralgood.org> writes: >> > Oh, and my Dell machines are old enough that I'm stuck with the hardware >> > RAID controller. I use ZFS and have raid0 arrays configured with single >> > drives in each. I _hate_ it. When a drive fails the machine reboots and >> > the controller hangs the boot until I drive out there and dump the card's >> > cache. It's just awful. >> >> That doesn't sound like a good setup. Usually, nothing reboots when a >> drive fails. >> >> Would it be a disadvantage to put all drives into a single RAID10 (or >> each half of them into one) and put ZFS on it (or them) if you want to >> keep ZFS? > > Well, it still leaves me with the overhead of dealing with creating arrays > in the hardware. Didn't you need to create the RAID0s having a single disk, too? > And it costs me loss of the scrubbing/verification of the end-to-end > checksumming. So I'm less safe there with no less work. If you're worried about the controller giving results that lead to the correct check sums and data ending up on the disk not matching these check sums when the controller reads it later, what difference does it make which kind of RAID you use? You can always run a scrub to verify the check sums, and if errors are being found, you may need to replace the controller. > It would probably eliminate the reboots, though. But that's only if my > theory about the reboots is correct. > > The failures I've seen involve the circuit board on the drive failing and > the drive not responding to any commands ever again. My guess is that the > ZFS watchdog timer is rebooting because commands don't complete within > the timeout period. I could change that by changing the setting that keeps > ZFS from writing to a drive when a drive vanishes, but then I lose the > safety of pausing the system when a drive pops out of the slot. Yes, that > has happened before. Do the drives pop back into the slots all by themselves before the timeout expires? When a drive becomes unresponsive, ZFS should just fail it and continue to work with the remaining ones. I've seen it doing that. > Maybe I should just go ahead and change it. I've got a drive about to > fail on me. It's a three way mirror so I'm not worried about it. It would > be, uh, _nice_ if it didn't bring down the machine, though. If you were using two or more disks each in a RAID1 or RAID10 to create one disk exposed to ZFS, you wouldn't have a problem when one disk becomes unresponsive. If there's someone around who is used to quickly popping the disks back into their slots, that someone could as well replace a failed disk by simply taking it out and plugging a new one in. Hardware RAID does have advantages, so why not use them when you're stuck with it anyway?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?87blxvjn4a.fsf>