Date: Sat, 25 Jun 2011 14:11:11 -0700 From: perryh@pluto.rain.com To: freebsd-geom@freebsd.org Subject: gmirror robustness Message-ID: <4e064eef.qVQOy1VCTJp2wI/g%perryh@pluto.rain.com>
next in thread | raw e-mail | index | archive | help
How would I go about making gmirror more robust WRT transient errors? Once in a while I get a sequence like this (reformatted): Jun 25 15:55:30 fbsd81 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=615769530 Jun 25 15:55:30 fbsd81 kernel: ad8: FAILURE - WRITE_DMA48 status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=615769530 Jun 25 15:55:30 fbsd81 kernel: GEOM_MIRROR: Request failed (error=5). ad8s2a[WRITE(offset=315265765888, length=78336)] Jun 25 15:55:30 fbsd81 kernel: GEOM_MIRROR: Device gm0: provider ad8s2a disconnected. It's always the same 4 messages: a retried WRITE_DMA48 UDMA ICRC error, a WRITE_DMA48 "FAILURE" on the same LBA with status=51 and error=4, a gmirror "Request failed (error=5)", and a disconnect. The LBA, offset, and length vary from one instance to another. It's unclear why the ad8 driver is returning an error indication after a single retryable error -- I'll be asking about that on drivers@ -- but the question here is how to improve gmirror's handling of the situation. I'd prefer to have gmirror retry before giving up and disconnecting, or at least deactivate instead of disconnecting (so that I can reactivate, and have it update the mirror, rather than having to re-insert the disconnected provider and have gmirror spend the next couple of hours recopying everything). Are there any configuration settings that would affect this behavior, or would I have to hack the code?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4e064eef.qVQOy1VCTJp2wI/g%perryh>