From owner-freebsd-geom@FreeBSD.ORG Sat Jun 25 21:13:25 2011 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 438AD106564A for ; Sat, 25 Jun 2011 21:13:25 +0000 (UTC) (envelope-from perryh@pluto.rain.com) Received: from agora.rdrop.com (agora.rdrop.com [IPv6:2607:f678:1010::34]) by mx1.freebsd.org (Postfix) with ESMTP id 0D5FC8FC12 for ; Sat, 25 Jun 2011 21:13:25 +0000 (UTC) Received: from agora.rdrop.com (66@localhost [127.0.0.1]) by agora.rdrop.com (8.13.1/8.12.7) with ESMTP id p5PLDO5Q083972 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Sat, 25 Jun 2011 14:13:24 -0700 (PDT) (envelope-from perryh@pluto.rain.com) Received: (from uucp@localhost) by agora.rdrop.com (8.13.1/8.12.9/Submit) with UUCP id p5PLDOVj083971 for freebsd-geom@freebsd.org; Sat, 25 Jun 2011 14:13:24 -0700 (PDT) Received: from fbsd61 by pluto.rain.com (4.1/SMI-4.1-pluto-M2060407) id AA08390; Sat, 25 Jun 11 14:11:47 PDT Date: Sat, 25 Jun 2011 14:11:11 -0700 From: perryh@pluto.rain.com To: freebsd-geom@freebsd.org Message-Id: <4e064eef.qVQOy1VCTJp2wI/g%perryh@pluto.rain.com> User-Agent: nail 11.25 7/29/05 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: gmirror robustness X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2011 21:13:25 -0000 How would I go about making gmirror more robust WRT transient errors? Once in a while I get a sequence like this (reformatted): Jun 25 15:55:30 fbsd81 kernel: ad8: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=615769530 Jun 25 15:55:30 fbsd81 kernel: ad8: FAILURE - WRITE_DMA48 status=51 error=4 LBA=615769530 Jun 25 15:55:30 fbsd81 kernel: GEOM_MIRROR: Request failed (error=5). ad8s2a[WRITE(offset=315265765888, length=78336)] Jun 25 15:55:30 fbsd81 kernel: GEOM_MIRROR: Device gm0: provider ad8s2a disconnected. It's always the same 4 messages: a retried WRITE_DMA48 UDMA ICRC error, a WRITE_DMA48 "FAILURE" on the same LBA with status=51 and error=4, a gmirror "Request failed (error=5)", and a disconnect. The LBA, offset, and length vary from one instance to another. It's unclear why the ad8 driver is returning an error indication after a single retryable error -- I'll be asking about that on drivers@ -- but the question here is how to improve gmirror's handling of the situation. I'd prefer to have gmirror retry before giving up and disconnecting, or at least deactivate instead of disconnecting (so that I can reactivate, and have it update the mirror, rather than having to re-insert the disconnected provider and have gmirror spend the next couple of hours recopying everything). Are there any configuration settings that would affect this behavior, or would I have to hack the code?