From owner-freebsd-stable@FreeBSD.ORG Fri Dec 28 14:55:51 2007 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3A05B16A41A; Fri, 28 Dec 2007 14:55:51 +0000 (UTC) (envelope-from uspoerlein@gmail.com) Received: from acme.spoerlein.net (acme.spoerlein.net [217.172.44.86]) by mx1.freebsd.org (Postfix) with ESMTP id 813F013C467; Fri, 28 Dec 2007 14:55:50 +0000 (UTC) (envelope-from uspoerlein@gmail.com) Received: from roadrunner.spoerlein.net (e180134215.adsl.alicedsl.de [85.180.134.215]) by acme.spoerlein.net (8.14.1/8.14.1) with ESMTP id lBSEtmqo094815 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 28 Dec 2007 15:55:49 +0100 (CET) (envelope-from uspoerlein@gmail.com) Received: from roadrunner.spoerlein.net (localhost [127.0.0.1]) by roadrunner.spoerlein.net (8.14.2/8.14.2) with ESMTP id lBSEtlbp007111 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 28 Dec 2007 15:55:47 +0100 (CET) (envelope-from uspoerlein@gmail.com) Received: (from q@localhost) by roadrunner.spoerlein.net (8.14.2/8.14.2/Submit) id lBSEtlMN007110; Fri, 28 Dec 2007 15:55:47 +0100 (CET) (envelope-from uspoerlein@gmail.com) Date: Fri, 28 Dec 2007 15:55:47 +0100 From: Ulrich Spoerlein To: stable@freebsd.org, Hidetoshi Shimokawa Message-ID: <20071228145547.GC1532@roadrunner.spoerlein.net> Mail-Followup-To: stable@freebsd.org, Hidetoshi Shimokawa References: <20071228125437.GB1532@roadrunner.spoerlein.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071228125437.GB1532@roadrunner.spoerlein.net> User-Agent: Mutt/1.5.17 (2007-11-01) Cc: Subject: Re: sbp(4) write error wedging GEOM mirror X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Dec 2007 14:55:51 -0000 On Fri, 28.12.2007 at 13:54:37 +0100, Ulrich Spoerlein wrote: > [Ramblings about sbp(4) wedging geom mirror] Ok, it looks like sbp(4) is off the hook. I tried the rebuilding again, this time attaching da0 via umass(4) instead of sbp(4) and while it also eventually wedges, umass can recover from this situation by its own umass0: Prolific PL-3507C USB Storage Device, rev 2.00/0.01, addr 2 da0 at umass-sim0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-0 device da0: 40.000MB/s transfers da0: 238475MB (488397168 512 byte sectors: 255H 63S/T 30401C) GEOM_MIRROR: Component da0s1 (device gm0) broken, skipping. GEOM_MIRROR: Cannot add disk da0s1 to gm0 (error=22). GEOM_MIRROR: Component da0s2 (device gm1) broken, skipping. GEOM_MIRROR: Cannot add disk da0s2 to gm1 (error=22). GEOM_MIRROR: Component da0s1 (device gm0) broken, skipping. GEOM_MIRROR: Cannot add disk da0s1 to gm0 (error=22). GEOM_MIRROR: Component da0s1 (device gm0) broken, skipping. GEOM_MIRROR: Cannot add disk da0s1 to gm0 (error=22). GEOM_MIRROR: Device gm0: provider da0s1 detected. GEOM_MIRROR: Device gm0: provider da0s1 is stale. GEOM_MIRROR: Device gm1: provider da0s2 detected. GEOM_MIRROR: Device gm1: provider da0s2 is stale. GEOM_MIRROR: Device gm0: provider da0s1 disconnected. GEOM_MIRROR: Device gm0: provider da0s1 detected. GEOM_MIRROR: Device gm0: rebuilding provider da0s1. fwohci0: BUS reset fwohci0: node_id=0xc800ffc1, gen=2, CYCLEMASTER mode firewire0: 2 nodes, maxhop <= 1, cable IRM = 1 (me) firewire0: bus manager 1 (me) fwohci0: txd err=14 ack busy_X fwohci0: txd err=14 ack busy_X fwohci0: txd err=14 ack busy_X fwohci0: BUS reset fwohci0: node_id=0xc800ffc1, gen=3, CYCLEMASTER mode firewire0: 2 nodes, maxhop <= 1, cable IRM = 1 (me) firewire0: bus manager 1 (me) firewire0: New S400 device ID:0050770e013023f0 da1 at sbp0 bus 0 target 0 lun 0 da1: Fixed Simplified Direct Access SCSI-4 device da1: 50.000MB/s transfers da1: 381554MB (781422768 512 byte sectors: 255H 63S/T 48641C) GEOM_MIRROR: Device gm2: provider da1 detected. GEOM_MIRROR: Device gm2: rebuilding provider da1. GEOM_MIRROR: Device gm0: rebuilding provider da0s1 finished. GEOM_MIRROR: Device gm0: provider da0s1 activated. GEOM_MIRROR: Device gm1: provider da0s2 disconnected. GEOM_MIRROR: Device gm1: provider da0s2 detected. GEOM_MIRROR: Device gm1: rebuilding provider da0s2. (14:08:27) root@coyote: ~# gmirror status umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR GEOM_MIRROR: CannotGEOM_MIRROR: Synchronization request failed (error=5). da0s2[WRITE(offset=23111270 write metadata on da0s1 (device=gm0, error=5). GEOM_MIRROR: Cannot update metada400, length=131072)] GEOM_MIRROR: Device gm1: provider da0s2 disconnected. GEOta on disk da0s1 (error=5). M_MIRROR: Device gm1: rebuilding provider da0s2 stopped. GEOM_MIRROR: Device gm0: provider da0s1 disconnected. umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR Expumass0: BBB reset failed, IOERROR eumass0: BBB bulk-in clear stall failed, IOERROR nsumass0: BBB bulk-out clear stall failed, IOERROR ive timeout(9) function: 0xc09623a9(0xc32de800) 0.006188295 s umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR ... (multiple pages) umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR (da0:umass-sim0:0:0:0): Synchronize cache failed, status == 0x4, scsi status == 0x0 umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR ... (multiple pages) umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR Name Status Components mirror/gm2 DEGRADED ad1 da1 (12%) mirror/gm0 DEGRADED ad0s1 mirror/gm1 DEGRADED ad0s2 (14:14:46) root@coyote: ~# (14:14:46) root@coyote: ~# gmirror status Name Status Components mirror/gm2 DEGRADED ad1 da1 (16%) mirror/gm0 DEGRADED ad0s1 mirror/gm1 DEGRADED ad0s2 (14:41:22) root@coyote: ~# fdisk -s /dev/da0 umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR Expensive timeout(9) function: 0xc0690e74(0xc342a000) 0.007737115 s umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR fdisk: can't open device /dev/da0 fdisk: cannot open disk /dev/da0: Input/output error Exit 1 (14:41:54) root@coyote: ~# camcontrol rescan 1 umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR umass0: BBB reset failed, IOERROR umass0: BBB bulk-in clear stall failed, IOERROR umass0: BBB bulk-out clear stall failed, IOERROR (da0:umass-sim0:0:0:0): lost device (da0:umass-sim0:0:0:0): removing device entry Re-scan of bus 1 was successful So as you can see, after lots of stalled transfers GEOM mirror will do the right thing and kick out the failing components. Something it cannot do when it is attached via sbp(4). Is this behaviour of sbp(4) tweakable? Cheers, Ulrich Spoerlein -- It is better to remain silent and be thought a fool, than to speak, and remove all doubt.