From owner-freebsd-geom@FreeBSD.ORG Mon Oct 4 08:44:49 2004 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2A38A16A4CE for ; Mon, 4 Oct 2004 08:44:49 +0000 (GMT) Received: from clueful.shagged.org (clueful.shagged.org [212.13.201.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id C625943D48 for ; Mon, 4 Oct 2004 08:44:48 +0000 (GMT) (envelope-from chris@clueful.shagged.org) Received: from chris by clueful.shagged.org with local (Exim 4.40 (FreeBSD)) id 1CEOSY-000H60-Hb for freebsd-geom@freebsd.org; Mon, 04 Oct 2004 09:44:42 +0100 Date: Mon, 4 Oct 2004 09:44:42 +0100 From: Chris Elsworth To: freebsd-geom@freebsd.org Message-ID: <20041004084442.GA65504@shagged.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.6i Sender: Chris Elsworth X-Shagged-MailScanner-Information: See www.mailscanner.info for information X-Shagged-MailScanner: Found to be clean X-MailScanner-From: chris@clueful.shagged.org Subject: SCSI disk getting disconnected on boot X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Oct 2004 08:44:49 -0000 Hello, After having a two-way gmirror happily working for a few days, upon rebooting both machines, they both seem to have lost half the mirror. Here's the debug output from bootup on one of them: Waiting 5 seconds for SCSI devices to settle GEOM_MIRROR[2]: Tasting acd0. da0 at ahc0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-3 device da0: 160.000MB/s transfers (80.000MHz, offset 127, 16bit), Tagged Queueing Enabd da0: 34732MB (71132959 512 byte sectors: 255H 63S/T 4427C) da1 at ahc0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-3 device da1: 160.000MB/s transfers (80.000MHz, offset 127, 16bit), Tagged Queueing Enabd da1: 34732MB (71132959 512 byte sectors: 255H 63S/T 4427C) GEOM_MIRROR[2]: Tasting da0. SMP: AP CPU #1 Launched! magic: GEOM::MIRROR version: 1 name: gm mid: 1573691141 did: 1965364196 all: 2 syncid: 3 priority: 0 slice: 4096 balance: split mediasize: 36420074496 sectorsize: 512 syncoffset: 12766412800 mflags: NONE dflags: DIRTY SYNCHRONIZING hcprovider: da0 MD5 hash: ad3dd443dde332bde5d63b262571dcc9 GEOM_MIRROR[1]: Creating device gm (id=1573691141). GEOM_MIRROR[0]: Device gm created (id=1573691141). GEOM_MIRROR[1]: Adding disk da0 to gm. GEOM_MIRROR[2]: Adding disk da0. GEOM_MIRROR[2]: Disk da0 connected. GEOM_MIRROR[1]: Disk da0 state changed from NONE to NEW (device gm). GEOM_MIRROR[0]: Device gm: provider da0 detected. GEOM_MIRROR[2]: Tasting da1. magic: GEOM::MIRROR version: 1 name: gm mid: 1573691141 did: 4008348218 all: 2 syncid: 3 priority: 0 slice: 4096 balance: split mediasize: 36420074496 sectorsize: 512 syncoffset: 0 mflags: NONE dflags: NONE hcprovider: da1 MD5 hash: e6584ea109907134ce7285853c7bbcb1 GEOM_MIRROR[1]: Adding disk da1 to gm. GEOM_MIRROR[2]: Adding disk da1. GEOM_MIRROR[2]: Disk da1 connected. GEOM_MIRROR[1]: Disk da1 state changed from NONE to NEW (device gm). GEOM_MIRROR[0]: Device gm: provider da1 detected. GEOM_MIRROR[1]: Device gm state changed from STARTING to RUNNING. GEOM_MIRROR[1]: Disk da1 state changed from NEW to ACTIVE (device gm). GEOM_MIRROR[2]: Access da1 r0w1e1 = 0 GEOM_MIRROR[2]: Tasting da0a. GEOM_MIRROR[2]: Access da1 r0w-1e-1 = 0 GEOM_MIRROR[2]: Metadata on da1 updated. GEOM_MIRROR[0]: Device gm: provider da1 activated. GEOM_MIRROR[1]: Disk da0 state changed from NEW to SYNCHRONIZING (device gm). GEOM_MIRROR[0]: Device gm: provider mirror/gm launched. GEOM_MIRROR[0]: Device gm: rebuilding provider da0. GEOM_MIRROR[2]: Access da0 r0w1e1 = 1 GEOM_MIRROR[1]: Disk da0 state changed from SYNCHRONIZING to DISCONNECTED (devi. GEOM_MIRROR[0]: Device gm: provider da0 disconnected. GEOM_MIRROR[2]: Disk da0 disconnected. GEOM_MIRROR[2]: Consumer da0 destroyed. GEOM_MIRROR[2]: Tasting da0b. GEOM_MIRROR[2]: Tasting da0c. GEOM_MIRROR[2]: Tasting da0d. GEOM_MIRROR[2]: Tasting da0e. GEOM_MIRROR[2]: Tasting da0f. GEOM_MIRROR[2]: Tasting da0g. GEOM_MIRROR[2]: Tasting da1a. GEOM_MIRROR[2]: Tasting da1b. GEOM_MIRROR[2]: Tasting da1c. GEOM_MIRROR[2]: Tasting da1d. GEOM_MIRROR[2]: Tasting da1e. GEOM_MIRROR[2]: Tasting da1f. GEOM_MIRROR[2]: Tasting da1g. GEOM_MIRROR[2]: Tasting mirror/gm. GEOM_MIRROR[2]: Access request for mirror/gm: r1w0e0. GEOM_MIRROR[2]: Access da1 r1w0e1 = 0 GEOM_MIRROR[2]: Access request for mirror/gm: r-1w0e0. GEOM_MIRROR[2]: Access da1 r-1w0e-1 = 0 ... After this there's lots of access requests for various da1 partitions, all of which succeed. The system boots normally from here, using the gmirror device with just one provider left. I have to activate da0 in order to get it to resync. You'll notice that in this particular case, da0 was still resyncing when I rebooted the machine, but this is reproducible even if both halves of the mirror are synced. I'd expected that even in the case of rebooting during a resync, the resync should restart after a boot, not disconnect the drive. The only explanation I could think of - da0 is the boot device; is this locking the metadata against being updated somehow? Would using mirror devices of da0s1 and da1s1 get round this? Appreciate any input :) -- Chris