From owner-freebsd-geom@FreeBSD.ORG Wed Feb 16 17:55:41 2005 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3A1A616A4CE for ; Wed, 16 Feb 2005 17:55:41 +0000 (GMT) Received: from gromit.dlib.vt.edu (gromit.dlib.vt.edu [128.173.49.29]) by mx1.FreeBSD.org (Postfix) with ESMTP id BEB4F43D31 for ; Wed, 16 Feb 2005 17:55:40 +0000 (GMT) (envelope-from paul@gromit.dlib.vt.edu) Received: from zappa.Chelsea-Ct.Org (pool-151-199-113-125.roa.east.verizon.net [151.199.113.125]) by gromit.dlib.vt.edu (8.13.1/8.13.1) with ESMTP id j1GHtckM083278 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 16 Feb 2005 12:55:39 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) Received: from zappa.Chelsea-Ct.Org (localhost.Chelsea-Ct.Org [127.0.0.1]) by zappa.Chelsea-Ct.Org (8.13.1/8.13.1) with ESMTP id j1GHtWDI000935 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 16 Feb 2005 12:55:33 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) Received: (from paul@localhost) by zappa.Chelsea-Ct.Org (8.13.1/8.13.1/Submit) id j1GHtWTP000934 for freebsd-geom@freebsd.org; Wed, 16 Feb 2005 12:55:32 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) X-Authentication-Warning: zappa.Chelsea-Ct.Org: paul set sender to paul@gromit.dlib.vt.edu using -f From: Paul Mather To: freebsd-geom@freebsd.org Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Wed, 16 Feb 2005 12:55:32 -0500 Message-Id: <1108576532.887.19.camel@zappa.Chelsea-Ct.Org> Mime-Version: 1.0 X-Mailer: Evolution 2.0.3 FreeBSD GNOME Team Port Subject: geom_mirror was stale, now broken X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Feb 2005 17:55:41 -0000 Because of the annoying ATA regression that crept into 5.3, I semi-regularly get "TIMEOUT - WRITE_DMA" errors that ultimately cause a drive to be removed from my geom_mirror configuration. :-( Previously, the drive suffering the WRITE_DMA problem would be marked as a "stale" provider during boot. Recently, this appears to have changed, and the provider is listed as "broken." E.g.: Feb 16 05:21:45 zappa kernel: ad2: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=49981679 Feb 16 05:21:50 zappa kernel: ad2: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=49981679 Feb 16 05:21:50 zappa kernel: ad2: FAILURE - WRITE_DMA timed out Feb 16 05:21:50 zappa kernel: GEOM_MIRROR: Cannot update metadata on disk ad2 (error=5). Feb 16 05:21:50 zappa kernel: GEOM_MIRROR: Device raid1: provider ad2 disconnected. [[...]] Feb 16 11:48:37 zappa kernel: FreeBSD 6.0-CURRENT #0: Fri Feb 11 09:03:49 EST 2005 [[...]] Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1 created (id=723259611). Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1: provider ad0 detected. Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1: provider ad2 detected. Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Component ad2 (device raid1) broken, skipping. Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1: provider ad0 activated. Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1: provider mirror/raid1 launched. One artifact of the mirror provider being marked as "broken" is that I can no longer simply rebuild onto it. Now, I have to "gmirror forget" and then "gmirror insert" the "broken" provider back into the mirror and then rebuild onto it. Under what circumstances is a mirror provider considered "broken" as opposed to "stale?" BTW, here is the current status of my geom_mirror (it is currently rebuilding): Geom name: raid1 State: DEGRADED Components: 2 Balance: split Slice: 4096 Flags: NOAUTOSYNC GenID: 4 SyncID: 12 ID: 723259611 Providers: 1. Name: mirror/raid1 Mediasize: 25590619648 (24G) Sectorsize: 512 Mode: r6w5e5 Consumers: 1. Name: ad0 Mediasize: 25590620160 (24G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 4 SyncID: 12 ID: 1971505175 2. Name: ad2 Mediasize: 25590620160 (24G) Sectorsize: 512 Mode: r1w1e1 State: SYNCHRONIZING Priority: 1 Flags: DIRTY, SYNCHRONIZING, FORCE_SYNC GenID: 4 SyncID: 12 Synchronized: 89% ID: 3025777059 Geom name: raid1.sync Consumers: 1. Name: mirror/raid1 Mediasize: 25590619648 (24G) Sectorsize: 512 Mode: r1w0e0 I notice the "GenID" of the providers increases at every breakage. What is the GenID? (It seems like a relatively recent addition.) I've also noticed the geom_mirror appears less resilient to drive "failures" nowadays than it did before. I recently had to do a hard reset reboot this morning because my system "froze"---apparently unable to do any disk I/O. :-( Alas, I can't get any crashdumps (I'm using GBDE-encrypted swap on my geom_mirror), and can't set up a serial console at this time because the system has only one serial port and it's being used for my olde Apple LaserWriter II right now. :-) BTW, I run X. Is it possible to break to the debugger from the regular console should the system freeze again like it did this morning? If so, how do I do that? Cheers, Paul. -- e-mail: paul@gromit.dlib.vt.edu "Without music to decorate it, time is just a bunch of boring production deadlines or dates by which bills must be paid." --- Frank Vincent Zappa