From owner-freebsd-geom@FreeBSD.ORG Mon Nov 15 18:04:30 2004 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 91B4716A4CE for ; Mon, 15 Nov 2004 18:04:30 +0000 (GMT) Received: from gromit.dlib.vt.edu (gromit.dlib.vt.edu [128.173.49.29]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1324543D31 for ; Mon, 15 Nov 2004 18:04:30 +0000 (GMT) (envelope-from paul@gromit.dlib.vt.edu) Received: from zappa.Chelsea-Ct.Org (pool-151-199-90-129.roa.east.verizon.net [151.199.90.129]) by gromit.dlib.vt.edu (8.13.1/8.13.1) with ESMTP id iAFI4SqB041842 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 15 Nov 2004 13:04:29 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) Received: from zappa.Chelsea-Ct.Org (localhost.Chelsea-Ct.Org [127.0.0.1]) by zappa.Chelsea-Ct.Org (8.13.1/8.13.1) with ESMTP id iAFI4LKt081424 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 15 Nov 2004 13:04:22 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) Received: (from paul@localhost) by zappa.Chelsea-Ct.Org (8.13.1/8.13.1/Submit) id iAFI4LXj081423 for freebsd-geom@freebsd.org; Mon, 15 Nov 2004 13:04:21 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) X-Authentication-Warning: zappa.Chelsea-Ct.Org: paul set sender to paul@gromit.dlib.vt.edu using -f From: Paul Mather To: freebsd-geom@freebsd.org Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Mon, 15 Nov 2004 13:04:20 -0500 Message-Id: <1100541860.31778.36.camel@zappa.Chelsea-Ct.Org> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 FreeBSD GNOME Team Port Subject: Panic after trying to recover from drive failure with geom_vinum X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Nov 2004 18:04:30 -0000 I have a 5.3-STABLE system upgraded from a 5.2.1 system that used a root-on-vinum mirrored setup. Both under 5.2.1 and 5.3, the system periodically gets those "TIMEOUT - WRITE_DMA retrying" errors you sometimes hear people mention. Usually, it is nothing, but it seems the one that happened last night caused geom_vinum to mark the drive as down and flag all its plexes and subdisks down, too: Nov 15 04:34:14 handle kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=1581375 Nov 15 04:34:15 handle kernel: ad0: FAILURE - WRITE_DMA timed out Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk swap.p0.s0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex swap.p0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk root.p0.s0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex root.p0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk var.p0.s0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex var.p0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: subdisk usr.p0.s0 is down Nov 15 04:34:15 handle kernel: GEOM_VINUM: plex usr.p0 is down Of course, the drive wasn't actually down, but how to tell geom_vinum that? I tried "gvinum start laurel" (laurel is the name for the ad0 drive), but geom_vinum said it couldn't. So, I thought I'd try and start the plexes individually. Unfortunately, "gvinum start root.p0" caused the machine to reboot. (I was logged in via SSH so I couldn't see what happened on the console; I'm presuming there was a panic followed by a reboot.) Luckily, when the system came back, "laurel" was now flagged as "up" and so a "gvinum start" of each plex synchronised them and brought them all back up. My question is this: what would be a better way to recover from this in the future, i.e., how to let geom_vinum know the drive was in fact "up"? With classic vinum, "setstate" could have been used as a last resort. I thought in retrospect that perhaps an "atacontrol detach" followed by an "atacontrol attach" might have brought the drive's real state to geom_vinum's attention. Does this sound likely? I'm just trying to avoid another unnecessary panic+reboot in the future, here. :-) Cheers, Paul. -- e-mail: paul@gromit.dlib.vt.edu "Without music to decorate it, time is just a bunch of boring production deadlines or dates by which bills must be paid." --- Frank Vincent Zappa From owner-freebsd-geom@FreeBSD.ORG Wed Nov 17 03:44:45 2004 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7557E16A4CE for ; Wed, 17 Nov 2004 03:44:45 +0000 (GMT) Received: from server1.astraldream.net (astraldream.net [69.20.5.160]) by mx1.FreeBSD.org (Postfix) with ESMTP id C5A4543D31 for ; Wed, 17 Nov 2004 03:44:42 +0000 (GMT) (envelope-from ssouhlal@FreeBSD.org) Received: from [192.168.0.20] (63-170-138-118.cst-sg.blacksburg.ntc-com.net [63.170.138.118]) (authenticated (0 bits)) by server1.astraldream.net (8.11.6/8.11.6) with ESMTP id iAH3iaX18597 verified NO) for ; Tue, 16 Nov 2004 22:44:42 -0500 From: Suleiman Souhlal To: freebsd-geom@FreeBSD.org Content-Type: text/plain Message-Id: <1100663076.6480.76.camel@localhost> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Tue, 16 Nov 2004 22:44:36 -0500 Content-Transfer-Encoding: 7bit Subject: Severe gbde problem X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Nov 2004 03:44:45 -0000 Hello, I was able to verify that there is a pretty severe problem with gbde, as noted in http://www.freebsd.org/cgi/query-pr.cgi?pr=72812 : I enabled gbde_swap in a 5.3 installation running inside qemu, and after running a few processes making large mallocs, the following message appeared on the console: swap_pager: I/O error - pageout failed; blkno 5610,size 4096,error 0 After rebooting, the system was unbootable (stuck at boot0). Apparently, it has overwritten parts of the disklabel, or something similar. The swap partition was right at the beginning of the disk. In a previous (failed) attempt to reproduce this problem, the swap partition was actually the second partition on the disk, and none of the symptoms happened. So, it seems that gbde is overwriting bytes that are just before its partition. Any ideas on how to track this bug down? Bye, -- Suleiman Souhlal | ssouhlal@vt.edu The FreeBSD Project | ssouhlal@FreeBSD.org From owner-freebsd-geom@FreeBSD.ORG Wed Nov 17 06:31:19 2004 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1636C16A4D0; Wed, 17 Nov 2004 06:31:19 +0000 (GMT) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5B58843D1F; Wed, 17 Nov 2004 06:31:18 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.13.1/8.13.1) with ESMTP id iAH6VES4020024; Wed, 17 Nov 2004 07:31:14 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Suleiman Souhlal From: "Poul-Henning Kamp" In-Reply-To: Your message of "Tue, 16 Nov 2004 22:44:36 EST." <1100663076.6480.76.camel@localhost> Date: Wed, 17 Nov 2004 07:31:14 +0100 Message-ID: <20023.1100673074@critter.freebsd.dk> Sender: phk@critter.freebsd.dk cc: freebsd-geom@freebsd.org Subject: Re: Severe gbde problem X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Nov 2004 06:31:19 -0000 In message <1100663076.6480.76.camel@localhost>, Suleiman Souhlal writes: > >Any ideas on how to track this bug down? Do not put your swap partition over the first 16 sectors of the disk, those sectors contain bootcode. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-geom@FreeBSD.ORG Wed Nov 17 11:34:45 2004 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C51C516A4CE for ; Wed, 17 Nov 2004 11:34:45 +0000 (GMT) Received: from server1.astraldream.net (astraldream.net [69.20.5.160]) by mx1.FreeBSD.org (Postfix) with ESMTP id 612EE43D1F for ; Wed, 17 Nov 2004 11:34:45 +0000 (GMT) (envelope-from ssouhlal@FreeBSD.org) Received: from [192.168.1.12] (63-170-138-118.cst-sg.blacksburg.ntc-com.net [63.170.138.118]) (authenticated (0 bits)) by server1.astraldream.net (8.11.6/8.11.6) with ESMTP id iAHBYdX22654 (using TLSv1/SSLv3 with cipher RC4-SHA (128 bits) verified NO); Wed, 17 Nov 2004 06:34:44 -0500 In-Reply-To: <20023.1100673074@critter.freebsd.dk> References: <20023.1100673074@critter.freebsd.dk> Mime-Version: 1.0 (Apple Message framework v619) Content-Type: multipart/mixed; boundary=Apple-Mail-3--22496007 Message-Id: From: Suleiman Souhlal Date: Wed, 17 Nov 2004 06:34:31 -0500 To: "Poul-Henning Kamp" X-Mailer: Apple Mail (2.619) cc: freebsd-geom@FreeBSD.org Subject: Re: Severe gbde problem X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Nov 2004 11:34:45 -0000 --Apple-Mail-3--22496007 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed Hello, On Nov 17, 2004, at 1:31 AM, Poul-Henning Kamp wrote: > Do not put your swap partition over the first 16 sectors of the disk, > those sectors contain bootcode. Don't you mean "do not put gbde partitions over the first 16 sectors of the disk"? I'm pretty sure any gbde partition would eventually overwrite the first 16 sectors. If so, shouldn't we mention this in gbde(8), as proposed in the following patch? --Apple-Mail-3--22496007 Content-Transfer-Encoding: 7bit Content-Type: application/octet-stream; x-unix-mode=0644; name="gbde.8.diff" Content-Disposition: attachment; filename=gbde.8.diff Index: /home/refugee/freebsd/src/sbin/gbde/gbde.8 =================================================================== --- src/sbin/gbde/gbde.8 (revision 48) +++ /home/refugee/freebsd/src/sbin/gbde/gbde.8 (working copy) @@ -213,3 +213,8 @@ .Sh BUGS The cryptographic algorithms and the overall design have not been attacked mercilessly for over 10 years by a gang of cryptoanalysts. +.Pp +.Nm +should not be used on partitions at the beginning of the disk, as it might +overwrite the first 16 sectors, which contain bootcode, rendering the disk +unbootable. --Apple-Mail-3--22496007 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed -- Suleiman Souhlal | ssouhlal@vt.edu The FreeBSD Project | ssouhlal@FreeBSD.org --Apple-Mail-3--22496007-- From owner-freebsd-geom@FreeBSD.ORG Fri Nov 19 04:49:34 2004 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 691FC16A4CE for ; Fri, 19 Nov 2004 04:49:34 +0000 (GMT) Received: from gromit.dlib.vt.edu (gromit.dlib.vt.edu [128.173.49.29]) by mx1.FreeBSD.org (Postfix) with ESMTP id 04C6E43D5C for ; Fri, 19 Nov 2004 04:49:34 +0000 (GMT) (envelope-from paul@gromit.dlib.vt.edu) Received: from zappa.Chelsea-Ct.Org (pool-151-199-90-129.roa.east.verizon.net [151.199.90.129]) by gromit.dlib.vt.edu (8.13.1/8.13.1) with ESMTP id iAJ4nVCg090056 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Thu, 18 Nov 2004 23:49:32 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) Received: from zappa.Chelsea-Ct.Org (localhost.Chelsea-Ct.Org [127.0.0.1]) by zappa.Chelsea-Ct.Org (8.13.1/8.13.1) with ESMTP id iAJ4nP2X016755 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Thu, 18 Nov 2004 23:49:26 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) Received: (from paul@localhost) by zappa.Chelsea-Ct.Org (8.13.1/8.13.1/Submit) id iAJ4nOV8016754 for freebsd-geom@freebsd.org; Thu, 18 Nov 2004 23:49:24 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) X-Authentication-Warning: zappa.Chelsea-Ct.Org: paul set sender to paul@gromit.dlib.vt.edu using -f From: Paul Mather To: freebsd-geom@freebsd.org Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Thu, 18 Nov 2004 23:49:21 -0500 Message-Id: <1100839762.5421.21.camel@zappa.Chelsea-Ct.Org> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 FreeBSD GNOME Team Port Subject: geom_mirror synchronisation hangs at boot X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Nov 2004 04:49:34 -0000 I had another one of those occasional "TIMEOUT - WRITE_DMA" messages. This time it was on a 6.0-CURRENT system (last built 2004-11-12) with a geom_mirror setup, and it caused one of the providers to be removed from the mirror and to operate in degraded mode: Nov 18 14:15:07 zappa kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=49981679 Nov 18 14:15:10 zappa kernel: ad0: FAILURE - WRITE_DMA timed out Nov 18 14:15:10 zappa kernel: GEOM_MIRROR: Cannot update metadata on disk ad0 (error=5). Nov 18 14:15:10 zappa kernel: GEOM_MIRROR: Device raid1: provider ad0 disconnected. Nov 18 14:15:10 zappa kernel: GEOM_MIRROR: Request failed (error=1). ad0[WRITE(offset=809639936, length=2048)] As with my geom_vinum episode, the drive wasn't really down. So, like my geom_vinum case, I decided to reboot to make the drive magically be recognised as alive. (In retrospect, an atacontrol detach/attach would have been better. I'll remember that in future.:) When the system rebooted, all appeared well. The drive was recognised and I could hear reconstruction kick in as ad0 was rebuilt. Unfortunately, a moment later, when it came to probe my atapicam devices (cd0 and da0 ZIP drive), the sounds of furious reconstruction vanished and the machine locked up. I am wondering if the atapicam device probing interfered with/was affected by the rebuilding that had kicked off on ad0. Is there some way of delaying reconstruction such that it begins after some timed delay---rather like the way the background fsck is delayed for 60 seconds to get most of the boot sequence completed before it begins pounding on the drives? I've never yet had any problems reconstructing a mirror when the OS is up and running. In the end, the only way I seemed to be able to render my system bootable was to download and burn the latest FreeSBIE BETA CD, boot that, and then "kldload geom_mirror" to initiate reconstruction once the system was running. :-( I know there are various gmirror options to disable auto-synchronisation. Is there anything that can be twiddled in the loader to enable/disable auto-synchronisation? That would be really handy. Here is the typical order of probing of ATA devices in my system. Note that cd0 and da0 come after the GEOM_MIRROR discovery/initialisation: Nov 18 19:36:44 zappa kernel: ad0: 24405MB [49585/16/63] at ata0-master UDMA33 Nov 18 19:36:44 zappa kernel: acd0: DVDR at ata0-slave UDMA33 Nov 18 19:36:44 zappa kernel: ata1-slave: FAILURE - SETFEATURES SET TRANSFER MODE status=1 error=4 Nov 18 19:36:44 zappa kernel: ad2: 24405MB [49585/16/63] at ata1-master UDMA33 Nov 18 19:36:44 zappa kernel: afd0: REMOVABLE at ata1-slave BIOSPIO Nov 18 19:36:44 zappa kernel: GEOM_MIRROR: Device raid1 created (id=1030107361). Nov 18 19:36:44 zappa kernel: GEOM_MIRROR: Device raid1: provider ad0 detected. Nov 18 19:36:44 zappa kernel: GEOM_MIRROR: Device raid1: provider ad2 detected. Nov 18 19:36:44 zappa kernel: GEOM_MIRROR: Device raid1: provider ad2 activated. Nov 18 19:36:44 zappa kernel: GEOM_MIRROR: Device raid1: provider ad0 activated. Nov 18 19:36:44 zappa kernel: GEOM_MIRROR: Device raid1: provider mirror/raid1 launched. Nov 18 19:36:44 zappa kernel: da0 at ata1 bus 0 target 1 lun 0 Nov 18 19:36:44 zappa kernel: da0: Removable Direct Access SCSI-0 device Nov 18 19:36:44 zappa kernel: da0: 3.300MB/s transfers Nov 18 19:36:44 zappa kernel: da0: 96MB (196608 512 byte sectors: 64H 32S/T 96C) Nov 18 19:36:44 zappa kernel: cd0 at ata0 bus 0 target 1 lun 0 Nov 18 19:36:44 zappa kernel: cd0: Removable CD-ROM SCSI-0 device Nov 18 19:36:44 zappa kernel: cd0: 33.000MB/s transfers Nov 18 19:36:44 zappa kernel: cd0: Attempt to query device size failed: NOT READY, Medium not present Nov 18 19:36:44 zappa kernel: Mounting root from ufs:/dev/mirror/raid1a In the boot hang on auto-synchronisation, the hang would occur just before the "Mounting root" line. Cheers, Paul. -- e-mail: paul@gromit.dlib.vt.edu "Without music to decorate it, time is just a bunch of boring production deadlines or dates by which bills must be paid." --- Frank Vincent Zappa From owner-freebsd-geom@FreeBSD.ORG Fri Nov 19 16:05:23 2004 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E43AB16A4CE for ; Fri, 19 Nov 2004 16:05:23 +0000 (GMT) Received: from mtiwmhc12.worldnet.att.net (mtiwmhc12.worldnet.att.net [204.127.131.116]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9208943D53 for ; Fri, 19 Nov 2004 16:05:23 +0000 (GMT) (envelope-from duanewinner@worldnet.att.net) Received: from [10.10.100.90] (unknown[216.113.237.29]) by worldnet.att.net (mtiwmhc12) with ESMTP id <2004111916051911200mi6oae> (Authid: duanewinner); Fri, 19 Nov 2004 16:05:23 +0000 Message-ID: <419E19BE.20007@att.net> Date: Fri, 19 Nov 2004 11:05:18 -0500 From: Duane Winner User-Agent: Mozilla Thunderbird 0.9 (X11/20041108) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-geom@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: GEOM: create disk during runtime? (security run output) X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Nov 2004 16:05:24 -0000 Hello, I'm hoping somebody on this list can shed some light on this. My boss sent me a copy of his daily cron security run output, which contained this: localhost.local kernel log messages: GEOM: create disk ad0 dp=0xc6b77d60 GEOM: create disk cd0 dp=0xc69a8600 We're all running FreeBSD 5.2.1-p12. I've seen the "GEOM: create disk" messages plenty of times on boot and in my dmesg's, but never really paid much attention to them, since I don't really understand how GEOM works or how to interpret these kinds of messages. But I don't recall ever seeing it in a security run cronjob. My first question is "why is this happening during a security run cronjob? If this system is already booted and running, why is GEOM creating disks?" My second question is "is this something bad? Is it a red flag?" The only reasonable (and non-threatening) answer I could come up with is maybe it's because the machine went into or came out of suspend mode near the time the cronjob ran. (APM) The bottom line is I don't really know anything about GEOM, and would like to know what this means so preventative action can be taken if necessary. Thanks for any info, Duane