From owner-freebsd-fs@FreeBSD.ORG Tue Oct 19 15:16:03 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EB7D21065672 for ; Tue, 19 Oct 2010 15:16:03 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.emeryville.ca.mail.comcast.net (qmta04.emeryville.ca.mail.comcast.net [76.96.30.40]) by mx1.freebsd.org (Postfix) with ESMTP id D03EE8FC0A for ; Tue, 19 Oct 2010 15:16:03 +0000 (UTC) Received: from omta03.emeryville.ca.mail.comcast.net ([76.96.30.27]) by qmta04.emeryville.ca.mail.comcast.net with comcast id LcQN1f0080b6N64A4fG3Dc; Tue, 19 Oct 2010 15:16:03 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta03.emeryville.ca.mail.comcast.net with comcast id LfG21f0063LrwQ28PfG273; Tue, 19 Oct 2010 15:16:03 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 0F17C9B418; Tue, 19 Oct 2010 08:16:02 -0700 (PDT) Date: Tue, 19 Oct 2010 08:16:02 -0700 From: Jeremy Chadwick To: Karl Pielorz Message-ID: <20101019151602.GA61733@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS 'read-only' device / pool scan / import? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Oct 2010 15:16:04 -0000 On Tue, Oct 19, 2010 at 03:52:52PM +0100, Karl Pielorz wrote: > On FreeBSD if I bring the system up 'single user' - the first time I > do (for example): > > " > zpool status > " > > There's a pause, a flurry of disk activity - and the system appears > to import any pools it finds, and the status appears. > > I'd guess at this point - some data is written to the disks? - Is > there any way of avoiding that, i.e. a kind of "If you were to > import the pools/display a status, what would you show, without > actually writing any data?" (or importing it) - i.e. to ensure the > devices are opened read only? The activity you see is almost certainly the result of kernel modules being loaded dynamically and so on, plus disk tasting and metadata analysis. This might be different if you had opensolaris_load="yes" and zfs_load="yes" in /boot/loader.conf, but I'm guessing that isn't the case. BTW, in single-user, I always do the following before doing *any* ZFS-related work. This is assuming the system uses UFS for /, /var, /tmp, and /usr. mount -t ufs -a /etc/rc.d/hostid start /etc/rc.d/zfs start > The reason I ask is we recently had a pool that had a mishap. The > backing RAID controller dropped a drive (we were using them in > JBOD). This happened cleanly. The system got shutdown, and I think > the wrong drive was replaced. > > When it came up the RAID controller 'collapsed' the device list (so > there was no gap where the old drive was) - that, plus the wrong > drive being replaced meant when we looked at the system we had: > > " > pool: vol > state: UNAVAIL > status: One or more devices could not be used because the label is missing > or invalid. There are insufficient replicas for the pool to > continue functioning. > " > > A number of devices were listed as 'corrupted data' - some devices > were listed twice as members of the pool - i.e. pretty screwed up. > > 'undoing' the damage and restarting the server - just threw up the > same status. > > I'm wondering if through the action of having the pool > imported/mounted etc. - ZFS has actually *written* to the drives > that were available that other drives aren't available / corrupt - > and basically, because that info was written, and check-summed > correctly - it takes that as gospel now, rather than actually > 're-checking' the drives (or is simply unable to re-check the drives > - because the meta data has been changed from the previous boot). > > If you see what I mean :) > > In the end, we're fortunate - we have backups (and they're currently > restoring now) - but I was just interested in if you 'attempt' to > mount/import a messed up pool - it could potentially screwup any > chances of mounting that pool cleanly again, even if you were to > 'undo' the hardware changes. > > I have a feeling that a zpool import or 'initial' zpool status has > to be a read/write operation (i.e. would fail anyway if you could > magically make the underlying devices read-only?) Experts here might be able to help, but you're really going to need to provide every little detail, in chronological order. What commands were done, what output was seen, what physical actions took place, etc.. 1) Restoring from backups is probably your best bet (IMHO; this is what I would do as well). 2) Take inventory of your hardware; get disk serial numbers (smartctl can show you this) and so on, and correlate them with specific device IDs (adaX, daX, etc.). I also label my drive bays with which device ID they associate with. 3) You didn't disclose what kind of ZFS pool setup you have. I'm guessing raidz1. Or is it a pool of mirrors? Or raidz2? Or...? It obviously matters. 4) If it's raidz1, I'm guessing you're hurt, and here's why: the disk falling off the bus during shut-down (or whatever happened -- meaning the incident that occurred *prior* to the wrong disk being replaced) would almost certainly have resulted in ZFS saying "hey! the array is degraded! Replace the disk! If anything else happens until that disk is replaced, you'll experience data loss!" Then somehow the wrong disk was replaced. At this point you have 1 disk which may be broken/wrong/whatever (the one which disappeared from the bus during the system shut-down), and one disk which is brand new and needs to be resilvered... so you're basically down to 1 disk. raidz1 isn't going to help you in this case: you can lose 1 disk, regardless of situation, period. There's a lot of other things I could add to the item list here (probably reach 9 or 10 if I tried), but in general the above sounds like its what happened. raidz2 would have been able to save you in this situation, but would require at least 4 disks. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |