From owner-freebsd-current@FreeBSD.ORG Fri May 8 03:46:10 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 70F2B1065677 for ; Fri, 8 May 2009 03:46:10 +0000 (UTC) (envelope-from rmtodd@ichotolot.servalan.com) Received: from mx1.synetsystems.com (mx1.synetsystems.com [76.10.206.14]) by mx1.freebsd.org (Postfix) with ESMTP id 4D9EA8FC22 for ; Fri, 8 May 2009 03:46:09 +0000 (UTC) (envelope-from rmtodd@ichotolot.servalan.com) Received: by mx1.synetsystems.com (Postfix, from userid 66) id 63403CD4; Thu, 7 May 2009 23:46:09 -0400 (EDT) Received: from rmtodd by servalan.servalan.com with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1M2GPl-0004YR-EM; Thu, 07 May 2009 22:06:21 -0500 To: freebsd-current@freebsd.org, Martin References: <20090507210516.06331fb2@zelda.local> From: Richard Todd Date: Thu, 07 May 2009 22:06:21 -0500 In-Reply-To: (Martin's message of "Thu, 7 May 2009 21:05:16 +0200") Message-ID: User-Agent: Gnus/5.1008 (Gnus v5.10.8) XEmacs/21.4.22 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Subject: Re: ZFS panic space_map.c line 110 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 May 2009 03:46:11 -0000 Martin writes: > Hi, > > I have a file server running ZFS on -CURRENT. Someone has tried to > transfer a file with several gigabytes onto the system. The kernel > crashed with a panic and freezed up during spewing the panic. I've only > written down the most important messages: > > solaris assert ss==NULL > zfs/space_map.c line 110 > > process: 160 spa_zio > > I've heard that I can try to move the zpool cache away and import the > zpool with force once again. Will this help? I kinda doubt it. > zpool with force once again. Will this help? I am asking because I > don't know if the panic is caused by a corrupt cache or corrupt > file system metadata. Maybe someone can explain it. (I had to switch the This panic wouldn't have anything to do with zpool.cache (that's just a file to help the system find which devices it should expect to find zpools on during boot). This is a problem with the free space map, which is part of the filesystem metadata. If you're lucky, it's just the in-core copy of the free space map that was bogus and there's a valid map on disk. If you're unlucky, the map on disk is trashed, and there's no really easy way to recover that pool. > Is this issue with inconsistent zpools well known? I've seen some posts > from 2007 and January 2009 that reported similar problems. Apparently > some people have lost their entire zpools multiple times already, as > far as I understood it. Mine was probably one of those messages; I managed to get an error like that once, through Seriously Provoking the system (repeatedly unmounting and mounting the main filesystem on one pool) while attempting to debug a different, unrelated problem. It's not something I've ever seen in any sort of "normal" usage, and just copying a few gig to the FS shouldn't cause this sort of problem. I managed to recover the data without having to resort to backups, by hacking the kernel to disable some of the asserts in space_map.c, iterating until I reached a point where I got a kernel that could import the pool without panicing. Once I did that I managed to mount the fs readonly and copy everything off to a different device. Like I said, not an *easy* way to recover that data. > One more piece of information I can give is that every hour the ZFS file > systems create snapshots. Maybe it triggered some inconsistency between > the writes to a file system and the snapshot, I cannot tell, because I > don't understand the condition. I doubt this had anything to do with the problem.