From owner-freebsd-fs@FreeBSD.ORG Sun Oct 30 06:46:19 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5FCBB106566C for ; Sun, 30 Oct 2011 06:46:19 +0000 (UTC) (envelope-from haroldp@internal.org) Received: from pluto.internal.org (mail.internal.org [64.191.53.117]) by mx1.freebsd.org (Postfix) with ESMTP id 1755C8FC13 for ; Sun, 30 Oct 2011 06:46:18 +0000 (UTC) Received: from [10.0.0.79] (99-46-24-87.lightspeed.renonv.sbcglobal.net [99.46.24.87]) by pluto.internal.org (Postfix) with ESMTPA id CB60CEC9BE for ; Sat, 29 Oct 2011 23:46:16 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) From: Harold Paulson In-Reply-To: <20111023140222.GG1697@garage.freebsd.pl> Date: Sat, 29 Oct 2011 23:46:15 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <0D7D4701-925D-4BFC-A2BE-51892CD08B45@internal.org> References: <4D8047A6-930E-4DE8-BA55-051890585BFE@internal.org> <20111023140222.GG1697@garage.freebsd.pl> To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.1084) Subject: Re: Damaged directory on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Oct 2011 06:46:19 -0000 Pawel,=20 On Oct 23, 2011, at 7:02 AM, Pawel Jakub Dawidek wrote: > On Mon, Oct 17, 2011 at 05:17:31PM -0700, Harold Paulson wrote: >> Hello,=20 >>=20 >> I've had a server that boots from ZFS panicking for a couple days. I = have worked around the problem for now, but I hope someone can give me = some insight into what's going on, and how I can solve it properly. =20 >>=20 >> The server is running 8.2-STABLE (zfs v28) with 8G of ram and 4 SATA = disks in a raid10 type arrangement: >>=20 >> # uname -a =20 >> FreeBSD jane.sierraweb.com 8.2-STABLE-201105 FreeBSD = 8.2-STABLE-201105 #0: Tue May 17 05:18:48 UTC 2011 = root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 >>=20 >> And zpool status:=20 >>=20 >> NAME STATE READ WRITE CKSUM >> tank ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> gpt/disk0 ONLINE 0 0 0 >> gpt/disk1 ONLINE 0 0 0 >> mirror ONLINE 0 0 0 >> gpt/disk2 ONLINE 0 0 0 >> gpt/disk3 ONLINE 0 0 0 >>=20 >> It started panicking under load a couple days ago. We replaced RAM = and motherboard, but problems persisted. I don't know if a hardware = issue originally caused the problem or what. When it panics, I get the = usual panic message, but I don't get a core file, and it never reboots = itself. =20 >>=20 >> http://pastebin.com/F1J2AjSF >>=20 >> While I was trying to figure out the source of the problem, I notice = stuck various stuck processes that peg a CPU and can't be killed, such = as: >>=20 >> PID JID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU = COMMAND >> 48735 0 root 1 46 0 11972K 924K CPU3 3 415:14 = 100.00% find >>=20 >> They are not marked zombie, but I can't kill them, and restarting the = jail they are in won't even get rid of them. truss just hangs with no = output on them. On different occasions, I noticed pop3d processes for = the same user getting stuck in this way. On a hunch I ran a "find" = through the files in the user's Maildir and got a panic. I disabled = this account and now the server is stable again. At least until = locate.updatedb walks through that directory, I suppose. Evidentially, = there is some kind of hole in the file system below that directory tree = causing the panic. =20 >>=20 >> I can move that directory out of the way, and carry on, but is there = anything I can do to really *repair* the problem? >=20 > Could you run these commands: >=20 > objdump -D /boot/kernel/zfs.ko.symbols | egrep '^[0-9a-f]{8,16} = ' | awk '{printf("0x%s\n", $1)}' | xargs -J ADDR = printf "%u + %u\n" ADDR 0x111 | bc | xargs printf "0x%x\n" | xargs = addr2line -e /boot/kernel/zfs.ko.symbols >=20 > They should convert fzap_cursor_retrieve+0x111 info file:line. Send it > here once you obtain it. % objdump -D /boot/kernel/zfs.ko.symbols | egrep '^[0-9a-f]{8,16} = ' | awk '{printf("0x%s\n", $1)}' | xargs -J ADDR = printf "%u + %u\n" ADDR 0x111 | bc | xargs printf "0x%x\n" | xargs = addr2line -e /boot/kernel/zfs.ko.symbols = /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/= zap.c:1158 - H