From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 18 00:54:50 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7CBC01065674
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 00:54:50 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta07.emeryville.ca.mail.comcast.net
	(qmta07.emeryville.ca.mail.comcast.net [76.96.30.64])
	by mx1.freebsd.org (Postfix) with ESMTP id 63A168FC12
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 00:54:50 +0000 (UTC)
Received: from omta14.emeryville.ca.mail.comcast.net ([76.96.30.60])
	by qmta07.emeryville.ca.mail.comcast.net with comcast
	id m0ty1h0041HpZEsA70ujiE; Tue, 18 Oct 2011 00:54:43 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta14.emeryville.ca.mail.comcast.net with comcast
	id m0ti1h00k1t3BNj8a0tijs; Tue, 18 Oct 2011 00:53:43 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 545D1102C1C; Mon, 17 Oct 2011 17:54:48 -0700 (PDT)
Date: Mon, 17 Oct 2011 17:54:48 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Harold Paulson <haroldp@internal.org>
Message-ID: <20111018005448.GA2855@icarus.home.lan>
References: <4D8047A6-930E-4DE8-BA55-051890585BFE@internal.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4D8047A6-930E-4DE8-BA55-051890585BFE@internal.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: Damaged directory on ZFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2011 00:54:50 -0000

On Mon, Oct 17, 2011 at 05:17:31PM -0700, Harold Paulson wrote:
> I've had a server that boots from ZFS panicking for a couple days.  I have worked around the problem for now, but I hope someone can give me some insight into what's going on, and how I can solve it properly.  
> 
> The server is running 8.2-STABLE (zfs v28) with 8G of ram and 4 SATA disks in a raid10 type arrangement:
> 
> # uname -a              
> FreeBSD jane.sierraweb.com 8.2-STABLE-201105 FreeBSD 8.2-STABLE-201105 #0: Tue May 17 05:18:48 UTC 2011     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

First thing to do is to consider upgrading to a newer RELENG_8 date.
There have been *many* ZFS fixes since May.

> And zpool status: 
> 
> 	NAME           STATE     READ WRITE CKSUM
> 	tank           ONLINE       0     0     0
> 	  mirror       ONLINE       0     0     0
> 	    gpt/disk0  ONLINE       0     0     0
> 	    gpt/disk1  ONLINE       0     0     0
> 	  mirror       ONLINE       0     0     0
> 	    gpt/disk2  ONLINE       0     0     0
> 	    gpt/disk3  ONLINE       0     0     0
> 
> It started panicking under load a couple days ago.  We replaced RAM and motherboard, but problems persisted.  I don't know if a hardware issue originally caused the problem or what.  When it panics, I get the usual panic message, but I don't get a core file, and it never reboots itself.  
> 
> http://pastebin.com/F1J2AjSF

ZFS developers will need to comment on the state of the backtrace.  You
may be requested to examine the core using kgdb and be given some
commands to run on it.

> While I was trying to figure out the source of the problem, I notice stuck various stuck processes that peg a CPU and can't be killed, such as:
> 
>   PID JID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
> 48735   0 root        1  46    0 11972K   924K CPU3    3 415:14 100.00% find

Had you done procstat -k -k 48735 (the "double -k" is not a typo), you
probably would have seen that the process was "stuck" in a ZFS-related
thread.  These are processes which the kernel is hanging on to and will
not let go of, so even kill -9 won't kill these.

It would have also be worthwhile to get the "process tree" of what
spawned the PID.  (Solaris has ptree; I think we have something similar
under FreeBSD but I forget what)  The reason that matters is that it's
probably a periodic job that runs (there are many which use find),
traversing your ZFS filesystems, and tickling a bug/issue somewhere.
You even hint at this in your next paragraph, re: locate.updatedb.

> They are not marked zombie, but I can't kill them, and restarting the jail they are in won't even get rid of them.  truss just hangs with no output on them.  On different occasions, I noticed pop3d processes for the same user getting stuck in this way.  On a hunch I ran a "find" through the files in the user's Maildir and got a panic.  I disabled this account and now the server is stable again.  At least until locate.updatedb walks through that directory, I suppose.   Evidentially, there is some kind of hole in the file system below that directory tree causing the panic.  

The fact that jails are involved complicates things even more.

truss and ktrace won't show anything going on because of what I said
above: the kernel bits associated with the process are hung or spinning,
not the actual syscall/userland bits.

Furthermore, truss on FreeBSD is basically worthless; use ktrace.

> I can move that directory out of the way, and carry on, but is there anything I can do to really *repair* the problem?

I would recommend starting with "zpool scrub" on the pool which is
associated with the Maildir/ directory of the account you disable.  I
will not be surprised if it comes back 100% clean.

Given what the backtrace looks like, I would say the Maildir/ has a ton
of files in it.  Is that the case?  Does "echo *" say something about
argument list too long?

You should also be aware that Maildir on ZFS performs horribly.  I've
experienced this, and there are old discussions about it as well.  Here
are some of my findings.

http://koitsu.wordpress.com/2009/06/01/freebsd-and-zfs-horrible-raidz1-read-speed/
http://koitsu.wordpress.com/2009/06/01/freebsd-and-zfs-horrible-raidz1-speed-part-2/
http://koitsu.wordpress.com/2009/10/29/unix-mail-format-annoyances/

The state of mail spools on UNIX is a complete disgrace, and everyone
involved in it should feel ashamed.  MIX is probably the best solution
to this problem, but it's not being adopted by all the major players,
which is very sad.  I realise that doesn't solve your problem, but my
strong recommendation is to use classic UNIX mail spools (one file for
many messages) when the filesystem is ZFS-based.

However, someone familiar with the ZFS internals, as I said, should
investigate the crash you're experiencing regardless.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |