From owner-freebsd-fs@FreeBSD.ORG  Sat Oct  9 11:12:44 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C2C871065672
	for <freebsd-fs@freebsd.org>; Sat,  9 Oct 2010 11:12:44 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta06.emeryville.ca.mail.comcast.net
	(qmta06.emeryville.ca.mail.comcast.net [76.96.30.56])
	by mx1.freebsd.org (Postfix) with ESMTP id A83B18FC18
	for <freebsd-fs@freebsd.org>; Sat,  9 Oct 2010 11:12:43 +0000 (UTC)
Received: from omta02.emeryville.ca.mail.comcast.net ([76.96.30.19])
	by qmta06.emeryville.ca.mail.comcast.net with comcast
	id GbCe1f0070QkzPwA6bCjiZ; Sat, 09 Oct 2010 11:12:43 +0000
Received: from koitsu.dyndns.org ([98.248.41.155])
	by omta02.emeryville.ca.mail.comcast.net with comcast
	id GbCi1f0013LrwQ28NbCi51; Sat, 09 Oct 2010 11:12:43 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 083C99B418; Sat,  9 Oct 2010 04:12:42 -0700 (PDT)
Date: Sat, 9 Oct 2010 04:12:42 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Kai Gallasch <gallasch@free.de>
Message-ID: <20101009111241.GA58948@icarus.home.lan>
References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: Locked up processes after upgrade to ZFS v15
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Oct 2010 11:12:44 -0000

On Wed, Oct 06, 2010 at 02:28:31PM +0200, Kai Gallasch wrote:
> Two days ago I upgraded my server to 8.1-STABLE (amd64) and upgraded ZFS from v14 to v15.
> After zpool & zfs upgrade the server was running stable for about half a day, but then apache processes running inside jails would lock up and could not be terminated any more.
> 
> In the end apache (both worker and prefork) itself locked up, because it lost control of its child processes.
> 
> - only webserver jails with a prefork or worker apache do lock up
> - non-apache processes in other jails do not show this problem
> - locked httpd processes will not terminate when rebooting.
> 
> in 'top' the stuck processes show up with state zfs or zfsmrb:
> 
>   PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
>  2341 root        1  44    0   112M 12760K select  3   0:04  0.00% httpd
>  2365 root        1  44    0 12056K  4312K select  0   0:00  0.00% sendmail
>  2376 root        1  48    0  7972K  1628K nanslp  4   0:00  0.00% cron
>  2214 root        1  44    0  6916K  1440K select  0   0:00  0.00% syslogd
> 24731 www         1  44    0   114M 13464K zfsmrb  6   0:00  0.00% httpd
> 12111 www         1  44    0   114M 13520K zfs     5   0:00  0.00% httpd
> 24729 www         1  44    0   114M 13408K zfsmrb  4   0:00  0.00% httpd
> 24728 www         1  47    0   114M 13404K zfsmrb  5   0:00  0.00% httpd
> 11051 www         1  44    0   114M 13456K zfs     1   0:00  0.00% httpd
> 26368 www         1  44    0   114M 13460K zfsmrb  6   0:00  0.00% httpd
> 24730 www         1  44    0   114M 13444K zfsmrb  5   0:00  0.00% httpd
> 88803 www         1  44    0   114M 13388K zfs     1   0:00  0.00% httpd
> 10887 www         1  44    0   114M 13436K zfs     6   0:00  0.00% httpd
> 16493 www         1  44    0   114M 13528K zfs     5   0:00  0.00% httpd
> 12461 www         1  44    0   114M 13340K zfs     1   0:00  0.00% httpd
> 89018 www         1  51    0   114M 13260K zfs     1   0:00  0.00% httpd
> 48699 www         1  52    0   114M 13308K zfs     3   0:00  0.00% httpd
> 31090 www         1  44    0   114M 13404K zfs     3   0:00  0.00% httpd
> 18094 www         1  44    0   114M 13312K zfs     2   0:00  0.00% httpd
> 69479 www         1  46    0   114M 13424K zfs     4   0:00  0.00% httpd
> 12890 www         1  44    0   114M 13336K zfs     5   0:00  0.00% httpd
> 67204 www         1  44    0   114M 13328K zfs     5   0:00  0.00% httpd
> 69402 www         1  60    0   114M 13432K zfs     4   0:00  0.00% httpd
> 91162 www         1  56    0   114M 13408K zfs     0   0:00  0.00% httpd
> 89781 www         1  45    0   114M 13428K zfs     4   0:00  0.00% httpd
> 48663 www         1  45    0   114M 13388K zfs     4   0:00  0.00% httpd
> 12112 www         1  44    0   114M 13340K zfs     6   0:00  0.00% httpd
> 91161 www         1  54    0   114M 13280K zfs     5   0:00  0.00% httpd
> 88839 www         1  44    0   114M 13592K zfsmrb  5   0:00  0.00% httpd
> 89144 www         1  58    0   114M 13304K zfs     0   0:00  0.00% httpd
> 78946 www         1  45    0   114M 13420K zfs     0   0:00  0.00% httpd
> 81984 www         1  44    0   114M 13396K zfs     5   0:00  0.00% httpd
> 93431 www         1  61    0   114M 13340K zfs     5   0:00  0.00% httpd
> 91179 www         1  76    0   114M 13360K zfs     4   0:00  0.00% httpd
> 69400 www         1  53    0   114M 13324K zfs     0   0:00  0.00% httpd
> 54211 www         1  45    0   114M 13404K zfs     6   0:00  0.00% httpd
> 36335 www         1  45    0   114M 13400K zfs     4   0:00  0.00% httpd
> 31093 www         1  44    0   114M 13348K zfs     2   0:00  0.00% httpd
> 
> I compiled a debug kernel with following options:
> 
> options         KDB                     # Enable kernel debugger support.
> options         DDB                     # Support DDB.
> options         GDB                     # Support remote GDB.
> options         INVARIANTS              # Enable calls of extra sanity checking
> options         INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
> options         WITNESS                 # Enable checks to detect deadlocks and cycles
> options         WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speed
> #
> options         SW_WATCHDOG
> options         DEBUG_LOCKS
> options         DEBUG_VFS_LOCKS
> 
> After process lockups only output on console was:
> witness_lock_list_get: witness exhausted
> 
> I also moved the jails with the stuck httpd processes to another server (also 8.1-STABLE, ZFS v15) - but the lockup also ouccured there.
> 
> How can I debug this and get further information? At the moment I am thinking about reverting from zfs to ufs - to save some nerves. Would be a big disappointment for me, after all the time and effort trying to use zfs in production.

We're seeing (what we believe) to be this exact situation.

We just migrated our main RELENG_7 system to RELENG_8 last night (full
OS reinstall, kernel/world built from sources dated October 8 @ 01:30
PDT).  The system is dedicated to httpd + postfix + named.  We do not
use jails.  Memory/RAM is ECC, and the system does have serial console.
SMP is in use (physical CPU is an Intel Core 2 Duo).

ZFS is in use on this system (a very simple configuration).  Filesystems
which use ZFS are mapped to /home and /var/mail (thus heavily accessed
given the hosts' role):

  pool: pool
 state: ONLINE
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        pool         ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            ada0s1g  ONLINE       0     0     0
            ada1     ONLINE       0     0     0

errors: No known data errors

httpd exclusively uses prefork and *was not* built with any threading
support.

On RELENG_7, the system used ZFS v14, had the same tunings, and had an
uptime of 221 days w/out issue.

With RELENG_8, the system lasted approximately 12 hours (about half a
day) before getting into a state that looks almost identical to Kai's
system: existing processes were stuck (unkillable, even with -9).  New
processes could be spawned (including ones which used the ZFS
filesystems), and commands executed successfully.

init complained about wedged processes when the system was rebooted:

Oct  9 02:00:56 init: some processes would not die; ps axl advised

No indication of any hardware issues on the console.

The administrator who was handling the issue did not use "ps -l", "top",
nor "procstat -k", so we don't have any indication of what the process
state was in, nor what the kernel calling stack looked like that lead up
to the wedging.  All he stated was that the processes were in D/I
states, which doesn't help since that's what they're in normally anyway.
If I was around I would have forced DDB and done "call doadump" to
investigate things post-mortem.

Monitoring graphs of the system during this time don't indicate any
signs of memory thrashing (though bsnmp-ucd doesn't provide as much
granularity as top does); the system looks normal except for a slightly
decreased load average (probably as a result of the deadlocked
processes).

/boot/loader.conf tunings on the system.  Please note some of these may
already be the defaults:

kern.maxdsiz="1536M"
kern.dfldsiz="1536M"
kern.maxssiz="256M"
vm.kmem_size="4096M"
vfs.zfs.arc_max="3584M"
vfs.zfs.prefetch_disable="1"
vfs.zfs.zio.use_uma="0"
vfs.zfs.txg.timeout="5"
vfs.zfs.txg.write_limit_override="268435456"

/etc/sysctl.conf tunings:

kern.ipc.shm_use_phys=1
net.inet.icmp.icmplim=1500
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=131072
vfs.ufs.dirhash_maxmem=16777216
kern.maxvnodes=200000

Aside from the top/procstat/kernel dump aspect, what other information
would kernel folks be interested in?  Is "call doadump" sufficient for
post-mortem investigation?  I need to know since if/when this happens
again (likely), I want to get folks as much information as possible.

Also, a question for Kai: what did you end up doing to resolve this
problem?  Did you roll back to an older FreeBSD, or...?

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |