From owner-freebsd-fs@FreeBSD.ORG Sat Oct 9 11:12:44 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C2C871065672 for ; Sat, 9 Oct 2010 11:12:44 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta06.emeryville.ca.mail.comcast.net (qmta06.emeryville.ca.mail.comcast.net [76.96.30.56]) by mx1.freebsd.org (Postfix) with ESMTP id A83B18FC18 for ; Sat, 9 Oct 2010 11:12:43 +0000 (UTC) Received: from omta02.emeryville.ca.mail.comcast.net ([76.96.30.19]) by qmta06.emeryville.ca.mail.comcast.net with comcast id GbCe1f0070QkzPwA6bCjiZ; Sat, 09 Oct 2010 11:12:43 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta02.emeryville.ca.mail.comcast.net with comcast id GbCi1f0013LrwQ28NbCi51; Sat, 09 Oct 2010 11:12:43 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 083C99B418; Sat, 9 Oct 2010 04:12:42 -0700 (PDT) Date: Sat, 9 Oct 2010 04:12:42 -0700 From: Jeremy Chadwick To: Kai Gallasch Message-ID: <20101009111241.GA58948@icarus.home.lan> References: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Oct 2010 11:12:44 -0000 On Wed, Oct 06, 2010 at 02:28:31PM +0200, Kai Gallasch wrote: > Two days ago I upgraded my server to 8.1-STABLE (amd64) and upgraded ZFS from v14 to v15. > After zpool & zfs upgrade the server was running stable for about half a day, but then apache processes running inside jails would lock up and could not be terminated any more. > > In the end apache (both worker and prefork) itself locked up, because it lost control of its child processes. > > - only webserver jails with a prefork or worker apache do lock up > - non-apache processes in other jails do not show this problem > - locked httpd processes will not terminate when rebooting. > > in 'top' the stuck processes show up with state zfs or zfsmrb: > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 2341 root 1 44 0 112M 12760K select 3 0:04 0.00% httpd > 2365 root 1 44 0 12056K 4312K select 0 0:00 0.00% sendmail > 2376 root 1 48 0 7972K 1628K nanslp 4 0:00 0.00% cron > 2214 root 1 44 0 6916K 1440K select 0 0:00 0.00% syslogd > 24731 www 1 44 0 114M 13464K zfsmrb 6 0:00 0.00% httpd > 12111 www 1 44 0 114M 13520K zfs 5 0:00 0.00% httpd > 24729 www 1 44 0 114M 13408K zfsmrb 4 0:00 0.00% httpd > 24728 www 1 47 0 114M 13404K zfsmrb 5 0:00 0.00% httpd > 11051 www 1 44 0 114M 13456K zfs 1 0:00 0.00% httpd > 26368 www 1 44 0 114M 13460K zfsmrb 6 0:00 0.00% httpd > 24730 www 1 44 0 114M 13444K zfsmrb 5 0:00 0.00% httpd > 88803 www 1 44 0 114M 13388K zfs 1 0:00 0.00% httpd > 10887 www 1 44 0 114M 13436K zfs 6 0:00 0.00% httpd > 16493 www 1 44 0 114M 13528K zfs 5 0:00 0.00% httpd > 12461 www 1 44 0 114M 13340K zfs 1 0:00 0.00% httpd > 89018 www 1 51 0 114M 13260K zfs 1 0:00 0.00% httpd > 48699 www 1 52 0 114M 13308K zfs 3 0:00 0.00% httpd > 31090 www 1 44 0 114M 13404K zfs 3 0:00 0.00% httpd > 18094 www 1 44 0 114M 13312K zfs 2 0:00 0.00% httpd > 69479 www 1 46 0 114M 13424K zfs 4 0:00 0.00% httpd > 12890 www 1 44 0 114M 13336K zfs 5 0:00 0.00% httpd > 67204 www 1 44 0 114M 13328K zfs 5 0:00 0.00% httpd > 69402 www 1 60 0 114M 13432K zfs 4 0:00 0.00% httpd > 91162 www 1 56 0 114M 13408K zfs 0 0:00 0.00% httpd > 89781 www 1 45 0 114M 13428K zfs 4 0:00 0.00% httpd > 48663 www 1 45 0 114M 13388K zfs 4 0:00 0.00% httpd > 12112 www 1 44 0 114M 13340K zfs 6 0:00 0.00% httpd > 91161 www 1 54 0 114M 13280K zfs 5 0:00 0.00% httpd > 88839 www 1 44 0 114M 13592K zfsmrb 5 0:00 0.00% httpd > 89144 www 1 58 0 114M 13304K zfs 0 0:00 0.00% httpd > 78946 www 1 45 0 114M 13420K zfs 0 0:00 0.00% httpd > 81984 www 1 44 0 114M 13396K zfs 5 0:00 0.00% httpd > 93431 www 1 61 0 114M 13340K zfs 5 0:00 0.00% httpd > 91179 www 1 76 0 114M 13360K zfs 4 0:00 0.00% httpd > 69400 www 1 53 0 114M 13324K zfs 0 0:00 0.00% httpd > 54211 www 1 45 0 114M 13404K zfs 6 0:00 0.00% httpd > 36335 www 1 45 0 114M 13400K zfs 4 0:00 0.00% httpd > 31093 www 1 44 0 114M 13348K zfs 2 0:00 0.00% httpd > > I compiled a debug kernel with following options: > > options KDB # Enable kernel debugger support. > options DDB # Support DDB. > options GDB # Support remote GDB. > options INVARIANTS # Enable calls of extra sanity checking > options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS > options WITNESS # Enable checks to detect deadlocks and cycles > options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed > # > options SW_WATCHDOG > options DEBUG_LOCKS > options DEBUG_VFS_LOCKS > > After process lockups only output on console was: > witness_lock_list_get: witness exhausted > > I also moved the jails with the stuck httpd processes to another server (also 8.1-STABLE, ZFS v15) - but the lockup also ouccured there. > > How can I debug this and get further information? At the moment I am thinking about reverting from zfs to ufs - to save some nerves. Would be a big disappointment for me, after all the time and effort trying to use zfs in production. We're seeing (what we believe) to be this exact situation. We just migrated our main RELENG_7 system to RELENG_8 last night (full OS reinstall, kernel/world built from sources dated October 8 @ 01:30 PDT). The system is dedicated to httpd + postfix + named. We do not use jails. Memory/RAM is ECC, and the system does have serial console. SMP is in use (physical CPU is an Intel Core 2 Duo). ZFS is in use on this system (a very simple configuration). Filesystems which use ZFS are mapped to /home and /var/mail (thus heavily accessed given the hosts' role): pool: pool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 0 mirror ONLINE 0 0 0 ada0s1g ONLINE 0 0 0 ada1 ONLINE 0 0 0 errors: No known data errors httpd exclusively uses prefork and *was not* built with any threading support. On RELENG_7, the system used ZFS v14, had the same tunings, and had an uptime of 221 days w/out issue. With RELENG_8, the system lasted approximately 12 hours (about half a day) before getting into a state that looks almost identical to Kai's system: existing processes were stuck (unkillable, even with -9). New processes could be spawned (including ones which used the ZFS filesystems), and commands executed successfully. init complained about wedged processes when the system was rebooted: Oct 9 02:00:56 init: some processes would not die; ps axl advised No indication of any hardware issues on the console. The administrator who was handling the issue did not use "ps -l", "top", nor "procstat -k", so we don't have any indication of what the process state was in, nor what the kernel calling stack looked like that lead up to the wedging. All he stated was that the processes were in D/I states, which doesn't help since that's what they're in normally anyway. If I was around I would have forced DDB and done "call doadump" to investigate things post-mortem. Monitoring graphs of the system during this time don't indicate any signs of memory thrashing (though bsnmp-ucd doesn't provide as much granularity as top does); the system looks normal except for a slightly decreased load average (probably as a result of the deadlocked processes). /boot/loader.conf tunings on the system. Please note some of these may already be the defaults: kern.maxdsiz="1536M" kern.dfldsiz="1536M" kern.maxssiz="256M" vm.kmem_size="4096M" vfs.zfs.arc_max="3584M" vfs.zfs.prefetch_disable="1" vfs.zfs.zio.use_uma="0" vfs.zfs.txg.timeout="5" vfs.zfs.txg.write_limit_override="268435456" /etc/sysctl.conf tunings: kern.ipc.shm_use_phys=1 net.inet.icmp.icmplim=1500 net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.sendspace=65536 net.inet.tcp.recvspace=131072 vfs.ufs.dirhash_maxmem=16777216 kern.maxvnodes=200000 Aside from the top/procstat/kernel dump aspect, what other information would kernel folks be interested in? Is "call doadump" sufficient for post-mortem investigation? I need to know since if/when this happens again (likely), I want to get folks as much information as possible. Also, a question for Kai: what did you end up doing to resolve this problem? Did you roll back to an older FreeBSD, or...? -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |