From owner-freebsd-fs@FreeBSD.ORG Wed Oct 6 12:55:15 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BD569106566B for ; Wed, 6 Oct 2010 12:55:15 +0000 (UTC) (envelope-from gallasch@free.de) Received: from smtp.free.de (smtp.free.de [91.204.6.103]) by mx1.freebsd.org (Postfix) with ESMTP id EF6578FC17 for ; Wed, 6 Oct 2010 12:55:14 +0000 (UTC) Received: (qmail 22364 invoked from network); 6 Oct 2010 14:28:32 +0200 Received: from smtp.free.de (HELO orwell.free.de) ([91.204.4.103]) (envelope-sender ) by smtp.free.de (qmail-ldap-1.03) with AES128-SHA encrypted SMTP for ; 6 Oct 2010 14:28:32 +0200 From: Kai Gallasch Content-Type: text/plain; charset=us-ascii Message-Id: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de> Date: Wed, 6 Oct 2010 14:28:31 +0200 To: freebsd-fs@freebsd.org Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v1081) X-Mailer: Apple Mail (2.1081) Subject: Locked up processes after upgrade to ZFS v15 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Oct 2010 12:55:15 -0000 Hi. Two days ago I upgraded my server to 8.1-STABLE (amd64) and upgraded ZFS = from v14 to v15. After zpool & zfs upgrade the server was running stable for about half a = day, but then apache processes running inside jails would lock up and = could not be terminated any more. In the end apache (both worker and prefork) itself locked up, because it = lost control of its child processes. - only webserver jails with a prefork or worker apache do lock up - non-apache processes in other jails do not show this problem - locked httpd processes will not terminate when rebooting. in 'top' the stuck processes show up with state zfs or zfsmrb: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU = COMMAND 2341 root 1 44 0 112M 12760K select 3 0:04 0.00% httpd 2365 root 1 44 0 12056K 4312K select 0 0:00 0.00% = sendmail 2376 root 1 48 0 7972K 1628K nanslp 4 0:00 0.00% cron 2214 root 1 44 0 6916K 1440K select 0 0:00 0.00% = syslogd 24731 www 1 44 0 114M 13464K zfsmrb 6 0:00 0.00% httpd 12111 www 1 44 0 114M 13520K zfs 5 0:00 0.00% httpd 24729 www 1 44 0 114M 13408K zfsmrb 4 0:00 0.00% httpd 24728 www 1 47 0 114M 13404K zfsmrb 5 0:00 0.00% httpd 11051 www 1 44 0 114M 13456K zfs 1 0:00 0.00% httpd 26368 www 1 44 0 114M 13460K zfsmrb 6 0:00 0.00% httpd 24730 www 1 44 0 114M 13444K zfsmrb 5 0:00 0.00% httpd 88803 www 1 44 0 114M 13388K zfs 1 0:00 0.00% httpd 10887 www 1 44 0 114M 13436K zfs 6 0:00 0.00% httpd 16493 www 1 44 0 114M 13528K zfs 5 0:00 0.00% httpd 12461 www 1 44 0 114M 13340K zfs 1 0:00 0.00% httpd 89018 www 1 51 0 114M 13260K zfs 1 0:00 0.00% httpd 48699 www 1 52 0 114M 13308K zfs 3 0:00 0.00% httpd 31090 www 1 44 0 114M 13404K zfs 3 0:00 0.00% httpd 18094 www 1 44 0 114M 13312K zfs 2 0:00 0.00% httpd 69479 www 1 46 0 114M 13424K zfs 4 0:00 0.00% httpd 12890 www 1 44 0 114M 13336K zfs 5 0:00 0.00% httpd 67204 www 1 44 0 114M 13328K zfs 5 0:00 0.00% httpd 69402 www 1 60 0 114M 13432K zfs 4 0:00 0.00% httpd 91162 www 1 56 0 114M 13408K zfs 0 0:00 0.00% httpd 89781 www 1 45 0 114M 13428K zfs 4 0:00 0.00% httpd 48663 www 1 45 0 114M 13388K zfs 4 0:00 0.00% httpd 12112 www 1 44 0 114M 13340K zfs 6 0:00 0.00% httpd 91161 www 1 54 0 114M 13280K zfs 5 0:00 0.00% httpd 88839 www 1 44 0 114M 13592K zfsmrb 5 0:00 0.00% httpd 89144 www 1 58 0 114M 13304K zfs 0 0:00 0.00% httpd 78946 www 1 45 0 114M 13420K zfs 0 0:00 0.00% httpd 81984 www 1 44 0 114M 13396K zfs 5 0:00 0.00% httpd 93431 www 1 61 0 114M 13340K zfs 5 0:00 0.00% httpd 91179 www 1 76 0 114M 13360K zfs 4 0:00 0.00% httpd 69400 www 1 53 0 114M 13324K zfs 0 0:00 0.00% httpd 54211 www 1 45 0 114M 13404K zfs 6 0:00 0.00% httpd 36335 www 1 45 0 114M 13400K zfs 4 0:00 0.00% httpd 31093 www 1 44 0 114M 13348K zfs 2 0:00 0.00% httpd I compiled a debug kernel with following options: options KDB # Enable kernel debugger = support. options DDB # Support DDB. options GDB # Support remote GDB. options INVARIANTS # Enable calls of extra sanity = checking options INVARIANT_SUPPORT # Extra sanity checks of = internal structures, required by INVARIANTS options WITNESS # Enable checks to detect = deadlocks and cycles options WITNESS_SKIPSPIN # Don't run witness on spinlocks = for speed # options SW_WATCHDOG options DEBUG_LOCKS options DEBUG_VFS_LOCKS After process lockups only output on console was: witness_lock_list_get: witness exhausted I also moved the jails with the stuck httpd processes to another server = (also 8.1-STABLE, ZFS v15) - but the lockup also ouccured there. How can I debug this and get further information? At the moment I am = thinking about reverting from zfs to ufs - to save some nerves. Would be = a big disappointment for me, after all the time and effort trying to use = zfs in production. Regards, Kai.