Date: Wed, 6 Oct 2010 14:28:31 +0200 From: Kai Gallasch <gallasch@free.de> To: freebsd-fs@freebsd.org Subject: Locked up processes after upgrade to ZFS v15 Message-ID: <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de>
index | next in thread | raw e-mail
Hi. Two days ago I upgraded my server to 8.1-STABLE (amd64) and upgraded ZFS from v14 to v15. After zpool & zfs upgrade the server was running stable for about half a day, but then apache processes running inside jails would lock up and could not be terminated any more. In the end apache (both worker and prefork) itself locked up, because it lost control of its child processes. - only webserver jails with a prefork or worker apache do lock up - non-apache processes in other jails do not show this problem - locked httpd processes will not terminate when rebooting. in 'top' the stuck processes show up with state zfs or zfsmrb: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 2341 root 1 44 0 112M 12760K select 3 0:04 0.00% httpd 2365 root 1 44 0 12056K 4312K select 0 0:00 0.00% sendmail 2376 root 1 48 0 7972K 1628K nanslp 4 0:00 0.00% cron 2214 root 1 44 0 6916K 1440K select 0 0:00 0.00% syslogd 24731 www 1 44 0 114M 13464K zfsmrb 6 0:00 0.00% httpd 12111 www 1 44 0 114M 13520K zfs 5 0:00 0.00% httpd 24729 www 1 44 0 114M 13408K zfsmrb 4 0:00 0.00% httpd 24728 www 1 47 0 114M 13404K zfsmrb 5 0:00 0.00% httpd 11051 www 1 44 0 114M 13456K zfs 1 0:00 0.00% httpd 26368 www 1 44 0 114M 13460K zfsmrb 6 0:00 0.00% httpd 24730 www 1 44 0 114M 13444K zfsmrb 5 0:00 0.00% httpd 88803 www 1 44 0 114M 13388K zfs 1 0:00 0.00% httpd 10887 www 1 44 0 114M 13436K zfs 6 0:00 0.00% httpd 16493 www 1 44 0 114M 13528K zfs 5 0:00 0.00% httpd 12461 www 1 44 0 114M 13340K zfs 1 0:00 0.00% httpd 89018 www 1 51 0 114M 13260K zfs 1 0:00 0.00% httpd 48699 www 1 52 0 114M 13308K zfs 3 0:00 0.00% httpd 31090 www 1 44 0 114M 13404K zfs 3 0:00 0.00% httpd 18094 www 1 44 0 114M 13312K zfs 2 0:00 0.00% httpd 69479 www 1 46 0 114M 13424K zfs 4 0:00 0.00% httpd 12890 www 1 44 0 114M 13336K zfs 5 0:00 0.00% httpd 67204 www 1 44 0 114M 13328K zfs 5 0:00 0.00% httpd 69402 www 1 60 0 114M 13432K zfs 4 0:00 0.00% httpd 91162 www 1 56 0 114M 13408K zfs 0 0:00 0.00% httpd 89781 www 1 45 0 114M 13428K zfs 4 0:00 0.00% httpd 48663 www 1 45 0 114M 13388K zfs 4 0:00 0.00% httpd 12112 www 1 44 0 114M 13340K zfs 6 0:00 0.00% httpd 91161 www 1 54 0 114M 13280K zfs 5 0:00 0.00% httpd 88839 www 1 44 0 114M 13592K zfsmrb 5 0:00 0.00% httpd 89144 www 1 58 0 114M 13304K zfs 0 0:00 0.00% httpd 78946 www 1 45 0 114M 13420K zfs 0 0:00 0.00% httpd 81984 www 1 44 0 114M 13396K zfs 5 0:00 0.00% httpd 93431 www 1 61 0 114M 13340K zfs 5 0:00 0.00% httpd 91179 www 1 76 0 114M 13360K zfs 4 0:00 0.00% httpd 69400 www 1 53 0 114M 13324K zfs 0 0:00 0.00% httpd 54211 www 1 45 0 114M 13404K zfs 6 0:00 0.00% httpd 36335 www 1 45 0 114M 13400K zfs 4 0:00 0.00% httpd 31093 www 1 44 0 114M 13348K zfs 2 0:00 0.00% httpd I compiled a debug kernel with following options: options KDB # Enable kernel debugger support. options DDB # Support DDB. options GDB # Support remote GDB. options INVARIANTS # Enable calls of extra sanity checking options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS options WITNESS # Enable checks to detect deadlocks and cycles options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed # options SW_WATCHDOG options DEBUG_LOCKS options DEBUG_VFS_LOCKS After process lockups only output on console was: witness_lock_list_get: witness exhausted I also moved the jails with the stuck httpd processes to another server (also 8.1-STABLE, ZFS v15) - but the lockup also ouccured there. How can I debug this and get further information? At the moment I am thinking about reverting from zfs to ufs - to save some nerves. Would be a big disappointment for me, after all the time and effort trying to use zfs in production. Regards, Kai.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?39F05641-4E46-4BE0-81CA-4DEB175A5FBE>
