Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Oct 2010 14:28:31 +0200
From:      Kai Gallasch <gallasch@free.de>
To:        freebsd-fs@freebsd.org
Subject:   Locked up processes after upgrade to ZFS v15
Message-ID:  <39F05641-4E46-4BE0-81CA-4DEB175A5FBE@free.de>

next in thread | raw e-mail | index | archive | help
Hi.

Two days ago I upgraded my server to 8.1-STABLE (amd64) and upgraded ZFS =
from v14 to v15.
After zpool & zfs upgrade the server was running stable for about half a =
day, but then apache processes running inside jails would lock up and =
could not be terminated any more.

In the end apache (both worker and prefork) itself locked up, because it =
lost control of its child processes.

- only webserver jails with a prefork or worker apache do lock up
- non-apache processes in other jails do not show this problem
- locked httpd processes will not terminate when rebooting.

in 'top' the stuck processes show up with state zfs or zfsmrb:

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU =
COMMAND
 2341 root        1  44    0   112M 12760K select  3   0:04  0.00% httpd
 2365 root        1  44    0 12056K  4312K select  0   0:00  0.00% =
sendmail
 2376 root        1  48    0  7972K  1628K nanslp  4   0:00  0.00% cron
 2214 root        1  44    0  6916K  1440K select  0   0:00  0.00% =
syslogd
24731 www         1  44    0   114M 13464K zfsmrb  6   0:00  0.00% httpd
12111 www         1  44    0   114M 13520K zfs     5   0:00  0.00% httpd
24729 www         1  44    0   114M 13408K zfsmrb  4   0:00  0.00% httpd
24728 www         1  47    0   114M 13404K zfsmrb  5   0:00  0.00% httpd
11051 www         1  44    0   114M 13456K zfs     1   0:00  0.00% httpd
26368 www         1  44    0   114M 13460K zfsmrb  6   0:00  0.00% httpd
24730 www         1  44    0   114M 13444K zfsmrb  5   0:00  0.00% httpd
88803 www         1  44    0   114M 13388K zfs     1   0:00  0.00% httpd
10887 www         1  44    0   114M 13436K zfs     6   0:00  0.00% httpd
16493 www         1  44    0   114M 13528K zfs     5   0:00  0.00% httpd
12461 www         1  44    0   114M 13340K zfs     1   0:00  0.00% httpd
89018 www         1  51    0   114M 13260K zfs     1   0:00  0.00% httpd
48699 www         1  52    0   114M 13308K zfs     3   0:00  0.00% httpd
31090 www         1  44    0   114M 13404K zfs     3   0:00  0.00% httpd
18094 www         1  44    0   114M 13312K zfs     2   0:00  0.00% httpd
69479 www         1  46    0   114M 13424K zfs     4   0:00  0.00% httpd
12890 www         1  44    0   114M 13336K zfs     5   0:00  0.00% httpd
67204 www         1  44    0   114M 13328K zfs     5   0:00  0.00% httpd
69402 www         1  60    0   114M 13432K zfs     4   0:00  0.00% httpd
91162 www         1  56    0   114M 13408K zfs     0   0:00  0.00% httpd
89781 www         1  45    0   114M 13428K zfs     4   0:00  0.00% httpd
48663 www         1  45    0   114M 13388K zfs     4   0:00  0.00% httpd
12112 www         1  44    0   114M 13340K zfs     6   0:00  0.00% httpd
91161 www         1  54    0   114M 13280K zfs     5   0:00  0.00% httpd
88839 www         1  44    0   114M 13592K zfsmrb  5   0:00  0.00% httpd
89144 www         1  58    0   114M 13304K zfs     0   0:00  0.00% httpd
78946 www         1  45    0   114M 13420K zfs     0   0:00  0.00% httpd
81984 www         1  44    0   114M 13396K zfs     5   0:00  0.00% httpd
93431 www         1  61    0   114M 13340K zfs     5   0:00  0.00% httpd
91179 www         1  76    0   114M 13360K zfs     4   0:00  0.00% httpd
69400 www         1  53    0   114M 13324K zfs     0   0:00  0.00% httpd
54211 www         1  45    0   114M 13404K zfs     6   0:00  0.00% httpd
36335 www         1  45    0   114M 13400K zfs     4   0:00  0.00% httpd
31093 www         1  44    0   114M 13348K zfs     2   0:00  0.00% httpd

I compiled a debug kernel with following options:

options         KDB                     # Enable kernel debugger =
support.
options         DDB                     # Support DDB.
options         GDB                     # Support remote GDB.
options         INVARIANTS              # Enable calls of extra sanity =
checking
options         INVARIANT_SUPPORT       # Extra sanity checks of =
internal structures, required by INVARIANTS
options         WITNESS                 # Enable checks to detect =
deadlocks and cycles
options         WITNESS_SKIPSPIN        # Don't run witness on spinlocks =
for speed
#
options         SW_WATCHDOG
options         DEBUG_LOCKS
options         DEBUG_VFS_LOCKS

After process lockups only output on console was:
witness_lock_list_get: witness exhausted

I also moved the jails with the stuck httpd processes to another server =
(also 8.1-STABLE, ZFS v15) - but the lockup also ouccured there.

How can I debug this and get further information? At the moment I am =
thinking about reverting from zfs to ufs - to save some nerves. Would be =
a big disappointment for me, after all the time and effort trying to use =
zfs in production.

Regards,
Kai.









Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?39F05641-4E46-4BE0-81CA-4DEB175A5FBE>