Date: Tue, 22 Jan 2013 11:19:29 +0100 From: Kai Gallasch <gallasch@free.de> To: freebsd-stable <freebsd-stable@freebsd.org> Subject: FreeBSD 9.1 - openldap slapd lockups, mutex problems Message-ID: <D9280700-5105-4068-82E2-5E353C07EC2F@free.de>
next in thread | raw e-mail | index | archive | help
Hi. (Im am sending this to the "stable" list, because it maybe kernel = related.. ) On 9.1-RELEASE I am witnessing lockups of the openldap slapd daemon. The slapd runs for some days and then hangs, consuming high amounts of = CPU. In this state slapd can only be restarted by SIGKILL. # procstat -kk 71195 PID TID COMM TDNAME KSTACK = =20 71195 149271 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = do_wait+0x678 __umtx_op_wait+0x68 amd64_syscall+0x546 Xfast_syscall+0xf7=20= 71195 194998 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _cv_wait_sig+0x12e = seltdwait+0x110 kern_select+0x6ef sys_select+0x5d amd64_syscall+0x546 = Xfast_syscall+0xf7=20 71195 195544 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 196183 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_timedwait_sig+0x19 _sleep+0x2d4 = userret+0x9e doreti_ast+0x1f=20 71195 197966 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 198446 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 198453 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 198563 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 199520 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 200038 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 200670 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 200674 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 200675 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 201179 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 201180 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 201181 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 201183 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 201189 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7 When I try to stop slapd through the rc script I can see in the logs = that the process is waiting for a thread to terminate - indefinitely. Other multithreaded server processes running on the server without = problems (apache-worker, mysqld, bind, etc.) On UFS2 slapd runs fine, without showing the error. Things I have tried already to stop the lockups: - running openldap-server23, openldap24 both with different BDB backend = versions. - tuning the BDB Init File - reducing the threads used by slapd through slapd.conf What I didn't try until now: Mounting a zfs vdev into the jail, to have the BDB storing its data on = UFS. (don't like the idea) Environment: - freebsd 9.1-rel-amd64 multijail server with cpu resource limit = patch[1], which didn't make it into 9.1-rel=20 - filesystem: zfs-only, swap on zfs - active jail limits through rctl.conf (memory, maxprocs, open files) - a handfull of openldap-server jails that show the same slapd lockup = tendency. - slapd started through daemontools (supvervise) Some ideas: - openldap-server with BDB backend uses sparse files for storing the = data - on top of ZFS. Has anyone else running openldap-server on FreeBSD 9.1 inside a jail = seen similar problems? How can I debug this further? Any hints appreciated :-) Regards. [1] https://wiki.freebsd.org/JailResourceLimits=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D9280700-5105-4068-82E2-5E353C07EC2F>