Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Jan 2013 11:19:29 +0100
From:      Kai Gallasch <gallasch@free.de>
To:        freebsd-stable <freebsd-stable@freebsd.org>
Subject:   FreeBSD 9.1 - openldap slapd lockups, mutex problems
Message-ID:  <D9280700-5105-4068-82E2-5E353C07EC2F@free.de>

next in thread | raw e-mail | index | archive | help
Hi.

(Im am sending this to the "stable" list, because it maybe kernel =
related.. )

On 9.1-RELEASE I am witnessing lockups of the openldap slapd daemon.

The slapd runs for some days and then hangs, consuming high amounts of =
CPU.
In this state slapd can only be restarted by SIGKILL.

 # procstat -kk 71195
  PID    TID COMM             TDNAME           KSTACK                    =
  =20
71195 149271 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
do_wait+0x678 __umtx_op_wait+0x68 amd64_syscall+0x546 Xfast_syscall+0xf7=20=

71195 194998 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _cv_wait_sig+0x12e =
seltdwait+0x110 kern_select+0x6ef sys_select+0x5d amd64_syscall+0x546 =
Xfast_syscall+0xf7=20
71195 195544 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 196183 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_timedwait_sig+0x19 _sleep+0x2d4 =
userret+0x9e doreti_ast+0x1f=20
71195 197966 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 198446 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 198453 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 198563 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 199520 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 200038 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 200670 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 200674 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 200675 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 201179 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 201180 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 201181 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 201183 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 201189 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7

When I try to stop slapd through the rc script I can see in the logs =
that the process is waiting for a thread to terminate - indefinitely.
Other multithreaded server processes running on the server without =
problems (apache-worker, mysqld, bind, etc.)
On UFS2 slapd runs fine, without showing the error.


Things I have tried already to stop the lockups:

- running openldap-server23, openldap24 both with different BDB backend =
versions.
- tuning the BDB Init File
- reducing the threads used by slapd through slapd.conf

What I didn't try until now:

Mounting a zfs vdev into the jail, to have the BDB storing its data on =
UFS. (don't like the idea)


Environment:

- freebsd 9.1-rel-amd64 multijail server with cpu resource limit =
patch[1], which didn't make it into 9.1-rel=20
- filesystem: zfs-only, swap on zfs
- active jail limits through rctl.conf (memory, maxprocs, open files)
- a handfull of openldap-server jails that show the same slapd lockup =
tendency.
- slapd started through daemontools (supvervise)

Some ideas:
- openldap-server with BDB backend uses sparse files for storing the =
data - on top of ZFS.

Has anyone else running openldap-server on FreeBSD 9.1 inside a jail =
seen similar problems?
How can I debug this further?

Any hints appreciated :-)

Regards.


[1] https://wiki.freebsd.org/JailResourceLimits=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D9280700-5105-4068-82E2-5E353C07EC2F>