From owner-freebsd-stable@FreeBSD.ORG Tue Jan 22 10:19:32 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 29B7FCDD for ; Tue, 22 Jan 2013 10:19:32 +0000 (UTC) (envelope-from gallasch@free.de) Received: from smtp.free.de (smtp.free.de [91.204.6.103]) by mx1.freebsd.org (Postfix) with ESMTP id 9CA1A76F for ; Tue, 22 Jan 2013 10:19:31 +0000 (UTC) Received: (qmail 41865 invoked from network); 22 Jan 2013 11:19:30 +0100 Received: from smtp.free.de (HELO orwell.free.de) (gallasch@free.de@[91.204.4.103]) (envelope-sender ) by smtp.free.de (qmail-ldap-1.03) with AES128-SHA encrypted SMTP for ; 22 Jan 2013 11:19:30 +0100 From: Kai Gallasch Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Tue, 22 Jan 2013 11:19:29 +0100 Subject: FreeBSD 9.1 - openldap slapd lockups, mutex problems To: freebsd-stable Message-Id: Mime-Version: 1.0 (Apple Message framework v1085) X-Mailer: Apple Mail (2.1085) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Jan 2013 10:19:32 -0000 Hi. (Im am sending this to the "stable" list, because it maybe kernel = related.. ) On 9.1-RELEASE I am witnessing lockups of the openldap slapd daemon. The slapd runs for some days and then hangs, consuming high amounts of = CPU. In this state slapd can only be restarted by SIGKILL. # procstat -kk 71195 PID TID COMM TDNAME KSTACK = =20 71195 149271 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = do_wait+0x678 __umtx_op_wait+0x68 amd64_syscall+0x546 Xfast_syscall+0xf7=20= 71195 194998 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _cv_wait_sig+0x12e = seltdwait+0x110 kern_select+0x6ef sys_select+0x5d amd64_syscall+0x546 = Xfast_syscall+0xf7=20 71195 195544 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 196183 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_timedwait_sig+0x19 _sleep+0x2d4 = userret+0x9e doreti_ast+0x1f=20 71195 197966 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 198446 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 198453 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 198563 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 199520 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 200038 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 200670 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 200674 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 200675 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 201179 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 201180 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 201181 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 201183 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7=20 71195 201189 slapd - mi_switch+0x186 = sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d = _do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 = amd64_syscall+0x546 Xfast_syscall+0xf7 When I try to stop slapd through the rc script I can see in the logs = that the process is waiting for a thread to terminate - indefinitely. Other multithreaded server processes running on the server without = problems (apache-worker, mysqld, bind, etc.) On UFS2 slapd runs fine, without showing the error. Things I have tried already to stop the lockups: - running openldap-server23, openldap24 both with different BDB backend = versions. - tuning the BDB Init File - reducing the threads used by slapd through slapd.conf What I didn't try until now: Mounting a zfs vdev into the jail, to have the BDB storing its data on = UFS. (don't like the idea) Environment: - freebsd 9.1-rel-amd64 multijail server with cpu resource limit = patch[1], which didn't make it into 9.1-rel=20 - filesystem: zfs-only, swap on zfs - active jail limits through rctl.conf (memory, maxprocs, open files) - a handfull of openldap-server jails that show the same slapd lockup = tendency. - slapd started through daemontools (supvervise) Some ideas: - openldap-server with BDB backend uses sparse files for storing the = data - on top of ZFS. Has anyone else running openldap-server on FreeBSD 9.1 inside a jail = seen similar problems? How can I debug this further? Any hints appreciated :-) Regards. [1] https://wiki.freebsd.org/JailResourceLimits=