From owner-freebsd-stable@FreeBSD.ORG  Tue Jan 22 10:19:32 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 29B7FCDD
 for <freebsd-stable@freebsd.org>; Tue, 22 Jan 2013 10:19:32 +0000 (UTC)
 (envelope-from gallasch@free.de)
Received: from smtp.free.de (smtp.free.de [91.204.6.103])
 by mx1.freebsd.org (Postfix) with ESMTP id 9CA1A76F
 for <freebsd-stable@freebsd.org>; Tue, 22 Jan 2013 10:19:31 +0000 (UTC)
Received: (qmail 41865 invoked from network); 22 Jan 2013 11:19:30 +0100
Received: from smtp.free.de (HELO orwell.free.de)
 (gallasch@free.de@[91.204.4.103])
 (envelope-sender <gallasch@free.de>)
 by smtp.free.de (qmail-ldap-1.03) with AES128-SHA encrypted SMTP
 for <freebsd-stable@freebsd.org>; 22 Jan 2013 11:19:30 +0100
From: Kai Gallasch <gallasch@free.de>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Date: Tue, 22 Jan 2013 11:19:29 +0100
Subject: FreeBSD 9.1 - openldap slapd lockups, mutex problems
To: freebsd-stable <freebsd-stable@freebsd.org>
Message-Id: <D9280700-5105-4068-82E2-5E353C07EC2F@free.de>
Mime-Version: 1.0 (Apple Message framework v1085)
X-Mailer: Apple Mail (2.1085)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Jan 2013 10:19:32 -0000

Hi.

(Im am sending this to the "stable" list, because it maybe kernel =
related.. )

On 9.1-RELEASE I am witnessing lockups of the openldap slapd daemon.

The slapd runs for some days and then hangs, consuming high amounts of =
CPU.
In this state slapd can only be restarted by SIGKILL.

 # procstat -kk 71195
  PID    TID COMM             TDNAME           KSTACK                    =
  =20
71195 149271 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
do_wait+0x678 __umtx_op_wait+0x68 amd64_syscall+0x546 Xfast_syscall+0xf7=20=

71195 194998 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _cv_wait_sig+0x12e =
seltdwait+0x110 kern_select+0x6ef sys_select+0x5d amd64_syscall+0x546 =
Xfast_syscall+0xf7=20
71195 195544 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 196183 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_timedwait_sig+0x19 _sleep+0x2d4 =
userret+0x9e doreti_ast+0x1f=20
71195 197966 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 198446 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 198453 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 198563 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 199520 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 200038 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 200670 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 200674 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 200675 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 201179 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 201180 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 201181 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 201183 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7=20
71195 201189 slapd            -                mi_switch+0x186 =
sleepq_catch_signals+0x2cc sleepq_wait_sig+0x16 _sleep+0x29d =
_do_lock_umutex+0x5e8 do_lock_umutex+0x17c __umtx_op_wait_umutex+0x63 =
amd64_syscall+0x546 Xfast_syscall+0xf7

When I try to stop slapd through the rc script I can see in the logs =
that the process is waiting for a thread to terminate - indefinitely.
Other multithreaded server processes running on the server without =
problems (apache-worker, mysqld, bind, etc.)
On UFS2 slapd runs fine, without showing the error.


Things I have tried already to stop the lockups:

- running openldap-server23, openldap24 both with different BDB backend =
versions.
- tuning the BDB Init File
- reducing the threads used by slapd through slapd.conf

What I didn't try until now:

Mounting a zfs vdev into the jail, to have the BDB storing its data on =
UFS. (don't like the idea)


Environment:

- freebsd 9.1-rel-amd64 multijail server with cpu resource limit =
patch[1], which didn't make it into 9.1-rel=20
- filesystem: zfs-only, swap on zfs
- active jail limits through rctl.conf (memory, maxprocs, open files)
- a handfull of openldap-server jails that show the same slapd lockup =
tendency.
- slapd started through daemontools (supvervise)

Some ideas:
- openldap-server with BDB backend uses sparse files for storing the =
data - on top of ZFS.

Has anyone else running openldap-server on FreeBSD 9.1 inside a jail =
seen similar problems?
How can I debug this further?

Any hints appreciated :-)

Regards.


[1] https://wiki.freebsd.org/JailResourceLimits=