From owner-freebsd-stable@FreeBSD.ORG Tue Feb 23 01:35:28 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E9258106566C for ; Tue, 23 Feb 2010 01:35:28 +0000 (UTC) (envelope-from john.marshall@riverwillow.com.au) Received: from mail1.riverwillow.net.au (mail1.riverwillow.net.au [203.58.93.36]) by mx1.freebsd.org (Postfix) with ESMTP id 819928FC0A for ; Tue, 23 Feb 2010 01:35:28 +0000 (UTC) Received: from rwpc12.mby.riverwillow.net.au (rwpc12.mby.riverwillow.net.au [172.25.24.168]) (authenticated bits=0) by mail1.riverwillow.net.au (8.14.4/8.14.4) with ESMTP id o1N1ZMJZ020186 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 23 Feb 2010 12:35:23 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=riverwillow.com.au; s=m1001; t=1266888923; bh=o0z5T8U+sqEtLu6PFtj2NSoMXP86QfuZU1alkTNDBrM=; h=Date:From:To:Subject:Message-ID:Mime-Version:Content-Type; b=sIhC8Vp0zXMYypH0dLVy1ufcJoTzkPbWoD4weUbF2tnM0vhypyp3PwZgepjncOpsm 4z269bXoVoGJTC1qa4z7AYj68fQ2MLQjhMMRFccbBXOp7YJz1vBnxMXnNOXQG24uNS cEJP3LLEaH6lG/PNFkDj7eltmdEguNsidUkJCu9w= Received: from rwpc12.mby.riverwillow.net.au (localhost [127.0.0.1]) by rwpc12.mby.riverwillow.net.au (8.14.3/8.14.3) with ESMTP id o1N1ZMVI008119 for ; Tue, 23 Feb 2010 12:35:22 +1100 (AEDT) (envelope-from john.marshall@riverwillow.com.au) Received: (from john@localhost) by rwpc12.mby.riverwillow.net.au (8.14.3/8.14.3/Submit) id o1N1ZMiB008118 for freebsd-stable@freebsd.org; Tue, 23 Feb 2010 12:35:22 +1100 (AEDT) (envelope-from john) Date: Tue, 23 Feb 2010 12:35:22 +1100 From: John Marshall To: freebsd-stable@freebsd.org Message-ID: <20100223013522.GE2303@rwpc12.mby.riverwillow.net.au> Mail-Followup-To: freebsd-stable@freebsd.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="vkogqOf2sHV7VnPd" Content-Disposition: inline User-Agent: Mutt/1.4.2.3i OpenPGP: id=A29A84A2; url=http://pki.riverwillow.net.au/pgp/johnmarshall.asc Subject: sleep(3) sometimes too sleepy on FreeBSD 8.0? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Feb 2010 01:35:29 -0000 --vkogqOf2sHV7VnPd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Environment: sendmail 8.14.4 on FreeBSD 8.0-RELEASE-p2 Since upgrading a few local servers to FreeBSD 8.0-RELEASE (and subsequently 8.0-RELEASE-p2), I have been seeing VERY intermittent problems with sendmail persistent queue runners. One or more queue runners will fail to wake up (having been told to sleep for either 1 or 5 seconds) and mail accumulates in their queue group queues. I have only seen this about 4 times but at least once on each of the three 8.0 servers. I've been seeing something like one occurrence per fortnight overall. The first few times I re-started sendmail. On Saturday I spent longer looking at it. - attached to each of the stuck queue runner processes via gdb to try to see where they were stuck - backtraces from both process were identical and looked sane - attached to a happy queue runner process and got an identical backtrace - exited gdb and discovered that the stuck queue runners had woken up and flushed their queues! The stuck queue runner processes had been stuck for several hours (judging by the timestamps on the queued mail messages) but the gdb attach apparently woke them up! PROCESS STATES BEFORE DEBUG (stuck runners are in 'I' state) PID TT STAT TIME COMMAND 80298 ?? Ss 0:17.68 sendmail: accepting connections (sendmail) 80299 ?? I 0:46.62 sendmail: running queue: /var/spool/mqueue/qd1/df= (sendmail) 80300 ?? I 0:08.83 sendmail: running queue: /var/spool/mqueue/mby/df= (sendmail) 80301 ?? S 0:31.58 sendmail: running queue: /var/spool/mqueue/oz/df = (sendmail) 80302 ?? S 0:30.71 sendmail: running queue: /var/spool/mqueue/rw2/df= (sendmail) 80303 ?? S 0:33.29 sendmail: running queue: /var/spool/mqueue/hold/d= f (sendmail) 80304 ?? S 0:30.55 sendmail: running queue: /var/spool/mqueue/pgp/df= (sendmail) BACKTRACE OF STUCK PROCESS 80299 (gdb) bt #0 0x28346547 in sigsuspend () from /lib/libc.so.7 #1 0x28344e98 in sigpause () from /lib/libc.so.7 #2 0x2833be3e in pause () from /lib/libc.so.7 #3 0x080cc7c8 in sleep () #4 0x08099c51 in run_work_group () #5 0x08099ebf in runqueue () #6 0x0805538d in main () BACKTRACE OF HAPPY PROCESS 80301 (gdb) bt #0 0x28346547 in sigsuspend () from /lib/libc.so.7 #1 0x28344e98 in sigpause () from /lib/libc.so.7 #2 0x2833be3e in pause () from /lib/libc.so.7 #3 0x080cc7c8 in sleep () #4 0x08099c51 in run_work_group () #5 0x08099ebf in runqueue () #6 0x0805538d in main () PROCESS STATES AFTER DEBUG PID TT STAT TIME COMMAND 80298 ?? Ss 0:17.69 sendmail: accepting connections (sendmail) 80299 ?? S 0:46.66 sendmail: running queue: /var/spool/mqueue/qd1/df= (sendmail) 80300 ?? S 0:08.85 sendmail: running queue: /var/spool/mqueue/mby/df= (sendmail) 80301 ?? S 0:31.60 sendmail: running queue: /var/spool/mqueue/oz/df = (sendmail) 80302 ?? S 0:30.73 sendmail: running queue: /var/spool/mqueue/rw2/df= (sendmail) 80303 ?? S 0:33.32 sendmail: running queue: /var/spool/mqueue/hold/d= f (sendmail) 80304 ?? S 0:30.58 sendmail: running queue: /var/spool/mqueue/pgp/df= (sendmail) SENDMAIL DETAILS Version 8.14.4 Compiled with: DNSMAP LOG MAP_REGEX MATCHGECOS MILTER MIME7TO8 MIME8TO7 NAMED_BIND NETINET NETUNIX NEWDB NIS PIPELINING SASLv2 SCANF STARTTLS USERDB XDEBUG /usr/sbin/sendmail: libsasl2.so.2 =3D> /usr/local/lib/libsasl2.so.2 (0x28154000) libssl.so.7 =3D> /usr/local/lib/libssl.so.7 (0x2816a000) libcrypto.so.7 =3D> /usr/local/lib/libcrypto.so.7 (0x281ad000) libutil.so.8 =3D> /lib/libutil.so.8 (0x282f2000) libc.so.7 =3D> /lib/libc.so.7 (0x28300000) libz.so.5 =3D> /lib/libz.so.5 (0x2840c000) I posted about this in comp.mail.sendmail and was told... > sleep() should be one of these calls: >=20 > if (njobs =3D=3D 0 && WorkGrp[wgrp].wg_lowqintvl < MIN_SLEEP_TIME) > sleep(MIN_SLEEP_TIME); > else if (WorkGrp[wgrp].wg_lowqintvl <=3D 0) > sleep(QueueIntvl > 0 ? QueueIntvl : MIN_SLEEP_TIME); > else > sleep(WorkGrp[wgrp].wg_lowqintvl); >=20 > Unless you have a really large value for one of these, the process > should continue after a while. The above code snippet is from sendmail/queue.c which fixes MIN_SLEEP_TIME at 5. QueueIntvl defaults to 1. wg_lowqintvl defaults to 0. I have not set any configuration or runtime options to override these defaults, so my persistent queue runners should be sleeping for either 1s or 5s only (not hours!). --=20 John Marshall --vkogqOf2sHV7VnPd Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkuDMNoACgkQw/tAaKKahKIFTACgohNDxeGmLeaHydAUp56bE4mY vDgAn2a5UxA0It3kCLdq8VtH09/6ZqjF =SZw7 -----END PGP SIGNATURE----- --vkogqOf2sHV7VnPd--