From owner-freebsd-stable@FreeBSD.ORG Tue Feb 23 09:36:23 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5B0761065679 for ; Tue, 23 Feb 2010 09:36:23 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id A94258FC0A for ; Tue, 23 Feb 2010 09:36:21 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o1N9aGC1043775 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 23 Feb 2010 11:36:16 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3) with ESMTP id o1N9aGVh050589 for ; Tue, 23 Feb 2010 11:36:16 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3/Submit) id o1N9aG3t050588 for freebsd-stable@freebsd.org; Tue, 23 Feb 2010 11:36:16 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 23 Feb 2010 11:36:16 +0200 From: Kostik Belousov To: freebsd-stable@freebsd.org Message-ID: <20100223093616.GO50403@deviant.kiev.zoral.com.ua> References: <20100223013522.GE2303@rwpc12.mby.riverwillow.net.au> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="5KObZtj3Gjpm0OUY" Content-Disposition: inline In-Reply-To: <20100223013522.GE2303@rwpc12.mby.riverwillow.net.au> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Subject: Re: sleep(3) sometimes too sleepy on FreeBSD 8.0? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Feb 2010 09:36:23 -0000 --5KObZtj3Gjpm0OUY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 23, 2010 at 12:35:22PM +1100, John Marshall wrote: > Environment: sendmail 8.14.4 on FreeBSD 8.0-RELEASE-p2 >=20 > Since upgrading a few local servers to FreeBSD 8.0-RELEASE (and > subsequently 8.0-RELEASE-p2), I have been seeing VERY intermittent > problems with sendmail persistent queue runners. One or more queue > runners will fail to wake up (having been told to sleep for either 1 or > 5 seconds) and mail accumulates in their queue group queues. >=20 > I have only seen this about 4 times but at least once on each of the > three 8.0 servers. I've been seeing something like one occurrence per > fortnight overall. The first few times I re-started sendmail. On > Saturday I spent longer looking at it. >=20 > - attached to each of the stuck queue runner processes via gdb to > try to see where they were stuck > - backtraces from both process were identical and looked sane > - attached to a happy queue runner process and got an identical > backtrace > - exited gdb and discovered that the stuck queue runners had woken > up and flushed their queues! >=20 > The stuck queue runner processes had been stuck for several hours > (judging by the timestamps on the queued mail messages) but the gdb > attach apparently woke them up! >=20 > PROCESS STATES BEFORE DEBUG (stuck runners are in 'I' state) >=20 > PID TT STAT TIME COMMAND > 80298 ?? Ss 0:17.68 sendmail: accepting connections (sendmail) > 80299 ?? I 0:46.62 sendmail: running queue: /var/spool/mqueue/qd1/= df (sendmail) > 80300 ?? I 0:08.83 sendmail: running queue: /var/spool/mqueue/mby/= df (sendmail) > 80301 ?? S 0:31.58 sendmail: running queue: /var/spool/mqueue/oz/d= f (sendmail) > 80302 ?? S 0:30.71 sendmail: running queue: /var/spool/mqueue/rw2/= df (sendmail) > 80303 ?? S 0:33.29 sendmail: running queue: /var/spool/mqueue/hold= /df (sendmail) > 80304 ?? S 0:30.55 sendmail: running queue: /var/spool/mqueue/pgp/= df (sendmail) >=20 > BACKTRACE OF STUCK PROCESS 80299 >=20 > (gdb) bt > #0 0x28346547 in sigsuspend () from /lib/libc.so.7 > #1 0x28344e98 in sigpause () from /lib/libc.so.7 > #2 0x2833be3e in pause () from /lib/libc.so.7 > #3 0x080cc7c8 in sleep () > #4 0x08099c51 in run_work_group () > #5 0x08099ebf in runqueue () > #6 0x0805538d in main () >=20 > BACKTRACE OF HAPPY PROCESS 80301 >=20 > (gdb) bt > #0 0x28346547 in sigsuspend () from /lib/libc.so.7 > #1 0x28344e98 in sigpause () from /lib/libc.so.7 > #2 0x2833be3e in pause () from /lib/libc.so.7 > #3 0x080cc7c8 in sleep () > #4 0x08099c51 in run_work_group () > #5 0x08099ebf in runqueue () > #6 0x0805538d in main () >=20 > PROCESS STATES AFTER DEBUG >=20 > PID TT STAT TIME COMMAND > 80298 ?? Ss 0:17.69 sendmail: accepting connections (sendmail) > 80299 ?? S 0:46.66 sendmail: running queue: /var/spool/mqueue/qd1/= df (sendmail) > 80300 ?? S 0:08.85 sendmail: running queue: /var/spool/mqueue/mby/= df (sendmail) > 80301 ?? S 0:31.60 sendmail: running queue: /var/spool/mqueue/oz/d= f (sendmail) > 80302 ?? S 0:30.73 sendmail: running queue: /var/spool/mqueue/rw2/= df (sendmail) > 80303 ?? S 0:33.32 sendmail: running queue: /var/spool/mqueue/hold= /df (sendmail) > 80304 ?? S 0:30.58 sendmail: running queue: /var/spool/mqueue/pgp/= df (sendmail) >=20 > SENDMAIL DETAILS >=20 > Version 8.14.4 > Compiled with: DNSMAP LOG MAP_REGEX MATCHGECOS MILTER MIME7TO8 MIME8TO7 > NAMED_BIND NETINET NETUNIX NEWDB NIS PIPELINING SASLv2 SCANF > STARTTLS USERDB XDEBUG >=20 > /usr/sbin/sendmail: > libsasl2.so.2 =3D> /usr/local/lib/libsasl2.so.2 (0x28154000) > libssl.so.7 =3D> /usr/local/lib/libssl.so.7 (0x2816a000) > libcrypto.so.7 =3D> /usr/local/lib/libcrypto.so.7 (0x281ad000) > libutil.so.8 =3D> /lib/libutil.so.8 (0x282f2000) > libc.so.7 =3D> /lib/libc.so.7 (0x28300000) > libz.so.5 =3D> /lib/libz.so.5 (0x2840c000) >=20 > I posted about this in comp.mail.sendmail and was told... >=20 > > sleep() should be one of these calls: > >=20 > > if (njobs =3D=3D 0 && WorkGrp[wgrp].wg_lowqintvl < MIN_SLEEP_TI= ME) > > sleep(MIN_SLEEP_TIME); > > else if (WorkGrp[wgrp].wg_lowqintvl <=3D 0) > > sleep(QueueIntvl > 0 ? QueueIntvl : MIN_SLEEP_TIME); > > else > > sleep(WorkGrp[wgrp].wg_lowqintvl); > >=20 > > Unless you have a really large value for one of these, the process > > should continue after a while. >=20 > The above code snippet is from sendmail/queue.c which fixes > MIN_SLEEP_TIME at 5. QueueIntvl defaults to 1. wg_lowqintvl defaults > to 0. I have not set any configuration or runtime options to override > these defaults, so my persistent queue runners should be sleeping for > either 1s or 5s only (not hours!). I think the best way to collect the data would be ktrace the queue runners, preferrably starting the ktrace before they are stuck. --5KObZtj3Gjpm0OUY Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkuDoY8ACgkQC3+MBN1Mb4gGwACg1LUYnPIOmnFJ3QUdcyhU0qzy 7E8AoNXDtvy2Y8zoVd3cnR2Hm19lqwib =kO8c -----END PGP SIGNATURE----- --5KObZtj3Gjpm0OUY--