Date: Tue, 23 Feb 2010 11:36:16 +0200 From: Kostik Belousov <kostikbel@gmail.com> To: freebsd-stable@freebsd.org Subject: Re: sleep(3) sometimes too sleepy on FreeBSD 8.0? Message-ID: <20100223093616.GO50403@deviant.kiev.zoral.com.ua> In-Reply-To: <20100223013522.GE2303@rwpc12.mby.riverwillow.net.au> References: <20100223013522.GE2303@rwpc12.mby.riverwillow.net.au>
next in thread | previous in thread | raw e-mail | index | archive | help
--5KObZtj3Gjpm0OUY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 23, 2010 at 12:35:22PM +1100, John Marshall wrote: > Environment: sendmail 8.14.4 on FreeBSD 8.0-RELEASE-p2 >=20 > Since upgrading a few local servers to FreeBSD 8.0-RELEASE (and > subsequently 8.0-RELEASE-p2), I have been seeing VERY intermittent > problems with sendmail persistent queue runners. One or more queue > runners will fail to wake up (having been told to sleep for either 1 or > 5 seconds) and mail accumulates in their queue group queues. >=20 > I have only seen this about 4 times but at least once on each of the > three 8.0 servers. I've been seeing something like one occurrence per > fortnight overall. The first few times I re-started sendmail. On > Saturday I spent longer looking at it. >=20 > - attached to each of the stuck queue runner processes via gdb to > try to see where they were stuck > - backtraces from both process were identical and looked sane > - attached to a happy queue runner process and got an identical > backtrace > - exited gdb and discovered that the stuck queue runners had woken > up and flushed their queues! >=20 > The stuck queue runner processes had been stuck for several hours > (judging by the timestamps on the queued mail messages) but the gdb > attach apparently woke them up! >=20 > PROCESS STATES BEFORE DEBUG (stuck runners are in 'I' state) >=20 > PID TT STAT TIME COMMAND > 80298 ?? Ss 0:17.68 sendmail: accepting connections (sendmail) > 80299 ?? I 0:46.62 sendmail: running queue: /var/spool/mqueue/qd1/= df (sendmail) > 80300 ?? I 0:08.83 sendmail: running queue: /var/spool/mqueue/mby/= df (sendmail) > 80301 ?? S 0:31.58 sendmail: running queue: /var/spool/mqueue/oz/d= f (sendmail) > 80302 ?? S 0:30.71 sendmail: running queue: /var/spool/mqueue/rw2/= df (sendmail) > 80303 ?? S 0:33.29 sendmail: running queue: /var/spool/mqueue/hold= /df (sendmail) > 80304 ?? S 0:30.55 sendmail: running queue: /var/spool/mqueue/pgp/= df (sendmail) >=20 > BACKTRACE OF STUCK PROCESS 80299 >=20 > (gdb) bt > #0 0x28346547 in sigsuspend () from /lib/libc.so.7 > #1 0x28344e98 in sigpause () from /lib/libc.so.7 > #2 0x2833be3e in pause () from /lib/libc.so.7 > #3 0x080cc7c8 in sleep () > #4 0x08099c51 in run_work_group () > #5 0x08099ebf in runqueue () > #6 0x0805538d in main () >=20 > BACKTRACE OF HAPPY PROCESS 80301 >=20 > (gdb) bt > #0 0x28346547 in sigsuspend () from /lib/libc.so.7 > #1 0x28344e98 in sigpause () from /lib/libc.so.7 > #2 0x2833be3e in pause () from /lib/libc.so.7 > #3 0x080cc7c8 in sleep () > #4 0x08099c51 in run_work_group () > #5 0x08099ebf in runqueue () > #6 0x0805538d in main () >=20 > PROCESS STATES AFTER DEBUG >=20 > PID TT STAT TIME COMMAND > 80298 ?? Ss 0:17.69 sendmail: accepting connections (sendmail) > 80299 ?? S 0:46.66 sendmail: running queue: /var/spool/mqueue/qd1/= df (sendmail) > 80300 ?? S 0:08.85 sendmail: running queue: /var/spool/mqueue/mby/= df (sendmail) > 80301 ?? S 0:31.60 sendmail: running queue: /var/spool/mqueue/oz/d= f (sendmail) > 80302 ?? S 0:30.73 sendmail: running queue: /var/spool/mqueue/rw2/= df (sendmail) > 80303 ?? S 0:33.32 sendmail: running queue: /var/spool/mqueue/hold= /df (sendmail) > 80304 ?? S 0:30.58 sendmail: running queue: /var/spool/mqueue/pgp/= df (sendmail) >=20 > SENDMAIL DETAILS >=20 > Version 8.14.4 > Compiled with: DNSMAP LOG MAP_REGEX MATCHGECOS MILTER MIME7TO8 MIME8TO7 > NAMED_BIND NETINET NETUNIX NEWDB NIS PIPELINING SASLv2 SCANF > STARTTLS USERDB XDEBUG >=20 > /usr/sbin/sendmail: > libsasl2.so.2 =3D> /usr/local/lib/libsasl2.so.2 (0x28154000) > libssl.so.7 =3D> /usr/local/lib/libssl.so.7 (0x2816a000) > libcrypto.so.7 =3D> /usr/local/lib/libcrypto.so.7 (0x281ad000) > libutil.so.8 =3D> /lib/libutil.so.8 (0x282f2000) > libc.so.7 =3D> /lib/libc.so.7 (0x28300000) > libz.so.5 =3D> /lib/libz.so.5 (0x2840c000) >=20 > I posted about this in comp.mail.sendmail and was told... >=20 > > sleep() should be one of these calls: > >=20 > > if (njobs =3D=3D 0 && WorkGrp[wgrp].wg_lowqintvl < MIN_SLEEP_TI= ME) > > sleep(MIN_SLEEP_TIME); > > else if (WorkGrp[wgrp].wg_lowqintvl <=3D 0) > > sleep(QueueIntvl > 0 ? QueueIntvl : MIN_SLEEP_TIME); > > else > > sleep(WorkGrp[wgrp].wg_lowqintvl); > >=20 > > Unless you have a really large value for one of these, the process > > should continue after a while. >=20 > The above code snippet is from sendmail/queue.c which fixes > MIN_SLEEP_TIME at 5. QueueIntvl defaults to 1. wg_lowqintvl defaults > to 0. I have not set any configuration or runtime options to override > these defaults, so my persistent queue runners should be sleeping for > either 1s or 5s only (not hours!). I think the best way to collect the data would be ktrace the queue runners, preferrably starting the ktrace before they are stuck. --5KObZtj3Gjpm0OUY Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkuDoY8ACgkQC3+MBN1Mb4gGwACg1LUYnPIOmnFJ3QUdcyhU0qzy 7E8AoNXDtvy2Y8zoVd3cnR2Hm19lqwib =kO8c -----END PGP SIGNATURE----- --5KObZtj3Gjpm0OUY--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100223093616.GO50403>