Date: Tue, 23 Feb 2010 12:35:22 +1100 From: John Marshall <john.marshall@riverwillow.com.au> To: freebsd-stable@freebsd.org Subject: sleep(3) sometimes too sleepy on FreeBSD 8.0? Message-ID: <20100223013522.GE2303@rwpc12.mby.riverwillow.net.au>
next in thread | raw e-mail | index | archive | help
--vkogqOf2sHV7VnPd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Environment: sendmail 8.14.4 on FreeBSD 8.0-RELEASE-p2 Since upgrading a few local servers to FreeBSD 8.0-RELEASE (and subsequently 8.0-RELEASE-p2), I have been seeing VERY intermittent problems with sendmail persistent queue runners. One or more queue runners will fail to wake up (having been told to sleep for either 1 or 5 seconds) and mail accumulates in their queue group queues. I have only seen this about 4 times but at least once on each of the three 8.0 servers. I've been seeing something like one occurrence per fortnight overall. The first few times I re-started sendmail. On Saturday I spent longer looking at it. - attached to each of the stuck queue runner processes via gdb to try to see where they were stuck - backtraces from both process were identical and looked sane - attached to a happy queue runner process and got an identical backtrace - exited gdb and discovered that the stuck queue runners had woken up and flushed their queues! The stuck queue runner processes had been stuck for several hours (judging by the timestamps on the queued mail messages) but the gdb attach apparently woke them up! PROCESS STATES BEFORE DEBUG (stuck runners are in 'I' state) PID TT STAT TIME COMMAND 80298 ?? Ss 0:17.68 sendmail: accepting connections (sendmail) 80299 ?? I 0:46.62 sendmail: running queue: /var/spool/mqueue/qd1/df= (sendmail) 80300 ?? I 0:08.83 sendmail: running queue: /var/spool/mqueue/mby/df= (sendmail) 80301 ?? S 0:31.58 sendmail: running queue: /var/spool/mqueue/oz/df = (sendmail) 80302 ?? S 0:30.71 sendmail: running queue: /var/spool/mqueue/rw2/df= (sendmail) 80303 ?? S 0:33.29 sendmail: running queue: /var/spool/mqueue/hold/d= f (sendmail) 80304 ?? S 0:30.55 sendmail: running queue: /var/spool/mqueue/pgp/df= (sendmail) BACKTRACE OF STUCK PROCESS 80299 (gdb) bt #0 0x28346547 in sigsuspend () from /lib/libc.so.7 #1 0x28344e98 in sigpause () from /lib/libc.so.7 #2 0x2833be3e in pause () from /lib/libc.so.7 #3 0x080cc7c8 in sleep () #4 0x08099c51 in run_work_group () #5 0x08099ebf in runqueue () #6 0x0805538d in main () BACKTRACE OF HAPPY PROCESS 80301 (gdb) bt #0 0x28346547 in sigsuspend () from /lib/libc.so.7 #1 0x28344e98 in sigpause () from /lib/libc.so.7 #2 0x2833be3e in pause () from /lib/libc.so.7 #3 0x080cc7c8 in sleep () #4 0x08099c51 in run_work_group () #5 0x08099ebf in runqueue () #6 0x0805538d in main () PROCESS STATES AFTER DEBUG PID TT STAT TIME COMMAND 80298 ?? Ss 0:17.69 sendmail: accepting connections (sendmail) 80299 ?? S 0:46.66 sendmail: running queue: /var/spool/mqueue/qd1/df= (sendmail) 80300 ?? S 0:08.85 sendmail: running queue: /var/spool/mqueue/mby/df= (sendmail) 80301 ?? S 0:31.60 sendmail: running queue: /var/spool/mqueue/oz/df = (sendmail) 80302 ?? S 0:30.73 sendmail: running queue: /var/spool/mqueue/rw2/df= (sendmail) 80303 ?? S 0:33.32 sendmail: running queue: /var/spool/mqueue/hold/d= f (sendmail) 80304 ?? S 0:30.58 sendmail: running queue: /var/spool/mqueue/pgp/df= (sendmail) SENDMAIL DETAILS Version 8.14.4 Compiled with: DNSMAP LOG MAP_REGEX MATCHGECOS MILTER MIME7TO8 MIME8TO7 NAMED_BIND NETINET NETUNIX NEWDB NIS PIPELINING SASLv2 SCANF STARTTLS USERDB XDEBUG /usr/sbin/sendmail: libsasl2.so.2 =3D> /usr/local/lib/libsasl2.so.2 (0x28154000) libssl.so.7 =3D> /usr/local/lib/libssl.so.7 (0x2816a000) libcrypto.so.7 =3D> /usr/local/lib/libcrypto.so.7 (0x281ad000) libutil.so.8 =3D> /lib/libutil.so.8 (0x282f2000) libc.so.7 =3D> /lib/libc.so.7 (0x28300000) libz.so.5 =3D> /lib/libz.so.5 (0x2840c000) I posted about this in comp.mail.sendmail and was told... > sleep() should be one of these calls: >=20 > if (njobs =3D=3D 0 && WorkGrp[wgrp].wg_lowqintvl < MIN_SLEEP_TIME) > sleep(MIN_SLEEP_TIME); > else if (WorkGrp[wgrp].wg_lowqintvl <=3D 0) > sleep(QueueIntvl > 0 ? QueueIntvl : MIN_SLEEP_TIME); > else > sleep(WorkGrp[wgrp].wg_lowqintvl); >=20 > Unless you have a really large value for one of these, the process > should continue after a while. The above code snippet is from sendmail/queue.c which fixes MIN_SLEEP_TIME at 5. QueueIntvl defaults to 1. wg_lowqintvl defaults to 0. I have not set any configuration or runtime options to override these defaults, so my persistent queue runners should be sleeping for either 1s or 5s only (not hours!). --=20 John Marshall --vkogqOf2sHV7VnPd Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkuDMNoACgkQw/tAaKKahKIFTACgohNDxeGmLeaHydAUp56bE4mY vDgAn2a5UxA0It3kCLdq8VtH09/6ZqjF =SZw7 -----END PGP SIGNATURE----- --vkogqOf2sHV7VnPd--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100223013522.GE2303>