Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Feb 2010 11:36:16 +0200
From:      Kostik Belousov <kostikbel@gmail.com>
To:        freebsd-stable@freebsd.org
Subject:   Re: sleep(3) sometimes too sleepy on FreeBSD 8.0?
Message-ID:  <20100223093616.GO50403@deviant.kiev.zoral.com.ua>
In-Reply-To: <20100223013522.GE2303@rwpc12.mby.riverwillow.net.au>
References:  <20100223013522.GE2303@rwpc12.mby.riverwillow.net.au>

next in thread | previous in thread | raw e-mail | index | archive | help

--5KObZtj3Gjpm0OUY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Feb 23, 2010 at 12:35:22PM +1100, John Marshall wrote:
> Environment: sendmail 8.14.4 on FreeBSD 8.0-RELEASE-p2
>=20
> Since upgrading a few local servers to FreeBSD 8.0-RELEASE (and
> subsequently 8.0-RELEASE-p2), I have been seeing VERY intermittent
> problems with sendmail persistent queue runners.  One or more queue
> runners will fail to wake up (having been told to sleep for either 1 or
> 5 seconds) and mail accumulates in their queue group queues.
>=20
> I have only seen this about 4 times but at least once on each of the
> three 8.0 servers.  I've been seeing something like one occurrence per
> fortnight overall.  The first few times I re-started sendmail.  On
> Saturday I spent longer looking at it.
>=20
>  - attached to each of the stuck queue runner processes via gdb to
>    try to see where they were stuck
>  - backtraces from both process were identical and looked sane
>  - attached to a happy queue runner process and got an identical
>    backtrace
>  - exited gdb and discovered that the stuck queue runners had woken
>    up and flushed their queues!
>=20
> The stuck queue runner processes had been stuck for several hours
> (judging by the timestamps on the queued mail messages) but the gdb
> attach apparently woke them up!
>=20
> PROCESS STATES BEFORE DEBUG (stuck runners are in 'I' state)
>=20
>   PID  TT  STAT      TIME COMMAND
> 80298  ??  Ss     0:17.68 sendmail: accepting connections (sendmail)
> 80299  ??  I      0:46.62 sendmail: running queue: /var/spool/mqueue/qd1/=
df (sendmail)
> 80300  ??  I      0:08.83 sendmail: running queue: /var/spool/mqueue/mby/=
df (sendmail)
> 80301  ??  S      0:31.58 sendmail: running queue: /var/spool/mqueue/oz/d=
f (sendmail)
> 80302  ??  S      0:30.71 sendmail: running queue: /var/spool/mqueue/rw2/=
df (sendmail)
> 80303  ??  S      0:33.29 sendmail: running queue: /var/spool/mqueue/hold=
/df (sendmail)
> 80304  ??  S      0:30.55 sendmail: running queue: /var/spool/mqueue/pgp/=
df (sendmail)
>=20
> BACKTRACE OF STUCK PROCESS 80299
>=20
> (gdb) bt
> #0  0x28346547 in sigsuspend () from /lib/libc.so.7
> #1  0x28344e98 in sigpause () from /lib/libc.so.7
> #2  0x2833be3e in pause () from /lib/libc.so.7
> #3  0x080cc7c8 in sleep ()
> #4  0x08099c51 in run_work_group ()
> #5  0x08099ebf in runqueue ()
> #6  0x0805538d in main ()
>=20
> BACKTRACE OF HAPPY PROCESS 80301
>=20
> (gdb) bt
> #0  0x28346547 in sigsuspend () from /lib/libc.so.7
> #1  0x28344e98 in sigpause () from /lib/libc.so.7
> #2  0x2833be3e in pause () from /lib/libc.so.7
> #3  0x080cc7c8 in sleep ()
> #4  0x08099c51 in run_work_group ()
> #5  0x08099ebf in runqueue ()
> #6  0x0805538d in main ()
>=20
> PROCESS STATES AFTER DEBUG
>=20
>   PID  TT  STAT      TIME COMMAND
> 80298  ??  Ss     0:17.69 sendmail: accepting connections (sendmail)
> 80299  ??  S      0:46.66 sendmail: running queue: /var/spool/mqueue/qd1/=
df (sendmail)
> 80300  ??  S      0:08.85 sendmail: running queue: /var/spool/mqueue/mby/=
df (sendmail)
> 80301  ??  S      0:31.60 sendmail: running queue: /var/spool/mqueue/oz/d=
f (sendmail)
> 80302  ??  S      0:30.73 sendmail: running queue: /var/spool/mqueue/rw2/=
df (sendmail)
> 80303  ??  S      0:33.32 sendmail: running queue: /var/spool/mqueue/hold=
/df (sendmail)
> 80304  ??  S      0:30.58 sendmail: running queue: /var/spool/mqueue/pgp/=
df (sendmail)
>=20
> SENDMAIL DETAILS
>=20
> Version 8.14.4
>  Compiled with: DNSMAP LOG MAP_REGEX MATCHGECOS MILTER MIME7TO8 MIME8TO7
> 		NAMED_BIND NETINET NETUNIX NEWDB NIS PIPELINING SASLv2 SCANF
> 		STARTTLS USERDB XDEBUG
>=20
> /usr/sbin/sendmail:
> 	libsasl2.so.2 =3D> /usr/local/lib/libsasl2.so.2 (0x28154000)
> 	libssl.so.7 =3D> /usr/local/lib/libssl.so.7 (0x2816a000)
> 	libcrypto.so.7 =3D> /usr/local/lib/libcrypto.so.7 (0x281ad000)
> 	libutil.so.8 =3D> /lib/libutil.so.8 (0x282f2000)
> 	libc.so.7 =3D> /lib/libc.so.7 (0x28300000)
> 	libz.so.5 =3D> /lib/libz.so.5 (0x2840c000)
>=20
> I posted about this in comp.mail.sendmail and was told...
>=20
> > sleep() should be one of these calls:
> >=20
> >         if (njobs =3D=3D 0 && WorkGrp[wgrp].wg_lowqintvl < MIN_SLEEP_TI=
ME)
> >                 sleep(MIN_SLEEP_TIME);
> >         else if (WorkGrp[wgrp].wg_lowqintvl <=3D 0)
> >                 sleep(QueueIntvl > 0 ? QueueIntvl : MIN_SLEEP_TIME);
> >         else
> >                 sleep(WorkGrp[wgrp].wg_lowqintvl);
> >=20
> > Unless you have a really large value for one of these, the process
> > should continue after a while.
>=20
> The above code snippet is from sendmail/queue.c which fixes
> MIN_SLEEP_TIME at 5.  QueueIntvl defaults to 1.  wg_lowqintvl defaults
> to 0.  I have not set any configuration or runtime options to override
> these defaults, so my persistent queue runners should be sleeping for
> either 1s or 5s only (not hours!).

I think the best way to collect the data would be ktrace the queue runners,
preferrably starting the ktrace before they are stuck.

--5KObZtj3Gjpm0OUY
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iEYEARECAAYFAkuDoY8ACgkQC3+MBN1Mb4gGwACg1LUYnPIOmnFJ3QUdcyhU0qzy
7E8AoNXDtvy2Y8zoVd3cnR2Hm19lqwib
=kO8c
-----END PGP SIGNATURE-----

--5KObZtj3Gjpm0OUY--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100223093616.GO50403>