Date: Wed, 24 Feb 2010 18:53:59 +1100 From: Peter Jeremy <peterjeremy@acm.org> To: freebsd-stable@freebsd.org Cc: gshapiro@freebsd.org Subject: Re: sleep(3) sometimes too sleepy on FreeBSD 8.0? Message-ID: <20100224075359.GA61876@server.vk2pj.dyndns.org> In-Reply-To: <20100223013522.GE2303@rwpc12.mby.riverwillow.net.au> References: <20100223013522.GE2303@rwpc12.mby.riverwillow.net.au>
next in thread | previous in thread | raw e-mail | index | archive | help
--J/dobhs11T7y2rNN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Updates following some off-line discussions and debugging with John on IRC. I've cc'd gshapiro@ because the problem appears to be sendmail, rather than the FreeBSD kernel. On 2010-Feb-23 12:35:22 +1100, John Marshall <john.marshall@riverwillow.com= =2Eau> wrote: >Environment: sendmail 8.14.4 on FreeBSD 8.0-RELEASE-p2 Note that this is stock ISC sendmail, not the sendmail in either the base system or the port. >I posted about this in comp.mail.sendmail and was told... > >> sleep() should be one of these calls: >>=20 >> if (njobs =3D=3D 0 && WorkGrp[wgrp].wg_lowqintvl < MIN_SLEEP_TIM= E) >> sleep(MIN_SLEEP_TIME); >> else if (WorkGrp[wgrp].wg_lowqintvl <=3D 0) >> sleep(QueueIntvl > 0 ? QueueIntvl : MIN_SLEEP_TIME); >> else >> sleep(WorkGrp[wgrp].wg_lowqintvl); Whilst it's true that the code calls sleep(), it's not calling sleep(3) in the FreeBSD libc. Instead it's calling a sleep() defined in libsm/clock.c - which is a horrible maze of #ifdefs. John has pre-processed that code and the result it at: http://www.riverwillow.net.au/~john/sm/clock.preprocessed At a quick look, the code is broken: sm_seteventm() generates a one-off timer using setitimer(2), which will send SIGALRM when it expires. sm_releasesignal() then unblocks SIGALRM. In theory, the SIGALRM could be delivered anywhere after the (!SmSleepDone) test and before pause() is called - in which case, the signal is lost and pause() will sleep forever. On 2010-Feb-24 08:13:06 +1100, John Marshall <john.marshall@riverwillow.com= =2Eau> wrote: >My ktrace file was created with 'ktrace -g 48501'. I have the result of >'kdump -R -p 48504' available at: > > <http://www.riverwillow.net.au/~john/8_0/rwsrv04_201002240725.kdump.gz> The syscall pattern near the end of this file is significantly different =66rom that elsewhere in the file - with gettimeofday(), sigprocmask() and sigsuspend() looping fairly rapidly. Interestingly, sigsuspend() is returning EINTR but no signal is reported. I'm not sure what could cause this. This syscall pattern looks like the while() loop in sendmail's sleep(), though it does appear that the loop is exited on that occasion but not on the following occasion (though the reason for this behaviour is unclear). Overall, it appears that there is a race condition in sendmail and something in the 8.0 signal handling appears to make this race easier to lose. Going back to the original clock.c source code, the other thing that is obvious is the HAVE_NANOSLEEP block - if this was active, sleep() would call nanosleep(2) and the whole signal mess would be avoided. It's not clear when that code was added but clock.c has not been touched for many years. In the sendmail in FreeBSD-8.0, there is no other reference to HAVE_NANOSLEEP within sendmail. sendmail 8.14.4 (in 8-STABLE) has HAVE_NANOSLEEP enabled on Solaris 11 only. Is there any reason why HAVE_NANOSLEEP is not defined for FreeBSD? Looking back through the commit logs, nanosleep(2) was implemented in sys/kern/kern_time.c v1.23 on Thu May 8 14:16:25 1997 UTC - that's just before RELENG_2_2. --=20 Peter Jeremy --J/dobhs11T7y2rNN Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkuE2xcACgkQ/opHv/APuIc1oACgnsnkJg0LUt/QFHWuKMQGKFl5 cpkAn1emZlO8CO0dn21kIEM3qi61kuid =zPnc -----END PGP SIGNATURE----- --J/dobhs11T7y2rNN--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100224075359.GA61876>