From owner-freebsd-stable@FreeBSD.ORG Wed Feb 24 07:54:02 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E389106564A for ; Wed, 24 Feb 2010 07:54:02 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from mail34.syd.optusnet.com.au (mail34.syd.optusnet.com.au [211.29.133.218]) by mx1.freebsd.org (Postfix) with ESMTP id 1C83A8FC08 for ; Wed, 24 Feb 2010 07:54:01 +0000 (UTC) Received: from server.vk2pj.dyndns.org (c122-106-232-148.belrs3.nsw.optusnet.com.au [122.106.232.148]) by mail34.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o1O7rxrR002622 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 24 Feb 2010 18:54:00 +1100 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.3/8.14.3) with ESMTP id o1O7rxaH062181; Wed, 24 Feb 2010 18:53:59 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.3/8.14.3/Submit) id o1O7rxKR062180; Wed, 24 Feb 2010 18:53:59 +1100 (EST) (envelope-from peter) Date: Wed, 24 Feb 2010 18:53:59 +1100 From: Peter Jeremy To: freebsd-stable@freebsd.org Message-ID: <20100224075359.GA61876@server.vk2pj.dyndns.org> References: <20100223013522.GE2303@rwpc12.mby.riverwillow.net.au> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="J/dobhs11T7y2rNN" Content-Disposition: inline In-Reply-To: <20100223013522.GE2303@rwpc12.mby.riverwillow.net.au> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) Cc: gshapiro@freebsd.org Subject: Re: sleep(3) sometimes too sleepy on FreeBSD 8.0? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Feb 2010 07:54:02 -0000 --J/dobhs11T7y2rNN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Updates following some off-line discussions and debugging with John on IRC. I've cc'd gshapiro@ because the problem appears to be sendmail, rather than the FreeBSD kernel. On 2010-Feb-23 12:35:22 +1100, John Marshall wrote: >Environment: sendmail 8.14.4 on FreeBSD 8.0-RELEASE-p2 Note that this is stock ISC sendmail, not the sendmail in either the base system or the port. >I posted about this in comp.mail.sendmail and was told... > >> sleep() should be one of these calls: >>=20 >> if (njobs =3D=3D 0 && WorkGrp[wgrp].wg_lowqintvl < MIN_SLEEP_TIM= E) >> sleep(MIN_SLEEP_TIME); >> else if (WorkGrp[wgrp].wg_lowqintvl <=3D 0) >> sleep(QueueIntvl > 0 ? QueueIntvl : MIN_SLEEP_TIME); >> else >> sleep(WorkGrp[wgrp].wg_lowqintvl); Whilst it's true that the code calls sleep(), it's not calling sleep(3) in the FreeBSD libc. Instead it's calling a sleep() defined in libsm/clock.c - which is a horrible maze of #ifdefs. John has pre-processed that code and the result it at: http://www.riverwillow.net.au/~john/sm/clock.preprocessed At a quick look, the code is broken: sm_seteventm() generates a one-off timer using setitimer(2), which will send SIGALRM when it expires. sm_releasesignal() then unblocks SIGALRM. In theory, the SIGALRM could be delivered anywhere after the (!SmSleepDone) test and before pause() is called - in which case, the signal is lost and pause() will sleep forever. On 2010-Feb-24 08:13:06 +1100, John Marshall wrote: >My ktrace file was created with 'ktrace -g 48501'. I have the result of >'kdump -R -p 48504' available at: > > The syscall pattern near the end of this file is significantly different =66rom that elsewhere in the file - with gettimeofday(), sigprocmask() and sigsuspend() looping fairly rapidly. Interestingly, sigsuspend() is returning EINTR but no signal is reported. I'm not sure what could cause this. This syscall pattern looks like the while() loop in sendmail's sleep(), though it does appear that the loop is exited on that occasion but not on the following occasion (though the reason for this behaviour is unclear). Overall, it appears that there is a race condition in sendmail and something in the 8.0 signal handling appears to make this race easier to lose. Going back to the original clock.c source code, the other thing that is obvious is the HAVE_NANOSLEEP block - if this was active, sleep() would call nanosleep(2) and the whole signal mess would be avoided. It's not clear when that code was added but clock.c has not been touched for many years. In the sendmail in FreeBSD-8.0, there is no other reference to HAVE_NANOSLEEP within sendmail. sendmail 8.14.4 (in 8-STABLE) has HAVE_NANOSLEEP enabled on Solaris 11 only. Is there any reason why HAVE_NANOSLEEP is not defined for FreeBSD? Looking back through the commit logs, nanosleep(2) was implemented in sys/kern/kern_time.c v1.23 on Thu May 8 14:16:25 1997 UTC - that's just before RELENG_2_2. --=20 Peter Jeremy --J/dobhs11T7y2rNN Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkuE2xcACgkQ/opHv/APuIc1oACgnsnkJg0LUt/QFHWuKMQGKFl5 cpkAn1emZlO8CO0dn21kIEM3qi61kuid =zPnc -----END PGP SIGNATURE----- --J/dobhs11T7y2rNN--