From owner-cvs-all Sun Dec 10 20:48:24 2000 From owner-cvs-all@FreeBSD.ORG Sun Dec 10 20:48:20 2000 Return-Path: Delivered-To: cvs-all@freebsd.org Received: from mail1.rdc3.on.home.com (mail1.rdc3.on.home.com [24.2.9.40]) by hub.freebsd.org (Postfix) with ESMTP id C125937B401; Sun, 10 Dec 2000 20:48:19 -0800 (PST) Received: from cr237535-a.bloor1.on.wave.home.com ([24.156.35.39]) by mail1.rdc3.on.home.com (InterMail vM.4.01.03.00 201-229-121) with ESMTP id <20001211044818.YAIO20018.mail1.rdc3.on.home.com@cr237535-a.bloor1.on.wave.home.com>; Sun, 10 Dec 2000 20:48:18 -0800 Received: from james by cr237535-a.bloor1.on.wave.home.com with local (Exim 3.15 #1) id 145Kss-000JDx-00; Sun, 10 Dec 2000 23:48:18 -0500 Date: Sun, 10 Dec 2000 23:48:18 -0500 From: James FitzGibbon To: David O'Brien Cc: Robert Watson , cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: cvs commit: src/etc crontab Message-ID: <20001210234818.A73780@ehlo.com> References: <200012110043.eBB0hYV06366@hak.lan.Awfulhak.org> <20001210165229.A84706@dragon.nuxi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: <20001210165229.A84706@dragon.nuxi.com>; from obrien@FreeBSD.org on Sun, Dec 10, 2000 at 04:52:29PM -0800 Sender: owner-cvs-all@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * David O'Brien (obrien@FreeBSD.org) [001210 20:23]: > I don't understand what your position is. I think queueing the jobs is > the correct thing to do. My daily run can take 2+ hours occastionally. > If that happens at the end of the week, I don't want to skip the weekly > run, just delay it until after the dailly run. This can lead to problems based on the local programs that the site runs. CVSup is notorious for this -- if the server it is attempting to connect to is unavailable, it will just sit and retry for hours. I've had systems that were found to have 20+ copies of cvsup running, all retrying a server whose name had inadvertently been removed from local DNS. Thankfully, it was the highest-numbered script in the daily directory, so it wasn't preventing anything else from running (save the mailing of the entire daily output to root). I'm not sure what the happy medium is here. I agree that running lockf with a 'lock-or-quit' behaviour is bad, but if a program in the middle of the daily sequence gets buggered, you could have major problems. Take 320.rdist for example. If it hangs, then 450.status-security never runs, so you would end up getting x (where x is the number of days you took to learn of the problem) copies of the security output, but all based upon the same state without any interim reports being sent out. Granted, any good admin should notice that a machine isn't sending reports, but at large sites there are always a few machines not properly monitored that slip through the cracks. What about putting a variable in rc.conf (is that the right place for non-startup-related variables?) that represents the timeout value. Have it default to 0 (or some other acceptable limit). That gives acceptable behaviour plus an easily accessible place for admins to increase or even remove the timeout. To implement this, a shell variable extractor util would be handy (i.e. something along the lines of "variable=`confvar periodic_daily_timeout`" which would return the string "-t 3600 " in a default install, but a null string in your environment to queue concurrent runs of the script. The crontab entry would then become something like this: lockf `confvar periodic_daily_timeout` periodic daily 2>&! sendmail root lockf `confvar periodic_weekly_timeout` periodic weekly 2>&! sendmail root modulo the other timing changes we are discussing. Thoughts ? -- j. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message