From owner-cvs-all  Sun Dec 10 20:48:24 2000
From owner-cvs-all@FreeBSD.ORG  Sun Dec 10 20:48:20 2000
Return-Path: <owner-cvs-all@FreeBSD.ORG>
Delivered-To: cvs-all@freebsd.org
Received: from mail1.rdc3.on.home.com (mail1.rdc3.on.home.com [24.2.9.40])
	by hub.freebsd.org (Postfix) with ESMTP
	id C125937B401; Sun, 10 Dec 2000 20:48:19 -0800 (PST)
Received: from cr237535-a.bloor1.on.wave.home.com ([24.156.35.39])
          by mail1.rdc3.on.home.com (InterMail vM.4.01.03.00 201-229-121)
          with ESMTP
          id <20001211044818.YAIO20018.mail1.rdc3.on.home.com@cr237535-a.bloor1.on.wave.home.com>;
          Sun, 10 Dec 2000 20:48:18 -0800
Received: from james by cr237535-a.bloor1.on.wave.home.com with local (Exim 3.15 #1)
	id 145Kss-000JDx-00; Sun, 10 Dec 2000 23:48:18 -0500
Date: Sun, 10 Dec 2000 23:48:18 -0500
From: James FitzGibbon <james@ehlo.com>
To: David O'Brien <obrien@FreeBSD.org>
Cc: Robert Watson <rwatson@FreeBSD.org>, cvs-committers@FreeBSD.org,
	cvs-all@FreeBSD.org
Subject: Re: cvs commit: src/etc crontab
Message-ID: <20001210234818.A73780@ehlo.com>
References: <rwatson@FreeBSD.org> <200012110043.eBB0hYV06366@hak.lan.Awfulhak.org> <20001210165229.A84706@dragon.nuxi.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.4i
In-Reply-To: <20001210165229.A84706@dragon.nuxi.com>; from obrien@FreeBSD.org on Sun, Dec 10, 2000 at 04:52:29PM -0800
Sender: owner-cvs-all@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

* David O'Brien (obrien@FreeBSD.org) [001210 20:23]:

> I don't understand what your position is.  I think queueing the jobs is
> the correct thing to do.  My daily run can take 2+ hours occastionally.
> If that happens at the end of the week, I don't want to skip the weekly
> run, just delay it until after the dailly run.

This can lead to problems based on the local programs that the site runs. 
CVSup is notorious for this -- if the server it is attempting to connect to
is unavailable, it will just sit and retry for hours.  I've had systems that
were found to have 20+ copies of cvsup running, all retrying a server whose
name had inadvertently been removed from local DNS.  Thankfully, it was the
highest-numbered script in the daily directory, so it wasn't preventing
anything else from running (save the mailing of the entire daily output to
root).

I'm not sure what the happy medium is here.  I agree that running lockf with
a 'lock-or-quit' behaviour is bad, but if a program in the middle of the
daily sequence gets buggered, you could have major problems.  Take 320.rdist
for example.  If it hangs, then 450.status-security never runs, so you would
end up getting x (where x is the number of days you took to learn of the
problem) copies of the security output, but all based upon the same state
without any interim reports being sent out.

Granted, any good admin should notice that a machine isn't sending reports,
but at large sites there are always a few machines not properly monitored
that slip through the cracks.

What about putting a variable in rc.conf (is that the right place for
non-startup-related variables?) that represents the timeout value.  Have it
default to 0 (or some other acceptable limit).  That gives acceptable
behaviour plus an easily accessible place for admins to increase or even
remove the timeout.

To implement this, a shell variable extractor util would be handy (i.e.
something along the lines of "variable=`confvar periodic_daily_timeout`"
which would return the string "-t 3600 " in a default install, but a null
string in your environment to queue concurrent runs of the script.  The
crontab entry would then become something like this:

lockf `confvar periodic_daily_timeout` periodic daily 2>&! sendmail root
lockf `confvar periodic_weekly_timeout` periodic weekly 2>&! sendmail root

modulo the other timing changes we are discussing.

Thoughts ?

-- 
j.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message