Date: Tue, 24 Oct 2017 09:25:29 -0600 From: Ian Lepore <ian@freebsd.org> To: Borja Marcos <borjam@sarenet.es>, Alan Somers <asomers@freebsd.org> Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, freebsd-security@freebsd.org Subject: Re: Periodic jobs lockf timeout Message-ID: <1508858729.34364.32.camel@freebsd.org> In-Reply-To: <EAE33C61-BC70-4A09-86A0-0C5F62D993ED@sarenet.es> References: <AEF2CF7D-BFAC-4ACE-95F2-EF5026E89959@sarenet.es> <CAOtMX2hb_Ur8XtTdoPju3ZQGMfJ_pApUKsZiaocxaG9n%2BDVycA@mail.gmail.com> <EAE33C61-BC70-4A09-86A0-0C5F62D993ED@sarenet.es>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2017-10-24 at 17:06 +0200, Borja Marcos wrote: > > > > > On 24 Oct 2017, at 16:41, Alan Somers <asomers@freebsd.org> wrote: > > > > On Tue, Oct 24, 2017 at 3:07 AM, Borja Marcos <borjam@sarenet.es> wrote: > > Are you talking about the lockf in /usr/sbin/periodic? It already has > > a timeout of 0, which should prevent overlapping periodic jobs. Or is > > there some other lockf involved? Without knowing which lockf you're > > talking about, I can't understand your problem. > Sorry, my explanation was awful now that I read it again. Yes, I mean the lockf in /usr/sbin/periodic. And > no, I didn’t mean that jobs overlap (certainly they don’t thanks to the lockf) but they can pile up. Today I had > a machine with three daily jobs waiting to start because the first one had been running for four days (a combination > of lots of files and datasets, heavy system load, ZFS pool almost full…) > > The problem with a timeout of 0 is that it’s unlimited. No, lockf -t 0 means to exit without waiting, with status EX_TEMPFAIL, if the lock cannot be acquired immediately. In light of that, the rest of your report/request doesn't make sense. Jobs won't stack up, they'll fail if the prior one is still running. -- Ian > In case something is wrong you can end up with a growing queue of > daily periodic jobs waiting to run. Imagine you have a very high system load for several days and for some reason the daily job > won’t complete. Next day a new daily job will try to start but it will have to wait for the first one to finish. And so on. > > The proposal is to replace the “0” timeout for lockf with a sane timeout so that it will attempt to run it, but give up in > case it can’t be done in a reasonable time. The timeout shouldn’t be long actually. If periodic must wait in order to > start a job it means that you have a serious performance problem and it’s pointless to keep your machine doing “find” > 24/7. > > Given the nature of the periodic jobs I don’t think it should be a problem to attempt to run them in a best effort basis > rather than guaranteing that they will eventually even if awfully late. > > I would add a configurable timeout for /usr/sbin/periodic. I think it’s better done with a different variable for each > class and their default values can be 0 so that nothing changes. > > daily_start_timeout > weekly_start_timeout > monthly_start_timeout > > > > > > > The anticongestion_sleeptime variable is unrelated to lockf. > Understood, I stand corrected. I assumed it was. > > Hope it’s better now. It’s pretty easy to do but I’m interested on the opinions on this matter :) > > > Thank you! > > > > > > Borja.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1508858729.34364.32.camel>