From owner-freebsd-security@freebsd.org Tue Oct 24 15:25:38 2017 Return-Path: Delivered-To: freebsd-security@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CD4EBE508A1 for ; Tue, 24 Oct 2017 15:25:38 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from outbound1a.eu.mailhop.org (outbound1a.eu.mailhop.org [52.58.109.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6680863E82 for ; Tue, 24 Oct 2017 15:25:37 +0000 (UTC) (envelope-from ian@freebsd.org) X-MHO-User: 940d5056-b8cf-11e7-a893-25625093991c X-Report-Abuse-To: https://support.duocircle.com/support/solutions/articles/5000540958-duocircle-standard-smtp-abuse-information X-Originating-IP: 73.78.92.27 X-Mail-Handler: DuoCircle Outbound SMTP Received: from ilsoft.org (unknown [73.78.92.27]) by outbound1.eu.mailhop.org (Halon) with ESMTPSA id 940d5056-b8cf-11e7-a893-25625093991c; Tue, 24 Oct 2017 15:25:35 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.15.2/8.15.2) with ESMTP id v9OFPTfG001285; Tue, 24 Oct 2017 09:25:29 -0600 (MDT) (envelope-from ian@freebsd.org) Message-ID: <1508858729.34364.32.camel@freebsd.org> Subject: Re: Periodic jobs lockf timeout From: Ian Lepore To: Borja Marcos , Alan Somers Cc: "freebsd-hackers@freebsd.org" , freebsd-security@freebsd.org Date: Tue, 24 Oct 2017 09:25:29 -0600 In-Reply-To: References: Content-Type: text/plain; charset="windows-1251" X-Mailer: Evolution 3.18.5.1 FreeBSD GNOME Team Port Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-Mailman-Approved-At: Tue, 24 Oct 2017 17:28:38 +0000 X-BeenThere: freebsd-security@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Security issues \[members-only posting\]" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Oct 2017 15:25:38 -0000 On Tue, 2017-10-24 at 17:06 +0200, Borja Marcos wrote: > > > > > On 24 Oct 2017, at 16:41, Alan Somers wrote: > > > > On Tue, Oct 24, 2017 at 3:07 AM, Borja Marcos wrote: > > Are you talking about the lockf in /usr/sbin/periodic?  It already has > > a timeout of 0, which should prevent overlapping periodic jobs.  Or is > > there some other lockf involved?  Without knowing which lockf you're > > talking about, I can't understand your problem. > Sorry, my explanation was awful now that I read it again. Yes, I mean the lockf in /usr/sbin/periodic. And > no, I didn’t mean that jobs overlap (certainly they don’t thanks to the lockf) but they can pile up. Today I had > a machine with three daily jobs waiting to start because the first one had been running for four days (a combination > of lots of files and datasets, heavy system load, ZFS pool almost full…)  > > The problem with a timeout of 0 is that it’s unlimited. No, lockf -t 0 means to exit without waiting, with status EX_TEMPFAIL, if the lock cannot be acquired immediately.  In light of that, the rest of your report/request doesn't make sense.  Jobs won't stack up, they'll fail if the prior one is still running. -- Ian > In case something is wrong you can end up with a growing queue of > daily periodic jobs waiting to run. Imagine you have a very high system load for several days and for some reason the daily job > won’t complete. Next day a new daily job will try to start but it will have to wait for the first one to finish. And so on. > > The proposal is to replace the “0” timeout for lockf with a sane timeout so that it will attempt to run it, but give up in > case it can’t be done in a reasonable time. The timeout shouldn’t be long actually. If periodic must wait in order to > start a job it means that you have a serious performance problem and it’s pointless to keep your machine doing “find” > 24/7. > > Given the nature of the periodic jobs I don’t think it should be a problem to attempt to run them in a best effort basis > rather than guaranteing that they will eventually even if awfully late. > > I would add a configurable timeout for /usr/sbin/periodic. I think it’s better done with a different variable for each  > class and their default values can be 0 so that nothing changes. > > daily_start_timeout > weekly_start_timeout > monthly_start_timeout > > > > > > > The anticongestion_sleeptime variable is unrelated to lockf. > Understood, I stand corrected. I assumed it was.  > > Hope it’s better now. It’s pretty easy to do but I’m interested on the opinions on this matter :) > > > Thank you! > > > > > > Borja.