Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Feb 2019 05:46:11 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 235657] /usr/libexec/atrun race causes missed jobs
Message-ID:  <bug-235657-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D235657

            Bug ID: 235657
           Summary: /usr/libexec/atrun race causes missed jobs
           Product: Base System
           Version: 12.0-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: bin
          Assignee: bugs@FreeBSD.org
          Reporter: karl@denninger.net

Created attachment 201915
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D201915&action=
=3Dedit
Diff against /usr/src/libexec/atrun directory

I have no idea why this hasn't bit people before, or isn't biting people
now.... but it is biting me.

/usr/libexec/atrun is the "batch" job executor out of the cron and by defau=
lt
runs every 5 minutes.

The code has an unlink call in it that attempts to remove old jobs from the
queue but unfortunately the queue code can select a job to run, call fork()=
 to
start it, post-fork() the child can give up the CPU before it opens the file
containing the job and thus the queue code (which is in the parent) can exe=
cute
the unlink before the child process gets the file open.  If this happens you
get a "file not found" error in the cron log and the job doesn't run.

The attached patch fixes the potential race by moving the unlink into the
child; it may not be the most-elegant, but it works.  Unfortunately due to =
the
code's structure (it performs multiple tests on the file to be run for secu=
rity
reasons) there are multiple error exits and, in the event of any of those, =
you
must unlink the file as well or it will try to run repeatedly -- yet you ca=
n't
unlink it immediately after it is opened because some of the tests require =
it
still be on the filesystem.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-235657-227>