Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Jan 2001 16:09:17 +0100
From:      Gerhard Sittig <Gerhard.Sittig@gmx.net>
To:        Greg Black <gjb@gbch.net>
Cc:        Doug Barton <DougB@gorean.org>, freebsd-hackers@FreeBSD.org
Subject:   Re: how to test out cron.c changes? (was: cvs commit: src/etc crontab)
Message-ID:  <20010113160917.Q253@speedy.gsinet>
In-Reply-To: <nospam-3a5d53d57a04368@maxim.gbch.net>; from gjb@gbch.net on Thu, Jan 11, 2001 at 04:33:57PM %2B1000
References:  <20001120143658.B4415@netmode.ece.ntua.gr> <20001120193326.C27042@speedy.gsinet> <20001205225656.Z27042@speedy.gsinet> <20001220211548.T253@speedy.gsinet> <3A513799.75EAB470@FreeBSD.org> <20010102133239.V253@speedy.gsinet> <20010107170840.G253@speedy.gsinet> <3A5AE490.D251F590@gorean.org> <20010110233907.L253@speedy.gsinet> <nospam-3a5d53d57a04368@maxim.gbch.net>

next in thread | previous in thread | raw e-mail | index | archive | help
[ for the impatient there's a summary at the bottom ("/summarize") ]

On Thu, Jan 11, 2001 at 16:33 +1000, Greg Black wrote:
> Gerhard Sittig wrote:
> 
> > I take notice of your (and Greg Black's) reservation / being
> > opposed, respect it and conclude that the change will have to
> > - default to the current behaviour (something quite usual for
> >   expanding changes)
> 
> We'd need some guarantees that the attempt to maintain current
> behaviour was done correctly -- i.e., without introducing bugs
> that broke things.

The only way I could think of to make sure there won't be new
bugs would be peer review.  The reason why we're not yet at this
stage is my personal failure to make any (visible) progress.
There seems to be a catch22:  I don't want to bother you with
untested code, but I fail to successfully test what I have here
as long as I'm alone fighting it.  But that's OK with me as it
"only" slows things down a bit.  Compared to the time this topic
has been discussed in over the last few years some more weeks
don't count that much ...

> > - be well documented (something absolutely clear to all of us,
> >   strictly speaking it's way out of imagination for us how
> >   somebody could contribute undocumented stuff ... :)
> 
> One of the things that would need to be documented is what will
> happen to the new algorithm in the situation where cron is
> stopped and re-started during one of the time periods that gets
> repeated.

Oh, you bring up an absolutely new datapoint it seems!  Since all
the information vixie cron (in its original form as well as the
OpenBSD variant) keeps its state in is held in memory (the time
it went to sleep, the time it expects to wake up again, the time
it is collecting jobs for -- usually somewhere between the time
it went to sleep and the time it woke up at, catching up towards
the current time) it wouldn't know that it does repeat an hour it
just has passed few minutes ago in case there's been a restart in
between.

This lack of persistency is some (mis)behaviour the OpenBSD
version has, too.  And it isn't documented there, either.  It
looks like this point should be fed back to their project, too
(to be solved by extending the documentation).

A solution to make this state persistent would be to
- have cron.c:set_time() read state from a file which
- cron.c:cron_sleep() will write to before going to sleep.

The file probably should live somewhere next to the crontab spool
directory.  But permanently writing state to disc to keep this
kind of persistency is what we have learnt to be wrong from the
latest /etc/rc and Yarrow thread on cvs-all ...


And yes, now that you brought this up, I can see the (IMO first
technical and valid) point against the proposed change.

It turns out that specifying cronjobs' timings in UTC is the only
real solution.  Should any existing will to *somehow* solve the
current situation better be spent on writing a frontend to _this_
approach (some vicron(8) command being a vipw(8) workalike,
translating users' specifications in local time to a unified
coordinate system; or a cron(8) command line option to interpret
the crontab time specification in UTC) instead of shooting down
the effort I'm trying to do now?  It's not that I would insist in
following a dead end's path.  But I would like to learn _how_ to
solve the problem and how to contribute if my current proposal
turns out to just not work at all in forseable future or for some
obvious(?) reason.

This is BTW the reason why I asked some seven weeks back in
cvs-all message <20001120193326.C27042@speedy.gsinet> *if* the
OpenBSD approach would be a solution for us, if somebody is
already to handle the DST topic, and if I should lend a hand /
jump in on existing works.  Just to find myself welcomed with a
warm and heartly "Go away and do something more useful instead!"
which might one of the reasons for this discussion being not
wholly technical ... :>

> > - yet be enabled easily for those interested in the change to
> >   work for them and free up some of their resources for more
> >   important tasks
> > - maybe provide knobs (besides the on-off-switch) to customize
> >   behaviour in a more fine grained way
> 
> In the beginning, something like CRON_DST_HACK="NO" in rc.conf
> with a comment pointing to the explanation should cover both
> these items.  If more is needed later, then it can be added.

I would have done some cron_flags setting in rc.conf(5) since
this is already kind of missing in case you want to activate some
-x command line parameters. :)  Of course this would have its
comment in /etc/defaults/rc.conf and its section in the
rc.conf(5) manpage pointing to the fullblown discussion in the
cron(8) manpage.

> > BTW:  There's good news for those with a dislike regarding
> > the change:  While testing I'm stuck again, so there will be
> > some more delay.
> 
> Previously we were told that this stuff had already been tested
> for years under another OS and was therefore robust and
> reliable.  Now we learn that these claims are not correct.  And
> you wonder why people are reluctant to even consider these
> changes ...

"We were told UNIX had been around for some thirty years, is said
to be functional / reliable / flexible / add whatever you use and
love UNIX for.  And now we learn it doesn't even work easily for
those simple tasks as networking / printing / gaming / etc are?"

Excuse me, please?  Could it be that you got more from my
messages than what I actually said?  What I did was to extract a
patch from a sibling *BSD project and try to port it to FreeBSD.
When *I* fail there it doesn't mean that the other project's
approach must be wrong.  I don't question *BSD's networking
capabilities just because I don't get LAN access with three
different PCMCIA cards (3com, Xircom, D-Link) and several OSes
(FreeBSD-STABLE, FreeBSD-CURRENT, NetBSD 1.5, OpenBSD 2.8).  And
I wouldn't dare to say "we never want to get to a state where
this works" just because it doesn't at the moment.  But I'm side
tracking.  And yes, there is a chance that the method has flaws
in principle, see my summary below.  But it could have been my
fault as well.


Well, since you "asked":  the current status is that OpenBSD has
had this extension since Dec '97:

  $ rlog /home/ocvs/src/usr.sbin/cron/cron.c,v
  [ ... ]
  revision 1.4
  date: 1997/12/22 08:10:41;  author: deraadt;  state: Exp; lines: +156 -68
  handle timing normally except when clock jumps between 1 and 3 hours. If it
  jumps, attempt as best as possible to gaurantee that jobs DO run, but only
  run ONCE; patch by thompson@.tgsoft.com
  [ ... ]
  date: 1995/10/18 08:47:30;  author: deraadt;  state: Exp; lines: +0 -0
  initial import of NetBSD tree

OpenBSD imported the NetBSD source, which didn't have the DST
handling code at this time.  Reading the NetBSD 1.5 cron(8)
manpage doesn't give a hint that they introduced such a change
any time later.  BSD/OS is something I didn't check.  FreeBSD
obviously doesn't have special DST treatment.  So of all the *BSD
sibling projects OpenBSD seems to be the only one to have taken
action in this respect (modulo my lack of information about
BSD/OS).

What I did so far is to extract the DST related parts of the cron
trees' diff between FreeBSD and OpenBSD (the latter had much more
changes not related to this topic).  The patches applied cleanly,
the source compiles and runs.  cron now handles time _jumps_ in
the documented way (like caused by manual intervention by means
of date(8) - tested - or netdate(8) - not tested, but expected to
result in the same time(3) change).

Why I'm puzzled since my latest tests is:  cron.c:main() has a
structure like this (all time counters are held in minutes):

-----------------------------------------------------------------
int main() {
        /* fork() to daemonize */
        ...

        /* init */
        load_database(&database);
        set_time();
        run_reboot_jobs(&database);
        timeRunning = virtualTime = clockTime;

        /* endless main loop
         * clockTime: the "real" time from time(NULL)
         * timeRunning: when we last ran
         * virtualTime: what we expect the time to be
         */
        while (TRUE) {

                load_database(&database);

                /* wait for the time to change */
                do {
                        cron_sleep(timeRunning + 1);
                        set_time();
                } while (clockTime == timeRunning);
                timeRunning = clockTime;

                /* calculate the difference between
                 * the current time and the last time we ran at
                 */
                timeDiff = timeRunning - virtualTime;

|               /* the most common case
|                * here is where the new switch(wakeupKind) comes in
|                */
|               virtualTime = timeRunning;
|               find_jobs(virtualTime, &database, TRUE, TRUE);

                job_runqueue();
        }
}

void set_time() {
        clockTime = time(NULL) / 60;
}

void cron_sleep(int target) {
        sleep(time(NULL) - target);
}
-----------------------------------------------------------------

The marked section is the one which (after obtaining the patch
from OpenBSD) would read like

  wakeupKind = f(timeDiff);
  switch (wakeupKind) {
    ...
  }


Now for the problem:  There's a repeated "sleep(); clockTime =
time(NULL);" invocation.  Then the difference between the minutes
we last ran at and the current minutes count is calculated.  In
the most common case cron wakes up often enough to always find
that only one minute has passed since.  What surprises me is:
DST changes won't make the time() result jump!  At least that's
what I get from reading "man 3 time" on several FreeBSD and
OpenBSD (and Linux:) versions.

localtime(3) is only used in find_jobs().  So the main loop
doesn't know about DST changes.  Of course localtime could get
introduced in the main loop's calculation, too -- but it wouldn't
be the original OpenBSD approach any longer.  Hmmm, how could
this ever have worked?  Or how could this bug(?) have gone
unnoticed for three years?  More probable:  Where is my error in
these thoughts?  What did I miss?  That's where I had to setup
another machine running OpenBSD to check and see if I could get
the expected behaviour at all.  And it looks like OpenBSD's
cron(8) doesn't "handle DST" either -- although the manpage does
suggest it.

Can any OpenBSD user lingering around and following the thread
comment on this?  Does it work and I fail to see it?  Did it
never work and nobody noticed?  Has it never be a concern?


To summarize the current FreeBSD situation:  DST is an ever
popping up "problem" right now and there is not (cannot be?) a
general solution easy enough to work for everyone and become
accepted by all traditional admins.  Then there's been my
impression (not experience) that a sibling BSD project already
has a solution and has had it for the past three years.  Hence my
try to port their approach over into FreeBSD.  And (after
fiddling with it for a while) their approach seems to not work
for DST changes.  Although it still could be incorporated to
handle the edge cases where
- cron(8) doesn't wake up often or fast enough to cope with its
  scheduled jobs (probably rare, but not impossible)
- time jumps - back and forth - cannot be avoided easily:  The
  "problem" with an NTP daemon is that it depends on network
  connectivity.  Remember that not everyone has permanent
  connections, some dialup users could prefer to use chrony or
  cron'ed netdate(8) or wall clock "sync"ed date(8) invocations.

I understand that having the clock jump is a Bad Idea(TM).
Especially when it is jumping backward since this violates the
model we have of time (*always* monotonously increasing, *maybe*
or *usually* flowing continuously)!  But I see an advantage in
supporting delayed cron ticks as well as forward clock correction
jumpwise.  That's why I will file the PR with the current state
next week.

And I realize that the DST topic is anything but trivial, cannot
be handled by my ported patch and actions can easily do harm when
done incorrectly.  The only solutions turn out to be
- education by constantly repeating "don't do this on your
  behalf", "correct the possibly wrong defaults installed for
  you" as well as "always keep this in mind whenever things
  should change", maybe supported by some kind of afterboot(7)
  manpage
- avoiding local time representation in job time specifications
  and thus completely eliminating DST influence, maybe supported
  by editing wrappers or database loading converters since humans
  might have problems getting used to calculate in "foreign"
  coordinate systems

So we're back from DST being a "problem" to just an "issue" ...


virtually yours   82D1 9B9C 01DC 4FB4 D7B4  61BE 3F49 4F77 72DE DA76
Gerhard Sittig   true | mail -s "get gpg key" Gerhard.Sittig@gmx.net
-- 
     If you don't understand or are scared by any of the above
             ask your parents or an adult to help you.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010113160917.Q253>