Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Nov 2002 12:02:50 -0500 (EST)
From:      John Baldwin <jhb@FreeBSD.org>
To:        "Michael A. Mackey" <michael-mackey@uiowa.edu>
Cc:        freebsd-alpha@freebsd.org, Terry Lambert <tlambert2@mindspring.com>
Subject:   Re: Extreme time drift in SMP mode
Message-ID:  <XFMail.20021113120250.jhb@FreeBSD.org>
In-Reply-To: <1037142521.27992.41.camel@focaccia.>

next in thread | previous in thread | raw e-mail | index | archive | help

On 12-Nov-2002 Michael A. Mackey wrote:
> I guess I don't understand the problem.
> 
> It seemed to me that the problem was that not all the interrupts were
> being delivered because the Lynx architecture expects each processor to
> generate interrupts.  Before the fix, the system lost time by an amount
> which was equivalent to throwing away half of the interrupts. After my
> modification, each processor is allowed to generate clock interrupts,
> and the system receives the complete set of interrupts, yielding the
> result that the system keeps time correctly.
> 
> I'm sure that this is a naive picture of what's going on (and I'm not a
> kernel developer), but it works.  I realize that it is probably specific
> to the Lynx architecture, and I of course would be happy for a 'correct'
> way to allow this old box to happily crank along solving PDE's.
>  
> 
> Anyway, I sure am glad to have such high quality software to run on this
> box.  
> Keep up the great work FreeBSD-Alpha!

I'll try to explain.

For better or for worse, FreeBSD currently uses the following model
of clock interrupts to drive hardclock() (update timecounters, handle
profiling, drive softclock) and statclock() (update statistics):

For each "virtual" system-wide clock interrupt, all CPU's execute
statclock_process() and hardclock_process() (should be renamed to
*_thread() at this point) to perform process-specific updates
(profiling, stats, etc.) that need to happen on all CPU's.  All but
one of these CPU's execute these functions directly from their clock
interrupt.  One CPU executes statclock() and hardclock() directly
which call the _process() variants as part of their task, but also
perform system-wide updates such as update the timecounters and
drive softclock().

On i386, clock interrupts are only sent to one processor in a sort
of round-robin fashion.  What we do there is that each time a clock
interrupt occurs, the receiving CPU acts as the "master" CPU and
executes hardclock() and statclock().  It then IPI's all the other
CPU's in the system to simulate a system-wide clock interrupt, and
all the other CPU's in the system then execute the _process()
functions.

When we did SMP on Alpha we wanted to avoid sending all those IPI's
if possible.  For one thing, IPI's in general are expensive.  On the
Alpha they are a bit worse though as you can only IPI one CPU at a
time whereas on i386 you can send broadcast IPI's to all other CPU's
at once.  On at least the 4100 and DS20 type machines, we found that
the clock interrupt was broadcast to all CPU's, but in a round-robin
fashion.  That is, if we were getting X clock interrupts / sec in UP
on CPU 0, we still got X clock ints / sec on CPU 0, but we also got
X clock ints / sec on CPU 1, 2, etc.  but offset so that they didn't
all get interrupted at once.  Thus, we made the boot processor the
"master" CPU for all clock interrupts and had it call hardclock() and
statclock() while all the other CPU's would call the _process()
variants.  Basically, the system was doing the global IPI for us
except that since the interrupts were staggered, there was less
contesting on common locks.

Now enter the 2100 into the picture.  I'm not really sure what it is
doing with its clock interrupts.  I'm not sure if it is acting like a
i386 and round-robin'ing the clock interrupts to all processors or if
it is acting like other Alpha's but slowing down the clock.  Probably
it is acting like a i386 and we might need to just change the 2100
clock interrupt handler to use the i386 model and go and IPI the other
CPU's when a clock interrupt comes in.  If anyone can stick more than 2
CPU's in a 2100 system and see if the clock runs 3x or 4x as slow that
might help.

The reason I would prefer to just dink with the timer_freq is that it
is simple and doesn't change the model that system-wide things like
timekeeping only happen once per "virtual" clock interrupt.

-- 

John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.20021113120250.jhb>