From owner-freebsd-current Wed Sep 20 6:59: 9 2000 Delivered-To: freebsd-current@freebsd.org Received: from gidora.zeta.org.au (gidora.zeta.org.au [203.26.10.25]) by hub.freebsd.org (Postfix) with SMTP id ACACF37B443 for ; Wed, 20 Sep 2000 06:59:00 -0700 (PDT) Received: (qmail 23775 invoked from network); 20 Sep 2000 13:58:51 -0000 Received: from unknown (HELO bde.zeta.org.au) (203.2.228.102) by gidora.zeta.org.au with SMTP; 20 Sep 2000 13:58:51 -0000 Date: Thu, 21 Sep 2000 00:58:47 +1100 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: John Baldwin Cc: current@FreeBSD.ORG, "Andrey A. Chernov" , smp@FreeBSD.ORG Subject: Re: recent kernel, microuptime went backwards In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 20 Sep 2000, John Baldwin wrote: > On 19-Sep-00 Bruce Evans wrote: > > It really does go backwards. This is caused by the giant lock preventing > > the clock interrupt task from running soon enough. The giant lock can > > also prevent the clock interrupt task from running often enough even > > after booting. E.g., "dd if=/dev/random of=/dev/null bs=large" does > > several bad things. > > It's not the Giant lock that is at fault. We give up Giant during mi_switch(). > Then scheduling problem is in the way that the top-level scheduler runs. Then the scheduler is more broken than I thought :-). Initially there may be a locking problem as well as a scheduling problem. Giving up Giant in the first mi_switch() is a bit late. mi_switch() uses microuptime(), and the clock task needs to be run before then to finish initialization of the timecounter. > We > decide to schedule another process due to the timeslice ending during the clk > interrupt thread. How can this work? The timeslice accounting stuff doesn't get updated until the clock task runs. > In the past, this was not run as a thread, so it ran, set > the AST_* constant for needing a resched and then exited. During doreti, we > notice an AST is pending and call ast(). ast() calls userret() which notices > that a resched is needed and calls mi_switch(). In the New World Order, when > the clock interrupt occurs, we set the AST_* constant for every interrupt > before returning from sched_ithd(). This results in the actual interrupt > threads being schedule from ast(). What should happen is for ast() to normally schedule the clock interrupt (and other interrupts) immediately (unless they are blocked). This doesn't seem to be working, and I can't see how it can work, since there is nothing except the giant lock to tell us whether interrupts are blocked, and the giant lock is held most of the time in system mode. Previously, cpl told us, but cpl is no longer maintained. > However, when the clk ithread finishes, it > simply calls mi_switch() to enter the next process in ithd_loop(). The > need_resched() that it sets isn't handled until the next call to userret() > either via a hardware interrupt or a syscall return. Thus, the problem isn't > due to Giant, but rather to interrupt threads. I think this is a different problem. It is similar to a problem for scheduling netisrs from non-interrupt context. schednetisr() sets the AST flag and some other flags. Nothing looks at these flags until an interrupt occurs or the process sleeps. Previously, this was handled in splx(): s = splnet(); /* s == 0 in process context. */ queue_net_output(...); schednetisr(...); splx(s); /* Since s == 0, the netisr gets run here. */ Will this work again as soom as splx() is replaced by mtx_exit(), etc? We only have a few thousand spls to change :(. > As for the micruptime() > messages on boot, they only occur here on a UP kernel. On an SMP kernel I > don't get them. Also, they always occur during mi_switch() when an interrupt > thread is finishing and going back to sleep. The first such thread to be run > to generate thet error message is the irq0: clk ithread, so the clk ithread is > running fine. They are very timing dependent, and probably also very task-mix dependent. The primary cause of microuptime() going backwards is tv_nsec overflowing if the system takes longer than 2^32 nsec (about 4 seconds) between the initialization of the timecounter and the timecounter maintenance for the first clock interrupt. On one of my systems, the first thread to call mi_switch() is the generic thread (proc0?) that executes run_interrupt_driven_hooks(). mi_switch() is called for the first time when the ata hook goes to sleep. Things would be a little different for SMP. Hopefully another cpu handles the clock interrupt. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message