From owner-freebsd-current@FreeBSD.ORG  Tue May  4 13:45:50 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8B73616A4CE
	for <freebsd-current@FreeBSD.org>;
	Tue,  4 May 2004 13:45:50 -0700 (PDT)
Received: from mail6.speakeasy.net (mail6.speakeasy.net [216.254.0.206])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D374A43D45
	for <freebsd-current@FreeBSD.org>;
	Tue,  4 May 2004 13:45:49 -0700 (PDT)	(envelope-from jhb@FreeBSD.org)
Received: (qmail 22322 invoked from network); 4 May 2004 20:45:49 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <freebsd-current@FreeBSD.org>; 4 May 2004 20:45:49 -0000
Received: from 10.50.40.205 (gw1.twc.weather.com [216.133.140.1])
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i44Kjj5B064817;
	Tue, 4 May 2004 16:45:46 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: Bruce Evans <bde@zeta.org.au>
Date: Tue, 4 May 2004 15:37:50 -0400
User-Agent: KMail/1.6
References: <20040426111754.38a855c4.bm@malepartus.de>
	<20040503094236.6b7dc4a5.bm@malepartus.de>
	<20040504221059.W9822@gamplex.bde.org>
In-Reply-To: <20040504221059.W9822@gamplex.bde.org>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200405041537.50243.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: Burkard Meyendriesch <bm@malepartus.de>
cc: freebsd-current@FreeBSD.org
cc: atkin901@yahoo.com
Subject: Re: sio: lots of silo overflows on Asus K8V with Moxa Smartio
	C104H/PCI solved
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 04 May 2004 20:45:50 -0000

On Tuesday 04 May 2004 10:04 am, Bruce Evans wrote:
> On Mon, 3 May 2004, Burkard Meyendriesch wrote:
> > On Mon, 3 May 2004 15:00:15 +1000 (EST) Bruce Evans wrote:
> > > ...
> > >
> > > You need to turn PUC_FASTINTR back off to test the patch.
> >
> > I disabled "options PUC_FASTINTR" in my kernel configuration file
> > and made a new kernel. The actual boot message is:
> >
> > May  3 09:18:59 Reineke kernel: puc0: <Moxa Technologies, Smartio
> > C104H/PCI> port 0xa000-0xa00f,0xa400-0xa43f,0xa800-0xa87f irq 19 at
> > device 14.0 on pci0 May  3 09:18:59 Reineke kernel: puc0: Reserved 0x40
> > bytes for rid 0x18 type 4 at 0xa400 May  3 09:18:59 Reineke kernel: sio4:
> > <Moxa Technologies, Smartio C104H/PCI> on puc0 May  3 09:18:59 Reineke
> > kernel: sio4: type 16550A
> > May  3 09:18:59 Reineke kernel: sio4: unable to activate interrupt in
> > fast mode - using normal mode
> >
> > Doing my sio5 Palm backup test now I again get lots of silo overflows;
> > the input rate is in a range from 100 to 10000 chars/s and the puc
> > interrupt rate behaves similar: from 12 to 1200 ints/s. After several
> > minutes the Palm backup software gives up due to protocol errors
> > (maybe as a result of massive character losses).
> >
> > If your patch only becomes effective without "options PUC_FASTINTR" it
> > does not seem to solve the sio interrupt problem in my environment.
>
> So much for my theory that the problem is contention with a low priority
> thread.  Since holding a spin lock or otherwise disabling interrupts for
> too long would also break the PUC_FASTINTR case, the problem must be that
> the highest priority runnable thread (which with my patch can only be the
> sio (puc) ithread if that thread is runnable) is not always run.  This is
> quite likely to be just the old bug that handling of interrupts which
> can't be handled immediately might be delayed for too long.  From
> ithread_schedule():
>
> % 	mtx_lock_spin(&sched_lock);
> % 	if (TD_AWAITING_INTR(td)) {
> % 		CTR2(KTR_INTR, "%s: setrunqueue %d", __func__, p->p_pid);
> % 		TD_CLR_IWAIT(td);
> % 		setrunqueue(td);
> % 		if (do_switch &&
> % 		    (ctd->td_critnest == 1) ) {
> % 			KASSERT((TD_IS_RUNNING(ctd)),
> % 			    ("ithread_schedule: Bad state for curthread."));
> % 			if (ctd->td_flags & TDF_IDLETD)
> % 				ctd->td_state = TDS_CAN_RUN; /* XXXKSE */
> % 			mi_switch(SW_INVOL);
> % 		} else {
> % 			curthread->td_flags |= TDF_NEEDRESCHED;
> % 		}
> % 	} else {
>
> When the switch can't be done immediately (which is now always (!!)),
> this just sets TDF_NEEDRESCHED, but TDF_NEEDRESCHED is not checked
> until return to user mode, so the switch is indefinitely delayed.
> critical_enter() now disables interrupts, so the thread that got
> interrupted must not be in a critical section.  When we return to it
> without checking TDF_NEEDRESCHED, it continues until it either gives
> up control or perhaps until it is interrupted by an interrupt whose
> handler is missing the bugs here (a clock interrupt perhaps).
>
> (!!) For hardware interrupts, ithread_schedule() is now always called
> from a critical section.  Interrupts are now disabled in critical
> sections, so ctd->td-critnest is always 0 when
> ithread_execute_handlers() is called, always 1 when ithread_schedule()
> is called, and always 2 when it is checked above.  So the switch can
> only occur when ithread_schedule() is called in contexts that are
> missing this foot-shooting.  There may be a few such calls, but
> software interrupts are often scheduled from critical regions too.

Oof. :(

> I've fixed the primary bug in 2 different versions of interrupt handling
> (mine and last October's version in -current).  Rescheduling must
> be considered whenever td_critnest is decremented to 0.  The following
> patch worked last October, but its infrastructure went away last November:

This patch is very similar to the preemptive kernel patches (*) that I finally 
resurrected and reworked.  The primary differences are that 1) I use a 
separate flag for deferred preemptions (TDF_OWEPREEMPT) and 2) I move the 
preemption checking code into sched_add() so that anytime a thread is put on 
a run queue we may choose to preempt.  This removes the need for explicit 
preemption code in ithread_schedule(), mtx_unlock_sleep(), and boot().  Note 
that a fully preemptive kernel doesn't appear to be safe yet, even on UP (I 
get corrupted eip's that are off by one in both kernel (SMP test) and 
userland (UP test) that I haven't tracked down yet.)  

*: http://www.FreeBSD.org/~jhb/patches/preempt.patch

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org