From owner-freebsd-stable@FreeBSD.ORG  Mon Jun 30 23:16:14 2008
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D74F31065673;
	Mon, 30 Jun 2008 23:16:14 +0000 (UTC)
	(envelope-from stef@memberwebs.com)
Received: from mx.npubs.com (mail.zoneseven.net [209.66.100.224])
	by mx1.freebsd.org (Postfix) with ESMTP id AFA338FC1B;
	Mon, 30 Jun 2008 23:16:14 +0000 (UTC)
	(envelope-from stef@memberwebs.com)
Received: from mx.npubs.com (avhost [209.66.100.194])
	by mx.npubs.com (Postfix) with ESMTP id C64D7F181CD;
	Mon, 30 Jun 2008 23:16:13 +0000 (UTC)
Received: from northstar-srv2 (unknown [172.27.2.11])
	by mx.npubs.com (Postfix) with ESMTP id F0245F1817C;
	Mon, 30 Jun 2008 23:16:11 +0000 (UTC)
From: Stef <stef@memberwebs.com>
User-Agent: Thunderbird 2.0.0.14 (X11/20080505)
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
References: <20080615112318.146C1F18512@mx.npubs.com>
	<200806180917.05689.jhb@freebsd.org>
X-Enigmail-Version: 0.95.0
Content-Type: multipart/mixed; boundary="------------040406060900030904060601"
Message-Id: <20080630231611.F0245F1817C@mx.npubs.com>
X-Virus-Scanned: ClamAV using ClamSMTP
Date: Mon, 30 Jun 2008 23:16:13 +0000 (UTC)
Cc: freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org
Subject: Re: FreeBSD 6.3 deadlock (vm_map?) with DDB output
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 30 Jun 2008 23:16:15 -0000

This is a multi-part message in MIME format.
--------------040406060900030904060601
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit

John Baldwin wrote:
> On Sunday 15 June 2008 07:23:19 am Stef Walter wrote:
>> I've been trying to track down a deadlock on some newish production
>> servers running FreeBSD 6.3-RELEASE-p2. The deadlock occurs on a
>> specific (although mundane) hardware configuration, and each of several
>> servers running this hardware deadlock about once per week.
> 
> Try this change:
> 
<snip>
> We use it at work on 6.x.  W/o this fix, round-robin stops working on 4BSD 
> when softclock() (swi4: clock) blocks on a lock like Giant.

Just wanted to confirm: That patch did the trick. All the SMP machines
that had this problem have been stable for 11 days now, longer than any
of them were up previously.

I changed the patch slightly to work with FreeBSD 6.3-RELEASE. That's
attached, in case anyone needs this later.

Cheers,
Stef

--------------040406060900030904060601
Content-Type: text/x-patch;
 name="kern_sched4bsd_deadlock.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="kern_sched4bsd_deadlock.patch"
--- sys/kern/sched_4bsd.c.orig	2006-06-16 22:11:55.000000000 +0000
+++ sys/kern/sched_4bsd.c	2008-06-18 17:04:34.000000000 +0000
@@ -157,13 +157,10 @@
 static int	sched_quantum;	/* Roundrobin scheduling quantum in ticks. */
 #define	SCHED_QUANTUM	(hz / 10)	/* Default sched quantum */
 
-static struct callout roundrobin_callout;
-
 static void	slot_fill(struct ksegrp *kg);
 static struct kse *sched_choose(void);		/* XXX Should be thread * */
 
 static void	setup_runqs(void);
-static void	roundrobin(void *arg);
 static void	schedcpu(void);
 static void	schedcpu_thread(void);
 static void	sched_priority(struct thread *td, u_char prio);
@@ -316,27 +313,6 @@
 }
 
 /*
- * Force switch among equal priority processes every 100ms.
- * We don't actually need to force a context switch of the current process.
- * The act of firing the event triggers a context switch to softclock() and
- * then switching back out again which is equivalent to a preemption, thus
- * no further work is needed on the local CPU.
- */
-/* ARGSUSED */
-static void
-roundrobin(void *arg)
-{
-
-#ifdef SMP
-	mtx_lock_spin(&sched_lock);
-	forward_roundrobin();
-	mtx_unlock_spin(&sched_lock);
-#endif
-
-	callout_reset(&roundrobin_callout, sched_quantum, roundrobin, NULL);
-}
-
-/*
  * Constants for digital decay and forget:
  *	90% of (kg_estcpu) usage in 5 * loadav time
  *	95% of (ke_pctcpu) usage in 60 seconds (load insensitive)
@@ -618,11 +594,6 @@
 		sched_quantum = SCHED_QUANTUM;
 	hogticks = 2 * sched_quantum;
 
-	callout_init(&roundrobin_callout, CALLOUT_MPSAFE);
-
-	/* Kick off timeout driven events by calling first time. */
-	roundrobin(NULL);
-
 	/* Account for thread0. */
 	sched_load_add();
 }
@@ -697,6 +668,14 @@
 		resetpriority(kg);
 		resetpriority_thread(td, kg);
 	}
+
+	/*
+	 * Force a context switch if the current thread has used up a full
+	 * quantum (default quantum is 100ms).
+	 */
+	if (!((td)->td_flags & TDF_IDLETD) &&
+	    ticks - PCPU_GET(switchticks) >= sched_quantum)
+		td->td_flags |= TDF_NEEDRESCHED;
 }
 
 /*

--------------040406060900030904060601--