From owner-freebsd-current@FreeBSD.ORG Sat Jul 10 07:06:23 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 63C4D16A4CE; Sat, 10 Jul 2004 07:06:23 +0000 (GMT) Received: from tomoyo.MyBSD.org.my (duke.void.net.my [202.157.186.223]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8374543D5F; Sat, 10 Jul 2004 07:06:18 +0000 (GMT) (envelope-from skywizard@MyBSD.org.my) Received: from kasumi.MyBSD.org.my (unknown [219.94.117.9]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tomoyo.MyBSD.org.my (Postfix) with ESMTP id 40E646CC1F; Sat, 10 Jul 2004 15:09:04 +0800 (MYT) Date: Sat, 10 Jul 2004 15:06:20 +0800 From: Ariff Abdullah To: Robert Watson Message-Id: <20040710150620.7595b207.skywizard@MyBSD.org.my> In-Reply-To: References: Organization: MyBSD X-Mailer: /usr/local/lib/ruby/1.8/net/smtp.rb Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit cc: freebsd-current@FreeBSD.org Subject: Re: Native preemption is the culprit [was Re: today's CURRENT lockups] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Jul 2004 07:06:23 -0000 On Sat, 10 Jul 2004 01:18:06 -0400 (EDT) Robert Watson wrote: > > On Fri, 9 Jul 2004, Robert Watson wrote: > > > I'm now experiencing extremely hard hangs in the following > > configurations: > > > > SMP kernel running SCHED_ULE with hyperthreads > > SMP kernel running SCHED_4BSD with hyperthreads > > > > To generate the load, I'm using the "supersmack" benchark with the > > select-key.smack query set with 30 client workers and 10,000 > > transactions. I am able to reliable hang the system with one or > > two runs. > > > > By disabling the "#define PREEMPTION" entry in param.h with > > SCHED_4BSD, I'm able to complete the benchmark several times in a > > row without apparent problems. However, I'll leave it running for > > a few more hours and see if I didn't just "get lucky". I'll then > > try SCHED_ULE w/o PREEMPTION. > > > > By "extremely hard" I mean that I am unable to break into the > > debugger using a serial break on the serial console. I have not > > yet been able to run the test on a system with easily accessible > > NMI but will attempt to do so in the next few days. > > > > I'll give UP a spin with various combinations next. > > FYI, UP+SCHED_ULE with PREEMPTION hung within three seconds of > starting the benchmark. Without PREEMPTION it seems to run fine. > > So it looks like either PREEMPTION has a problem, or it's triggering > an existing problem elsewhere. If it's only one problem, it seems > not to depend on either SMP/UP or the scheduler choice. If it's > multiple problems, who knows :-). As the MySQL test relies on > threading, we could be looking at an edge case involving threading > and scheduling/preemption-- the other reports I've seen mention > X11/KDE, which would also involve threading. On the other hand, it > could just be load. Tomorrow I'll load up a box with non-threaded > apps and see what happens. > I'm suspecting bad combination between threaded apps and current native preemption, either the preemption itself, or threads. Running current kernel without any threaded apps turns up nothing suspicious. Once the threaded apps started, it's like sending the entire system to the death row. I'm reverting following files to pre-July 2 to achive solid stability: sys/sys/interrupt.h - v1.27 sys/kern/kern_intr.c - v1.110 sys/i386/i386/intr_machdep.c - v1.6 sys/kern/sched_ule.c - v1.109 CPU: AMD Duron(tm) (1800.08-MHz 686-class CPU) Features=0x383fbff AMD Features=0xc0400000 -- Ariff Abdullah MyBSD http://www.MyBSD.org.my (IPv6/IPv4) http://staff.MyBSD.org.my (IPv6/IPv4) http://tomoyo.MyBSD.org.my (IPv6/IPv4)