From owner-freebsd-threads@FreeBSD.ORG Sat Oct 11 18:52:26 2003 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A700A16A4B3 for ; Sat, 11 Oct 2003 18:52:26 -0700 (PDT) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id B179943FA3 for ; Sat, 11 Oct 2003 18:52:25 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mail.pcnet.com (8.12.10/8.12.1) with ESMTP id h9C1qOrq007780; Sat, 11 Oct 2003 21:52:24 -0400 (EDT) Date: Sat, 11 Oct 2003 21:52:24 -0400 (EDT) From: Daniel Eischen X-Sender: eischen@pcnet5.pcnet.com To: "Christopher M. Sedore" In-Reply-To: <32A8B2CB12BFC84D8D11D872C787AA9A515CD3@EXCHANGE.forest.maxwell.syr.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-threads@freebsd.org Subject: Re: odd problem(s) with libthr and libkse X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Oct 2003 01:52:26 -0000 On Sat, 11 Oct 2003, Christopher M. Sedore wrote: > I have a multithreaded program that I've built to run under libc_r, libthr, > and libkse. I use the libc_r build for debugging and the others for actual > work (the program is disk/network io intensive and I want the disk io > concurrency from thr or kse). > Anyway, here is the issue I'm seeing. It may be the same or a related > problem for both, or may not be. > When running under libthr, everything works fine for an indeterminate > period, usually between 10 seconds and 30 minutes. Eventually, all program > function stops. If I watch in top, threads get stuck in "sigwai". First > one, then a couple, then all. > When running under kse, the program pauses periodically. I have one thread > that prints out a heartbeat once per second, and prints debug info. I get > pauses of up to 5 seconds between my heartbeat: sigwait() may not be behaving as you'd expect in libkse. It is slightly different than in libc_r, but should be POSIX compliant nonetheless. I use the following to test libkse for I/O intensive applications: http://people.freebsd.org/~deischen/kse/crew.c http://people.freebsd.org/~deischen/kse/sched_bug.c The latter test may be similar to what you are describing. It spawns a bunch of threads to perform disk I/O and one thread that just sleeps and prints an incrementing number once a second. Use the first test as "crew node /usr/src" and it will spawn worker threads to search for the string "node" in all files in /usr/src. It is one of Butenhof's tests. Other than that, you'll need to give more info. SCHED_4BSD or SCHED_ULE? SMP or UP? scope system threads or scope process threads? Sample program to demonstrate the problem? -- Dan Eischen