From owner-freebsd-threads@FreeBSD.ORG Sat Oct 11 06:45:19 2003 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CBBEB16A4B3 for ; Sat, 11 Oct 2003 06:45:19 -0700 (PDT) Received: from maxwell.syr.edu (maxwell.syr.edu [128.230.129.5]) by mx1.FreeBSD.org (Postfix) with ESMTP id D1FFE43F3F for ; Sat, 11 Oct 2003 06:45:18 -0700 (PDT) (envelope-from cmsedore@maxwell.syr.edu) Received: from exchange.maxwell.syr.edu (excluster2.maxwell.syr.edu [128.230.129.231]) by maxwell.syr.edu (8.12.10/8.9.1) with ESMTP id h9BDjHo2051313 for ; Sat, 11 Oct 2003 09:45:17 -0400 (EDT) X-MIMEOLE: Produced By Microsoft Exchange V6.5.6944.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Date: Sat, 11 Oct 2003 09:45:18 -0400 Message-ID: <32A8B2CB12BFC84D8D11D872C787AA9A515CD3@EXCHANGE.forest.maxwell.syr.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: odd problem(s) with libthr and libkse Thread-Index: AcOP/egQcf07hy63SdeRAFjEr8d4tQ== From: "Christopher M. Sedore" To: Subject: odd problem(s) with libthr and libkse X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Oct 2003 13:45:19 -0000 I have a multithreaded program that I've built to run under libc_r, = libthr, and libkse. I use the libc_r build for debugging and the others = for actual work (the program is disk/network io intensive and I want the = disk io concurrency from thr or kse). =20 Anyway, here is the issue I'm seeing. It may be the same or a related = problem for both, or may not be. =20 When running under libthr, everything works fine for an indeterminate = period, usually between 10 seconds and 30 minutes. Eventually, all = program function stops. If I watch in top, threads get stuck in = "sigwai". First one, then a couple, then all. =20 =20 When running under kse, the program pauses periodically. I have one = thread that prints out a heartbeat once per second, and prints debug = info. I get pauses of up to 5 seconds between my heartbeat: =20 8 per sec (40/5) - 3739 total, 6047 pce active 6 per sec (32/5) - 3722 total timer 1065864344 timer 1065864347 timer 1065864348 7 per sec (38/5) - 3777 total, 6034 pce active 7 per sec (39/5) - 3761 total timer 1065864350 timer 1065864351 accepted on 14 accepted on 15 timer 1065864353 4 per sec (20/5) - 3797 total, 6058 pce active 6 per sec (30/5) - 3791 total timer 1065864354 timer 1065864355 timer 1065864357 (note skips from 1065864348 to 350, 351 to 353, 355 to 357) =20 When the pauses occur, network traffic dips to nearly nothing: Overall = program performance with libkse is about 25-50% of what it is under = libthr (when working) or ~30-60% of libc_r. =20 I have no problems when running under libc_r. Any suggestions to debug = this? =20 I'm running 5.1-CURRENT-20030917-JPSNAP. =20 Thanks, =20 -Chris =20