From owner-freebsd-current@FreeBSD.ORG Tue Feb 22 15:48:24 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 43CAA16A4CE for ; Tue, 22 Feb 2005 15:48:24 +0000 (GMT) Received: from relay03.pair.com (relay03.pair.com [209.68.5.17]) by mx1.FreeBSD.org (Postfix) with SMTP id 771D743D39 for ; Tue, 22 Feb 2005 15:48:23 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 99027 invoked from network); 22 Feb 2005 15:48:21 -0000 Received: from unknown (HELO peter.osted.lan) (unknown) by unknown with SMTP; 22 Feb 2005 15:48:21 -0000 X-pair-Authenticated: 80.161.118.233 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.13.1/8.13.1) with ESMTP id j1MFmLLQ070217; Tue, 22 Feb 2005 16:48:21 +0100 (CET) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.13.1/8.13.1/Submit) id j1MFmKKZ070216; Tue, 22 Feb 2005 16:48:20 +0100 (CET) (envelope-from pho) Date: Tue, 22 Feb 2005 16:48:20 +0100 From: Peter Holm To: current@freebsd.org Message-ID: <20050222154820.GA70179@peter.osted.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i cc: jroberson@chesapeake.net Subject: Livelock with GENERIC HEAD from Feb 19 13:36 UTC X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Feb 2005 15:48:24 -0000 With GENERIC HEAD from Feb 19 13:36 UTC + mpsafe_vfs = 1 I got a new livelock: http://www.holm.cc/stress/log/cons118.html This time I think I have a clue to what the problem is. One of the stress test programs (swap) works like this pseudo code: c = malloc(size); page = getpagesize(); while (done_testing == 0) { i = 0; while (i < size && done_testing == 0) { c[i] = 0; i += page; } } Could it be that two incarnations of this program can monopolize the run queue? $ sort -n +4 < /var/crash/ps.186 | grep " R" UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 5 0 0 8 0 0 0 - RL ?? 0:00.00 [thread tas 0 68391 68390 8 97 0 320 120 - RE ?? 0:00.01 [atrun] 1001 68342 68326 295 131 0 17628 0 - R+ #C: 192:43.57 [swap] 1001 68354 68326 295 131 0 13268 0 - R+ #C: 192:42.13 [swap] 1001 68331 68325 288 132 0 1224 0 - R+ #C: 0:00.02 [creat] 1001 68332 68325 288 132 0 1224 0 - R+ #C: 0:00.02 [creat] 1001 68333 68325 288 132 0 1224 0 - R+ #C: 0:00.02 [creat] 1001 68334 68325 288 132 0 1224 0 - R+ #C: 0:00.03 [creat] 1001 68335 68325 288 132 0 1224 0 - R+ #C: 0:00.10 [creat] 1001 68336 68325 288 132 0 1224 0 - R+ #C: 0:00.07 [creat] 1001 68361 68328 290 132 0 1232 0 - R+ #C: 0:00.42 [tcp] 1001 68362 68329 288 132 0 1252 0 - R+ #C: 0:00.06 [udp] 1001 68363 68329 288 132 0 1252 0 - R+ #C: 0:00.04 [udp] 1001 68368 68360 288 132 0 1320 0 - R+ #C: 0:00.05 [tcp] 1001 68369 68361 290 132 0 1320 0 - R+ #C: 0:00.56 [tcp] 1001 68387 68338 288 132 0 1656 0 - R+ #C: 0:00.02 [sh] 1001 68388 68340 288 132 0 1664 0 - R+ #C: 0:00.02 [sh] 1001 68389 68388 288 132 0 0 0 - RE+ #C: 0:00.02 [swapinfo] 1001 68390 68388 288 132 0 1204 0 - R+ #C: 0:00.01 [tail] 0 11 0 262 171 0 0 0 - RL ?? 345:02.29 [idle: cpu0 At a later freeze today a "kill 1 " from kdb unfroze the box. -- Peter Holm