From owner-freebsd-threads@FreeBSD.ORG  Wed Aug 17 16:18:10 2005
Return-Path: <owner-freebsd-threads@FreeBSD.ORG>
X-Original-To: freebsd-threads@freebsd.org
Delivered-To: freebsd-threads@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 19E9D16A41F
	for <freebsd-threads@freebsd.org>; Wed, 17 Aug 2005 16:18:10 +0000 (GMT)
	(envelope-from ghelmer@palisadesys.com)
Received: from magellan.palisadesys.com (magellan.palisadesys.com
	[192.188.162.211])
	by mx1.FreeBSD.org (Postfix) with ESMTP id BC12B43D45
	for <freebsd-threads@freebsd.org>; Wed, 17 Aug 2005 16:18:09 +0000 (GMT)
	(envelope-from ghelmer@palisadesys.com)
Received: from [172.16.1.108] (cetus.palisadesys.com [192.188.162.7])
	(authenticated bits=0)
	by magellan.palisadesys.com (8.12.11/8.12.11) with ESMTP id
	j7HGHrFv080332; Wed, 17 Aug 2005 11:17:55 -0500 (CDT)
	(envelope-from ghelmer@palisadesys.com)
Message-ID: <43036330.9000501@palisadesys.com>
Date: Wed, 17 Aug 2005 11:17:52 -0500
From: Guy Helmer <ghelmer@palisadesys.com>
User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Julian Elischer <julian@elischer.org>
References: <42D691F2.3030201@palisadesys.com> <42D6BA3E.1000306@elischer.org>
	<42D7BBB8.9050207@palisadesys.com> <42D8199E.1060702@elischer.org>
In-Reply-To: <42D8199E.1060702@elischer.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Palisade-MailScanner-Information: Please contact the ISP for more information
X-Palisade-MailScanner: Found to be clean
X-MailScanner-From: ghelmer@palisadesys.com
Cc: freebsd-threads@freebsd.org
Subject: Re: system scope threads entering STOP state
X-BeenThere: freebsd-threads@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Threading on FreeBSD <freebsd-threads.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>, 
	<mailto:freebsd-threads-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-threads>
List-Post: <mailto:freebsd-threads@freebsd.org>
List-Help: <mailto:freebsd-threads-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Aug 2005 16:18:10 -0000

Julian Elischer wrote:

> Guy Helmer wrote:
>
>> Julian Elischer wrote:
>>
>>> Guy Helmer wrote:
>>>
>>>> I have a long-running multithreaded process on FreeBSD 5.4 (SMP, 
>>>> PREEMTPION, SCHED_4BSD) linked with libpthread and I'm creating the 
>>>> threads with attribute PTHREAD_SCOPE_SYSTEM.  The threads need to 
>>>> be processing input in near-real-time or its input buffers overflow.
>>>>
>>>> I've modified the program so that a thread can fork/execl/waitpid 
>>>> (without WNOHANG) to use an external program for further processing 
>>>> on a batch of input (sometimes via a pipe, other times via writing 
>>>> to a file).  However, even under a light input load, the program is 
>>>> now dropping input.  While running top(1) in thread mode, I 
>>>> occasionally find all the program's threads are in the STOP state 
>>>> for several consecutive seconds.  Is there anything related to the 
>>>> frequent use of fork, execve, or wait4 that would be likely to 
>>>> cause such a situation?  I'm not seeing anything obvious in my 
>>>> reading of the kernel sources.
>>>
>>> duirng a fork the parent process is in a variant of the  "STOPPED" 
>>> state, or, rather, if you
>>> look at top -H you should see that all teh threads except for that 
>>> doing the fork, are in
>>> the STOPPED state.
>>>
>>> This is because while a thread is forking the process needs to be 
>>> single threaded so that
>>> there is a consistent image to be copied to teh child.
>>>
>>> the single threaded state is also enterred for exit() and execve(), 
>>> though that should not affect your program.
>>>
>>> I can't imagine why the state would persist for any length of time, 
>>> unless there is another thread
>>> that is in an uninterruptible wait. In that case the other threads 
>>> have to wait for it to complete
>>> what it is doing and come back.  I have considerred whether such a 
>>> thread should not be considerred
>>> "already suspended" and in fact some earlier versions of the code 
>>> did that, however it leads to some
>>> inconsistancies and the danger that such a thread will be suspended 
>>> holding some resource
>>> that it should not hold for any length of time.
>>
>> Thanks for the explanation.  I was [aware] that the other threads 
>> would be stopped during a fork(2) but it looked to me like the STOP 
>> would be brief.
>> Would an "uninterruptible wait" include system calls like a write(2) 
>> of a large buffer?  That would explain it...
>
> it's hard to say.. Possibly yes, if it had to allocate buffer space. 
> However this is a question for
> others..
>
> Is it possible to duplicate this on request?

[where did the past month go?]

I think I found the culprit - I think the process in question was 
actually dumping core and it is a large process - between 50MB and 100MB 
- so that would explain the 10+ seconds all the threads were in the STOP 
state.  It was difficult to notice while running top(1) since a watchdog 
process immediately restarts the multi-threaded process if it exits due 
to things like segfaults, and I was paying attention to the state 
column, not the PID column.

Sorry for what was a bit of a wild-goose chase,
Guy

-- 
Guy Helmer, Ph.D.
Principal System Architect
Palisade Systems, Inc.