Date: Wed, 17 Aug 2005 11:17:52 -0500 From: Guy Helmer <ghelmer@palisadesys.com> To: Julian Elischer <julian@elischer.org> Cc: freebsd-threads@freebsd.org Subject: Re: system scope threads entering STOP state Message-ID: <43036330.9000501@palisadesys.com> In-Reply-To: <42D8199E.1060702@elischer.org> References: <42D691F2.3030201@palisadesys.com> <42D6BA3E.1000306@elischer.org> <42D7BBB8.9050207@palisadesys.com> <42D8199E.1060702@elischer.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Julian Elischer wrote: > Guy Helmer wrote: > >> Julian Elischer wrote: >> >>> Guy Helmer wrote: >>> >>>> I have a long-running multithreaded process on FreeBSD 5.4 (SMP, >>>> PREEMTPION, SCHED_4BSD) linked with libpthread and I'm creating the >>>> threads with attribute PTHREAD_SCOPE_SYSTEM. The threads need to >>>> be processing input in near-real-time or its input buffers overflow. >>>> >>>> I've modified the program so that a thread can fork/execl/waitpid >>>> (without WNOHANG) to use an external program for further processing >>>> on a batch of input (sometimes via a pipe, other times via writing >>>> to a file). However, even under a light input load, the program is >>>> now dropping input. While running top(1) in thread mode, I >>>> occasionally find all the program's threads are in the STOP state >>>> for several consecutive seconds. Is there anything related to the >>>> frequent use of fork, execve, or wait4 that would be likely to >>>> cause such a situation? I'm not seeing anything obvious in my >>>> reading of the kernel sources. >>> >>> duirng a fork the parent process is in a variant of the "STOPPED" >>> state, or, rather, if you >>> look at top -H you should see that all teh threads except for that >>> doing the fork, are in >>> the STOPPED state. >>> >>> This is because while a thread is forking the process needs to be >>> single threaded so that >>> there is a consistent image to be copied to teh child. >>> >>> the single threaded state is also enterred for exit() and execve(), >>> though that should not affect your program. >>> >>> I can't imagine why the state would persist for any length of time, >>> unless there is another thread >>> that is in an uninterruptible wait. In that case the other threads >>> have to wait for it to complete >>> what it is doing and come back. I have considerred whether such a >>> thread should not be considerred >>> "already suspended" and in fact some earlier versions of the code >>> did that, however it leads to some >>> inconsistancies and the danger that such a thread will be suspended >>> holding some resource >>> that it should not hold for any length of time. >> >> Thanks for the explanation. I was [aware] that the other threads >> would be stopped during a fork(2) but it looked to me like the STOP >> would be brief. >> Would an "uninterruptible wait" include system calls like a write(2) >> of a large buffer? That would explain it... > > it's hard to say.. Possibly yes, if it had to allocate buffer space. > However this is a question for > others.. > > Is it possible to duplicate this on request? [where did the past month go?] I think I found the culprit - I think the process in question was actually dumping core and it is a large process - between 50MB and 100MB - so that would explain the 10+ seconds all the threads were in the STOP state. It was difficult to notice while running top(1) since a watchdog process immediately restarts the multi-threaded process if it exits due to things like segfaults, and I was paying attention to the state column, not the PID column. Sorry for what was a bit of a wild-goose chase, Guy -- Guy Helmer, Ph.D. Principal System Architect Palisade Systems, Inc.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?43036330.9000501>
