From owner-freebsd-current@FreeBSD.ORG  Mon Aug  8 19:45:47 2005
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@FreeBSD.org
Delivered-To: freebsd-current@FreeBSD.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id BC47716A41F
	for <freebsd-current@FreeBSD.org>; Mon,  8 Aug 2005 19:45:47 +0000 (GMT)
	(envelope-from truckman@FreeBSD.org)
Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 632DF43D45
	for <freebsd-current@FreeBSD.org>; Mon,  8 Aug 2005 19:45:47 +0000 (GMT)
	(envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
	by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id j78JjdaQ095689;
	Mon, 8 Aug 2005 12:45:43 -0700 (PDT)
	(envelope-from truckman@FreeBSD.org)
Message-Id: <200508081945.j78JjdaQ095689@gw.catspoiler.org>
Date: Mon, 8 Aug 2005 12:45:39 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
To: apelisse@gmail.com
In-Reply-To: <61c746830508080624312e35a4@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
Cc: freebsd-current@FreeBSD.org
Subject: Re: Fix for some stress panics
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Aug 2005 19:45:47 -0000

On  8 Aug, Antoine Pelisse wrote:
> On 8/8/05, Don Lewis <truckman@freebsd.org> wrote:
>> 
>> On 7 Aug, Antoine Pelisse wrote:
>> > http://people.freebsd.org/~pho/stress/log/cons149.html
>> > http://people.freebsd.org/~pho/stress/log/cons130.html
>> >
>> > I've been working on this panic today (the two are obviously
>> > the same) and here is a patch to fix it:
>> > --- sys/kern/kern_proc.c.orig Mon Apr 18 04:10:36 2005
>> > +++ sys/kern/kern_proc.c Sun Aug 7 21:18:03 2005
>> > @@ -884,10 +884,8 @@
>> > _PHOLD(p);
>> > FOREACH_THREAD_IN_PROC(p, td) {
>> > fill_kinfo_thread(td, &kinfo_proc);
>> > - PROC_UNLOCK(p);
>> > error = SYSCTL_OUT(req, (caddr_t)&kinfo_proc,
>> > sizeof(kinfo_proc));
>> > - PROC_LOCK(p);
>> > if (error)
>> > break;
>> > }
>> >
>> > As a matter of fact, if td is removed from the list through 
>> thread_unlink
>> > while
>> > the mutex is released and the next thread is removed just after, the 
>> FOREACH
>> >
>> > is looping through an unlinked list where the td_ksegrp has been set to 
>> NULL
>> >
>> > by thread_exit.
>> > If we absolutely have to release the lock, then it's probably safer to 
>> check
>> > if
>> > td_ksegroup != NULL in the fill_kinfo_thread function.
>> 
>> Calling SYSCTL_OUT(), which calls copyout(), is not allowed while
>> holding a mutex unless the userland buffer is wired. The reason is that
>> if any of the userland buffer is not resident in RAM, the thread can
>> block while the buffer is paged in, and this is not legal while holding
>> a mutex.
>> 
>> There are two ways to fix this:
>> 
>> Allocate a temporary buffer of the appropriate size, grab the
>> mutex, traverse the list and copy the data into the temporary
>> buffer, release the mutex, call SYSCTL_OUT() to copy the
>> contents of the temporary buffer to the user buffer, and free
>> the temporary buffer.
>> 
>> Wire the appropriate amount of the userland buffer using
>> sysctl_wire_old_buffer(), grab the mutex, traverse the list,
>> copying each item to userland with SYSCTL_OUT(), release the
>> mutex, and unwire the userland buffer.
> 
> 
> 
> Is that what you meant ?
> 
> --- sys/kern/kern_proc.c.orig Mon Apr 18 04:10:36 2005
> +++ sys/kern/kern_proc.c Mon Aug 8 15:09:50 2005
> @@ -868,7 +868,7 @@
> {
> struct thread *td;
> struct kinfo_proc kinfo_proc;
> - int error = 0;
> + int error, buffersize = 0;
> struct proc *np;
> pid_t pid = p->p_pid;
> 
> @@ -883,11 +883,23 @@
> } else {
> _PHOLD(p);
> FOREACH_THREAD_IN_PROC(p, td) {
> - fill_kinfo_thread(td, &kinfo_proc);
> - PROC_UNLOCK(p);
> + buffersize += sizeof(struct kinfo_proc);
> + }
> + PROC_UNLOCK(p);
> +
> + error = sysctl_wire_old_buffer(req, buffersize);
> + if (error)
> + {
> + _PRELE(p);
> + return error;
> + }
> +
> + PROC_LOCK(p);
> + FOREACH_THREAD_IN_PROC(p, td) {
> + fill_kinfo_thread(td, &kinfo_proc);
> error = SYSCTL_OUT(req, (caddr_t)&kinfo_proc,
> sizeof(kinfo_proc));
> - PROC_LOCK(p);
> +
> if (error)
> break;
> }

That looks pretty much correct.

> In either case, the appropriate buffer size must be calculated (possibly
>> requiring the mutex to be grabbed and released) before executing an
>> operation that could block, grabbing the mutex, and traversing the list.
>> It is quite possible for the pre-calculated size to not match the actual
>> size needed when the list is traversed to actually copy the data.
>> 
>> 
> How can this issue be addressed ? Maybe some extra memory has to be wired ?

That's probably the easiest thing to do.  You could pad buffersize by
some multiple of sizeof(struct kinfo_proc) in case the process grew some
threads while the buffer was being wired.  It is still possible that
more threads than you expect to be created during this time and the for
the second loop to run out of wired space.  I think SYSCTL_OUT() will
fail with an ENOMEM error in this case, and the user may want to retry
the request.