From owner-freebsd-current@freebsd.org Fri Dec 18 17:44:20 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B9850A4B493 for ; Fri, 18 Dec 2015 17:44:20 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 993841AEE for ; Fri, 18 Dec 2015 17:44:20 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id tBIHiASv001733; Fri, 18 Dec 2015 09:44:14 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201512181744.tBIHiASv001733@gw.catspoiler.org> Date: Fri, 18 Dec 2015 09:44:10 -0800 (PST) From: Don Lewis Subject: Re: fork_findpid() - Fatal trap 12: page fault while in kernel mode To: mjguzik@gmail.com cc: kostikbel@gmail.com, freebsd-current@freebsd.org In-Reply-To: <20151218163810.GB830@dft-labs.eu> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Dec 2015 17:44:20 -0000 On 18 Dec, Mateusz Guzik wrote: > On Thu, Dec 17, 2015 at 02:33:46PM -0800, Don Lewis wrote: >> On 17 Dec, Mateusz Guzik wrote: >> > On Thu, Dec 17, 2015 at 11:48:08AM -0800, Don Lewis wrote: >> >> On 17 Dec, Konstantin Belousov wrote: >> >> > On Wed, Dec 16, 2015 at 11:08:02AM -0800, Don Lewis wrote: >> >> >> I used to have a patch the deferred linking the new process into >> >> >> proctree/allproc until it was fully formed. The motivation was to get >> >> >> rid of all of the PRS_NEW stuff scattered around the source. >> >> >> Unfortunately the patch bit-rotted and I'm pretty sure that I lost it. >> >> > >> >> > I had similar tought for a second as one of the possibilities to fix the >> >> > issue, but rejected it outright due to the way the pid allocator works. >> >> > The loop which faulted is the allocator, it depends on the new pid being >> >> > linked early to detect the duplicated alloc. >> >> > >> >> > What you wrote could be done, but this restructuring requires the separate >> >> > pid allocator, and probably it must repeat all quirks and subtle behaviour >> >> > of the current algorithm. But I do not object, PRS_NEW is a trouble >> >> > on its own. >> >> >> >> I don't think it requires any changes to the allocater. It should only >> >> be necessary to delay the call to fork_findpid() until we are ready to >> >> link the new proc into allproc. Basically, drop the locks at the >> >> beginning of do_fork(), then grab them again somewhere near the end >> >> (probably where we are currently mark the process as PRS_NORMAL) and >> >> move the call to fork_findpid(), the p2->p_pid assignment, and the list >> >> manipulation code to a location after that. >> >> >> >> It's probably not quite that simple though ... >> > >> > That would mean you would need to be able to deconstruct the process >> > because you cannot guarantee there are any pids left, which may or may >> > not be easily doable. >> >> It doesn't look like we handle that properly in the current code. I >> think fork_findpid() will loop forever. It shouldn't be possible if >> maxproc < pid_max / 3, or maybe pid_max / 2. It might be a good idea to >> enforce this. >> > > Not sure I follow, can you rephrase/elaborate? The first time through, fork_findpid() will start it's search with trypid=lastpid+1 and searches upwards from there. If it reaches PID_MAX (I think that should be pid_max) without finding a free pid, it does a goto retry, which resets trypid back to the beginning and restarts the search. IF there are no free pids, then trypid will goto retry and repeat this same loop forever. There is no error return from fork_findpid() to indicate that the fork should fail if there are no free pids. >> > The current method is going to bite us performance-wise anyway and an >> > allocater which does not require a walk over the tree is necessary in >> > the long run. Seems like a bitmap (or a bunch of bitmaps) is the way to >> > go here. >> >> I think that separate bitmaps for process, process group, and session >> ids would be needed. It would waste some space, but it's probably more >> efficent to use a byte array and store all the bits for the pid >> together. >> > > Well I had such separate bitmaps in mind with addition of a combined > "the id is in use bitmap". This would make the common case of finding a > new pid reasonably fast. Access to all bitmaps would be protected with > proctree lock, which matches current locking scheme anyway. That's also a possibility. Maintaining the bitmaps would be more complicated because any time one of the individual bitmaps is updated, the combined bitmap would also have to be recalculated. It would be possible to use bit_ffc() to find the first free pid, but that would always find the lowest available free pid and would not emulate the current default behaviour of allocating pids mostly sequentually.