From owner-freebsd-current@freebsd.org Fri Dec 18 18:45:58 2015 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 37983A4C095 for ; Fri, 18 Dec 2015 18:45:58 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com [IPv6:2a00:1450:400c:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C3F66159A; Fri, 18 Dec 2015 18:45:57 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-wm0-x22e.google.com with SMTP id p187so75204767wmp.1; Fri, 18 Dec 2015 10:45:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=ymC/dE+jqfQqYEpD1+rwlVrbUhqwnNft1CiTsbzkzzA=; b=M2LpSFfQi/kPP2ETq/J6rU5rwweHypM+FGfyrwp7avFpk3CLYd3RpK9qs0D9c85027 gDoLLF8u8ygE5FBjA1P/ixNoU4r0jt5Q1HzO2c3BwdZuhqQOTGmI6d90PTKn455XC3P2 4FcJAOxftdT/76dsi13ZFWpjYVenviAbjkczM3u0G87tcVLs+4IpujTBuMUoUaGlrFlq pTjUWXJgNMXZuDjPc6YT+SDHmht57lov4XYBoI1IGclNgTH/n9FjRo47jGiqcZIvbHyx XO+5nj/lHkw+cegeNcVJqPxUwclpGjjrQmsurHhOeUgzZJZiNmuF7LhW5mes4YznC7f8 GQng== X-Received: by 10.194.104.5 with SMTP id ga5mr5709666wjb.155.1450464355501; Fri, 18 Dec 2015 10:45:55 -0800 (PST) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id l20sm7787661wmd.20.2015.12.18.10.45.54 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Fri, 18 Dec 2015 10:45:54 -0800 (PST) Date: Fri, 18 Dec 2015 19:45:52 +0100 From: Mateusz Guzik To: Don Lewis Cc: kostikbel@gmail.com, freebsd-current@freebsd.org Subject: Re: fork_findpid() - Fatal trap 12: page fault while in kernel mode Message-ID: <20151218184552.GC830@dft-labs.eu> Mail-Followup-To: Mateusz Guzik , Don Lewis , kostikbel@gmail.com, freebsd-current@freebsd.org References: <20151218163810.GB830@dft-labs.eu> <201512181744.tBIHiASv001733@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <201512181744.tBIHiASv001733@gw.catspoiler.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Dec 2015 18:45:58 -0000 On Fri, Dec 18, 2015 at 09:44:10AM -0800, Don Lewis wrote: > On 18 Dec, Mateusz Guzik wrote: > > On Thu, Dec 17, 2015 at 02:33:46PM -0800, Don Lewis wrote: > >> On 17 Dec, Mateusz Guzik wrote: > >> > On Thu, Dec 17, 2015 at 11:48:08AM -0800, Don Lewis wrote: > >> >> On 17 Dec, Konstantin Belousov wrote: > >> >> > On Wed, Dec 16, 2015 at 11:08:02AM -0800, Don Lewis wrote: > >> >> >> I used to have a patch the deferred linking the new process into > >> >> >> proctree/allproc until it was fully formed. The motivation was to get > >> >> >> rid of all of the PRS_NEW stuff scattered around the source. > >> >> >> Unfortunately the patch bit-rotted and I'm pretty sure that I lost it. > >> >> > > >> >> > I had similar tought for a second as one of the possibilities to fix the > >> >> > issue, but rejected it outright due to the way the pid allocator works. > >> >> > The loop which faulted is the allocator, it depends on the new pid being > >> >> > linked early to detect the duplicated alloc. > >> >> > > >> >> > What you wrote could be done, but this restructuring requires the separate > >> >> > pid allocator, and probably it must repeat all quirks and subtle behaviour > >> >> > of the current algorithm. But I do not object, PRS_NEW is a trouble > >> >> > on its own. > >> >> > >> >> I don't think it requires any changes to the allocater. It should only > >> >> be necessary to delay the call to fork_findpid() until we are ready to > >> >> link the new proc into allproc. Basically, drop the locks at the > >> >> beginning of do_fork(), then grab them again somewhere near the end > >> >> (probably where we are currently mark the process as PRS_NORMAL) and > >> >> move the call to fork_findpid(), the p2->p_pid assignment, and the list > >> >> manipulation code to a location after that. > >> >> > >> >> It's probably not quite that simple though ... > >> > > >> > That would mean you would need to be able to deconstruct the process > >> > because you cannot guarantee there are any pids left, which may or may > >> > not be easily doable. > >> > >> It doesn't look like we handle that properly in the current code. I > >> think fork_findpid() will loop forever. It shouldn't be possible if > >> maxproc < pid_max / 3, or maybe pid_max / 2. It might be a good idea to > >> enforce this. > >> > > > > Not sure I follow, can you rephrase/elaborate? > > The first time through, fork_findpid() will start it's search with > trypid=lastpid+1 and searches upwards from there. If it reaches PID_MAX > (I think that should be pid_max) without finding a free pid, it does a > goto retry, which resets trypid back to the beginning and restarts the > search. IF there are no free pids, then trypid will goto retry and > repeat this same loop forever. > > There is no error return from fork_findpid() to indicate that the fork > should fail if there are no free pids. > Oh, you mean nprocs check used prior to calling do_fork is insufficient. I can't test it right now, indeed sees like a real problem. Bitmap switch fixes this automatically without further hackery. > >> > The current method is going to bite us performance-wise anyway and an > >> > allocater which does not require a walk over the tree is necessary in > >> > the long run. Seems like a bitmap (or a bunch of bitmaps) is the way to > >> > go here. > >> > >> I think that separate bitmaps for process, process group, and session > >> ids would be needed. It would waste some space, but it's probably more > >> efficent to use a byte array and store all the bits for the pid > >> together. > >> > > > > Well I had such separate bitmaps in mind with addition of a combined > > "the id is in use bitmap". This would make the common case of finding a > > new pid reasonably fast. Access to all bitmaps would be protected with > > proctree lock, which matches current locking scheme anyway. > > That's also a possibility. Maintaining the bitmaps would be more > complicated because any time one of the individual bitmaps is updated, > the combined bitmap would also have to be recalculated. It would be > possible to use bit_ffc() to find the first free pid, but that would > always find the lowest available free pid and would not emulate the > current default behaviour of allocating pids mostly sequentually. > What do you mean recalculated? Maybe I got this wrong, but it seems totally trivial. When allocating either a new process group or a new session, set the bit in the combined map. When allocating a new pid, check the combined map. Free entry implies an unused pid, so use that. Set the bit in the combined map and pid map. Finally, when freeing either of these identifiers, unset the bit in the combined map only if no other map has the bit set. -- Mateusz Guzik