From owner-freebsd-arch@FreeBSD.ORG Mon Mar 14 07:06:15 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 67F4C16A4CF; Mon, 14 Mar 2005 07:06:15 +0000 (GMT) Received: from mail.ciam.ru (mail.ciam.ru [213.147.57.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 272E943D1F; Mon, 14 Mar 2005 07:06:14 +0000 (GMT) (envelope-from sem@FreeBSD.org) Received: from msd-mtu.mbrd.ru ([195.34.35.77] helo=[172.16.4.9]) by mail.ciam.ru with esmtpa (Exim 4.x) id 1DAjeW-0001sv-Nh; Mon, 14 Mar 2005 10:06:12 +0300 Message-ID: <423537E3.3010409@FreeBSD.org> Date: Mon, 14 Mar 2005 10:06:11 +0300 From: Sergey Matveychuk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041217 X-Accept-Language: ru, en-us, en MIME-Version: 1.0 To: Tim Kientzle References: <42335A52.9060208@freebsd.org> In-Reply-To: <42335A52.9060208@freebsd.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-arch@freebsd.org Subject: Re: Removing gtar from base X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Mar 2005 07:06:15 -0000 Tim Kientzle wrote: > I'd like to remove gtar from -CURRENT as the next > step in that transition. Can you bump __FreeBSD_version when gtar will gone please? I've found quite boring to find close __FreeBSD_version to date when bsdtar appeared. -- Sem. From owner-freebsd-arch@FreeBSD.ORG Mon Mar 14 08:00:24 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6C49816A4CF for ; Mon, 14 Mar 2005 08:00:24 +0000 (GMT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id BF90143D3F for ; Mon, 14 Mar 2005 08:00:23 +0000 (GMT) (envelope-from jroberson@chesapeake.net) Received: from mail.chesapeake.net (localhost [127.0.0.1]) by mail.chesapeake.net (8.12.10/8.12.10) with ESMTP id j2E80Md4023990 for ; Mon, 14 Mar 2005 03:00:22 -0500 (EST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost)j2E80MR6023985 for ; Mon, 14 Mar 2005 03:00:22 -0500 (EST) (envelope-from jroberson@chesapeake.net) X-Authentication-Warning: mail.chesapeake.net: jroberson owned process doing -bs Date: Mon, 14 Mar 2005 03:00:22 -0500 (EST) From: Jeff Roberson To: arch@freebsd.org Message-ID: <20050314024439.G20708@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: filesystem suspension. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Mar 2005 08:00:24 -0000 The current filesystem suspension mechanism suffers from a few asthetic and functional problems. I've been talking with Kirk about ways we could replace it, and I'd like to propose a few of those ideas here to see if anyone has useful criticism. First, I'll briefly outline the problems. There is the obvious problem of the rather cumbersome and error prone addition of vn_start_write calls wherever you may write to the filesystem. I keep finding places where they were not added when new code came in, or were originally lacking. It's just yet another call you have to remember to make when dealing with vfs. Furthermore, there is a real problem with vput(), which may cause VOP_INACTIVE to be called, which may truncate. To solve this, we vn_start_write from within VOP_INACTIVE after we already have a lock held. This is actually a lock order reversal, as the file system suspension acts as a real lock. rwatson has reported seeing this deadlock on a real system. To solve this, we could do the INACTIVE from another thread which can call vn_start_write before relocking the vnode, but this would serialize all file deletions! I considered other mechanisms for this as well, but they all have similar problems. I have two basic proposals. One is to handle all suspension from within ffs's VOP_LOCK routine, the other is to handle all suspension from within every vop that may write. The ffs_lock method would move the suspension barrier into the ffs_lock routine. A thread would not be suspended if it already held a lockmgr lock, and in this way it would be allowed to continue without leaving any datastructures in an inconsistent state. The suspension would proceed once there were no outstanding ufs locks and all new callers would block in ffs_lock. This requires the least effort as virtually all of the code would be in ffs_lock and unlock. It would however prevent threads from issuing read only calls for the duration of the suspension. My second proposal involves gaiting threads within the actual writing VOPs. This would be similar to the vn_start_write mechanism, but it would be contained entirely within ffs/ufs. The big difference would be that some threads would be suspended while holding locks so the snapshot would have to run lockless, which could be done safely, or by using a special locking protocol, like allowing it to recursively acquire locks that are already held. This would allow most read-only VOPs to continue, unless they attempted to lock a vnode which was suspended in a writing vop. Comments? Other proposals? I'd like to get this sorted out for 6.0. I may come up with some interim solution for RELENG_5 because the vrele problem has caused deadlocks there. Thanks, Jeff From owner-freebsd-arch@FreeBSD.ORG Mon Mar 14 08:15:11 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 91FE716A4CE for ; Mon, 14 Mar 2005 08:15:11 +0000 (GMT) Received: from critter.freebsd.dk (f170.freebsd.dk [212.242.86.170]) by mx1.FreeBSD.org (Postfix) with ESMTP id BD6D743D2F for ; Mon, 14 Mar 2005 08:15:10 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.13.1/8.13.1) with ESMTP id j2E8F4QS023182; Mon, 14 Mar 2005 09:15:05 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Jeff Roberson From: "Poul-Henning Kamp" In-Reply-To: Your message of "Mon, 14 Mar 2005 03:00:22 EST." <20050314024439.G20708@mail.chesapeake.net> Date: Mon, 14 Mar 2005 09:15:04 +0100 Message-ID: <23181.1110788104@critter.freebsd.dk> Sender: phk@critter.freebsd.dk cc: arch@freebsd.org Subject: Re: filesystem suspension. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Mar 2005 08:15:11 -0000 In message <20050314024439.G20708@mail.chesapeake.net>, Jeff Roberson writes: >Comments? Other proposals? I'd like to get this sorted out for 6.0. I >may come up with some interim solution for RELENG_5 because the vrele >problem has caused deadlocks there. I agree that the vn_start_write() thing is not a Good Thing and it would certainly be desirable to get the entire snapshot thing into UFS/FFS where it belongs. Without digging through the entire issue, I would tend to lean towards the "catch them in the vop_*() which write" model on general principles. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Tue Mar 15 02:38:52 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CF05516A4CE for ; Tue, 15 Mar 2005 02:38:52 +0000 (GMT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4837E43D46 for ; Tue, 15 Mar 2005 02:38:52 +0000 (GMT) (envelope-from jroberson@chesapeake.net) Received: from mail.chesapeake.net (localhost [127.0.0.1]) by mail.chesapeake.net (8.12.10/8.12.10) with ESMTP id j2F2cpd4078131 for ; Mon, 14 Mar 2005 21:38:51 -0500 (EST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost)j2F2coLw078104 for ; Mon, 14 Mar 2005 21:38:51 -0500 (EST) (envelope-from jroberson@chesapeake.net) X-Authentication-Warning: mail.chesapeake.net: jroberson owned process doing -bs Date: Mon, 14 Mar 2005 21:38:49 -0500 (EST) From: Jeff Roberson To: arch@freebsd.org Message-ID: <20050314213038.V20708@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: Freeing vnodes. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2005 02:38:52 -0000 I have a patch at http://www.chesapeake.net/~jroberson/freevnodes.diff that allows us to start reclaiming vnodes from the free list and release their memory. It also changes the semantics of wantfreevnodes, and makes getnewvnode() much prettier. The changes attempt to keep some number of vnodes, currently 2.5% of desiredvnodes, that are free in memory. Free vnodes are vnodes which have no references or pages in memory. For example, if an application simply stat's a vnode, it will end up on the free list at the end of the operation. The algorithm that is currently in place will immediately recycle these vnodes once there is enough pressure, which will cause us to do a full lookup and reread the inode, etc. as soon as it is stat'd again. This also removes the recycling from the getnewvnode() path. Instead, it is done by a new helper function that is called from vnlru_proc(). This function just frees vnodes from the head of the list until we reach our wantfreevnodes target. I haven't perf tested this yet, but I have a box that is doing a buildworld with a fairly constant freevnodes count which shows that vnodes are actually being uma_zfree'd. Comments? Anyone willing to do some perf tests for me? Thanks, Jeff From owner-freebsd-arch@FreeBSD.ORG Tue Mar 15 03:15:57 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 84CA916A4CE for ; Tue, 15 Mar 2005 03:15:57 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id CF15843D55 for ; Tue, 15 Mar 2005 03:15:56 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 27737 invoked by uid 89); 15 Mar 2005 03:15:55 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 15 Mar 2005 03:15:55 -0000 Received: (qmail 27726 invoked by uid 89); 15 Mar 2005 03:15:55 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 15 Mar 2005 03:15:55 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id j2F3Fsw6087106; Mon, 14 Mar 2005 22:15:54 -0500 (EST) (envelope-from ups@tree.com) From: Stephan Uphoff To: Jeff Roberson In-Reply-To: <20050314213038.V20708@mail.chesapeake.net> References: <20050314213038.V20708@mail.chesapeake.net> Content-Type: text/plain Message-Id: <1110856553.29804.37784.camel@palm> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Mon, 14 Mar 2005 22:15:54 -0500 Content-Transfer-Encoding: 7bit cc: arch@freebsd.org Subject: Re: Freeing vnodes. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2005 03:15:57 -0000 On Mon, 2005-03-14 at 21:38, Jeff Roberson wrote: > I have a patch at http://www.chesapeake.net/~jroberson/freevnodes.diff > that allows us to start reclaiming vnodes from the free list and release > their memory. It also changes the semantics of wantfreevnodes, and makes > getnewvnode() much prettier. > > The changes attempt to keep some number of vnodes, currently 2.5% of > desiredvnodes, that are free in memory. Free vnodes are vnodes which > have no references or pages in memory. For example, if an application > simply stat's a vnode, it will end up on the free list at the end of the > operation. The algorithm that is currently in place will immediately > recycle these vnodes once there is enough pressure, which will cause us to > do a full lookup and reread the inode, etc. as soon as it is stat'd again. > > This also removes the recycling from the getnewvnode() path. Instead, it > is done by a new helper function that is called from vnlru_proc(). This > function just frees vnodes from the head of the list until we reach our > wantfreevnodes target. > > I haven't perf tested this yet, but I have a box that is doing a > buildworld with a fairly constant freevnodes count which shows that vnodes > are actually being uma_zfree'd. > > Comments? Anyone willing to do some perf tests for me? > > Thanks, > Jeff Just looked at the raw diff and might have missed it - how are the parent directory "name" cache entries ( vnode fields v_dd, v_ddid) handled? Thanks, Stephan From owner-freebsd-arch@FreeBSD.ORG Tue Mar 15 05:39:35 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3208016A4CE for ; Tue, 15 Mar 2005 05:39:35 +0000 (GMT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id AA09B43D46 for ; Tue, 15 Mar 2005 05:39:34 +0000 (GMT) (envelope-from jroberson@chesapeake.net) Received: from mail.chesapeake.net (localhost [127.0.0.1]) by mail.chesapeake.net (8.12.10/8.12.10) with ESMTP id j2F5dXd4027062; Tue, 15 Mar 2005 00:39:33 -0500 (EST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost)j2F5dX7T027059; Tue, 15 Mar 2005 00:39:33 -0500 (EST) (envelope-from jroberson@chesapeake.net) X-Authentication-Warning: mail.chesapeake.net: jroberson owned process doing -bs Date: Tue, 15 Mar 2005 00:39:33 -0500 (EST) From: Jeff Roberson To: Stephan Uphoff In-Reply-To: <1110856553.29804.37784.camel@palm> Message-ID: <20050315003915.C20708@mail.chesapeake.net> References: <20050314213038.V20708@mail.chesapeake.net> <1110856553.29804.37784.camel@palm> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: Freeing vnodes. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2005 05:39:35 -0000 On Mon, 14 Mar 2005, Stephan Uphoff wrote: > On Mon, 2005-03-14 at 21:38, Jeff Roberson wrote: > > I have a patch at http://www.chesapeake.net/~jroberson/freevnodes.diff > > that allows us to start reclaiming vnodes from the free list and release > > their memory. It also changes the semantics of wantfreevnodes, and makes > > getnewvnode() much prettier. > > > > The changes attempt to keep some number of vnodes, currently 2.5% of > > desiredvnodes, that are free in memory. Free vnodes are vnodes which > > have no references or pages in memory. For example, if an application > > simply stat's a vnode, it will end up on the free list at the end of the > > operation. The algorithm that is currently in place will immediately > > recycle these vnodes once there is enough pressure, which will cause us to > > do a full lookup and reread the inode, etc. as soon as it is stat'd again. > > > > This also removes the recycling from the getnewvnode() path. Instead, it > > is done by a new helper function that is called from vnlru_proc(). This > > function just frees vnodes from the head of the list until we reach our > > wantfreevnodes target. > > > > I haven't perf tested this yet, but I have a box that is doing a > > buildworld with a fairly constant freevnodes count which shows that vnodes > > are actually being uma_zfree'd. > > > > Comments? Anyone willing to do some perf tests for me? > > > > Thanks, > > Jeff > > Just looked at the raw diff and might have missed it - how are the > parent directory "name" cache entries ( vnode fields v_dd, v_ddid) > handled? Just as they were before, by calling cache_purge. > > Thanks, > Stephan > From owner-freebsd-arch@FreeBSD.ORG Tue Mar 15 07:29:42 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CBF2016A4CE for ; Tue, 15 Mar 2005 07:29:42 +0000 (GMT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7CBA043D54 for ; Tue, 15 Mar 2005 07:29:42 +0000 (GMT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) j2F7Td0e052413; Mon, 14 Mar 2005 23:29:40 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id j2F7Tdxl052412; Mon, 14 Mar 2005 23:29:39 -0800 (PST) (envelope-from dillon) Date: Mon, 14 Mar 2005 23:29:39 -0800 (PST) From: Matthew Dillon Message-Id: <200503150729.j2F7Tdxl052412@apollo.backplane.com> To: Jeff Roberson References: <20050314213038.V20708@mail.chesapeake.net> cc: arch@freebsd.org Subject: Re: Freeing vnodes. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2005 07:29:43 -0000 :I have a patch at http://www.chesapeake.net/~jroberson/freevnodes.diff :that allows us to start reclaiming vnodes from the free list and release :their memory. It also changes the semantics of wantfreevnodes, and makes :getnewvnode() much prettier. Hmm. I'm not sure your logic is correct in this bit: + /* + * We try to get half way to wantfreevnodes each time we run. This + * slows down the process slightly, giving vnodes a greater chance + * of being lru'd to the back of the list. + */ + count = (freevnodes - wantfreevnodes) / 2; + for (; count > 0; count--) { ... This seems to be indicating that you are going to try to destroy *MOST* of the vnodes on the free list. freevnodes is typically a fairly high number what wantfreevnodes is typically a very low number, so this calculation is typically going to be, well, a big number. I think you did not intend this. Didn't you just want to destroy enough vnodes to have 'wantfreevnodes' worth of slop so getnewvnode() could allocate new vnodes? In that case the calculation would be: count = numvnodes - desiredvnodes + wantfreevnodes; while (count > 0) { ... physically free a vnode and reduce the numvnodes count ... --count; } :This also removes the recycling from the getnewvnode() path. Instead, it :is done by a new helper function that is called from vnlru_proc(). This :function just frees vnodes from the head of the list until we reach our :wantfreevnodes target. : :I haven't perf tested this yet, but I have a box that is doing a :buildworld with a fairly constant freevnodes count which shows that vnodes :are actually being uma_zfree'd. : :Comments? Anyone willing to do some perf tests for me? : :Thanks, :Jeff There is a second issue when you remove the recycling from the getnewvnode() path. Generally speaking when the system runs out of vnodes you wind up with a LOT of processes sleeping in the getnewvnode() procedure all at once. You need to make very sure that you are actually freeing up a sufficient number of vnodes to allow *all* the processes waiting for a new vnode to proceed all at once. This would argue against only going 'half way' to the goal, and would also argue for having a more dynamic goal that is based on the load on getnewvnode(). Even worse, since processes open and close files all the time, there can be a HUGE load on getnewvnode() (think of cvsupd and find, or a cvs update, etc...). This load can easily outstrip vnlru_proc()'s new ability to free vnodes and potentially cause a lot of unnecessarily blockages. I love the idea of being able to free vnodes in vnlru_proc() rather then free-and-reuse them in allocvnode(), but I cannot figure out how vnlru_proc() could possibly adapt to the huge load range that getnewvnode() has to deal with. Plus keep in mind that the vnodes being reused at that point are basically already dead except for the vgonel(). This brings up the true crux of the problem, where the true overhead of reusing a vnode inline with the getnewvnode() call is... and that is that vgonel() potentially has to update the related inode and could cause an unrelated process to block inside getnewvnode(). But even this case is a hard sell because the buffer cache for the filesystem almost certainly has cached the inode's disk block and a stat() alone does not actually dirty an inode, you usually have to read() from the file too. I would argue that if anything needs fixing here it would be the 'late' inode sync. [note: inode syncing can occur in INACTIVE as well as RECLAIM, and I'm not sure which routine it occurs in 'most of the time']. -Matt Matthew Dillon From owner-freebsd-arch@FreeBSD.ORG Tue Mar 15 08:57:51 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2F69A16A4CE for ; Tue, 15 Mar 2005 08:57:51 +0000 (GMT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8439643D5C for ; Tue, 15 Mar 2005 08:57:50 +0000 (GMT) (envelope-from jroberson@chesapeake.net) Received: from mail.chesapeake.net (localhost [127.0.0.1]) by mail.chesapeake.net (8.12.10/8.12.10) with ESMTP id j2F8vld4072284; Tue, 15 Mar 2005 03:57:47 -0500 (EST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost)j2F8vl4X072281; Tue, 15 Mar 2005 03:57:47 -0500 (EST) (envelope-from jroberson@chesapeake.net) X-Authentication-Warning: mail.chesapeake.net: jroberson owned process doing -bs Date: Tue, 15 Mar 2005 03:57:47 -0500 (EST) From: Jeff Roberson To: Matthew Dillon In-Reply-To: <200503150729.j2F7Tdxl052412@apollo.backplane.com> Message-ID: <20050315035032.T20708@mail.chesapeake.net> References: <20050314213038.V20708@mail.chesapeake.net> <200503150729.j2F7Tdxl052412@apollo.backplane.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: Freeing vnodes. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2005 08:57:51 -0000 On Mon, 14 Mar 2005, Matthew Dillon wrote: > :I have a patch at http://www.chesapeake.net/~jroberson/freevnodes.diff > :that allows us to start reclaiming vnodes from the free list and release > :their memory. It also changes the semantics of wantfreevnodes, and makes > :getnewvnode() much prettier. > > Hmm. I'm not sure your logic is correct in this bit: > > + /* > + * We try to get half way to wantfreevnodes each time we run. This > + * slows down the process slightly, giving vnodes a greater chance > + * of being lru'd to the back of the list. > + */ > + count = (freevnodes - wantfreevnodes) / 2; > + for (; count > 0; count--) { > ... > > This seems to be indicating that you are going to try to destroy > *MOST* of the vnodes on the free list. freevnodes is typically > a fairly high number what wantfreevnodes is typically a very low > number, so this calculation is typically going to be, well, a big > number. > > I think you did not intend this. Didn't you just want to destroy > enough vnodes to have 'wantfreevnodes' worth of slop so getnewvnode() > could allocate new vnodes? In that case the calculation would be: On my system wantfreevnodes is at 2500. Let's say I have 4500 free vnodes. 4500 - 2500 = 2000. Divide by 2 gives you 1000. I don't think you read the whole patch. > > count = numvnodes - desiredvnodes + wantfreevnodes; > > while (count > 0) { > ... > physically free a vnode and reduce the numvnodes count > ... > --count; > } > > :This also removes the recycling from the getnewvnode() path. Instead, it > :is done by a new helper function that is called from vnlru_proc(). This > :function just frees vnodes from the head of the list until we reach our > :wantfreevnodes target. > : > :I haven't perf tested this yet, but I have a box that is doing a > :buildworld with a fairly constant freevnodes count which shows that vnodes > :are actually being uma_zfree'd. > : > :Comments? Anyone willing to do some perf tests for me? > : > :Thanks, > :Jeff > > There is a second issue when you remove the recycling from the > getnewvnode() path. Generally speaking when the system runs out of > vnodes you wind up with a LOT of processes sleeping in the getnewvnode() > procedure all at once. You need to make very sure that you are actually > freeing up a sufficient number of vnodes to allow *all* the processes > waiting for a new vnode to proceed all at once. This would argue > against only going 'half way' to the goal, and would also argue for > having a more dynamic goal that is based on the load on getnewvnode(). > > Even worse, since processes open and close files all the time, there > can be a HUGE load on getnewvnode() (think of cvsupd and find, or > a cvs update, etc...). This load can easily outstrip vnlru_proc()'s > new ability to free vnodes and potentially cause a lot of unnecessarily > blockages. We have one buf daemon, one page daemon, one syncer, one vnlru proc, etc. In all these cases it would be nice if they gained new contexts when they had a lot of work to do, but they don't, and it doesn't seem to be a huge problem today. On my system one vnlruproc easily keeps up with the job of freeing free vnodes. Remember these vnodes have no pages associated with them, so at most you're freeing an inode for a deleted file, and in the common case the whole operation runs on memory without blocking for io. We presently single thread the most critical case, where we have no free vnodes and are not allowed to allocate any more while we wait for vnlru_proc() to do io on vnodes with cached pages to reclaim some. I'm not convinced this is a real problem. > > I love the idea of being able to free vnodes in vnlru_proc() rather > then free-and-reuse them in allocvnode(), but I cannot figure out how > vnlru_proc() could possibly adapt to the huge load range that > getnewvnode() has to deal with. Plus keep in mind that the vnodes > being reused at that point are basically already dead except for > the vgonel(). > > This brings up the true crux of the problem, where the true overhead > of reusing a vnode inline with the getnewvnode() call is... and that > is that vgonel() potentially has to update the related inode and could > cause an unrelated process to block inside getnewvnode(). But even Yes, this is kind of gross, and would cause lock order problems except that we LK_NOWAIT on the vn lock in vtryrecycle(). It'd be better if we didn't try doing io on unrelated vnodes while this deep in the stack. > this case is a hard sell because the buffer cache for the filesystem > almost certainly has cached the inode's disk block and a stat() alone > does not actually dirty an inode, you usually have to read() from the > file too. I would argue that if anything needs fixing here it would > be the 'late' inode sync. [note: inode syncing can occur in INACTIVE > as well as RECLAIM, and I'm not sure which routine it occurs in > 'most of the time']. > > -Matt > Matthew Dillon > > From owner-freebsd-arch@FreeBSD.ORG Tue Mar 15 12:51:38 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4C5B716A4CE for ; Tue, 15 Mar 2005 12:51:38 +0000 (GMT) Received: from darkness.comp.waw.pl (darkness.comp.waw.pl [195.117.238.136]) by mx1.FreeBSD.org (Postfix) with ESMTP id B5E5843D62 for ; Tue, 15 Mar 2005 12:51:37 +0000 (GMT) (envelope-from pjd@darkness.comp.waw.pl) Received: by darkness.comp.waw.pl (Postfix, from userid 1009) id 1F5A9ACBD6; Tue, 15 Mar 2005 13:51:36 +0100 (CET) Date: Tue, 15 Mar 2005 13:51:36 +0100 From: Pawel Jakub Dawidek To: freebsd-arch@freebsd.org Message-ID: <20050315125136.GH9291@darkness.comp.waw.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="PbNjaMlpH3/3j4oy" Content-Disposition: inline User-Agent: Mutt/1.4.2i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 5.2.1-RC2 i386 Subject: System processes recognition. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2005 12:51:38 -0000 --PbNjaMlpH3/3j4oy Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi. I found, that there is no way to know if the given process is a system (kernel) process or not: - P_SYSTEM flag is used also for userland processes (init), - P_KTHREAD flag is not used for swapper, - ps(1) thinks, that it found system process when there are no arguments (argv =3D=3D NULL || argv[0] =3D=3D NULL), but this is not true: char *argv[1] =3D { NULL }; execve("/path/to/somewhere", argv, NULL); /path/to/somewhere process will be recognized by ps(1) as a system proces= s. The easiest way to fix it, is to add P_KTHREAD flag to the swapper, I think: --- init_main.c 17 Feb 2005 10:00:09 -0000 1.255 +++ init_main.c 15 Mar 2005 12:48:04 -0000 @@ -365,7 +365,7 @@ proc0_init(void *dummy __unused) session0.s_leader =3D p; =20 p->p_sysent =3D &null_sysvec; - p->p_flag =3D P_SYSTEM; + p->p_flag =3D P_SYSTEM | P_KTHREAD; p->p_sflag =3D PS_INMEM; p->p_state =3D PRS_NORMAL; knlist_init(&p->p_klist, &p->p_mtx); Opinions? --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --PbNjaMlpH3/3j4oy Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (FreeBSD) iD8DBQFCNtpYForvXbEpPzQRAkxKAJ467sqHyQxGbtCMVNQyO224C7otFACfZO6G ciQAJyoecbw3fmvJD85j76U= =yWsz -----END PGP SIGNATURE----- --PbNjaMlpH3/3j4oy-- From owner-freebsd-arch@FreeBSD.ORG Tue Mar 15 14:28:33 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8653816A4D1 for ; Tue, 15 Mar 2005 14:28:33 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id C5D2743D46 for ; Tue, 15 Mar 2005 14:28:32 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 30648 invoked by uid 89); 15 Mar 2005 14:28:30 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 15 Mar 2005 14:28:30 -0000 Received: (qmail 30623 invoked by uid 89); 15 Mar 2005 14:28:30 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 15 Mar 2005 14:28:30 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id j2FESTw6089930; Tue, 15 Mar 2005 09:28:29 -0500 (EST) (envelope-from ups@tree.com) From: Stephan Uphoff To: Jeff Roberson In-Reply-To: <20050315003915.C20708@mail.chesapeake.net> References: <20050314213038.V20708@mail.chesapeake.net> <1110856553.29804.37784.camel@palm> <20050315003915.C20708@mail.chesapeake.net> Content-Type: text/plain Message-Id: <1110896909.29804.39143.camel@palm> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Tue, 15 Mar 2005 09:28:29 -0500 Content-Transfer-Encoding: 7bit cc: arch@freebsd.org Subject: Re: Freeing vnodes. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2005 14:28:33 -0000 On Tue, 2005-03-15 at 00:39, Jeff Roberson wrote: > On Mon, 14 Mar 2005, Stephan Uphoff wrote: > > > On Mon, 2005-03-14 at 21:38, Jeff Roberson wrote: > > > I have a patch at http://www.chesapeake.net/~jroberson/freevnodes.diff > > > that allows us to start reclaiming vnodes from the free list and release > > > their memory. It also changes the semantics of wantfreevnodes, and makes > > > getnewvnode() much prettier. > > > > > > The changes attempt to keep some number of vnodes, currently 2.5% of > > > desiredvnodes, that are free in memory. Free vnodes are vnodes which > > > have no references or pages in memory. For example, if an application > > > simply stat's a vnode, it will end up on the free list at the end of the > > > operation. The algorithm that is currently in place will immediately > > > recycle these vnodes once there is enough pressure, which will cause us to > > > do a full lookup and reread the inode, etc. as soon as it is stat'd again. > > > > > > This also removes the recycling from the getnewvnode() path. Instead, it > > > is done by a new helper function that is called from vnlru_proc(). This > > > function just frees vnodes from the head of the list until we reach our > > > wantfreevnodes target. > > > > > > I haven't perf tested this yet, but I have a box that is doing a > > > buildworld with a fairly constant freevnodes count which shows that vnodes > > > are actually being uma_zfree'd. > > > > > > Comments? Anyone willing to do some perf tests for me? > > > > > > Thanks, > > > Jeff > > > > Just looked at the raw diff and might have missed it - how are the > > parent directory "name" cache entries ( vnode fields v_dd, v_ddid) > > handled? > > Just as they were before, by calling cache_purge. This purges the fields of the vnode that will be recycled. I am worried about the v_dd,v_ddid fields of a directory B that has the to be released vnode A as parent. (Obviously in this case there is no namecache entry with the vnode A as the directory (nc_dvp)) Right now A is type stable - but if A is released, access to B->v_dd may cause a page fault. Stephan From owner-freebsd-arch@FreeBSD.ORG Tue Mar 15 19:11:36 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4886216A4CE for ; Tue, 15 Mar 2005 19:11:36 +0000 (GMT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id B8B4B43D31 for ; Tue, 15 Mar 2005 19:11:35 +0000 (GMT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) j2FJBX0e055488; Tue, 15 Mar 2005 11:11:33 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id j2FJBWpd055485; Tue, 15 Mar 2005 11:11:32 -0800 (PST) (envelope-from dillon) Date: Tue, 15 Mar 2005 11:11:32 -0800 (PST) From: Matthew Dillon Message-Id: <200503151911.j2FJBWpd055485@apollo.backplane.com> To: Jeff Roberson References: <20050314213038.V20708@mail.chesapeake.net> <20050315035032.T20708@mail.chesapeake.net> cc: arch@freebsd.org Subject: Re: Freeing vnodes. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2005 19:11:36 -0000 :> I think you did not intend this. Didn't you just want to destroy :> enough vnodes to have 'wantfreevnodes' worth of slop so getnewvnode() :> could allocate new vnodes? In that case the calculation would be: : :On my system wantfreevnodes is at 2500. Let's say I have 4500 free :vnodes. 4500 - 2500 = 2000. Divide by 2 gives you 1000. I don't think :you read the whole patch. I'm not trying to be confrontational here, Jeff. Please remember that I'm the one who has done most of the algorithmic work on these subsystems. I designed the whole 'trigger' mechanism, for example. The wantfreevnodes calculation is: minvnodes / 10. That's a very small number. The 'freevnodes' value is typically a much larger value, especially if a program running through stat()ing things. It is possible to have tens of thousands of free vnodes. This makes your current count calculation effectively 'freevnodes / 2'. I really don't think you want to destroy half the current freevnodes on each pass, do you? :> can be a HUGE load on getnewvnode() (think of cvsupd and find, or :> a cvs update, etc...). This load can easily outstrip vnlru_proc()'s :> new ability to free vnodes and potentially cause a lot of unnecessarily :> blockages. : :We have one buf daemon, one page daemon, one syncer, one vnlru proc, etc. :In all these cases it would be nice if they gained new contexts when they :had a lot of work to do, but they don't, and it doesn't seem to be a huge :problem today. On my system one vnlruproc easily keeps up with the job of That's because they are carefully written (mostly by me) to not be subject to pure cpu loads. buf_daemon: Is primarily only responsible for flushing DIRTY buffers. The buffer allocation code will happily reuse clean buffers in-line. Dirty buffers are subject to the I/O limitations of the system (and they are flushed asynchronously for the most part), which means that one daemon should have no trouble handle the buffer load on an MP sytem. Since a system naturally has many more clean buffers then dirty buffers (even without algorithmic limitations), except in certain particular large-write cases which are handled elsewhere, the buf_daemon usually has very little effect on the buffer cache's ability to allocate a new buffer. page_daemon: Same deal. The page daemon is primarily responsible for flushing out dirty pages and for rebalancing the lists if they get really out of whack. Pages in the VM page cache (PQ_CACHE) can be reused on the fly and there are several *natural* ways for a page to go directly to the VM page cache without having to pass through the page daemon. In fact, MOST of the pages that get onto the PQ_CACHE or PQ_FREE queues are placed there directly by mechanisms unrelated to the page daemon. syncer: I've always wanted to rewrite the syncer to be per-mount or per-physical-device so it could sync out to multiple physical devices simultaniously. vnlru_proc: Prior to your patch, vnlru_proc was only responsible for rebalancing the freevnode list. Typically the ONLY case where a vnode needs to be forcefully put on the freevnode list is if there are a lot of vnodes which have VM objects which still have just one or two VM pages associated with them, because otherwise a vnode either gets put on the freevnode list directly by the vnode release code, or it has enough associated pages for us to not want to recycle it anyway (which is what the trigger code handles). The mechanism that leads to the creation of such vnodes also typically requires a lot of random I/O, which makes vnlru_proc() immune to cpu load. This means that vnlru_proc is only PARTIALLY responsible for maintaining the freevnode list and the part it Is responsible for tends to be unrelated to pure cpu loads. There are a ton of ways for a vnode to make it to that list WITHOUT passing through vnlru_proc, which means that prior to your patch getnewvnode() typically only has to wait for vnlru_proc() in the most extreme situations. By my read, the changes you are currently contemplating for vnlru_proc changes its characteristics such that it is now COMPLETELY responsible for freeing up vnodes for getnewvnode(). This was not the case before. I can only repeat that getnewvnode() has a massive dynamic loading range, one that is not necessarily dependant on or limited by I/O. For example, when you are stat()ing a lot of files over and over again there is a good chance that the related inodes are cached in the VM object representing the backing store for the filesystem. This means that getnewvnode() can cycle very quickly, on the order of tens of thousands of vnodes per second in certain situations. By my read, you are forcing *ALL* the vnode recycling activity to run through vnlru_proc() now. The only way now for getnewvnode() to get a new vnode is by allocating it out of the zone. This was not the case before. :freeing free vnodes. Remember these vnodes have no pages associated with :them, so at most you're freeing an inode for a deleted file, and in the :common case the whole operation runs on memory without blocking for io. :... :We presently single thread the most critical case, where we have no free :vnodes and are not allowed to allocate any more while we wait for :vnlru_proc() to do io on vnodes with cached pages to reclaim some. I'm :not convinced this is a real problem. Which means that in systems with a large amount of memory (large VM page cache) doing certainly operations (such as stat()ing a large number of files e.g. a find or cvsupd), where the file set is larger then the number of vnodes available, will now have to cycle all of those vnodes through a single thread in order to reuse them. The current pre-patch case is very different. With your patch, in addition to the issues already mentioned, the inode synchronization is now being single-threaded and while the writes are asynchronous, the reads are not (if the inode happens to not be in the VM page cache any more because it's been cached so long the system has decided to throw away the page to accomodate other cached data). In the current pre-patch case, that read load was distributed over ALL processes trying to do a getnewvnode(). i.e. it was a parallel read load that actually scaled fairly well to load. :> I love the idea of being able to free vnodes in vnlru_proc() rather :> then free-and-reuse them in allocvnode(), but I cannot figure out how :> vnlru_proc() could possibly adapt to the huge load range that :> getnewvnode() has to deal with. Plus keep in mind that the vnodes :> being reused at that point are basically already dead except for :> the vgonel(). :> :> This brings up the true crux of the problem, where the true overhead :> of reusing a vnode inline with the getnewvnode() call is... and that :> is that vgonel() potentially has to update the related inode and could :> cause an unrelated process to block inside getnewvnode(). But even : :Yes, this is kind of gross, and would cause lock order problems except :that we LK_NOWAIT on the vn lock in vtryrecycle(). It'd be better if we :didn't try doing io on unrelated vnodes while this deep in the stack. I agree. It is gross, though I will note that the fact that the vnode is ON the free list tends to mean that it isn't being referenced by anyone so there should not be any significant lock ordering issues. I haven't 'fixed' this in DragonFly because I haven't been able to figure out how to distribute the recycling load and deal with the huge dynamic loading range that getnewvnode() has. I've been working on the buffer cache code since, what, 1998? These are real issues. It's always very easy to design algorithms that work for specific machine configurations, the trick is to make them work across the board. One thing I LIKE about your code is the concept of being able to reuse a vnode (or in your case allocate a new vnode) without having to perform any I/O. The re-use case in the old code always has the potential to block an unrelated process if it has to do I/O recycling the vnode it wants to reuse. But this is a very easy effect to accomplish simply by leaving the recycling code in getnewvnode() intact but STILL adding new code to vnlru_proc() to ensure that a minimum number of vnodes are truely reusable without having to perform any I/O. This would enhance light-load (light getnewvnode() load that is) performance. It would have virtually no effect under heavier loads, which is why the vnode re-use code in getnewvnode() would have to stay, but the light-load benefit is undeniable. -Matt Matthew Dillon From owner-freebsd-arch@FreeBSD.ORG Tue Mar 15 22:07:08 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 971CB16A4CE; Tue, 15 Mar 2005 22:07:08 +0000 (GMT) Received: from smtp3.server.rpi.edu (smtp3.server.rpi.edu [128.113.2.3]) by mx1.FreeBSD.org (Postfix) with ESMTP id A09F743D2F; Tue, 15 Mar 2005 22:07:04 +0000 (GMT) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp3.server.rpi.edu (8.13.0/8.13.0) with ESMTP id j2FM71Eo025283; Tue, 15 Mar 2005 17:07:01 -0500 Mime-Version: 1.0 Message-Id: In-Reply-To: <20050315125136.GH9291@darkness.comp.waw.pl> References: <20050315125136.GH9291@darkness.comp.waw.pl> Date: Tue, 15 Mar 2005 17:07:00 -0500 To: Pawel Jakub Dawidek , freebsd-arch@freebsd.org From: Garance A Drosihn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-CanItPRO-Stream: default X-RPI-SA-Score: undef - spam-scanning disabled X-Scanned-By: CanIt (www . canit . ca) on 128.113.2.3 Subject: Re: System processes recognition (Adding P_KTHREAD to swapper) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2005 22:07:08 -0000 At 1:51 PM +0100 3/15/05, Pawel Jakub Dawidek wrote: >Hi. > >I found, that there is no way to know if the given process is a system >(kernel) process or not: > >- P_SYSTEM flag is used also for userland processes (init), >- P_KTHREAD flag is not used for swapper, >- ps(1) thinks, that it found system process when there are > no arguments, checking: (argv == NULL || argv[0] == NULL) > but this is not true: > > char *argv[1] = { NULL }; > execve("/path/to/somewhere", argv, NULL); > > The /path/to/somewhere process will be recognized by ps(1) > as a system process. > >The easiest way to fix it, is to add P_KTHREAD flag to the >swapper, I think: Something like this would be helpful, but I don't know enough kernel-stuff to know if there would be any side-effects by setting that bit. If that doesn't work, then we could have pkill/pgrep/ps check for 'pid == 0 && uid == 0', and assume any process that matches is also a "kernel thread process". But obviously it would be cleaner if we could just set that bit on the swapper process... I suppose I could just try that on a test system, and see if the system goes haywire :-) -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 01:11:09 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EB39116A4F6 for ; Wed, 16 Mar 2005 01:11:09 +0000 (GMT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id E50C543D48 for ; Wed, 16 Mar 2005 01:11:08 +0000 (GMT) (envelope-from jroberson@chesapeake.net) Received: from mail.chesapeake.net (localhost [127.0.0.1]) by mail.chesapeake.net (8.12.10/8.12.10) with ESMTP id j2G1B6d4011506; Tue, 15 Mar 2005 20:11:06 -0500 (EST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost)j2G1B6TI011499; Tue, 15 Mar 2005 20:11:06 -0500 (EST) (envelope-from jroberson@chesapeake.net) X-Authentication-Warning: mail.chesapeake.net: jroberson owned process doing -bs Date: Tue, 15 Mar 2005 20:11:05 -0500 (EST) From: Jeff Roberson To: Matthew Dillon In-Reply-To: <200503151911.j2FJBWpd055485@apollo.backplane.com> Message-ID: <20050315195525.F20708@mail.chesapeake.net> References: <20050314213038.V20708@mail.chesapeake.net> <20050315035032.T20708@mail.chesapeake.net> <200503151911.j2FJBWpd055485@apollo.backplane.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: Freeing vnodes. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 01:11:10 -0000 On Tue, 15 Mar 2005, Matthew Dillon wrote: > :> I think you did not intend this. Didn't you just want to destroy > :> enough vnodes to have 'wantfreevnodes' worth of slop so getnewvnode() > :> could allocate new vnodes? In that case the calculation would be: > : > :On my system wantfreevnodes is at 2500. Let's say I have 4500 free > :vnodes. 4500 - 2500 = 2000. Divide by 2 gives you 1000. I don't think > :you read the whole patch. > > I'm not trying to be confrontational here, Jeff. Please remember that > I'm the one who has done most of the algorithmic work on these > subsystems. I designed the whole 'trigger' mechanism, for example. I wasn't being confrontational, I gave you a real example, that's all. > > The wantfreevnodes calculation is: minvnodes / 10. That's a very small > number. The 'freevnodes' value is typically a much larger value, > especially if a program running through stat()ing things. It is possible > to have tens of thousands of free vnodes. This makes your current > count calculation effectively 'freevnodes / 2'. I really don't think > you want to destroy half the current freevnodes on each pass, do you? I haven't seen my machine even get to 10,000 free vnodes with this patch running. Even while doing a 'find . -exec stat {} \; >> /dev/null'. With the new mechanism all of these vnodes would go away after a second. It might be nice to keep them around for longer if resources permit, but I do think a timeout based approach is the correct one. Please note that with the old code you wouldn't even keep them around for a second if we were above minvnodes. They would be recycled on the next call to getnewvnode(). > > :> can be a HUGE load on getnewvnode() (think of cvsupd and find, or > :> a cvs update, etc...). This load can easily outstrip vnlru_proc()'s > :> new ability to free vnodes and potentially cause a lot of unnecessarily > :> blockages. > : > :We have one buf daemon, one page daemon, one syncer, one vnlru proc, etc. > :In all these cases it would be nice if they gained new contexts when they > :had a lot of work to do, but they don't, and it doesn't seem to be a huge > :problem today. On my system one vnlruproc easily keeps up with the job of > > That's because they are carefully written (mostly by me) to not be > subject to pure cpu loads. > > buf_daemon: Is primarily only responsible for flushing DIRTY buffers. > The buffer allocation code will happily reuse clean buffers > in-line. Dirty buffers are subject to the I/O limitations > of the system (and they are flushed asynchronously for the > most part), which means that one daemon should have no > trouble handle the buffer load on an MP sytem. Since a > system naturally has many more clean buffers then dirty > buffers (even without algorithmic limitations), except in > certain particular large-write cases which are handled > elsewhere, the buf_daemon usually has very little effect > on the buffer cache's ability to allocate a new buffer. > > page_daemon: Same deal. The page daemon is primarily responsible for > flushing out dirty pages and for rebalancing the lists > if they get really out of whack. Pages in the VM page > cache (PQ_CACHE) can be reused on the fly and there are > several *natural* ways for a page to go directly to the > VM page cache without having to pass through the page > daemon. In fact, MOST of the pages that get onto the > PQ_CACHE or PQ_FREE queues are placed there directly by > mechanisms unrelated to the page daemon. > > syncer: I've always wanted to rewrite the syncer to be per-mount > or per-physical-device so it could sync out to multiple > physical devices simultaniously. The syncer is kind of bogus anyway, because it mostly just destroys the buf daemon's delayed writes by forcing it all out at once. It does redundant work, except for updating inodes, which should be all it really does. > > vnlru_proc: Prior to your patch, vnlru_proc was only responsible for > rebalancing the freevnode list. Typically the ONLY case > where a vnode needs to be forcefully put on the freevnode > list is if there are a lot of vnodes which have VM objects > which still have just one or two VM pages associated with > them, because otherwise a vnode either gets put on the > freevnode list directly by the vnode release code, or it > has enough associated pages for us to not want to recycle > it anyway (which is what the trigger code handles). The > mechanism that leads to the creation of such vnodes also > typically requires a lot of random I/O, which makes > vnlru_proc() immune to cpu load. This means that > vnlru_proc is only PARTIALLY responsible for maintaining > the freevnode list and the part it Is responsible for > tends to be unrelated to pure cpu loads. > > There are a ton of ways for a vnode to make it to that > list WITHOUT passing through vnlru_proc, which means that > prior to your patch getnewvnode() typically only has to > wait for vnlru_proc() in the most extreme situations. I still haven't seen any code wait on vnlru_proc. I've tested it on a big fast system with lots of disks. I'll test it on a little slow system with one disk too. I understand what you're arguing, I'm just not sure if it's presently a real problem. > > By my read, the changes you are currently contemplating for vnlru_proc > changes its characteristics such that it is now COMPLETELY responsible > for freeing up vnodes for getnewvnode(). This was not the case before. > > I can only repeat that getnewvnode() has a massive dynamic loading range, > one that is not necessarily dependant on or limited by I/O. For example, > when you are stat()ing a lot of files over and over again there is a good > chance that the related inodes are cached in the VM object representing > the backing store for the filesystem. This means that getnewvnode() can > cycle very quickly, on the order of tens of thousands of vnodes per > second in certain situations. By my read, you are forcing *ALL* the > vnode recycling activity to run through vnlru_proc() now. The only way > now for getnewvnode() to get a new vnode is by allocating it out of > the zone. This was not the case before. > > :freeing free vnodes. Remember these vnodes have no pages associated with > :them, so at most you're freeing an inode for a deleted file, and in the > :common case the whole operation runs on memory without blocking for io. > :... > :We presently single thread the most critical case, where we have no free > :vnodes and are not allowed to allocate any more while we wait for > :vnlru_proc() to do io on vnodes with cached pages to reclaim some. I'm > :not convinced this is a real problem. > > Which means that in systems with a large amount of memory (large VM page > cache) doing certainly operations (such as stat()ing a large number > of files e.g. a find or cvsupd), where the file set is larger then > the number of vnodes available, will now have to cycle all of those > vnodes through a single thread in order to reuse them. Doesn't seem to be a problem with any workload I've been able to devise. > > The current pre-patch case is very different. With your patch, > in addition to the issues already mentioned, the inode synchronization > is now being single-threaded and while the writes are asynchronous, > the reads are not (if the inode happens to not be in the VM page cache > any more because it's been cached so long the system has decided to > throw away the page to accomodate other cached data). If it really ends up being a problem we can spin up one thread per proc. As it doesn't seem to be a problem at all with a single thread, I'm sure that would be sufficient to handle any extreme cases. > > In the current pre-patch case, that read load was distributed over ALL > processes trying to do a getnewvnode(). i.e. it was a parallel read > load that actually scaled fairly well to load. > > :> I love the idea of being able to free vnodes in vnlru_proc() rather > :> then free-and-reuse them in allocvnode(), but I cannot figure out how > :> vnlru_proc() could possibly adapt to the huge load range that > :> getnewvnode() has to deal with. Plus keep in mind that the vnodes > :> being reused at that point are basically already dead except for > :> the vgonel(). > :> > :> This brings up the true crux of the problem, where the true overhead > :> of reusing a vnode inline with the getnewvnode() call is... and that > :> is that vgonel() potentially has to update the related inode and could > :> cause an unrelated process to block inside getnewvnode(). But even > : > :Yes, this is kind of gross, and would cause lock order problems except > :that we LK_NOWAIT on the vn lock in vtryrecycle(). It'd be better if we > :didn't try doing io on unrelated vnodes while this deep in the stack. > > I agree. It is gross, though I will note that the fact that the vnode > is ON the free list tends to mean that it isn't being referenced by > anyone so there should not be any significant lock ordering issues. > > I haven't 'fixed' this in DragonFly because I haven't been able to > figure out how to distribute the recycling load and deal with the > huge dynamic loading range that getnewvnode() has. > > I've been working on the buffer cache code since, what, 1998? These > are real issues. It's always very easy to design algorithms that > work for specific machine configurations, the trick is to make them > work across the board. > > One thing I LIKE about your code is the concept of being able to reuse > a vnode (or in your case allocate a new vnode) without having to perform > any I/O. The re-use case in the old code always has the potential to > block an unrelated process if it has to do I/O recycling the vnode it > wants to reuse. But this is a very easy effect to accomplish simply by > leaving the recycling code in getnewvnode() intact but STILL adding new > code to vnlru_proc() to ensure that a minimum number of vnodes are > truely reusable without having to perform any I/O. This would enhance > light-load (light getnewvnode() load that is) performance. It would > have virtually no effect under heavier loads, which is why the vnode > re-use code in getnewvnode() would have to stay, but the light-load > benefit is undeniable. > > -Matt > Matthew Dillon > > From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 01:38:56 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AFAD816A4CE; Wed, 16 Mar 2005 01:38:56 +0000 (GMT) Received: from daintree.corp.yahoo.com (daintree.corp.yahoo.com [216.145.52.172]) by mx1.FreeBSD.org (Postfix) with ESMTP id 89E2743D3F; Wed, 16 Mar 2005 01:38:56 +0000 (GMT) (envelope-from peter@wemm.org) Received: by daintree.corp.yahoo.com (Postfix, from userid 2154) id 7E11F197A8; Tue, 15 Mar 2005 17:38:56 -0800 (PST) From: Peter Wemm To: freebsd-arch@freebsd.org Date: Tue, 15 Mar 2005 17:38:55 -0800 User-Agent: KMail/1.7.2 References: <42335A52.9060208@freebsd.org> In-Reply-To: <42335A52.9060208@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200503151738.56185.peter@wemm.org> cc: Tim Kientzle Subject: Re: Removing gtar from base X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 01:38:56 -0000 On Saturday 12 March 2005 01:08 pm, Tim Kientzle wrote: > My plan with bsdtar has been to have both bsdtar and > gtar in 5.x, bsdtar only in 6.x. > > I'd like to remove gtar from -CURRENT as the next > step in that transition. > > Proposed timeline: > * 1 week from today (March 19): Disable WITH_GTAR in -CURRENT. > * End of March: Disconnect gtar from build in -CURRENT. > * End of May: Remove gtar from -CURRENT. > > Note: > * gtar will remain in 5.x tree indefinitely. > * WITH_GTAR will continue to be supported in 5.x. > * gtar will continue to be available in ports indefinitely. > > If there are no objections, I'll start this > process 1 week from today. By the way, I keep finding myself suprised that I don't notice that I'm using bsdtar instead of gtar. It works so well as a transparent replacement for gtar (that my fingers know the args/switches/etc) that I keep forgetting that I'm using it. I only notice when I read a help message or the man page. Great job! -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 01:43:50 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 402A116A4CE; Wed, 16 Mar 2005 01:43:50 +0000 (GMT) Received: from daintree.corp.yahoo.com (daintree.corp.yahoo.com [216.145.52.172]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1D26343D54; Wed, 16 Mar 2005 01:43:50 +0000 (GMT) (envelope-from peter@wemm.org) Received: by daintree.corp.yahoo.com (Postfix, from userid 2154) id 20BFD197AC; Tue, 15 Mar 2005 17:43:50 -0800 (PST) From: Peter Wemm To: freebsd-arch@freebsd.org Date: Tue, 15 Mar 2005 17:43:49 -0800 User-Agent: KMail/1.7.2 References: <20050303074242.GA14699@VARK.MIT.EDU> <200503030954.08271.jhb@FreeBSD.org> <20050303153505.GA16964@VARK.MIT.EDU> In-Reply-To: <20050303153505.GA16964@VARK.MIT.EDU> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200503151743.49851.peter@wemm.org> cc: David Schultz Subject: Re: Removing kernel thread stack swapping X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 01:43:50 -0000 On Thursday 03 March 2005 07:35 am, David Schultz wrote: > On Thu, Mar 03, 2005, John Baldwin wrote: [..] > > Hence, don't kill this whole feature just because someone is too > > lazy to fix a bug. > > Fair enough. I'll defer to you on the extent of the problem. > David seemed to think that it was more widespread. (BTW, does > *anyone* know what the PHOLD() in kern_physio is for? Is it a > holdover from when the PCB was in struct user?) I've wondered about this myself in the past. I went looking once and discovered that it never did anything that I could find. I believe it is a case of 'because it was always done that way' or because the pseudocode in the Bach or bsd books had it. There is certainly no functional need for it in FreeBSD. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 01:46:12 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0701316A4CE for ; Wed, 16 Mar 2005 01:46:12 +0000 (GMT) Received: from daintree.corp.yahoo.com (daintree.corp.yahoo.com [216.145.52.172]) by mx1.FreeBSD.org (Postfix) with ESMTP id D718D43D31 for ; Wed, 16 Mar 2005 01:46:11 +0000 (GMT) (envelope-from peter@wemm.org) Received: by daintree.corp.yahoo.com (Postfix, from userid 2154) id CA899197A8; Tue, 15 Mar 2005 17:46:11 -0800 (PST) From: Peter Wemm To: freebsd-arch@freebsd.org Date: Tue, 15 Mar 2005 17:46:11 -0800 User-Agent: KMail/1.7.2 References: <200503011650.j21GoI7o018125@peedub.jennejohn.org> <20050302081019.GA25222@neo.redjade.org> <20050302081729.GA76106@xor.obsecurity.org> In-Reply-To: <20050302081729.GA76106@xor.obsecurity.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200503151746.11533.peter@wemm.org> cc: Poul-Henning Kamp cc: Gary Jennejohn cc: Sangwoo Shim cc: Kris Kennaway Subject: Re: How about import mpd into base system? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 01:46:12 -0000 On Wednesday 02 March 2005 12:17 am, Kris Kennaway wrote: > On Wed, Mar 02, 2005 at 05:10:19PM +0900, Sangwoo Shim wrote: > > On Tue, Mar 01, 2005 at 06:29:03PM +0100, Poul-Henning Kamp wrote: > > > In message <200503011650.j21GoI7o018125@peedub.jennejohn.org>, > > > Gary Jennejohn w > > > > > > rites: > > > >Sangwoo Shim writes: > > > >> Mpd is likely to be used by FreeBSD (and might DFBSD) > > > >> exclusively. So, how about import mpd into the base tree? Is > > > >> there any stopper to prevent mpd from being included into the > > > >> tree? > > > > > > > >Why? /usr/sbin/ppp supports PPPoE just fine and is already in > > > > the base. > > > > > > They should be merged. > > > > > > mpd-netgraph has functionality missing in the ppp in the base > > > system. > > > > Exactly my opinion. I'm glad to know you think like that! > > Hmm, I'm curious whether there is any commiter planning mpd import. > > He said 'merged', as in 'combined into one program instead of having > two (three, including pppd) separate ppp implementations that do > almost the same thing'. By the way, I think the time has come for pppd/chat/if_ppp.c to leave the base and go back to being a 3rd party port. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 03:19:46 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D02DE16A4CF for ; Wed, 16 Mar 2005 03:19:46 +0000 (GMT) Received: from duchess.speedfactory.net (duchess.speedfactory.net [66.23.201.84]) by mx1.FreeBSD.org (Postfix) with SMTP id 1233543D49 for ; Wed, 16 Mar 2005 03:19:46 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 17458 invoked by uid 89); 16 Mar 2005 03:19:44 -0000 Received: from duchess.speedfactory.net (66.23.201.84) by duchess.speedfactory.net with SMTP; 16 Mar 2005 03:19:44 -0000 Received: (qmail 17440 invoked by uid 89); 16 Mar 2005 03:19:44 -0000 Received: from unknown (HELO palm.tree.com) (66.23.216.49) by duchess.speedfactory.net with SMTP; 16 Mar 2005 03:19:44 -0000 Received: from [127.0.0.1] (localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id j2G3Jhw6093534; Tue, 15 Mar 2005 22:19:44 -0500 (EST) (envelope-from ups@tree.com) From: Stephan Uphoff To: Peter Wemm In-Reply-To: <200503151743.49851.peter@wemm.org> References: <20050303074242.GA14699@VARK.MIT.EDU> <20050303153505.GA16964@VARK.MIT.EDU> <200503151743.49851.peter@wemm.org> Content-Type: text/plain Message-Id: <1110943183.29804.42558.camel@palm> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Tue, 15 Mar 2005 22:19:43 -0500 Content-Transfer-Encoding: 7bit cc: David Schultz cc: "freebsd-arch@freebsd.org" Subject: Re: Removing kernel thread stack swapping X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 03:19:47 -0000 On Tue, 2005-03-15 at 20:43, Peter Wemm wrote: > On Thursday 03 March 2005 07:35 am, David Schultz wrote: > > On Thu, Mar 03, 2005, John Baldwin wrote: > [..] > > > Hence, don't kill this whole feature just because someone is too > > > lazy to fix a bug. > > > > Fair enough. I'll defer to you on the extent of the problem. > > David seemed to think that it was more widespread. (BTW, does > > *anyone* know what the PHOLD() in kern_physio is for? Is it a > > holdover from when the PCB was in struct user?) > > I've wondered about this myself in the past. I went looking once and > discovered that it never did anything that I could find. I believe it > is a case of 'because it was always done that way' or because the > pseudocode in the Bach or bsd books had it. There is certainly no > functional need for it in FreeBSD. kern_physio prevents chunks of memory needed for IO from being paged out. Swapping out a thread in kern_physio will prevent it from releasing the resources soon. With minphys > stack size I think PHOLD() is still a good idea. Stephan From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 03:33:38 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0990D16A4CE for ; Wed, 16 Mar 2005 03:33:38 +0000 (GMT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3C80143D66 for ; Wed, 16 Mar 2005 03:33:36 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.1/8.13.1) with ESMTP id j2G3X4sP061855; Tue, 15 Mar 2005 19:33:10 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200503160333.j2G3X4sP061855@gw.catspoiler.org> Date: Tue, 15 Mar 2005 19:33:04 -0800 (PST) From: Don Lewis To: jroberson@chesapeake.net In-Reply-To: <20050315195525.F20708@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: arch@FreeBSD.org Subject: Re: Freeing vnodes. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 03:33:38 -0000 On 15 Mar, Jeff Roberson wrote: > On Tue, 15 Mar 2005, Matthew Dillon wrote: >> syncer: I've always wanted to rewrite the syncer to be per-mount >> or per-physical-device so it could sync out to multiple >> physical devices simultaniously. It would be nice to do this on a per-physical-device basis to avoid multiple threads contending for the same device, but this looks like it would be difficult due to the way that devices can be sliced, diced, and merged. It would also be nice if buf_daemon was a per-device or per-mount. I haven't tested it lately, but in the past I was able to deadlock buf_daemon by loopback NFS mounting a local file system and doing a lot of write activity (iozone works well for this). > The syncer is kind of bogus anyway, because it mostly just destroys the > buf daemon's delayed writes by forcing it all out at once. It does > redundant work, except for updating inodes, which should be all it really > does. The syncer also sets an upper bound on the time that file modifications go unwritten to disk. Buf_daemon sleeps while numdirtybuffers <= lodirtybuffers, so a file updated on a quiet system would not be written to disk for an arbitrarily long time without the syncer. From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 08:41:19 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1087816A4D9 for ; Wed, 16 Mar 2005 08:41:19 +0000 (GMT) Received: from mail27.syd.optusnet.com.au (mail27.syd.optusnet.com.au [211.29.133.168]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2729743D46 for ; Wed, 16 Mar 2005 08:41:18 +0000 (GMT) (envelope-from PeterJeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (c211-30-75-229.belrs2.nsw.optusnet.com.au [211.30.75.229]) j2G8f8pv001616 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Wed, 16 Mar 2005 19:41:08 +1100 Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1])j2G8f77l028480; Wed, 16 Mar 2005 19:41:07 +1100 (EST) (envelope-from pjeremy@cirb503493.alcatel.com.au) Received: (from pjeremy@localhost)j2G8f7hS028479; Wed, 16 Mar 2005 19:41:07 +1100 (EST) (envelope-from pjeremy) Date: Wed, 16 Mar 2005 19:41:06 +1100 From: Peter Jeremy To: Matthew Dillon Message-ID: <20050316084106.GC28328@cirb503493.alcatel.com.au> References: <20050314213038.V20708@mail.chesapeake.net> <20050315035032.T20708@mail.chesapeake.net> <200503151911.j2FJBWpd055485@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200503151911.j2FJBWpd055485@apollo.backplane.com> User-Agent: Mutt/1.4.2i cc: arch@freebsd.org Subject: Re: Freeing vnodes. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 08:41:19 -0000 On Tue, 2005-Mar-15 11:11:32 -0800, Matthew Dillon wrote: > syncer: I've always wanted to rewrite the syncer to be per-mount > or per-physical-device so it could sync out to multiple > physical devices simultaniously. My current bitch with syncer is that it can run for up to around 8msec (on an AMD XP-1800). This screws up interrupt latency. -- Peter Jeremy From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 08:50:30 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1CD2C16A4CE for ; Wed, 16 Mar 2005 08:50:30 +0000 (GMT) Received: from critter.freebsd.dk (f170.freebsd.dk [212.242.86.170]) by mx1.FreeBSD.org (Postfix) with ESMTP id 29CCD43D4C for ; Wed, 16 Mar 2005 08:50:29 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.13.1/8.13.1) with ESMTP id j2G8oMwa017571; Wed, 16 Mar 2005 09:50:23 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Peter Jeremy From: "Poul-Henning Kamp" In-Reply-To: Your message of "Wed, 16 Mar 2005 19:41:06 +1100." <20050316084106.GC28328@cirb503493.alcatel.com.au> Date: Wed, 16 Mar 2005 09:50:22 +0100 Message-ID: <17570.1110963022@critter.freebsd.dk> Sender: phk@critter.freebsd.dk cc: arch@freebsd.org Subject: Re: Freeing vnodes. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 08:50:30 -0000 In message <20050316084106.GC28328@cirb503493.alcatel.com.au>, Peter Jeremy wri tes: >On Tue, 2005-Mar-15 11:11:32 -0800, Matthew Dillon wrote: >> syncer: I've always wanted to rewrite the syncer to be per-mount >> or per-physical-device so it could sync out to multiple >> physical devices simultaniously. > >My current bitch with syncer is that it can run for up to around 8msec >(on an AMD XP-1800). This screws up interrupt latency. And throw thousands of I/O requests on the queue at once, which screws up I/O performance. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 08:58:49 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 32C3716A4CE for ; Wed, 16 Mar 2005 08:58:49 +0000 (GMT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id C628E43D3F for ; Wed, 16 Mar 2005 08:58:48 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.1/8.13.1) with ESMTP id j2G8wdG6062341; Wed, 16 Mar 2005 00:58:43 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200503160858.j2G8wdG6062341@gw.catspoiler.org> Date: Wed, 16 Mar 2005 00:58:39 -0800 (PST) From: Don Lewis To: phk@phk.freebsd.dk In-Reply-To: <17570.1110963022@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: PeterJeremy@optushome.com.au cc: arch@FreeBSD.org Subject: Re: Freeing vnodes. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 08:58:49 -0000 On 16 Mar, Poul-Henning Kamp wrote: > In message <20050316084106.GC28328@cirb503493.alcatel.com.au>, Peter Jeremy wri > tes: >>On Tue, 2005-Mar-15 11:11:32 -0800, Matthew Dillon wrote: >>> syncer: I've always wanted to rewrite the syncer to be per-mount >>> or per-physical-device so it could sync out to multiple >>> physical devices simultaniously. >> >>My current bitch with syncer is that it can run for up to around 8msec >>(on an AMD XP-1800). This screws up interrupt latency. This is likely to be the MNT_VNODE_FOREACH loop in VOP_SYNC(). A lot of CPU cycles can be wasted even when there is no real work to do because the list of cached vnodes for the file system has to be traversed each time. This loop should be skipped in the MNT_LAZY case, and the inode timestamp updates should be handled by putting them on the syncer worklist. > And throw thousands of I/O requests on the queue at once, which screws > up I/O performance. That is also a problem with the loop in VOP_SYNC(). This behaviour is very noticeable when a machine modifies a lot of files. From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 09:23:38 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E4E7416A4CE for ; Wed, 16 Mar 2005 09:23:38 +0000 (GMT) Received: from mailout01.sul.t-online.com (mailout01.sul.t-online.com [194.25.134.80]) by mx1.FreeBSD.org (Postfix) with ESMTP id 56F1D43D53 for ; Wed, 16 Mar 2005 09:23:38 +0000 (GMT) (envelope-from Alexander@Leidinger.net) Received: from fwd27.aul.t-online.de by mailout01.sul.t-online.com with smtp id 1DBUkZ-0006Un-01; Wed, 16 Mar 2005 10:23:35 +0100 Received: from Andro-Beta.Leidinger.net (Th4Mk-Z1oeCh-lB31Jk+LBv8fzNehZstP4XeWqPk113RGGRfFjj0Yb@[217.83.27.72]) by fwd27.sul.t-online.de with esmtp id 1DBUkL-0BWx9s0; Wed, 16 Mar 2005 10:23:21 +0100 Received: from localhost (localhost [127.0.0.1])j2G9MCW6012719; Wed, 16 Mar 2005 10:22:12 +0100 (CET) (envelope-from Alexander@Leidinger.net) Received: from 141.113.101.32 ([141.113.101.32]) by netchild.homeip.net (Horde) with HTTP for ; Wed, 16 Mar 2005 10:22:12 +0100 Message-ID: <20050316102212.hyq4ptdcoc4k0s48@netchild.homeip.net> X-Priority: 3 (Normal) Date: Wed, 16 Mar 2005 10:22:12 +0100 From: Alexander Leidinger To: Jeff Roberson References: <20050314213038.V20708@mail.chesapeake.net> <20050315035032.T20708@mail.chesapeake.net> <200503151911.j2FJBWpd055485@apollo.backplane.com> <20050315195525.F20708@mail.chesapeake.net> In-Reply-To: <20050315195525.F20708@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Internet Messaging Program (IMP) H3 (4.0.2) / FreeBSD-4.11 X-ID: Th4Mk-Z1oeCh-lB31Jk+LBv8fzNehZstP4XeWqPk113RGGRfFjj0Yb@t-dialin.net X-TOI-MSGID: cd99e4ad-66f2-4b36-b320-f88def01487e cc: arch@freebsd.org Subject: Re: Freeing vnodes. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 09:23:39 -0000 Jeff Roberson wrote: > I haven't seen my machine even get to 10,000 free vnodes with this patch > running. Even while doing a 'find . -exec stat {} \; >> /dev/null'. With > the new mechanism all of these vnodes would go away after a second. It > might be nice to keep them around for longer if resources permit, but I do > think a timeout based approach is the correct one. Please note that with > the old code you wouldn't even keep them around for a second if we were > above minvnodes. They would be recycled on the next call to > getnewvnode(). Does this mean the behavior of find /usr/src -name .\#\* -o -name \*.orig -print followed by find /usr/src -name .\#\* -o -name \*.orig -print -delete after looking at the output on an idle system does change (assuming the system is below minvnodes)? Actually the first command needs a little bit of time with a slow disk, the second one is very fast compared to the first one. Bye, Alexander. -- http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 There's no room in the drug world for amateurs. -- Raoul Duke From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 10:27:43 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F204B16A4CE for ; Wed, 16 Mar 2005 10:27:42 +0000 (GMT) Received: from rwcrmhc13.comcast.net (rwcrmhc13.comcast.net [204.127.198.39]) by mx1.FreeBSD.org (Postfix) with ESMTP id B726F43D41 for ; Wed, 16 Mar 2005 10:27:42 +0000 (GMT) (envelope-from dougb@freebsd.org) Received: from [192.168.0.4] (c-24-130-110-32.we.client2.attbi.com[24.130.110.32]) by comcast.net (rwcrmhc13) with ESMTP id <200503161027420150041130e>; Wed, 16 Mar 2005 10:27:42 +0000 Message-ID: <42380A1D.1010005@freebsd.org> Date: Wed, 16 Mar 2005 02:27:41 -0800 From: Doug Barton Organization: http://www.FreeBSD.org User-Agent: Mozilla Thunderbird 1.0 (X11/20050316) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-arch@freebsd.org X-Enigmail-Version: 0.89.6.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Time to stop buildling named (and friends) by default in 6-current? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 10:27:43 -0000 Folks, Way back at the bsdcon in Foster City when we first started talking about importing BIND 9 into the base we also talked about adding more knobs to give users finer grained control over which bits of BIND were built, and turning off the build of named (and associated binaries) by default. Well, the first bit is done, so we're now in the position of being able to flip the NO_BIND_NAMED knob (see make.conf(5) for details) to WITH_BIND_NAMED, and turn it off by default. Is this something that we're still interested in doing? If so, this would be a good time to do it, since I'll be importing 9.3.1 sometime in the next couple days (first round of make world testing is underway), and we're still early in the life of 6-current. Of course, this would only be for 6-current, we wouldn't change the behavior in RELENG_[45]. What do you think? Doug -- This .signature sanitized for your protection From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 21:35:53 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 607DF16A4CF; Wed, 16 Mar 2005 21:35:53 +0000 (GMT) Received: from harmony.village.org (rover.village.org [168.103.84.182]) by mx1.FreeBSD.org (Postfix) with ESMTP id C71B443D1D; Wed, 16 Mar 2005 21:35:52 +0000 (GMT) (envelope-from imp@bsdimp.com) Received: from localhost (warner@rover2.village.org [10.0.0.1]) by harmony.village.org (8.13.3/8.13.1) with ESMTP id j2GLX5fh015937; Wed, 16 Mar 2005 14:33:05 -0700 (MST) (envelope-from imp@bsdimp.com) Date: Wed, 16 Mar 2005 14:33:11 -0700 (MST) Message-Id: <20050316.143311.01015387.imp@bsdimp.com> To: peter@wemm.org From: "M. Warner Losh" In-Reply-To: <200503151738.56185.peter@wemm.org> References: <42335A52.9060208@freebsd.org> <200503151738.56185.peter@wemm.org> X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit cc: kientzle@FreeBSD.org cc: freebsd-arch@FreeBSD.org Subject: Re: Removing gtar from base X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 21:35:53 -0000 In message: <200503151738.56185.peter@wemm.org> Peter Wemm writes: : By the way, I keep finding myself suprised that I don't notice that I'm : using bsdtar instead of gtar. It works so well as a transparent : replacement for gtar (that my fingers know the args/switches/etc) that : I keep forgetting that I'm using it. I only notice when I read a help : message or the man page. So far we've only noticed because someone added obscure gnu-tar options to one of the scripts we use to build flashes... Warner From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 22:48:20 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A0AE216A4CE for ; Wed, 16 Mar 2005 22:48:20 +0000 (GMT) Received: from mail24.sea5.speakeasy.net (mail24.sea5.speakeasy.net [69.17.117.26]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3A5DC43D39 for ; Wed, 16 Mar 2005 22:48:20 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 5349 invoked from network); 16 Mar 2005 22:48:19 -0000 Received: from server.baldwin.cx ([216.27.160.63]) (envelope-sender )AES256-SHA encrypted SMTP for ; 16 Mar 2005 22:48:18 -0000 Received: from [10.50.40.202] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.13.1/8.13.1) with ESMTP id j2GMm5Ku002222; Wed, 16 Mar 2005 17:48:08 -0500 (EST) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Wed, 16 Mar 2005 17:48:02 -0500 User-Agent: KMail/1.6.2 References: <20050315125136.GH9291@darkness.comp.waw.pl> In-Reply-To: <20050315125136.GH9291@darkness.comp.waw.pl> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200503161748.02353.jhb@FreeBSD.org> X-Spam-Status: No, score=-102.8 required=4.2 tests=ALL_TRUSTED, USER_IN_WHITELIST autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on server.baldwin.cx cc: Pawel Jakub Dawidek Subject: Re: System processes recognition. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 22:48:20 -0000 On Tuesday 15 March 2005 07:51 am, Pawel Jakub Dawidek wrote: > Hi. > > I found, that there is no way to know if the given process is a system > (kernel) process or not: > > - P_SYSTEM flag is used also for userland processes (init), > - P_KTHREAD flag is not used for swapper, > - ps(1) thinks, that it found system process when there are no arguments > (argv == NULL || argv[0] == NULL), but this is not true: > char *argv[1] = { NULL }; > > execve("/path/to/somewhere", argv, NULL); > /path/to/somewhere process will be recognized by ps(1) as a system > process. > > The easiest way to fix it, is to add P_KTHREAD flag to the swapper, I > think: > > --- init_main.c 17 Feb 2005 10:00:09 -0000 1.255 > +++ init_main.c 15 Mar 2005 12:48:04 -0000 > @@ -365,7 +365,7 @@ proc0_init(void *dummy __unused) > session0.s_leader = p; > > p->p_sysent = &null_sysvec; > - p->p_flag = P_SYSTEM; > + p->p_flag = P_SYSTEM | P_KTHREAD; > p->p_sflag = PS_INMEM; > p->p_state = PRS_NORMAL; > knlist_init(&p->p_klist, &p->p_mtx); > > Opinions? I think this is ok. Ask bde@, he might say that P_SYSTEM should be removed from init. (Can't remember if he is in favor of that or not.) -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Wed Mar 16 22:48:23 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 66E2D16A4E0 for ; Wed, 16 Mar 2005 22:48:23 +0000 (GMT) Received: from mail21.sea5.speakeasy.net (mail21.sea5.speakeasy.net [69.17.117.23]) by mx1.FreeBSD.org (Postfix) with ESMTP id E810143D2F for ; Wed, 16 Mar 2005 22:48:22 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 8278 invoked from network); 16 Mar 2005 22:48:22 -0000 Received: from server.baldwin.cx ([216.27.160.63]) (envelope-sender )AES256-SHA encrypted SMTP for ; 16 Mar 2005 22:48:21 -0000 Received: from [10.50.40.202] (gw1.twc.weather.com [216.133.140.1]) (authenticated bits=0) by server.baldwin.cx (8.13.1/8.13.1) with ESMTP id j2GMm5Kv002222; Wed, 16 Mar 2005 17:48:16 -0500 (EST) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: freebsd-arch@FreeBSD.org Date: Wed, 16 Mar 2005 17:49:24 -0500 User-Agent: KMail/1.6.2 References: <42380A1D.1010005@freebsd.org> In-Reply-To: <42380A1D.1010005@freebsd.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200503161749.24588.jhb@FreeBSD.org> X-Spam-Status: No, score=-102.8 required=4.2 tests=ALL_TRUSTED, USER_IN_WHITELIST autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on server.baldwin.cx cc: Doug Barton Subject: Re: Time to stop buildling named (and friends) by default in 6-current? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2005 22:48:23 -0000 On Wednesday 16 March 2005 05:27 am, Doug Barton wrote: > Folks, > > Way back at the bsdcon in Foster City when we first started talking about > importing BIND 9 into the base we also talked about adding more knobs to > give users finer grained control over which bits of BIND were built, and > turning off the build of named (and associated binaries) by default. Well, > the first bit is done, so we're now in the position of being able to flip > the NO_BIND_NAMED knob (see make.conf(5) for details) to WITH_BIND_NAMED, > and turn it off by default. Is this something that we're still interested > in doing? If so, this would be a good time to do it, since I'll be > importing 9.3.1 sometime in the next couple days (first round of make world > testing is underway), and we're still early in the life of 6-current. > > Of course, this would only be for 6-current, we wouldn't change the > behavior in RELENG_[45]. > > What do you think? If we are going to do this, then why not just have users install bind from ports and only install the client as part of the base system? This is what we do with DHCP for example. Basically, if it's going to be an optional component, I think it belongs in ports, not the /usr/src. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org From owner-freebsd-arch@FreeBSD.ORG Thu Mar 17 01:57:26 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B700A16A4CE for ; Thu, 17 Mar 2005 01:57:26 +0000 (GMT) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.203]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4D02D43D41 for ; Thu, 17 Mar 2005 01:57:26 +0000 (GMT) (envelope-from peadar.edwards@gmail.com) Received: by wproxy.gmail.com with SMTP id 67so87523wri for ; Wed, 16 Mar 2005 17:57:23 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=To3k79h7pMUxxX24x5EySNObLEAWZshmNUt7+P9OlEWSx1bNTUonpKvtB1apz/2zXI8Xzcl+tK0JHtj8+vosFn8XPnH7vCY4Aeb/TOSa9aCywelFPAnaed6gZZGBuCEUc5S6u8bPwnO+MqHnFKnv1VNZRvitc8nCd2AGSosXDx0= Received: by 10.54.62.9 with SMTP id k9mr83272wra; Wed, 16 Mar 2005 17:57:23 -0800 (PST) Received: by 10.54.57.20 with HTTP; Wed, 16 Mar 2005 17:57:23 -0800 (PST) Message-ID: <34cb7c8405031617572074b070@mail.gmail.com> Date: Thu, 17 Mar 2005 01:57:23 +0000 From: Peter Edwards To: "M. Warner Losh" In-Reply-To: <20050316.143311.01015387.imp@bsdimp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit References: <42335A52.9060208@freebsd.org> <200503151738.56185.peter@wemm.org> <20050316.143311.01015387.imp@bsdimp.com> cc: kientzle@freebsd.org cc: freebsd-arch@freebsd.org Subject: Re: Removing gtar from base X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Peter Edwards List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2005 01:57:26 -0000 On Wed, 16 Mar 2005 14:33:11 -0700 (MST), M. Warner Losh wrote: > In message: <200503151738.56185.peter@wemm.org> > Peter Wemm writes: > : By the way, I keep finding myself suprised that I don't notice that I'm > : using bsdtar instead of gtar. ... > So far we've only noticed because someone added obscure gnu-tar > options to one of the scripts we use to build flashes... > Someone changed tar? :-) From owner-freebsd-arch@FreeBSD.ORG Thu Mar 17 02:01:59 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E3C7916A4CE; Thu, 17 Mar 2005 02:01:58 +0000 (GMT) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 371CC43D31; Thu, 17 Mar 2005 02:01:58 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86])j2H21uHn007662; Thu, 17 Mar 2005 13:01:56 +1100 Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) j2H21mS5024600; Thu, 17 Mar 2005 13:01:54 +1100 Date: Thu, 17 Mar 2005 13:01:47 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: John Baldwin In-Reply-To: <200503161748.02353.jhb@FreeBSD.org> Message-ID: <20050317124633.M72560@delplex.bde.org> References: <20050315125136.GH9291@darkness.comp.waw.pl> <200503161748.02353.jhb@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed cc: Pawel Jakub Dawidek cc: freebsd-arch@freebsd.org Subject: Re: System processes recognition. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2005 02:01:59 -0000 On Wed, 16 Mar 2005, John Baldwin wrote: > On Tuesday 15 March 2005 07:51 am, Pawel Jakub Dawidek wrote: >> I found, that there is no way to know if the given process is a system >> (kernel) process or not: >> ... >> The easiest way to fix it, is to add P_KTHREAD flag to the swapper, I >> think: >> >> --- init_main.c 17 Feb 2005 10:00:09 -0000 1.255 >> +++ init_main.c 15 Mar 2005 12:48:04 -0000 >> @@ -365,7 +365,7 @@ proc0_init(void *dummy __unused) >> session0.s_leader = p; >> >> p->p_sysent = &null_sysvec; >> - p->p_flag = P_SYSTEM; >> + p->p_flag = P_SYSTEM | P_KTHREAD; >> p->p_sflag = PS_INMEM; >> p->p_state = PRS_NORMAL; >> knlist_init(&p->p_klist, &p->p_mtx); >> >> Opinions? > > I think this is ok. Ask bde@, he might say that P_SYSTEM should be removed > from init. (Can't remember if he is in favor of that or not.) P_SYSTEM for init is bogus since it breaks at least procfs for init. procfs may need to be disabled for init for security reasons, but it shouldn't be disabled unconditionally. I mainly noticed /proc/1/map not existing. There should be flags like P_KTHREAD as needed to make the properties of init independent. I use the following patches to mostly just remove the setting of P_SYSTEM for init. %%% Index: init_main.c =================================================================== RCS file: /home/ncvs/src/sys/kern/init_main.c,v retrieving revision 1.243 diff -u -2 -r1.243 init_main.c --- init_main.c 16 Jun 2004 00:26:29 -0000 1.243 +++ init_main.c 16 Jun 2004 05:56:22 -0000 @@ -697,8 +686,8 @@ panic("cannot fork init: %d\n", error); KASSERT(initproc->p_pid == 1, ("create_init: initproc->p_pid != 1")); - /* divorce init's credentials from the kernel's */ + + /* Divorce init's credentials from the kernel's. */ newcred = crget(); PROC_LOCK(initproc); - initproc->p_flag |= P_SYSTEM; oldcred = initproc->p_ucred; crcopy(newcred, oldcred); @@ -710,7 +699,5 @@ crfree(oldcred); cred_update_thread(FIRST_THREAD_IN_PROC(initproc)); - mtx_lock_spin(&sched_lock); - initproc->p_sflag |= PS_INMEM; - mtx_unlock_spin(&sched_lock); + cpu_set_fork_handler(FIRST_THREAD_IN_PROC(initproc), start_init, NULL); } Index: kern_sig.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_sig.c,v retrieving revision 1.281 diff -u -2 -r1.281 kern_sig.c --- kern_sig.c 11 Jun 2004 11:16:23 -0000 1.281 +++ kern_sig.c 2 Oct 2004 13:36:09 -0000 @@ -360,5 +348,5 @@ * is forbidden to set SA_NOCLDWAIT. */ - if (p->p_pid == 1) + if (p == initproc) ps->ps_flag &= ~PS_NOCLDWAIT; else @@ -1312,5 +1297,5 @@ LIST_FOREACH(p, &allproc, p_list) { PROC_LOCK(p); - if (p->p_pid <= 1 || p->p_flag & P_SYSTEM || + if (p == initproc || p->p_flag & P_SYSTEM || p == td->td_proc) { PROC_UNLOCK(p); @@ -1343,5 +1328,5 @@ LIST_FOREACH(p, &pgrp->pg_members, p_pglist) { PROC_LOCK(p); - if (p->p_pid <= 1 || p->p_flag & P_SYSTEM) { + if (p == initproc || p->p_flag & P_SYSTEM) { PROC_UNLOCK(p); continue; @@ -2127,5 +2153,5 @@ * Don't take default actions on system processes. */ - if (p->p_pid <= 1) { + if (p == initproc || p->p_flag & P_SYSTEM) { #ifdef DIAGNOSTIC /* %%% PS_INMEM should be inherited on fork() like it is for other user processes. I think it is always set after fork, so setting it in the above never did anything; in particular it never forced init to stay in memory, but setting P_SYSTEM did that and forced PS_INMEM to stay set as a side effect. All tests of P_SYSTEM need to be looked at to see if they affect init. I could only find the ones above that are close to being problems. Since they already had a separate (slightly wrong) test for init, they don't need to be changed; however, the tests are bogus if P_SYSTEM is set for init. (p->p_pid <= 1) is also satisfied for pid 0, but the P_SYSTEM part of the tests always succeeds for pid 0. Bruce From owner-freebsd-arch@FreeBSD.ORG Thu Mar 17 11:01:19 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6917016A4CE; Thu, 17 Mar 2005 11:01:19 +0000 (GMT) Received: from rwcrmhc11.comcast.net (rwcrmhc14.comcast.net [216.148.227.89]) by mx1.FreeBSD.org (Postfix) with ESMTP id 04F4343D46; Thu, 17 Mar 2005 11:01:19 +0000 (GMT) (envelope-from dougb@freebsd.org) Received: from [192.168.0.3] (c-24-130-110-32.we.client2.attbi.com[24.130.110.32]) by comcast.net (rwcrmhc14) with ESMTP id <2005031711011801400mv3h0e>; Thu, 17 Mar 2005 11:01:18 +0000 Message-ID: <423963C6.10903@FreeBSD.org> Date: Thu, 17 Mar 2005 03:02:30 -0800 From: Doug Barton Organization: http://www.FreeBSD.org/ User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: John Baldwin References: <42380A1D.1010005@freebsd.org> <200503161749.24588.jhb@FreeBSD.org> In-Reply-To: <200503161749.24588.jhb@FreeBSD.org> X-Enigmail-Version: 0.90.1.1 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-arch@FreeBSD.org Subject: Re: Time to stop buildling named (and friends) by default in 6-current? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2005 11:01:19 -0000 John Baldwin wrote: > On Wednesday 16 March 2005 05:27 am, Doug Barton wrote: > If we are going to do this, then why not just have users install bind from > ports and only install the client as part of the base system? This is what > we do with DHCP for example. Basically, if it's going to be an optional > component, I think it belongs in ports, not the /usr/src. I have a certain sympathy with that position, however the two situations are a bit different. With dhcp you're only talking about a couple of binaries. For reference, here's the relevant section of make.conf(5): NO_BIND_DNSSEC (bool) Set to avoid building or installing the DNSSEC related binaries, dnssec-keygen(8) and dnssec-signzone(8). NO_BIND_ETC (bool) Set to avoid installing the default files to /var/named/etc/namedb. NO_BIND_LIBS_LWRES (bool) Set to avoid installing the lightweight resolver library in /usr/lib. The library that is private to the build system may still be built as needed. NO_BIND_MTREE (bool) Set to avoid running mtree(8) to create the chroot directory structure under /var/named, and avoid creating an /etc/namedb symlink to the chroot directory. This option should typically be used together with NO_BIND_ETC. NO_BIND_NAMED (bool) Set to avoid building or installing named(8), named.reload(8), named-checkconf(8), named-checkzone(8), rndc(8), and rndc-confgen(8). NO_BIND_UTILS (bool) Set to avoid building or installing the BIND user- land utilities, dig(1), host(1), nslookup(1), and nsupdate(8). WITH_BIND_LIBS (bool) Set to install BIND libraries and include files. The community has said that they want to keep everything that's in _UTILS in the base. We've already made installing the libraries optional, except for lwres which nectar has plans for. So we could lose the binaries under named, and probably dnssec as well, but because the lwresd daemon uses a lot of the same code as named, we can't get rid of the sources. So in the end, My view of the thing is that we're better off having the whole thing in the tree, but defaulting the parts that are less likely to be used to off. But, I'm willing to listen to other arguments. Doug -- This .signature sanitized for your protection From owner-freebsd-arch@FreeBSD.ORG Thu Mar 17 13:14:08 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 05F3F16A4D7; Thu, 17 Mar 2005 13:14:08 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id DD8F643D39; Thu, 17 Mar 2005 13:14:07 +0000 (GMT) (envelope-from davidxu@freebsd.org) Received: from [127.0.0.1] (davidxu@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.3/8.13.3) with ESMTP id j2HDDw57048861; Thu, 17 Mar 2005 13:14:02 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4239829D.5030202@freebsd.org> Date: Thu, 17 Mar 2005 21:14:05 +0800 From: David Xu User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.7.2) Gecko/20041004 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Bruce Evans References: <20050315125136.GH9291@darkness.comp.waw.pl> <200503161748.02353.jhb@FreeBSD.org> <20050317124633.M72560@delplex.bde.org> In-Reply-To: <20050317124633.M72560@delplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: Pawel Jakub Dawidek cc: freebsd-arch@freebsd.org Subject: Re: System processes recognition. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2005 13:14:08 -0000 Bruce Evans wrote: > > P_SYSTEM for init is bogus since it breaks at least procfs for init. > procfs may need to be disabled for init for security reasons, but it > shouldn't be disabled unconditionally. I mainly noticed /proc/1/map > not existing. > > There should be flags like P_KTHREAD as needed to make the properties > of init independent. > > I use the following patches to mostly just remove the setting of P_SYSTEM > for init. Removing P_SYSTEM for init will cause it to be swapped out under heavy memory pressure, we unlikely want to swap out init, otherwise it results zoombies can not be recycled immediately, does anyone know that init is already be locked into memory, e.g, by PHOLD ? David Xu From owner-freebsd-arch@FreeBSD.ORG Thu Mar 17 15:10:38 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4A87716A4CE; Thu, 17 Mar 2005 15:10:38 +0000 (GMT) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id C96D743D39; Thu, 17 Mar 2005 15:10:37 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.11] (junior-wifi.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.1/8.13.1) with ESMTP id j2HFAKHK032004; Thu, 17 Mar 2005 08:10:20 -0700 (MST) (envelope-from scottl@samsco.org) Message-ID: <42399D58.3040000@samsco.org> Date: Thu, 17 Mar 2005 08:08:08 -0700 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.5) Gecko/20050218 X-Accept-Language: en-us, en MIME-Version: 1.0 To: John Baldwin References: <42380A1D.1010005@freebsd.org> <200503161749.24588.jhb@FreeBSD.org> In-Reply-To: <200503161749.24588.jhb@FreeBSD.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.8 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on pooker.samsco.org cc: Doug Barton cc: freebsd-arch@freebsd.org Subject: Re: Time to stop buildling named (and friends) by default in 6-current? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2005 15:10:38 -0000 John Baldwin wrote: > On Wednesday 16 March 2005 05:27 am, Doug Barton wrote: > >>Folks, >> >>Way back at the bsdcon in Foster City when we first started talking about >>importing BIND 9 into the base we also talked about adding more knobs to >>give users finer grained control over which bits of BIND were built, and >>turning off the build of named (and associated binaries) by default. Well, >>the first bit is done, so we're now in the position of being able to flip >>the NO_BIND_NAMED knob (see make.conf(5) for details) to WITH_BIND_NAMED, >>and turn it off by default. Is this something that we're still interested >>in doing? If so, this would be a good time to do it, since I'll be >>importing 9.3.1 sometime in the next couple days (first round of make world >>testing is underway), and we're still early in the life of 6-current. >> >>Of course, this would only be for 6-current, we wouldn't change the >>behavior in RELENG_[45]. >> >>What do you think? > > > If we are going to do this, then why not just have users install bind from > ports and only install the client as part of the base system? This is what > we do with DHCP for example. Basically, if it's going to be an optional > component, I think it belongs in ports, not the /usr/src. > I agree here, though maybe the argument is moot now that Doug imported 9.3.1 last night? Not changing the status quo is ok too. Scott From owner-freebsd-arch@FreeBSD.ORG Thu Mar 17 16:01:59 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 41CA716A4CE; Thu, 17 Mar 2005 16:01:59 +0000 (GMT) Received: from VARK.MIT.EDU (VARK.MIT.EDU [18.95.3.179]) by mx1.FreeBSD.org (Postfix) with ESMTP id D193343D46; Thu, 17 Mar 2005 16:01:58 +0000 (GMT) (envelope-from das@FreeBSD.ORG) Received: from VARK.MIT.EDU (localhost [127.0.0.1]) by VARK.MIT.EDU (8.13.3/8.13.1) with ESMTP id j2HG1skH004844; Thu, 17 Mar 2005 11:01:54 -0500 (EST) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by VARK.MIT.EDU (8.13.3/8.13.1/Submit) id j2HG1rWl004843; Thu, 17 Mar 2005 11:01:53 -0500 (EST) (envelope-from das@FreeBSD.ORG) Date: Thu, 17 Mar 2005 11:01:53 -0500 From: David Schultz To: "M. Warner Losh" Message-ID: <20050317160153.GA4766@VARK.MIT.EDU> Mail-Followup-To: "M. Warner Losh" , peter@wemm.org, kientzle@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG References: <42335A52.9060208@freebsd.org> <200503151738.56185.peter@wemm.org> <20050316.143311.01015387.imp@bsdimp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050316.143311.01015387.imp@bsdimp.com> cc: kientzle@FreeBSD.ORG cc: freebsd-arch@FreeBSD.ORG Subject: Re: Removing gtar from base X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2005 16:01:59 -0000 On Wed, Mar 16, 2005, M. Warner Losh wrote: > In message: <200503151738.56185.peter@wemm.org> > Peter Wemm writes: > : By the way, I keep finding myself suprised that I don't notice that I'm > : using bsdtar instead of gtar. It works so well as a transparent > : replacement for gtar (that my fingers know the args/switches/etc) that > : I keep forgetting that I'm using it. I only notice when I read a help > : message or the man page. > > So far we've only noticed because someone added obscure gnu-tar > options to one of the scripts we use to build flashes... I notice when I accidentally type 'tar xzf foo.zip' or 'tar xzf foo.tar' or 'tar xf foo.tgz' and it works anyway. :-) From owner-freebsd-arch@FreeBSD.ORG Thu Mar 17 19:17:03 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D23D616A4CE; Thu, 17 Mar 2005 19:17:03 +0000 (GMT) Received: from sccrmhc12.comcast.net (sccrmhc12.comcast.net [204.127.202.56]) by mx1.FreeBSD.org (Postfix) with ESMTP id 44D4143D5E; Thu, 17 Mar 2005 19:17:03 +0000 (GMT) (envelope-from dougb@freebsd.org) Received: from [192.168.0.4] (c-24-130-110-32.we.client2.attbi.com[24.130.110.32]) by comcast.net (sccrmhc12) with ESMTP id <2005031719170201200e5tf5e>; Thu, 17 Mar 2005 19:17:02 +0000 Message-ID: <4239D7AD.7050004@freebsd.org> Date: Thu, 17 Mar 2005 11:17:01 -0800 From: Doug Barton Organization: http://www.FreeBSD.org User-Agent: Mozilla Thunderbird 1.0 (X11/20050316) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Scott Long References: <42380A1D.1010005@freebsd.org> <200503161749.24588.jhb@FreeBSD.org> <42399D58.3040000@samsco.org> In-Reply-To: <42399D58.3040000@samsco.org> X-Enigmail-Version: 0.89.6.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-arch@freebsd.org Subject: Re: Time to stop buildling named (and friends) by default in 6-current? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2005 19:17:03 -0000 Scott Long wrote: > John Baldwin wrote: >> If we are going to do this, then why not just have users install bind >> from ports and only install the client as part of the base system? >> This is what we do with DHCP for example. Basically, if it's going to >> be an optional component, I think it belongs in ports, not the /usr/src. >> > > I agree here, though maybe the argument is moot now that Doug imported > 9.3.1 last night? Not changing the status quo is ok too. Scott, did you see my response to John's post? I don't consider any of this a done deal, but I had to get 9.3.1 in the tree asap in order to try and make an MFC before 5.4 goes out. If we collectively decide to strip named and friends out of the base, we can still do that. I know how to remove files from the vendor branch now. :) Doug -- This .signature sanitized for your protection From owner-freebsd-arch@FreeBSD.ORG Fri Mar 18 09:41:06 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0F89416A4CE; Fri, 18 Mar 2005 09:41:06 +0000 (GMT) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 656F943D31; Fri, 18 Mar 2005 09:41:05 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86])j2I9f4A6030952; Fri, 18 Mar 2005 20:41:04 +1100 Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246]) j2I9f1S5022138; Fri, 18 Mar 2005 20:41:02 +1100 Date: Fri, 18 Mar 2005 20:41:00 +1100 (EST) From: Bruce Evans X-X-Sender: bde@epsplex.bde.org To: David Xu In-Reply-To: <4239829D.5030202@freebsd.org> Message-ID: <20050318201454.Q1050@epsplex.bde.org> References: <20050315125136.GH9291@darkness.comp.waw.pl> <200503161748.02353.jhb@FreeBSD.org><4239829D.5030202@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed cc: Pawel Jakub Dawidek cc: freebsd-arch@freebsd.org Subject: Re: System processes recognition. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Mar 2005 09:41:06 -0000 On Thu, 17 Mar 2005, David Xu wrote: > Bruce Evans wrote: >> >> P_SYSTEM for init is bogus since it breaks at least procfs for init. >> procfs may need to be disabled for init for security reasons, but it >> shouldn't be disabled unconditionally. I mainly noticed /proc/1/map >> not existing. >> >> There should be flags like P_KTHREAD as needed to make the properties >> of init independent. > > Removing P_SYSTEM for init will cause it to be swapped out under heavy > memory pressure, we unlikely want to swap out init, otherwise it results > zoombies > can not be recycled immediately, does anyone know that init is already be > locked into memory, e.g, by PHOLD ? As I said, there should be flags like PKTHREAD to control this independently. Perhaps 2 flags to control swapouts and pageouts. Only the stack pages are swapped out and the stack is a small part of the process, so for init it is more important to prevent pageouts. I think PHOLD() only affects swapouts. The comment about it in proc.h doesn't say what it does -- the comment says that PHOLD() holds the U-area in memory, but now there isn't even a U-area. There is an explicit test for init in the pageout daemon. I think this prevents init being paged out, so with my removal of P_SYSTEM for init, init has the strange property of being swappable but not being pageable. The test for init has the same hard-coded assumption on init's pid that I fixed in kern_sig.c in my previous patch in this thread, and there is a worse hard-coded assumptions on pids in the same expression: %%% Index: vm_pageout.c =================================================================== RCS file: /home/ncvs/src/sys/vm/vm_pageout.c,v retrieving revision 1.268 diff -u -2 -r1.268 vm_pageout.c --- vm_pageout.c 7 Jan 2005 02:29:27 -0000 1.268 +++ vm_pageout.c 18 Mar 2005 09:15:09 -0000 @@ -1193,6 +1237,7 @@ /* * If this is a system or protected process, skip it. + * XXX: unfixed: all style bugs, some pid magic (48). */ - if ((p->p_flag & P_SYSTEM) || (p->p_pid == 1) || + if ((p->p_flag & P_SYSTEM) || (p == initproc) || (p->p_flag & P_PROTECTED) || ((p->p_pid < 48) && (swap_pager_avail != 0))) { %%% Bruce From owner-freebsd-arch@FreeBSD.ORG Fri Mar 18 12:16:46 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8F13A16A4CE; Fri, 18 Mar 2005 12:16:46 +0000 (GMT) Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5528A43D55; Fri, 18 Mar 2005 12:16:46 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by cyrus.watson.org (Postfix) with SMTP id 9F7E846B1A; Fri, 18 Mar 2005 07:16:45 -0500 (EST) Date: Fri, 18 Mar 2005 12:14:03 +0000 (GMT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Doug Barton In-Reply-To: <4239D7AD.7050004@freebsd.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-arch@freebsd.org Subject: Re: Time to stop buildling named (and friends) by default in 6-current? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Mar 2005 12:16:46 -0000 On Thu, 17 Mar 2005, Doug Barton wrote: > Scott Long wrote: > > John Baldwin wrote: > > >> If we are going to do this, then why not just have users install bind > >> from ports and only install the client as part of the base system? > >> This is what we do with DHCP for example. Basically, if it's going to > >> be an optional component, I think it belongs in ports, not the /usr/src. > > > > I agree here, though maybe the argument is moot now that Doug imported > > 9.3.1 last night? Not changing the status quo is ok too. > > Scott, did you see my response to John's post? I don't consider any of > this a done deal, but I had to get 9.3.1 in the tree asap in order to > try and make an MFC before 5.4 goes out. If we collectively decide to > strip named and friends out of the base, we can still do that. I know > how to remove files from the vendor branch now. :) Personally, I'm something of a fan of keeping the complete BIND in the base tree as is -- built by default, but not started at boot by default. It's well-maintained, historically "BSD", and probably widely used as such. Robert N M Watson From owner-freebsd-arch@FreeBSD.ORG Sat Mar 19 22:55:14 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 60B0216A4CE; Sat, 19 Mar 2005 22:55:14 +0000 (GMT) Received: from gateway.nixsys.be (gateway.nixsys.be [195.144.77.33]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9A9FE43D39; Sat, 19 Mar 2005 22:55:13 +0000 (GMT) (envelope-from philip@paeps.cx) Received: from wotan.home.paeps.cx (wotan.home.paeps.cx [IPv6:2001:838:37f:10:a00:20ff:fe9b:138c]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "wotan.home.paeps.cx", Issuer "NixSys CA" (verified OK)) by gateway.nixsys.be (Postfix) with ESMTP id 5EF13C0E8; Sat, 19 Mar 2005 23:55:10 +0100 (CET) Received: from fasolt.home.paeps.cx (fasolt.home.paeps.cx [IPv6:2001:838:37f:10:20a:e6ff:fe7d:c08]) by wotan.home.paeps.cx (Postfix) with ESMTP id 04B6C6199; Sat, 19 Mar 2005 23:55:08 +0100 (CET) Received: from fasolt.home.paeps.cx (philip@localhost [127.0.0.1]) by fasolt.home.paeps.cx (8.13.3/8.13.3) with ESMTP id j2JMt8bq078956; Sat, 19 Mar 2005 23:55:08 +0100 (CET) (envelope-from philip@fasolt.home.paeps.cx) Received: (from philip@localhost) by fasolt.home.paeps.cx (8.13.3/8.13.3/Submit) id j2JMt7ks078955; Sat, 19 Mar 2005 23:55:07 +0100 (CET) (envelope-from philip) Date: Sat, 19 Mar 2005 23:55:07 +0100 From: Philip Paeps To: Robert Watson Message-ID: <20050319225507.GH60989@fasolt.home.paeps.cx> Mail-Followup-To: Robert Watson , Doug Barton , freebsd-arch@freebsd.org References: <4239D7AD.7050004@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Date-in-Rome: ante diem XIV Kalendas Apriles MMDCCLVIII ab Urbe Condida X-PGP-Fingerprint: FA74 3C27 91A6 79D5 F6D3 FC53 BF4B D0E6 049D B879 X-Message-Flag: Get a proper mailclient! Organization: Happily Disorganized User-Agent: Mutt/1.5.9i cc: Doug Barton cc: freebsd-arch@FreeBSD.org Subject: Re: Time to stop buildling named (and friends) by default in 6-current? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Mar 2005 22:55:14 -0000 On 2005-03-18 12:14:03 (+0000), Robert Watson wrote: > On Thu, 17 Mar 2005, Doug Barton wrote: > > Scott Long wrote: > > > John Baldwin wrote: > > > > If we are going to do this, then why not just have users install bind > > > > from ports and only install the client as part of the base system? > > > > This is what we do with DHCP for example. Basically, if it's going to > > > > be an optional component, I think it belongs in ports, not the > > > > /usr/src. > > > > > > I agree here, though maybe the argument is moot now that Doug imported > > > 9.3.1 last night? Not changing the status quo is ok too. > > > > Scott, did you see my response to John's post? I don't consider any of > > this a done deal, but I had to get 9.3.1 in the tree asap in order to try > > and make an MFC before 5.4 goes out. If we collectively decide to strip > > named and friends out of the base, we can still do that. I know how to > > remove files from the vendor branch now. :) > > Personally, I'm something of a fan of keeping the complete BIND in the base > tree as is -- built by default, but not started at boot by default. It's > well-maintained, historically "BSD", and probably widely used as such. I agree with this. I wasn't very fond of BIND 8, but I've changed my mind after BIND 9 :-) It's a bit like sendmail -- very 'historically' BSD, and just something one expects to 'be there' in a complete way. Like sendmail, it's also very well maintained, which is an argument in favour of keeping it the way it is. - Philip -- Philip Paeps Please don't Cc me, I am philip@freebsd.org subscribed to the list. If you can't measure it, I'm not interested.