From owner-freebsd-arch Sun Feb 24 12:20:36 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id 9D0C537B400; Sun, 24 Feb 2002 12:20:33 -0800 (PST) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id HAA16123; Mon, 25 Feb 2002 07:20:30 +1100 Date: Mon, 25 Feb 2002 07:20:50 +1100 (EST) From: Bruce Evans X-X-Sender: To: Ruslan Ermilov Cc: , Bruce Evans , Pekka Savola Subject: Re: MAKEOBJDIRPREFIX In-Reply-To: <20020222080208.GB81821@sunbay.com> Message-ID: <20020225065837.F34027-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, 22 Feb 2002, Ruslan Ermilov wrote: > If I am reading the POSIX specs correctly, MAKEOBJDIRPREFIX could > be easily made to be honoured even if specified on a command line. > ... > The attached patch merely moves the `objdir' initialization below > the MainParseArgs() call, after all command line arguments have > already been parsed. Good idea. > To be honest, the current behavior does not contradict to POSIX > (which does not say anything about MAKEOBJDIR[PREFIX]), but the > proposed behavior would help users errouneously attempting to > set MAKEOBJDIRPREFIX on a command line. POSIX is mainly specifying the effect of macros on user makefiles, but I think setting macros on the command line should affect all of our make configuration files if this is not obviously wrong. > (It still does not work > if MAKEOBJDIRPREFIX is set as a make's global.) Do you mean globals in the make configuration files? Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Feb 24 12:43:11 2002 Delivered-To: freebsd-arch@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 3D1D537B402 for ; Sun, 24 Feb 2002 12:43:01 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g1OKfXt95731; Sun, 24 Feb 2002 12:41:33 -0800 (PST) (envelope-from dillon) Date: Sun, 24 Feb 2002 12:41:33 -0800 (PST) From: Matthew Dillon Message-Id: <200202242041.g1OKfXt95731@apollo.backplane.com> To: Seigo Tanimura Cc: arch@FreeBSD.ORG, Seigo Tanimura Subject: Re: reclaiming v_data of free vnodes References: <200202231556.g1NFu9N9040749@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :Thanks to vnlru, cvsup.jp.FreeBSD.org is keeping the number of vnodes :to quite sane value. (about 330K out of which 190K are in use) : :The next problem is the overuse of v_data. vmstat(1) at the uptime of :about 24 hours says: : : Type InUse MemUse HighUse Limit Requests Limit Limit Size(s) : FFS node323549 80888K 80897K102400K 29347661 0 0 256 : :which is almost the same as the number of the total vnodes. (in :cvsup.jp.FreeBSD.org, almost all in-use vnodes are actually inodes) : :This seems due to vrele() and vput() not calling VOP_RECLAIM(). One :solution is to always reclaim a vnode in vrele()/vput(), while we can :also run a kernel thread to scan the free vnodes and reclaim some of :them. Which one would be better, or are there any other ways? : :Any comments are welcome. : :-- :Seigo Tanimura Well, you definitely do not want vrele() or vput() to call VOP_RECLAIM() directly or you will blow the VFS cache (aka namei cache). 330,000 vnodes and/or inodes is pushing what a kernel with only 1G of KVM can handle. For these machines you may want to change the kernel start address from c000000 (1G of KVM) to 8000000 (2G of KVM). I forget exactly how that is done. Did kern.maxvnodes auto-size to 330,000 or did you set it up there manually? Or is kern.maxvnodes set lower and it blew it out on its own due to load? -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Feb 24 19:26: 8 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247]) by hub.freebsd.org (Postfix) with ESMTP id 3759037B402 for ; Sun, 24 Feb 2002 19:26:05 -0800 (PST) Received: from silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp [IPv6:3ffe:b80:5b0:3:280:c8ff:fe6b:6d73]) by rina.r.dl.itc.u-tokyo.ac.jp (8.12.2/3.7W-rina.r-Nankai-Koya) with ESMTP id g1P3Q1kg040648 ; Mon, 25 Feb 2002 12:26:02 +0900 (JST) Received: from silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1]) by silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (8.12.2/3.7W-carrots-Keikyu-Kurihama) with ESMTP id g1P3PVN9092431 ; Mon, 25 Feb 2002 12:25:59 +0900 (JST) Message-Id: <200202250325.g1P3PVN9092431@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> Date: Mon, 25 Feb 2002 12:25:30 +0900 From: Seigo Tanimura To: Matthew Dillon Cc: Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: reclaiming v_data of free vnodes In-Reply-To: <200202242041.g1OKfXt95731@apollo.backplane.com> References: <200202231556.g1NFu9N9040749@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> <200202242041.g1OKfXt95731@apollo.backplane.com> User-Agent: Wanderlust/2.8.1 (Something) SEMI/1.14.3 (Ushinoya) FLIM/1.14.3 (=?ISO-8859-1?Q?Unebigory=F2mae?=) APEL/10.3 MULE XEmacs/21.1 (patch 14) (Cuyahoga Valley) (i386--freebsd) Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sun, 24 Feb 2002 12:41:33 -0800 (PST), Matthew Dillon said: Matthew> cache). 330,000 vnodes and/or inodes is pushing what a kernel Matthew> with only 1G of KVM can handle. For these machines you may want Matthew> to change the kernel start address from c000000 (1G of KVM) to Matthew> 8000000 (2G of KVM). I forget exactly how that is done. Increasing KVM is not likely to help. The panic message in the Friday night was something like this: kmem_malloc(256): kmem_map too small: (~=200M) total allocated in kmem_malloc() called by ffs_vget(). It may help me to expand kmem_map to 512M. This, however, scales the number of vnodes/inodes to only up to about twice of the present number. Matthew> Did kern.maxvnodes auto-size to 330,000 or did you set it up Matthew> there manually? Or is kern.maxvnodes set lower and it blew it out Matthew> on its own due to load? It is set automatically by the kernel. -- Seigo Tanimura To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Feb 24 20:44:19 2002 Delivered-To: freebsd-arch@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 710CC37B400 for ; Sun, 24 Feb 2002 20:44:15 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g1P4i8X29005; Sun, 24 Feb 2002 20:44:08 -0800 (PST) (envelope-from dillon) Date: Sun, 24 Feb 2002 20:44:08 -0800 (PST) From: Matthew Dillon Message-Id: <200202250444.g1P4i8X29005@apollo.backplane.com> To: Seigo Tanimura Cc: arch@FreeBSD.ORG Subject: Re: reclaiming v_data of free vnodes References: <200202231556.g1NFu9N9040749@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> <200202242041.g1OKfXt95731@apollo.backplane.com> <200202250325.g1P3PVN9092431@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :On Sun, 24 Feb 2002 12:41:33 -0800 (PST), : Matthew Dillon said: : :Matthew> cache). 330,000 vnodes and/or inodes is pushing what a kernel :Matthew> with only 1G of KVM can handle. For these machines you may want :Matthew> to change the kernel start address from c000000 (1G of KVM) to :Matthew> 8000000 (2G of KVM). I forget exactly how that is done. : :Increasing KVM is not likely to help. The panic message in the Friday :night was something like this: : :kmem_malloc(256): kmem_map too small: (~=200M) total allocated : :in kmem_malloc() called by ffs_vget(). : :It may help me to expand kmem_map to 512M. This, however, scales the :number of vnodes/inodes to only up to about twice of the present :number. You can use the boot-time tunable 'kern.vm.kmem.size' to set the size of kmem. You may have to reduce the size of the buffer cache to make everything fit. Also, if you make kmem_map too large you can run the system out of other types of space, like the zalloc memory space (which is allocated from the remaining KVA beyond the kmem_map), and memory for pipes. If this gets too tight you will have to increase the total amount of KVM for the system (which also decreases the size of the user per-process VM). At some point the number of vnodes will balance against cacheable memory. Vnodes are reclaimed when they no longer have any backing VM pages. The more memory the machine has, the more vnodes it can cache before it starts reclaiming them. This is why this hasn't been a problem before now... machines typically did not have enough physical memory to be able to cache backing store for a large number of vnodes. -Matt Matthew Dillon :Matthew> Did kern.maxvnodes auto-size to 330,000 or did you set it up :Matthew> there manually? Or is kern.maxvnodes set lower and it blew it out :Matthew> on its own due to load? : :It is set automatically by the kernel. : :-- :Seigo Tanimura : To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 25 0: 4:40 2002 Delivered-To: freebsd-arch@freebsd.org Received: from whale.sunbay.crimea.ua (whale.sunbay.crimea.ua [212.110.138.65]) by hub.freebsd.org (Postfix) with ESMTP id BFFF437B405; Mon, 25 Feb 2002 00:04:23 -0800 (PST) Received: (from ru@localhost) by whale.sunbay.crimea.ua (8.11.6/8.11.2) id g1P83xT29772; Mon, 25 Feb 2002 10:03:59 +0200 (EET) (envelope-from ru) Date: Mon, 25 Feb 2002 10:03:59 +0200 From: Ruslan Ermilov To: Bruce Evans Cc: arch@FreeBSD.org, Bruce Evans Subject: Re: MAKEOBJDIRPREFIX Message-ID: <20020225080359.GB28900@sunbay.com> References: <20020222080208.GB81821@sunbay.com> <20020225065837.F34027-100000@gamplex.bde.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020225065837.F34027-100000@gamplex.bde.org> User-Agent: Mutt/1.3.27i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Feb 25, 2002 at 07:20:50AM +1100, Bruce Evans wrote: > On Fri, 22 Feb 2002, Ruslan Ermilov wrote: [...] > > (It still does not work > > if MAKEOBJDIRPREFIX is set as a make's global.) > > Do you mean globals in the make configuration files? > Yes. -- Ruslan Ermilov Sysadmin and DBA, ru@sunbay.com Sunbay Software AG, ru@FreeBSD.org FreeBSD committer, +380.652.512.251 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 25 7: 6:10 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rina.r.dl.itc.u-tokyo.ac.jp (cvsup2.r.dl.itc.u-tokyo.ac.jp [133.11.199.247]) by hub.freebsd.org (Postfix) with ESMTP id 9843137B405 for ; Mon, 25 Feb 2002 07:06:05 -0800 (PST) Received: from sohgo.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (sohgo.carrots.uucp.r.dl.itc.u-tokyo.ac.jp [IPv6:3ffe:b80:5b0:3:200:e8ff:fe14:9f8a]) by rina.r.dl.itc.u-tokyo.ac.jp (8.12.2/3.7W-rina.r-Nankai-Koya) with ESMTP id g1PF4ukg096523 ; Tue, 26 Feb 2002 00:05:00 +0900 (JST) Received: from sohgo.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (localhost [IPv6:::1]) by sohgo.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (8.12.2/3.7W-carrots-Tokyu-Meguro) with ESMTP id g1PF4knG004473 ; Tue, 26 Feb 2002 00:04:46 +0900 (JST) Received: (from root@localhost) by sohgo.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (8.12.2/3.7W-submit-carrots-Tokyu-Meguro) with UUCP id g1PF4i1k004472 ; Tue, 26 Feb 2002 00:04:44 +0900 (JST) Received: from bunko.nkth.carrots.uucp.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1]) by bunko (8.12.2/3.7W-nkth.carrots-Saitama-Misono) with ESMTP id g1PEb12R001419 ; Mon, 25 Feb 2002 23:37:01 +0900 (JST) Message-Id: <200202251437.g1PEb12R001419@bunko> Date: Mon, 25 Feb 2002 23:37:01 +0900 From: Seigo Tanimura To: Matthew Dillon Cc: Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: reclaiming v_data of free vnodes In-Reply-To: <200202250444.g1P4i8X29005@apollo.backplane.com> References: <200202231556.g1NFu9N9040749@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> <200202242041.g1OKfXt95731@apollo.backplane.com> <200202250325.g1P3PVN9092431@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> <200202250444.g1P4i8X29005@apollo.backplane.com> Cc: Seigo Tanimura User-Agent: Wanderlust/2.8.1 (Something) SEMI/1.14.3 (Ushinoya) FLIM/1.14.3 (=?ISO-8859-1?Q?Unebigory=F2mae?=) APEL/10.3 MULE XEmacs/21.1 (patch 14) (Cuyahoga Valley) (i386--freebsd) Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sun, 24 Feb 2002 20:44:08 -0800 (PST), Matthew Dillon said: dillon> :Matthew> cache). 330,000 vnodes and/or inodes is pushing what a kernel dillon> :Matthew> with only 1G of KVM can handle. For these machines you may want dillon> :Matthew> to change the kernel start address from c000000 (1G of KVM) to dillon> :Matthew> 8000000 (2G of KVM). I forget exactly how that is done. dillon> : dillon> :Increasing KVM is not likely to help. The panic message in the Friday dillon> :night was something like this: dillon> : dillon> :kmem_malloc(256): kmem_map too small: (~=200M) total allocated dillon> You can use the boot-time tunable 'kern.vm.kmem.size' to set dillon> the size of kmem. You may have to reduce the size of the dillon> buffer cache to make everything fit. Also, if you make kmem_map dillon> too large you can run the system out of other types of space, dillon> like the zalloc memory space (which is allocated from the remaining dillon> KVA beyond the kmem_map), and memory for pipes. I tried that, but that attempt failed because I miscalculated the size of space for the buffer. One question before increasing kern.vm.kmem.size: why does ffs not use the zone allocator for inodes? -- Seigo Tanimura To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 25 7:50:25 2002 Delivered-To: freebsd-arch@freebsd.org Received: from scaup.prod.itd.earthlink.net (scaup.mail.pas.earthlink.net [207.217.120.49]) by hub.freebsd.org (Postfix) with ESMTP id 09FB837B400 for ; Mon, 25 Feb 2002 07:50:24 -0800 (PST) Received: from pool0138.cvx40-bradley.dialup.earthlink.net ([216.244.42.138] helo=mindspring.com) by scaup.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16fNOB-0005lK-00; Mon, 25 Feb 2002 07:50:08 -0800 Message-ID: <3C7A5D24.E11A6693@mindspring.com> Date: Mon, 25 Feb 2002 07:49:56 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Seigo Tanimura Cc: Matthew Dillon , arch@FreeBSD.ORG, Seigo Tanimura Subject: Re: reclaiming v_data of free vnodes References: <200202231556.g1NFu9N9040749@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> <200202242041.g1OKfXt95731@apollo.backplane.com> <200202250325.g1P3PVN9092431@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> <200202250444.g1P4i8X29005@apollo.backplane.com> <200202251437.g1PEb12R001419@bunko> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Seigo Tanimura wrote: > One question before increasing kern.vm.kmem.size: why does ffs not use > the zone allocator for inodes? It doesn't need to, so it doesn't. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 25 9:58:59 2002 Delivered-To: freebsd-arch@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 0314A37B404 for ; Mon, 25 Feb 2002 09:58:58 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g1PHsTs50126; Mon, 25 Feb 2002 09:54:29 -0800 (PST) (envelope-from dillon) Date: Mon, 25 Feb 2002 09:54:29 -0800 (PST) From: Matthew Dillon Message-Id: <200202251754.g1PHsTs50126@apollo.backplane.com> To: Terry Lambert Cc: Seigo Tanimura , arch@FreeBSD.ORG, Seigo Tanimura Subject: Re: reclaiming v_data of free vnodes References: <200202231556.g1NFu9N9040749@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> <200202242041.g1OKfXt95731@apollo.backplane.com> <200202250325.g1P3PVN9092431@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> <200202250444.g1P4i8X29005@apollo.backplane.com> <200202251437.g1PEb12R001419@bunko> <3C7A5D24.E11A6693@mindspring.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :Seigo Tanimura wrote: :> One question before increasing kern.vm.kmem.size: why does ffs not use :> the zone allocator for inodes? : :It doesn't need to, so it doesn't. : :-- Terry I supose it could. It doesn't for historical reasons and also probably because the size of an 'inode' depends on the filesystem. How many new zones do you want to wind up with? -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 25 10:53:16 2002 Delivered-To: freebsd-arch@freebsd.org Received: from hawk.mail.pas.earthlink.net (hawk.mail.pas.earthlink.net [207.217.120.22]) by hub.freebsd.org (Postfix) with ESMTP id AEABB37B404 for ; Mon, 25 Feb 2002 10:53:14 -0800 (PST) Received: from pool0418.cvx40-bradley.dialup.earthlink.net ([216.244.43.163] helo=mindspring.com) by hawk.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 16fQF9-0000Qc-00; Mon, 25 Feb 2002 10:53:00 -0800 Message-ID: <3C7A87FE.6A02E830@mindspring.com> Date: Mon, 25 Feb 2002 10:52:46 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Matthew Dillon Cc: Seigo Tanimura , arch@FreeBSD.ORG, Seigo Tanimura Subject: Re: reclaiming v_data of free vnodes References: <200202231556.g1NFu9N9040749@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> <200202242041.g1OKfXt95731@apollo.backplane.com> <200202250325.g1P3PVN9092431@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> <200202250444.g1P4i8X29005@apollo.backplane.com> <200202251437.g1PEb12R001419@bunko> <3C7A5D24.E11A6693@mindspring.com> <200202251754.g1PHsTs50126@apollo.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Matthew Dillon wrote: > :Seigo Tanimura wrote: > :> One question before increasing kern.vm.kmem.size: why does ffs not use > :> the zone allocator for inodes? > : > :It doesn't need to, so it doesn't. > > I supose it could. It doesn't for historical reasons and also probably > because the size of an 'inode' depends on the filesystem. How many new > zones do you want to wind up with? It doesn't because it's a bad idea to precommit KVA space when there isn't an architectural requirement for it. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 26 1:24: 2 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rina.r.dl.itc.u-tokyo.ac.jp (rina.r.dl.itc.u-tokyo.ac.jp [133.11.199.247]) by hub.freebsd.org (Postfix) with ESMTP id A28FD37B405 for ; Tue, 26 Feb 2002 01:23:51 -0800 (PST) Received: from rina.r.dl.itc.u-tokyo.ac.jp (localhost [127.0.0.1]) by rina.r.dl.itc.u-tokyo.ac.jp (8.12.2/3.7W-rina.r-Nankai-Koya) with ESMTP id g1Q9NkVh093544 ; Tue, 26 Feb 2002 18:23:46 +0900 (JST) Message-Id: <200202260923.g1Q9NkVh093544@rina.r.dl.itc.u-tokyo.ac.jp> Date: Tue, 26 Feb 2002 18:23:45 +0900 From: Seigo Tanimura To: Matthew Dillon Cc: Terry Lambert , Seigo Tanimura , arch@FreeBSD.ORG Subject: Re: reclaiming v_data of free vnodes In-Reply-To: <200202251754.g1PHsTs50126@apollo.backplane.com> References: <200202231556.g1NFu9N9040749@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> <200202242041.g1OKfXt95731@apollo.backplane.com> <200202250325.g1P3PVN9092431@silver.carrots.uucp.r.dl.itc.u-tokyo.ac.jp> <200202250444.g1P4i8X29005@apollo.backplane.com> <200202251437.g1PEb12R001419@bunko> <3C7A5D24.E11A6693@mindspring.com> <200202251754.g1PHsTs50126@apollo.backplane.com> User-Agent: Wanderlust/2.8.1 (Something) SEMI/1.14.3 (Ushinoya) FLIM/1.14.3 (=?ISO-8859-1?Q?Unebigory=F2mae?=) APEL/10.3 MULE XEmacs/21.1 (patch 14) (Cuyahoga Valley) (i386--freebsd) Organization: Digital Library Research Division, Information Techinology Centre, The University of Tokyo MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: multipart/mixed; boundary="Multipart_Tue_Feb_26_18:23:45_2002-1" Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --Multipart_Tue_Feb_26_18:23:45_2002-1 Content-Type: text/plain; charset=US-ASCII On Mon, 25 Feb 2002 09:54:29 -0800 (PST), Matthew Dillon said: dillon> :Seigo Tanimura wrote: dillon> :> One question before increasing kern.vm.kmem.size: why does ffs not use dillon> :> the zone allocator for inodes? dillon> : dillon> :It doesn't need to, so it doesn't. dillon> : dillon> :-- Terry dillon> I supose it could. It doesn't for historical reasons and also probably dillon> because the size of an 'inode' depends on the filesystem. How many new dillon> zones do you want to wind up with? AFAIK, all of ffs, ifs and ext2fs utilize struct inode and dinode. They deal with the filesystem-specific data by a union. All of the filesystems should thus be able to share a single zone. Also, since an in-code inode is never allocated during interrupt, we can allocate an inode in an on-demand manner. (ie ZONE_INTERRUPT is not required) For the better observation of KVM usage, I attach the results of vmstat -mz on cvsup.jp.FreeBSD.org. --Multipart_Tue_Feb_26_18:23:45_2002-1 Content-Type: text/plain; charset=US-ASCII Content-Disposition: inline; filename="vmstat-mz.cvsup.jp.FreeBSD.org.txt" Content-Transfer-Encoding: 7bit Memory statistics by bucket size Size In Use Free Requests HighWater Couldfree 16 1733 1083 254474736 1280 0 32 2225 1999 21695880 640 2542 64 279661 74835 170616113 320 1609 128 67759 133873 5176585 160 304258 256 326620 34180 79722025 80 23083 512 1367 110145 1629327 40 264777 1K 185 2979 5146563 20 2592719 2K 21 293 150659 10 123120 4K 159 1 12619 5 0 8K 4 22 44191 5 43889 16K 15 0 15 5 0 32K 3 0 9 5 0 64K 1 0 1 5 0 128K 2 0 2 5 0 256K 2 0 2 5 0 512K 6 0 6 5 0 Memory usage type by bucket size Size Type(s) 16 nexusdev, UFS dirhash, newdirblk, p1003.1b, NFSV3 srvdesc, ip6_moptions, in6_multi, igmp, routetbl, ether_multi, vnodes, mount, pcb, soname, rman, mbufmgr, bus, sysctloid, sysctl, ip6ndp, temp, devbuf, linker, atexit, proc-args, acpica, acpidev, DEVFS 32 atkbddev, UFS dirhash, dirrem, mkdir, diradd, freefile, freefrag, indirdep, bmsafemap, newblk, tseg_qent, in_multi, routetbl, ether_multi, ifaddr, vnodes, cluster_save buffer, pcb, soname, sbuf, mbufmgr, eventhandler, bus, SWAP, sysctloid, sysctl, uidinfo, temp, devbuf, lockf, linker, proc-args, sigio, acpica, pfs_vncache 64 UFS dirhash, allocindir, allocdirect, pagedep, NFS daemon, NFS req, in6_multi, routetbl, ether_multi, BPF, vnodes, cluster_save buffer, vfscache, pcb, iov, rman, mbufmgr, bus, sysctloid, sysctl, subproc, module, acpisem, ip6ndp, temp, devbuf, lockf, ithread, proc-args, file, acpica, isadev 128 ZONE, ppbusdev, UFS dirhash, freeblks, inodedep, NFS srvsock, ip_moptions, routetbl, vnodes, mount, vfscache, soname, ttys, taskqueue, mbufmgr, eventhandler, bus, timecounter, cred, session, pgrp, module, ip6ndp, temp, devbuf, ithread, zombie, proc-args, acpica, pfs_nodes, DEVFS 256 UFS mount, UFS dirhash, FFS node, newblk, NFS daemon, NFSV3 srvdesc, routetbl, ifaddr, vnodes, Export Host, vfscache, iov, bus, subproc, temp, devbuf, linker, proc-args, kqueue, file desc, dev_t, acpica 512 UFS dirhash, NFS daemon, NFSV3 diroff, routetbl, lo, ifaddr, mount, vfscache, BIO buffer, ptys, ttys, msg, ioctlops, bus, ip6ndp, temp, devbuf, file desc, acpica 1K UFS dirhash, BIO buffer, sem, ioctlops, MD disk, bus, uidinfo, temp, devbuf, kqueue, file desc, acpica 2K UFS mount, UFS dirhash, ifaddr, BIO buffer, pcb, bus, temp, devbuf, file desc 4K memdesc, UFS mount, UFS dirhash, sem, msg, sbuf, kobj, bus, proc, temp, devbuf 8K UFS mount, UFS dirhash, indirdep, syncache, temp, DEVFS 16K shm, msg, devbuf 32K mbufmgr, temp, devbuf, pfs_fileno 64K temp 128K pagedep, mbufmgr 256K VM pgdata, UFS mount 512K UFS ihash, inodedep, NFS hash, vfscache, SWAP, ISOFS mount Memory statistics by type Type Kern Type InUse MemUse HighUse Limit Requests Limit Limit Size(s) atkbddev 2 1K 1K102400K 2 0 0 32 nexusdev 7 1K 1K102400K 7 0 0 16 memdesc 1 4K 4K102400K 1 0 0 4K ZONE 17 3K 3K102400K 17 0 0 128 VM pgdata 1 256K 256K102400K 1 0 0 256K ppbusdev 3 1K 1K102400K 3 0 0 128 UFS mount 18 176K 176K102400K 18 0 0 256,2K,4K,8K,256K UFS ihash 1 512K 512K102400K 1 0 0 512K UFS dirhash 1752 592K 2157K102400K 198303 0 0 16,32,64,128,256,512,1K,2K,4K,8K FFS node324574 81144K 81159K102400K 53680385 0 0 256 newdirblk 0 0K 1K102400K 1050 0 0 16 dirrem 8 1K 34K102400K 166236 0 0 32 mkdir 0 0K 31K102400K 14476 0 0 32 diradd 77 3K 34K102400K 225095 0 0 32 freefile 0 0K 22K102400K 108569 0 0 32 freeblks 85 11K 163K102400K 1258992 0 0 128 freefrag 10 1K 14K102400K 4297510 0 0 32 allocindir 22 2K 455K102400K 2239498 0 0 64 indirdep 8 1K 177K102400K 73726 0 0 32,8K allocdirect 307 20K 163K102400K 8372769 0 0 64 bmsafemap 81 3K 12K102400K 912675 0 0 32 newblk 1 1K 1K102400K 10612268 0 0 32,256 inodedep 222 540K 827K102400K 1519232 0 0 128,512K pagedep 21 130K 168K102400K 59610 0 0 64,128K p1003.1b 1 1K 1K102400K 1 0 0 16 NFS daemon 69 7K 7K102400K 69 0 0 64,256,512 NFSV3 srvdesc 18 2K 2K102400K 51449952 0 0 16,256 NFS srvsock 2 1K 1K102400K 3 0 0 128 NFS hash 1 512K 512K102400K 1 0 0 512K NFSV3 diroff 3 2K 3K102400K 28 0 0 512 NFS req 0 0K 3K102400K 443556 0 0 64 ip6_moptions 2 1K 1K102400K 2 0 0 16 in6_multi 8 1K 1K102400K 8 0 0 16,64 syncache 1 8K 8K102400K 1 0 0 8K tseg_qent 0 0K 5K102400K 1440028 0 0 32 ip_moptions 1 1K 1K102400K 1 0 0 128 in_multi 3 1K 1K102400K 3 0 0 32 igmp 1 1K 1K102400K 1 0 0 16 routetbl 411 58K 91K102400K 54930 0 0 16,32,64,128,256,512 lo 1 1K 1K102400K 1 0 0 512 ether_multi 38 2K 2K102400K 38 0 0 16,32,64 ifaddr 25 8K 8K102400K 25 0 0 32,256,512,2K BPF 4 1K 1K102400K 4 0 0 64 vnodes 27 7K 7K102400K 347 0 0 16,32,64,128,256 Export Host 42 11K 11K102400K 42 0 0 256 mount 17 9K 9K102400K 107 0 0 16,128,512 cluster_save buffer 0 0K 1K102400K 1430745 0 0 32,64 vfscache341747 26533K 71233K102400K 68580240 0 0 64,128,256,512,512K BIO buffer 22 23K 3126K102400K 5081117 0 0 512,1K,2K pcb 161 7K 10K102400K 1045393 0 0 16,32,64,2K soname 39 2K 2K102400K 35188222 0 0 16,32,128 ptys 4 2K 2K102400K 4 0 0 512 ttys 566 75K 75K102400K 4739 0 0 128,512 shm 1 12K 12K102400K 1 0 0 16K sem 3 6K 6K102400K 3 0 0 1K,4K msg 4 25K 25K102400K 4 0 0 512,4K,16K iov 0 0K 1K102400K 6 0 0 64,256 ioctlops 0 0K 1K102400K 23 0 0 512,1K taskqueue 2 1K 1K102400K 2 0 0 128 sbuf 0 0K 5K102400K 2 0 0 32,4K rman 63 4K 4K102400K 410 0 0 16,64 mbufmgr 648 115K 122K102400K190799574 0 0 16,32,64,128,32K,128K MD disk 1 1K 1K102400K 1 0 0 1K kobj 124 496K 496K102400K 124 0 0 4K eventhandler 29 2K 2K102400K 29 0 0 32,128 bus 700 50K 50K102400K 3697 0 0 16,32,64,128,256,512,1K,2K,4K SWAP 2 1097K 1097K102400K 2 0 0 32,512K timecounter 100 13K 13K102400K 100 0 0 128 sysctloid 46 2K 2K102400K 46 0 0 16,32,64 sysctl 0 0K 1K102400K 2841328 0 0 16,32,64 uidinfo 9 2K 2K102400K 635 0 0 32,1K cred 612 77K 77K102400K 313395 0 0 128 subproc 262 21K 22K102400K 95303 0 0 64,256 proc 2 8K 8K102400K 2 0 0 4K session 46 6K 7K102400K 3081 0 0 128 pgrp 52 7K 8K102400K 3378 0 0 128 module 170 11K 11K102400K 170 0 0 64,128 acpisem 16 1K 1K102400K 16 0 0 64 ip6ndp 8 2K 2K102400K 10 0 0 16,64,128,512 temp 4 61K 98K102400K 5892122 0 0 16,32,64,128,256,512,1K,2K,4K,8K,32K,64K devbuf 680 520K 521K102400K 1564 0 0 16,32,64,128,256,512,1K,2K,4K,16K,32K lockf 204 7K 8K102400K 403580 0 0 32,64 linker 34 2K 2K102400K 48 0 0 16,32,256 ithread 45 5K 5K102400K 45 0 0 64,128 atexit 2 1K 1K102400K 2 0 0 16 zombie 0 0K 2K102400K 91985 0 0 128 proc-args 146 8K 12K102400K 153864 0 0 16,32,64,128,256 kqueue 10 10K 16K102400K 1942 0 0 256,1K sigio 1 1K 1K102400K 58 0 0 32 file 1084 68K 73K102400K 89495821 0 0 64 file desc 274 75K 77K102400K 93042 0 0 256,512,1K,2K dev_t 1477 370K 370K102400K 1477 0 0 256 acpica 2170 112K 112K102400K 10915 0 0 16,32,64,128,256,512,1K acpidev 100 2K 2K102400K 100 0 0 16 ISOFS mount 1 512K 512K102400K 1 0 0 512K isadev 28 2K 2K102400K 28 0 0 64 pfs_vncache 1 1K 3K102400K 575 0 0 32 pfs_fileno 1 20K 20K102400K 1 0 0 32K pfs_nodes 20 3K 3K102400K 20 0 0 128 DEVFS 149 25K 25K102400K 149 0 0 16,128,8K Memory Totals: In Use Free Requests 114399K 88854K 538668733 ITEM SIZE LIMIT USED FREE REQUESTS PIPE: 192, 0, 128, 42, 48084 SWAPMETA: 160, 509724, 102, 1178, 1641 unpcb: 160, 0, 54, 71, 10441 ripcb: 192, 8232, 3, 39, 7 syncache: 160, 15359, 0, 51, 1607721 tcpcb: 544, 8232, 375, 2441, 2614084 udpcb: 192, 8232, 103, 67, 39930 socket: 224, 8232, 536, 2408, 2664468 DIRHASH: 1024, 0, 1621, 363, 238470 KNOTE: 64, 0, 0, 128, 1349 NFSNODE: 320, 0, 207, 897, 17215 NFSMOUNT: 256, 0, 7, 25, 95 NAMEI: 1024, 0, 1, 79, 641224701 VNODEPOLL: 64, 0, 0, 320, 5 VNODE: 224, 0, 324849, 21, 324849 VMSPACE: 224, 0, 202, 68, 92101 PROC: 768, 0, 239, 21, 92224 DP fakepg: 64, 0, 0, 0, 0 PV ENTRY: 28, 2396042, 277407, 509014, 305744319 MAP ENTRY: 48, 0, 7139, 1234, 450455008 KMAP ENTRY: 48, 193110, 774, 122, 117340 MAP: 108, 0, 8, 39, 8 VM OBJECT: 96, 0, 196349, 124241, 8967708 --Multipart_Tue_Feb_26_18:23:45_2002-1 Content-Type: text/plain; charset=US-ASCII -- Seigo Tanimura --Multipart_Tue_Feb_26_18:23:45_2002-1-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 26 23: 3:45 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by hub.freebsd.org (Postfix) with ESMTP id 0CC1737B417 for ; Tue, 26 Feb 2002 23:03:40 -0800 (PST) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g1R73d807187 for ; Wed, 27 Feb 2002 02:03:39 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Wed, 27 Feb 2002 02:03:39 -0500 (EST) From: Jeff Roberson To: arch@freebsd.org Subject: Slab allocator Message-ID: <20020227005915.C17591-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I have patches available that implement a slab allocator. This was mostly inspired by the solaris allocator of the same name. I have deviated somewhat from their implementation though. I will describe some of the high level features here. Firstly, it has a zone like interface, where objects of the same type/size are allocated from the same zone. This allows you to do object caching so that users of the interface may depend on the state of the object upon allocation. This allows them to skip potentially expensive initialization. This is some times referred to as type stable storage. The design also allows you to free memory back to the system on demand. Currently this is done by the pageout daemon. All zones are scanned for whole free pages and may release their pages. I implemented a 20 second working set algorithm so that the zone will try to keep enough free items to satisfy the last 20 seconds worth of load. This stops the zones from releasing memory they will just ask for again. This feature is disabled via a zone create time flag for objects which depend on what some folks call type stable storage. This is common among objects with a generation count. I call this alternately 'broken' and 'address stable storage'. I intend to fix those objects which rely on this behavior if this implementation is accepted. There are also per cpu queues of items, with a per cpu lock. This allows for very effecient allocation, and also it provides near linear performance as the number of cpus increase. I do still depend on giant to talk to the back end page supplier (kmem_alloc, etc.). Once the VM is locked the allocator will not require giant at all. Using this implementation I have replaced malloc with a wrapper that calls into the slab allocator. There is a zone for every malloc size. This allows us to use non power of two malloc sizes. This could yield significant memory savings. Also, using this approach we automatically get a fine grain locked malloc. I would eventually like to pull other allocators into uma (The slab allocator). We could get rid of some of the kernel submaps and provide a much more dynamic amount of various resources. Something I had in mind were pbufs and mbufs, which could easily come from uma. This gives us the ability to redistribute memory to wherever it is needed, and not lock it in a particular place once it's there. I'm sure you're all wondering about performance. At one point uma was much faster than the standard system, but then I got around to finishing it. ;-) At this point I get virtually no difference in the time it takes to compile a kernel from the orignal kernel. Once more object initializers are implemented this will only improve. On workloads that cause heavy paging I have noticed considerable improvements due to the release of pages that were previously permanent. I will get some numbers on this soon. I have old statistics, but too much has changed for me to post them. There are a few things that need to be fixed right now. For one, the zone statistics don't reflect the items that are in the per cpu queues. I'm thinking about clean ways to collect this without locking every zone and per cpu queue when some one calls sysctl. The other problem is with the per cpu buckets. They are a fixed size right now. I need to define several zones for the buckets to come from and a way to manage growing/shrinking the buckets. There are two things that I would really like comments on. 1) Should I keep the uma_ prefixes on exported functions/types. 2) How much of the malloc_type stats should I keep? They either require atomic ops or a lock in their current state. Also, non power of two malloc sizes breaks their usage tracking. 3) Should I rename the files to vm_zone.c vm_zone.h, etc? Since you've read this far, I'll let you know where the patch is. ;-) http://www.chesapeake.net/~jroberson/uma.tar This includes a patch to the base system that converts several previous vm_zone users to uma users, and it also provides a vm_zone wrapper for those that haven't been converted. I did this to minimize the diffs so it would be easier to review. This also has vm/uma* which you need to extract into your sys/ directory. Any feedback is appreciated. I'd like to know what people expect from this before it is committable. Jeff PS Sorry for the long winded email. :-) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 1: 0:45 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by hub.freebsd.org (Postfix) with ESMTP id C936637B405 for ; Wed, 27 Feb 2002 01:00:43 -0800 (PST) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g1R90hH31393 for ; Wed, 27 Feb 2002 04:00:43 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Wed, 27 Feb 2002 04:00:43 -0500 (EST) From: Jeff Roberson To: arch@freebsd.org Subject: Re: Slab allocator In-Reply-To: <20020227005915.C17591-100000@mail.chesapeake.net> Message-ID: <20020227040002.L17591-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG This patch seems to break with witness enabled. I'm looking into it now. It seems to be fully safe w/o witness though. Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 8:44: 9 2002 Delivered-To: freebsd-arch@freebsd.org Received: from beastie.mckusick.com (beastie.mckusick.com [209.31.233.184]) by hub.freebsd.org (Postfix) with ESMTP id 774FC37B417 for ; Wed, 27 Feb 2002 08:44:03 -0800 (PST) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.11.4/8.9.3) with ESMTP id g1RGi1i32186; Wed, 27 Feb 2002 08:44:01 -0800 (PST) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200202271644.g1RGi1i32186@beastie.mckusick.com> To: Jeff Roberson Subject: Re: Slab allocator Cc: arch@FreeBSD.ORG In-Reply-To: Your message of "Wed, 27 Feb 2002 04:00:43 EST." <20020227040002.L17591-100000@mail.chesapeake.net> Date: Wed, 27 Feb 2002 08:44:01 -0800 From: Kirk McKusick Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Your slab allocator work looks very interesting. Given that the performance is comparable to what we are currently running, it seems like a big step forward. The current statistics that I find most useful are (in order), the number of current allocations, the high watermark for allocations, and the total number of allocations made. The number of current allocations needs to be correct and should be done under a lock. The high watermark and total number of allocations need not be precisely correct, so could well be done without a lock. The occational missed count, or a slightly low high watermark would not much matter. Kirk McKusick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 9:50:24 2002 Delivered-To: freebsd-arch@freebsd.org Received: from angelica.unixdaemons.com (angelica.unixdaemons.com [209.148.64.135]) by hub.freebsd.org (Postfix) with ESMTP id DC7C337B41E for ; Wed, 27 Feb 2002 09:50:02 -0800 (PST) Received: from angelica.unixdaemons.com (bmilekic@localhost.unixdaemons.com [127.0.0.1]) by angelica.unixdaemons.com (8.12.2/8.12.1) with ESMTP id g1RHnjh4032828; Wed, 27 Feb 2002 12:49:45 -0500 (EST) Received: (from bmilekic@localhost) by angelica.unixdaemons.com (8.12.2/8.12.1/Submit) id g1RHnjnw032827; Wed, 27 Feb 2002 12:49:45 -0500 (EST) (envelope-from bmilekic) Date: Wed, 27 Feb 2002 12:49:45 -0500 From: Bosko Milekic To: Jeff Roberson Cc: arch@FreeBSD.ORG Subject: Re: Slab allocator Message-ID: <20020227124945.A29065@unixdaemons.com> References: <20020227005915.C17591-100000@mail.chesapeake.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20020227005915.C17591-100000@mail.chesapeake.net>; from jroberson@chesapeake.net on Wed, Feb 27, 2002 at 02:03:39AM -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Feb 27, 2002 at 02:03:39AM -0500, Jeff Roberson wrote: [...] > There are two things that I would really like comments on. > > 1) Should I keep the uma_ prefixes on exported functions/types. Yes, I believe that this is generally good practise. > 2) How much of the malloc_type stats should I keep? They either require > atomic ops or a lock in their current state. Also, non power of two > malloc sizes breaks their usage tracking. I suggest that you re-do the stats based on uma, and not based on exactly what malloc exported. Try to group most of the stat modifications (writes) under some existing lock, if that's possible. If not, please don't add another lock just for those stats. I don't think that any statistics should require a lock just to be read (via sysctl()). > 3) Should I rename the files to vm_zone.c vm_zone.h, etc? I think you shouldn't. :-) We shouldn't confuse uma with our old vm_zone. > Since you've read this far, I'll let you know where the patch is. ;-) > > http://www.chesapeake.net/~jroberson/uma.tar > > This includes a patch to the base system that converts several previous > vm_zone users to uma users, and it also provides a vm_zone wrapper for > those that haven't been converted. I did this to minimize the diffs so it > would be easier to review. This also has vm/uma* which you need to > extract into your sys/ directory. > > Any feedback is appreciated. I'd like to know what people expect from > this before it is committable. After looking over it, I think that: (i) This allocator should go into -CURRENT (ii) For what concerns malloc and vm_zone, I suggest that they still live in -CURRENT for the next couple of months (since you've already instrumented them to wrap to uma in your patch). I sugest that a sysctl knob determines whether or not malloc and vm_zone allocation calls wrap to uma or call their old selves. This should allow us to quickly switch back and forth for the next couple of months, and it will probably also minimize conflicts for those maintaining large patch sets (*ducks* :-)). (iii) For what concerns mbuf allocations, I don't know what to tell you right now. Ultimately, I would like to see uma eventually replace mb_alloc. However, I would like to make sure that the transition is smooth and painless and that we don't lose any performance. This is why I think that co-existing uma with all existing code (see (ii)) is a good idea. Then, if you're willing, I would be glad to help "attach" mbuf allocations to uma, should that be possible and should we decide, as a group, that that is what we want to do. > Jeff > > PS Sorry for the long winded email. :-) Regards, -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 9:58:59 2002 Delivered-To: freebsd-arch@freebsd.org Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50]) by hub.freebsd.org (Postfix) with ESMTP id 1128D37B405 for ; Wed, 27 Feb 2002 09:58:50 -0800 (PST) Received: from pool0329.cvx22-bradley.dialup.earthlink.net ([209.179.199.74] helo=mindspring.com) by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16g8Lo-0005ad-00; Wed, 27 Feb 2002 09:58:48 -0800 Message-ID: <3C7D1E31.B13915E7@mindspring.com> Date: Wed, 27 Feb 2002 09:58:09 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Roberson Cc: arch@freebsd.org Subject: Re: Slab allocator References: <20020227005915.C17591-100000@mail.chesapeake.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG First, let me say OUTSTANDING WORK! Jeff Roberson wrote: > There are also per cpu queues of items, with a per cpu lock. This allows > for very effecient allocation, and also it provides near linear > performance as the number of cpus increase. I do still depend on giant to > talk to the back end page supplier (kmem_alloc, etc.). Once the VM is > locked the allocator will not require giant at all. What is the per-CPU lock required for? I think it can be gotten rid of, or at least taken out of the critical path, with more information. > I would eventually like to pull other allocators into uma (The slab > allocator). We could get rid of some of the kernel submaps and provide a > much more dynamic amount of various resources. Something I had in mind > were pbufs and mbufs, which could easily come from uma. This gives us the > ability to redistribute memory to wherever it is needed, and not lock it > in a particular place once it's there. How do you handle interrupt-time allocation of mbufs, in this case? The zalloci() handles this by pre-creation of the PTE's for the page mapping in the KVA, and then only has to deal with grabbing free physical pages to back them, which is a non-blocking operation that can occur at interrupt, and which, if it fails, is not fatal (i.e. it's handled; I've considered doing the same for the page mapping and PTE's, but that would make the time-to-run far less deterministic). > There are a few things that need to be fixed right now. For one, the zone > statistics don't reflect the items that are in the per cpu queues. I'm > thinking about clean ways to collect this without locking every zone and > per cpu queue when some one calls sysctl. The easy way around this is to say that these values are snpashots. So you maintain the figures of merit on a per CPU basis in the context of the CPU doing the allocations and deallocations, and treat it as read-only for the purposes of statistics reporting. This means that you don't need locks to get the statistics. For debugging, you could provide a rigid locked interface (e.g. by only enabling locking for the statistics gathering via a sysctl that defaults to "off"). > The other problem is with the per cpu buckets. They are a > fixed size right now. I need to define several zones for > the buckets to come from and a way to manage growing/shrinking > the buckets. I built a "chain" allocator that dealt with this issue, and also the object granularit issue. Basically, it calculated the LCM of the object size rounded to a MAX(sizeof(long),8) boundary for processor alignment sensitivity reasons, and the page size (also for processor sensitivity reasons), and then allocated a contiguous region from which it obtained objects of that type. All in all, it meant zero unnecessary space wastage (for 1,000,000 TCP connections, the savings were 1/4 of a Gigabyte for one zone alone). > There are two things that I would really like comments on. > > 1) Should I keep the uma_ prefixes on exported functions/types. Think of an acceptable acronym and use that; if UMA is maningful, it's as good as any. The real issue is to be able to rip out the old code, and see where the bleeders are so that the switchover can be as painless as possible. > 2) How much of the malloc_type stats should I keep? They either require > atomic ops or a lock in their current state. Also, non power of two > malloc sizes breaks their usage tracking. See above for the locks; I think they are unnecessary, unless you are debugging, and arguably unnecessary then, unless the lock is global to all CPUs. For the power of two stats, we may lose them, but we gain a higher granularity on zone identification for objects that right now get rounded into the same zone. I think that's an acceptable trade-off, if not a net win. > 3) Should I rename the files to vm_zone.c vm_zone.h, etc? This should be last, I think. And thanks again for the most excellent work! -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 10: 0:31 2002 Delivered-To: freebsd-arch@freebsd.org Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50]) by hub.freebsd.org (Postfix) with ESMTP id 9855537B41C for ; Wed, 27 Feb 2002 10:00:28 -0800 (PST) Received: from pool0329.cvx22-bradley.dialup.earthlink.net ([209.179.199.74] helo=mindspring.com) by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16g8NJ-0007jj-00; Wed, 27 Feb 2002 10:00:21 -0800 Message-ID: <3C7D1E8E.98DAAC8C@mindspring.com> Date: Wed, 27 Feb 2002 09:59:42 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Kirk McKusick Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator References: <200202271644.g1RGi1i32186@beastie.mckusick.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Kirk McKusick wrote: > Your slab allocator work looks very interesting. Given that > the performance is comparable to what we are currently running, > it seems like a big step forward. > > The current statistics that I find most useful are (in order), the > number of current allocations, the high watermark for allocations, > and the total number of allocations made. The number of current > allocations needs to be correct and should be done under a lock. > The high watermark and total number of allocations need not be > precisely correct, so could well be done without a lock. The > occational missed count, or a slightly low high watermark would > not much matter. Totally agree with everything in this post, particularly the stats locks being unnecessary. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 11: 0:19 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rwcrmhc51.attbi.com (rwcrmhc51.attbi.com [204.127.198.38]) by hub.freebsd.org (Postfix) with ESMTP id 19E0737B41A for ; Wed, 27 Feb 2002 11:00:17 -0800 (PST) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc51.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020227190016.RGDA2626.rwcrmhc51.attbi.com@InterJet.elischer.org>; Wed, 27 Feb 2002 19:00:16 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id KAA01167; Wed, 27 Feb 2002 10:58:14 -0800 (PST) Date: Wed, 27 Feb 2002 10:58:13 -0800 (PST) From: Julian Elischer To: Bosko Milekic Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: mbuf allocation. (was: Slab allocator) In-Reply-To: <20020227124945.A29065@unixdaemons.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 27 Feb 2002, Bosko Milekic wrote: > > (iii) For what concerns mbuf allocations, I don't know what to tell > you right now. Ultimately, I would like to see uma eventually > replace mb_alloc. However, I would like to make sure that the > transition is smooth and painless and that we don't lose any > performance. This is why I think that co-existing uma with all > existing code (see (ii)) is a good idea. Then, if you're > willing, I would be glad to help "attach" mbuf allocations to > uma, should that be possible and should we decide, as a group, > that that is what we want to do. On the topic of mbuf allocation: if we have a slab allocator it wouldn't matter if we have a 3rd size of mbuf, just the size of a mbuf header structure (with packet header) that gets used whenever a cluster-mbuf combo is allocated. This fits in with the comment that someone made recently about having an allocator function for a mbuf+cluster instead of having the caller allolcate one and hte add a cluster to it. I know of no code that takes an mbuf cluster, strips off the cluster, and then uses it as a normal mbuf, so the extra 200 bytes being carried around is always wasted. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 11:26:19 2002 Delivered-To: freebsd-arch@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 7882737B400 for ; Wed, 27 Feb 2002 11:26:13 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g1RJQCm29905; Wed, 27 Feb 2002 11:26:12 -0800 (PST) (envelope-from dillon) Date: Wed, 27 Feb 2002 11:26:12 -0800 (PST) From: Matthew Dillon Message-Id: <200202271926.g1RJQCm29905@apollo.backplane.com> To: Jeff Roberson Cc: arch@FreeBSD.ORG Subject: Re: Slab allocator References: <20020227005915.C17591-100000@mail.chesapeake.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :... : :There are also per cpu queues of items, with a per cpu lock. This allows :for very effecient allocation, and also it provides near linear :performance as the number of cpus increase. I do still depend on giant to :talk to the back end page supplier (kmem_alloc, etc.). Once the VM is :locked the allocator will not require giant at all. :... : :Since you've read this far, I'll let you know where the patch is. ;-) : :http://www.chesapeake.net/~jroberson/uma.tar :... :Any feedback is appreciated. I'd like to know what people expect from :this before it is committable. : :Jeff : :PS Sorry for the long winded email. :-) Well, one thing I've noticed right off the bat is that the code is trying to take advantage of per-cpu queues but is still having to obtain a per-cpu mutex to lock the per-cpu queue. Another thing I noticed is that the code appears to assume that PCPU_GET(cpuid) is stable in certain places, and I don't think that condition necessarily holds unless you explicitly enter a critical section (critical_enter() and critical_exit()). There are some cases where you obtain the per-cpu cache and lock it, which would be safe even if the cpu changed out from under you, and other case such as in uma_zalloc_internal() where you assume that the cpuid is stable when it isn't. I also noticed that cache_drain() appears to be the only place where you iterate through the per-cpu mutexes. All the other places appear to use the current-cpu's mutex. I would recommend the following: * That you review your code with special attention to the lack of stability of PCPU_GET(cpuid) when you are not in a critical section. * That you consider getting rid of the per-cpu locks and instead use critical_enter() and critical_exit() to obtain a stable cpuid in order to allocate or free from the current cpu's cache without having to obtain any mutexes whatsoever. Theoretically this would allow most calls to allocate and free small amounts of memory to run as fast as a simple procedure call would run (akin to what the kernel malloc() in -stable is able to accomplish). * That you consider an alternative method for draining the per-cpu caches. For example, by having the per-cpu code use a global, shared SX lock along with the critical section to access their per-cpu caches and then have the cache_drain code obtain an exclusive SX lock in order to have full access to all of the per-cpu caches. * Documentation. i.e. comment the code more, especially areas where you have to special-case things like for example when you unlock a cpu cache in order to call uma_zfree_internal(). -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 11:33:50 2002 Delivered-To: freebsd-arch@freebsd.org Received: from angelica.unixdaemons.com (angelica.unixdaemons.com [209.148.64.135]) by hub.freebsd.org (Postfix) with ESMTP id 4066C37B41D for ; Wed, 27 Feb 2002 11:33:38 -0800 (PST) Received: from angelica.unixdaemons.com (bmilekic@localhost.unixdaemons.com [127.0.0.1]) by angelica.unixdaemons.com (8.12.2/8.12.1) with ESMTP id g1RJXUh4040455; Wed, 27 Feb 2002 14:33:30 -0500 (EST) Received: (from bmilekic@localhost) by angelica.unixdaemons.com (8.12.2/8.12.1/Submit) id g1RJXUcj040454; Wed, 27 Feb 2002 14:33:30 -0500 (EST) (envelope-from bmilekic) Date: Wed, 27 Feb 2002 14:33:30 -0500 From: Bosko Milekic To: Terry Lambert Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator Message-ID: <20020227143330.A34054@unixdaemons.com> References: <20020227005915.C17591-100000@mail.chesapeake.net> <3C7D1E31.B13915E7@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3C7D1E31.B13915E7@mindspring.com>; from tlambert2@mindspring.com on Wed, Feb 27, 2002 at 09:58:09AM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Feb 27, 2002 at 09:58:09AM -0800, Terry Lambert wrote: > First, let me say OUTSTANDING WORK! > > Jeff Roberson wrote: > > There are also per cpu queues of items, with a per cpu lock. This allows > > for very effecient allocation, and also it provides near linear > > performance as the number of cpus increase. I do still depend on giant to > > talk to the back end page supplier (kmem_alloc, etc.). Once the VM is > > locked the allocator will not require giant at all. > > What is the per-CPU lock required for? I think it can be > gotten rid of, or at least taken out of the critical path, > with more information. Per-CPU caches. Reduces lock contention and trashes caches less often. > > I would eventually like to pull other allocators into uma (The slab > > allocator). We could get rid of some of the kernel submaps and provide a > > much more dynamic amount of various resources. Something I had in mind > > were pbufs and mbufs, which could easily come from uma. This gives us the > > ability to redistribute memory to wherever it is needed, and not lock it > > in a particular place once it's there. > > How do you handle interrupt-time allocation of mbufs, in > this case? The zalloci() handles this by pre-creation of > the PTE's for the page mapping in the KVA, and then only > has to deal with grabbing free physical pages to back them, > which is a non-blocking operation that can occur at interrupt, > and which, if it fails, is not fatal (i.e. it's handled; I've > considered doing the same for the page mapping and PTE's, but > that would make the time-to-run far less deterministic). Terry, how long will you keep thinking that mbufs come through the zone allocator? :-) For G*d's sake man, we've been over this before! > > There are a few things that need to be fixed right now. For one, the zone > > statistics don't reflect the items that are in the per cpu queues. I'm > > thinking about clean ways to collect this without locking every zone and > > per cpu queue when some one calls sysctl. > > The easy way around this is to say that these values are > snpashots. So you maintain the figures of merit on a per > CPU basis in the context of the CPU doing the allocations > and deallocations, and treat it as read-only for the > purposes of statistics reporting. This means that you > don't need locks to get the statistics. For debugging, > you could provide a rigid locked interface (e.g. by only > enabling locking for the statistics gathering via a sysctl > that defaults to "off"). Yes, this is exactly what we did with mb_alloc. This is also what I was trying to say in my last Email. > > The other problem is with the per cpu buckets. They are a > > fixed size right now. I need to define several zones for > > the buckets to come from and a way to manage growing/shrinking > > the buckets. > > I built a "chain" allocator that dealt with this issue, and > also the object granularit issue. Basically, it calculated > the LCM of the object size rounded to a MAX(sizeof(long),8) > boundary for processor alignment sensitivity reasons, and > the page size (also for processor sensitivity reasons), and > then allocated a contiguous region from which it obtained > objects of that type. All in all, it meant zero unnecessary > space wastage (for 1,000,000 TCP connections, the savings > were 1/4 of a Gigabyte for one zone alone). That's great, until you run out of pre-allocated contiguous space. [...] > And thanks again for the most excellent work! > > -- Terry -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 11:37:15 2002 Delivered-To: freebsd-arch@freebsd.org Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by hub.freebsd.org (Postfix) with ESMTP id 1E35D37B400 for ; Wed, 27 Feb 2002 11:37:13 -0800 (PST) Received: by elvis.mu.org (Postfix, from userid 1192) id E694FAE2AB; Wed, 27 Feb 2002 11:37:12 -0800 (PST) Date: Wed, 27 Feb 2002 11:37:12 -0800 From: Alfred Perlstein To: Bosko Milekic Cc: Terry Lambert , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator Message-ID: <20020227193712.GQ80761@elvis.mu.org> References: <20020227005915.C17591-100000@mail.chesapeake.net> <3C7D1E31.B13915E7@mindspring.com> <20020227143330.A34054@unixdaemons.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020227143330.A34054@unixdaemons.com> User-Agent: Mutt/1.3.27i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * Bosko Milekic [020227 11:33] wrote: > > On Wed, Feb 27, 2002 at 09:58:09AM -0800, Terry Lambert wrote: > > Terry, how long will you keep thinking that mbufs come through the > zone allocator? :-) For G*d's sake man, we've been over this before! He means sockets. :) -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' Tax deductible donations for FreeBSD: http://www.freebsdfoundation.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 11:40:19 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39]) by hub.freebsd.org (Postfix) with ESMTP id D71EB37B400 for ; Wed, 27 Feb 2002 11:40:16 -0800 (PST) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc53.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020227194016.RNDL2951.rwcrmhc53.attbi.com@InterJet.elischer.org>; Wed, 27 Feb 2002 19:40:16 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id LAA01420; Wed, 27 Feb 2002 11:34:49 -0800 (PST) Date: Wed, 27 Feb 2002 11:34:48 -0800 (PST) From: Julian Elischer To: Matthew Dillon Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator In-Reply-To: <200202271926.g1RJQCm29905@apollo.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 27 Feb 2002, Matthew Dillon wrote: > > :PS Sorry for the long winded email. :-) > > Well, one thing I've noticed right off the bat is that the code > is trying to take advantage of per-cpu queues but is still > having to obtain a per-cpu mutex to lock the per-cpu queue. I was wondering abuot that myself :-) > > Another thing I noticed is that the code appears to assume > that PCPU_GET(cpuid) is stable in certain places, and I don't > think that condition necessarily holds unless you explicitly > enter a critical section (critical_enter() and critical_exit()). > There are some cases where you obtain the per-cpu cache and lock > it, which would be safe even if the cpu changed out from under > you, and other case such as in uma_zalloc_internal() where you > assume that the cpuid is stable when it isn't. It is definitly not ok to assume that PCPU_GET(anything except curthread) is stable unless you have pre-emtion disabled. (e.g. via the crit_mumble() functions or straight interrupt dissablement. WHen jhb adds the pre-emption code, it is quite possible that you may be pre-empted at any unprotected point, and may be restarted on a different processor. curthread is an obvious exception as it travels with you.. > > * Documentation. i.e. comment the code more, especially > areas where you have to special-case things like for > example when you unlock a cpu cache in order to > call uma_zfree_internal(). yes, LOTS more comments please. Particularly giving the REASON you do things. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 11:42:58 2002 Delivered-To: freebsd-arch@freebsd.org Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by hub.freebsd.org (Postfix) with ESMTP id 3BB9037B41C for ; Wed, 27 Feb 2002 11:42:56 -0800 (PST) Received: by elvis.mu.org (Postfix, from userid 1192) id 13D01AE27F; Wed, 27 Feb 2002 11:42:56 -0800 (PST) Date: Wed, 27 Feb 2002 11:42:56 -0800 From: Alfred Perlstein To: Julian Elischer Cc: Matthew Dillon , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator Message-ID: <20020227194256.GR80761@elvis.mu.org> References: <200202271926.g1RJQCm29905@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.27i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * Julian Elischer [020227 11:40] wrote: > > > On Wed, 27 Feb 2002, Matthew Dillon wrote: > > > > > :PS Sorry for the long winded email. :-) > > > > Well, one thing I've noticed right off the bat is that the code > > is trying to take advantage of per-cpu queues but is still > > having to obtain a per-cpu mutex to lock the per-cpu queue. > > I was wondering abuot that myself :-) It's basically the pre-emption stuff you guys are wondering about along with the possiblity of free'ing back to another cpu's cache that may be an issue. Jeff, are you fee'ing memory back to the cache it was initially allocated from or not? -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 11:47: 1 2002 Delivered-To: freebsd-arch@freebsd.org Received: from angelica.unixdaemons.com (angelica.unixdaemons.com [209.148.64.135]) by hub.freebsd.org (Postfix) with ESMTP id 42D4237B400 for ; Wed, 27 Feb 2002 11:46:56 -0800 (PST) Received: from angelica.unixdaemons.com (bmilekic@localhost.unixdaemons.com [127.0.0.1]) by angelica.unixdaemons.com (8.12.2/8.12.1) with ESMTP id g1RJkgh4041423; Wed, 27 Feb 2002 14:46:42 -0500 (EST) Received: (from bmilekic@localhost) by angelica.unixdaemons.com (8.12.2/8.12.1/Submit) id g1RJkgLK041422; Wed, 27 Feb 2002 14:46:42 -0500 (EST) (envelope-from bmilekic) Date: Wed, 27 Feb 2002 14:46:42 -0500 From: Bosko Milekic To: Matthew Dillon Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator Message-ID: <20020227144642.A40638@unixdaemons.com> References: <20020227005915.C17591-100000@mail.chesapeake.net> <200202271926.g1RJQCm29905@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200202271926.g1RJQCm29905@apollo.backplane.com>; from dillon@apollo.backplane.com on Wed, Feb 27, 2002 at 11:26:12AM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Feb 27, 2002 at 11:26:12AM -0800, Matthew Dillon wrote: > > :... > : > :There are also per cpu queues of items, with a per cpu lock. This allows > :for very effecient allocation, and also it provides near linear > :performance as the number of cpus increase. I do still depend on giant to > :talk to the back end page supplier (kmem_alloc, etc.). Once the VM is > :locked the allocator will not require giant at all. > :... > : > :Since you've read this far, I'll let you know where the patch is. ;-) > : > :http://www.chesapeake.net/~jroberson/uma.tar > :... > :Any feedback is appreciated. I'd like to know what people expect from > :this before it is committable. > : > :Jeff > : > :PS Sorry for the long winded email. :-) > > Well, one thing I've noticed right off the bat is that the code > is trying to take advantage of per-cpu queues but is still > having to obtain a per-cpu mutex to lock the per-cpu queue. Yes, that's normal. One can get pre-empted here. > Another thing I noticed is that the code appears to assume > that PCPU_GET(cpuid) is stable in certain places, and I don't > think that condition necessarily holds unless you explicitly > enter a critical section (critical_enter() and critical_exit()). > There are some cases where you obtain the per-cpu cache and lock > it, which would be safe even if the cpu changed out from under > you, and other case such as in uma_zalloc_internal() where you > assume that the cpuid is stable when it isn't. No, what he does is take PCPU_GET(cpuid) and save it in a variable. If he gets pre-empted (unlikely) and he gets shifted CPUs he still uses the old CPU's cache. That's fine as long as it's done correctly. > I also noticed that cache_drain() appears to be the only > place where you iterate through the per-cpu mutexes. All > the other places appear to use the current-cpu's mutex. That's normal, he drains all PCPU caches. [...] > * That you consider an alternative method for draining > the per-cpu caches. For example, by having the > per-cpu code use a global, shared SX lock along > with the critical section to access their per-cpu > caches and then have the cache_drain code obtain > an exclusive SX lock in order to have full access > to all of the per-cpu caches. > > * Documentation. i.e. comment the code more, especially > areas where you have to special-case things like for > example when you unlock a cpu cache in order to > call uma_zfree_internal(). > > -Matt > Matthew Dillon > -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 11:55:14 2002 Delivered-To: freebsd-arch@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id BCFFD37B402 for ; Wed, 27 Feb 2002 11:55:10 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g1RJtAj30178; Wed, 27 Feb 2002 11:55:10 -0800 (PST) (envelope-from dillon) Date: Wed, 27 Feb 2002 11:55:10 -0800 (PST) From: Matthew Dillon Message-Id: <200202271955.g1RJtAj30178@apollo.backplane.com> To: Alfred Perlstein Cc: Julian Elischer , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator References: <200202271926.g1RJQCm29905@apollo.backplane.com> <20020227194256.GR80761@elvis.mu.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG : :* Julian Elischer [020227 11:40] wrote: :> :> :> On Wed, 27 Feb 2002, Matthew Dillon wrote: :> :> > :> > :PS Sorry for the long winded email. :-) :> > :> > Well, one thing I've noticed right off the bat is that the code :> > is trying to take advantage of per-cpu queues but is still :> > having to obtain a per-cpu mutex to lock the per-cpu queue. :> :> I was wondering abuot that myself :-) : :It's basically the pre-emption stuff you guys are wondering about :along with the possiblity of free'ing back to another cpu's :cache that may be an issue. : :Jeff, are you fee'ing memory back to the cache it was initially :allocated from or not? : :-Alfred I don't know what Jeff is doing there but I do seem to recall a paper from somewhere that indicated it was more efficient to free memory to the current cpu's per-cpu cache rather then back to the original cpu's cache because the current cpu's hardware L1/L2 cache likely already has mastership of the memory. I think Linux does things this way. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 12: 0:28 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39]) by hub.freebsd.org (Postfix) with ESMTP id 9AE6F37B400 for ; Wed, 27 Feb 2002 12:00:23 -0800 (PST) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc53.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020227200023.SBRW2951.rwcrmhc53.attbi.com@InterJet.elischer.org>; Wed, 27 Feb 2002 20:00:23 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id LAA01466; Wed, 27 Feb 2002 11:41:51 -0800 (PST) Date: Wed, 27 Feb 2002 11:41:50 -0800 (PST) From: Julian Elischer To: Bosko Milekic Cc: Terry Lambert , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator In-Reply-To: <20020227143330.A34054@unixdaemons.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 27 Feb 2002, Bosko Milekic wrote: > > On Wed, Feb 27, 2002 at 09:58:09AM -0800, Terry Lambert wrote: > > First, let me say OUTSTANDING WORK! > > > > Jeff Roberson wrote: > > > There are also per cpu queues of items, with a per cpu lock. This allows > > > for very effecient allocation, and also it provides near linear > > > performance as the number of cpus increase. I do still depend on giant to > > > talk to the back end page supplier (kmem_alloc, etc.). Once the VM is > > > locked the allocator will not require giant at all. > > > > What is the per-CPU lock required for? I think it can be > > gotten rid of, or at least taken out of the critical path, > > with more information. > > Per-CPU caches. Reduces lock contention and trashes caches less often. The idea of Per CPU caches is that only that CPU is accessing it so therefore you shouldn't need a lock at all. unless you are protecting against interrupts on your own processor and pre-emption. There are also ways to implement atomic operations on queues that require no locks at all. (e.g. using the test and swap instruction) > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 12: 5:32 2002 Delivered-To: freebsd-arch@freebsd.org Received: from angelica.unixdaemons.com (angelica.unixdaemons.com [209.148.64.135]) by hub.freebsd.org (Postfix) with ESMTP id A8EF937B405 for ; Wed, 27 Feb 2002 12:05:29 -0800 (PST) Received: from angelica.unixdaemons.com (bmilekic@localhost.unixdaemons.com [127.0.0.1]) by angelica.unixdaemons.com (8.12.2/8.12.1) with ESMTP id g1RK5Jh4042996; Wed, 27 Feb 2002 15:05:19 -0500 (EST) Received: (from bmilekic@localhost) by angelica.unixdaemons.com (8.12.2/8.12.1/Submit) id g1RK5JUJ042995; Wed, 27 Feb 2002 15:05:19 -0500 (EST) (envelope-from bmilekic) Date: Wed, 27 Feb 2002 15:05:19 -0500 From: Bosko Milekic To: Julian Elischer Cc: Terry Lambert , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator Message-ID: <20020227150519.A42681@unixdaemons.com> References: <20020227143330.A34054@unixdaemons.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from julian@elischer.org on Wed, Feb 27, 2002 at 11:41:50AM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Feb 27, 2002 at 11:41:50AM -0800, Julian Elischer wrote: > The idea of Per CPU caches is that only that CPU is accessing it so > therefore you shouldn't need a lock at all. unless you are protecting > against interrupts on your own processor > and pre-emption. There are also ways to implement atomic > operations on queues that require no locks at all. > (e.g. using the test and swap instruction) Yes, that's exactly the point. You have to protect against pre-emption and interrupts. -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 12:17:38 2002 Delivered-To: freebsd-arch@freebsd.org Received: from angelica.unixdaemons.com (angelica.unixdaemons.com [209.148.64.135]) by hub.freebsd.org (Postfix) with ESMTP id AD67C37B400 for ; Wed, 27 Feb 2002 12:17:33 -0800 (PST) Received: from angelica.unixdaemons.com (bmilekic@localhost.unixdaemons.com [127.0.0.1]) by angelica.unixdaemons.com (8.12.2/8.12.1) with ESMTP id g1RKHNh4044130; Wed, 27 Feb 2002 15:17:23 -0500 (EST) Received: (from bmilekic@localhost) by angelica.unixdaemons.com (8.12.2/8.12.1/Submit) id g1RKHMFd044129; Wed, 27 Feb 2002 15:17:22 -0500 (EST) (envelope-from bmilekic) Date: Wed, 27 Feb 2002 15:17:22 -0500 From: Bosko Milekic To: Matthew Dillon Cc: Alfred Perlstein , Julian Elischer , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator Message-ID: <20020227151722.B42681@unixdaemons.com> References: <200202271926.g1RJQCm29905@apollo.backplane.com> <20020227194256.GR80761@elvis.mu.org> <200202271955.g1RJtAj30178@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200202271955.g1RJtAj30178@apollo.backplane.com>; from dillon@apollo.backplane.com on Wed, Feb 27, 2002 at 11:55:10AM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Feb 27, 2002 at 11:55:10AM -0800, Matthew Dillon wrote: > I don't know what Jeff is doing there but I do seem to recall a > paper from somewhere that indicated it was more efficient to free memory > to the current cpu's per-cpu cache rather then back to the original > cpu's cache because the current cpu's hardware L1/L2 cache likely already > has mastership of the memory. I think Linux does things this way. I seem to recall that in general, if you have a writer <--> reader relationship in your code, that it is better to free back to the originating CPU's cache. That is, if you are the thread doing the freeing and you don't write to the object that you're freeing at all (this is often the case), it is better to free to the originating CPU's cache so as to prevent invalidation. > -Matt > Matthew Dillon > -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 12:19: 7 2002 Delivered-To: freebsd-arch@freebsd.org Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by hub.freebsd.org (Postfix) with ESMTP id 7A32837B41A for ; Wed, 27 Feb 2002 12:19:03 -0800 (PST) Received: by elvis.mu.org (Postfix, from userid 1192) id 476BAAE27F; Wed, 27 Feb 2002 12:19:03 -0800 (PST) Date: Wed, 27 Feb 2002 12:19:03 -0800 From: Alfred Perlstein To: Matthew Dillon Cc: Julian Elischer , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator Message-ID: <20020227201903.GS80761@elvis.mu.org> References: <200202271926.g1RJQCm29905@apollo.backplane.com> <20020227194256.GR80761@elvis.mu.org> <200202271955.g1RJtAj30178@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200202271955.g1RJtAj30178@apollo.backplane.com> User-Agent: Mutt/1.3.27i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * Matthew Dillon [020227 11:55] wrote: > > : > :* Julian Elischer [020227 11:40] wrote: > :> > :> > :> On Wed, 27 Feb 2002, Matthew Dillon wrote: > :> > :> > > :> > :PS Sorry for the long winded email. :-) > :> > > :> > Well, one thing I've noticed right off the bat is that the code > :> > is trying to take advantage of per-cpu queues but is still > :> > having to obtain a per-cpu mutex to lock the per-cpu queue. > :> > :> I was wondering abuot that myself :-) > : > :It's basically the pre-emption stuff you guys are wondering about > :along with the possiblity of free'ing back to another cpu's > :cache that may be an issue. > : > :Jeff, are you fee'ing memory back to the cache it was initially > :allocated from or not? > : > :-Alfred > > I don't know what Jeff is doing there but I do seem to recall a > paper from somewhere that indicated it was more efficient to free memory > to the current cpu's per-cpu cache rather then back to the original > cpu's cache because the current cpu's hardware L1/L2 cache likely already > has mastership of the memory. I think Linux does things this way. Last I checked linux does not. the stuff we both are talking about (free'ing back to original cache) is detailed in the 'Horde' memory allocator which is hard to find online. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' Tax deductible donations for FreeBSD: http://www.freebsdfoundation.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 12:20:26 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rwcrmhc54.attbi.com (rwcrmhc54.attbi.com [216.148.227.87]) by hub.freebsd.org (Postfix) with ESMTP id 6EE9037B400 for ; Wed, 27 Feb 2002 12:20:12 -0800 (PST) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc54.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020227202012.YFQJ1214.rwcrmhc54.attbi.com@InterJet.elischer.org>; Wed, 27 Feb 2002 20:20:12 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id MAA01633; Wed, 27 Feb 2002 12:17:31 -0800 (PST) Date: Wed, 27 Feb 2002 12:17:30 -0800 (PST) From: Julian Elischer To: Bosko Milekic Cc: Terry Lambert , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator In-Reply-To: <20020227150519.A42681@unixdaemons.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 27 Feb 2002, Bosko Milekic wrote: > > On Wed, Feb 27, 2002 at 11:41:50AM -0800, Julian Elischer wrote: > > The idea of Per CPU caches is that only that CPU is accessing it so > > therefore you shouldn't need a lock at all. unless you are protecting > > against interrupts on your own processor > > and pre-emption. There are also ways to implement atomic > > operations on queues that require no locks at all. > > (e.g. using the test and swap instruction) > > Yes, that's exactly the point. You have to protect against pre-emption > and interrupts. maybe use a critical section instead.. or better, a test/swap or, both... but it sounds like you need the lock anyhow because as you said.. it is possible a recently pre-empted thread may continue to use the pool of it's old processor for a short moment, (I'm not sure I like that idea) > > -- > Bosko Milekic > bmilekic@unixdaemons.com > bmilekic@FreeBSD.org > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 12:23:49 2002 Delivered-To: freebsd-arch@freebsd.org Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by hub.freebsd.org (Postfix) with ESMTP id 152C237B429 for ; Wed, 27 Feb 2002 12:23:33 -0800 (PST) Received: by elvis.mu.org (Postfix, from userid 1192) id D3BC7AE2BE; Wed, 27 Feb 2002 12:23:32 -0800 (PST) Date: Wed, 27 Feb 2002 12:23:32 -0800 From: Alfred Perlstein To: Julian Elischer Cc: Bosko Milekic , Terry Lambert , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator Message-ID: <20020227202332.GU80761@elvis.mu.org> References: <20020227150519.A42681@unixdaemons.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.27i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * Julian Elischer [020227 12:20] wrote: > > > On Wed, 27 Feb 2002, Bosko Milekic wrote: > > > > > On Wed, Feb 27, 2002 at 11:41:50AM -0800, Julian Elischer wrote: > > > The idea of Per CPU caches is that only that CPU is accessing it so > > > therefore you shouldn't need a lock at all. unless you are protecting > > > against interrupts on your own processor > > > and pre-emption. There are also ways to implement atomic > > > operations on queues that require no locks at all. > > > (e.g. using the test and swap instruction) > > > > Yes, that's exactly the point. You have to protect against pre-emption > > and interrupts. > maybe use a critical section instead.. > or better, a test/swap > or, both... > > but it sounds like you need the lock anyhow because > as you said.. it is possible a recently pre-empted thread may continue > to use the pool of it's old processor for a short moment, > (I'm not sure I like that idea) Leave it alone. The locks are a perfectly fine abstraction for the time being to get what we want and need. I'm for the per-cpu locks, and we can remove/fix them later if it's an issue. Removing locks is easier than adding them imo. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' Tax deductible donations for FreeBSD: http://www.freebsdfoundation.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 12:33:49 2002 Delivered-To: freebsd-arch@freebsd.org Received: from gull.prod.itd.earthlink.net (gull.mail.pas.earthlink.net [207.217.120.84]) by hub.freebsd.org (Postfix) with ESMTP id B8A5937B402 for ; Wed, 27 Feb 2002 12:33:43 -0800 (PST) Received: from pool0139.cvx21-bradley.dialup.earthlink.net ([209.179.192.139] helo=mindspring.com) by gull.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16gAlh-0001gi-00; Wed, 27 Feb 2002 12:33:41 -0800 Message-ID: <3C7D4270.F285F888@mindspring.com> Date: Wed, 27 Feb 2002 12:32:48 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Matthew Dillon Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator References: <20020227005915.C17591-100000@mail.chesapeake.net> <200202271926.g1RJQCm29905@apollo.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Matthew Dillon wrote: > Well, one thing I've noticed right off the bat is that the code > is trying to take advantage of per-cpu queues but is still > having to obtain a per-cpu mutex to lock the per-cpu queue. I disliked this as well; I think we can help him work around this problem, though, based on why he feels that it's needed; going of half-cocked about it is not going to solve anything, but knowing the requirements trace for the design decision will. Even so, a per-CPU mutex takes it out of the global contention domain, so that's something. Use of a mutex, which I think is, by definition, supposed to be synchronized between CPUs is the only real downside, if we assume that a per-CPU lock is in fact necessary for these things... it should be possible to use an intra-CPU primitive to implement that is much, much cheaper than the inter-CPU version that's in the mutex implementation. > Another thing I noticed is that the code appears to assume > that PCPU_GET(cpuid) is stable in certain places, and I don't > think that condition necessarily holds unless you explicitly > enter a critical section (critical_enter() and critical_exit()). > There are some cases where you obtain the per-cpu cache and lock > it, which would be safe even if the cpu changed out from under > you, and other case such as in uma_zalloc_internal() where you > assume that the cpuid is stable when it isn't. I saw this as well. This seemed to be much less of a problem, to me, since I think that the issue of forced kernel preemption resulting in running on another CPU is currently moot. In the long run, I think it will be mostly safe to assume that the CPU you are running on wil not change, except under extraordinary conditions (migration), which in turn could be deferred or even prevented, using a "don't migrate bit". Without per CPU scheduler queues, this is currently a danger, but it's a really minor one, in the scheme of things. In the per CPU scheduler queue case, the migration should have to be explicit based on a figure of merit. The main code path is therefore lockless. The way a migration occurs is to have the load on a single CPU exceed some watermark relative to the overall system load, which can be calculated using atomic figures of merit that need not be locked to be read per CPU, within a CPU cluster. If migration is a "go", then you grab a lock on the "give away" queue for the CPU you are giving it to, and push it over there. That CPU, in turn, checks this queue at the start of the sheduler cycle to see if it is empty (this check is thus also lockless in the common case). If there are processes pending migration to the CPU, then (and only then) does it acquire the lock, and migrate them to the local (lockless) scheduler queue, after which it releases the lock again. Any contention which occurs will be between 2 CPUs, not N. Obviously, this is future work. > I also noticed that cache_drain() appears to be the only > place where you iterate through the per-cpu mutexes. All > the other places appear to use the current-cpu's mutex. I am not happy with the "cache drain". I expect that the way I would do this is not through stealing, but through notification, which could, similarly, end up being lockless. I think the "stealing" case is also an extraordinary condition, and so I'm not concerned about optimizing it as if it were a common case. Consider that the latency introduced will be no more of a stumbling block for the system than the pool retention limits for the high water mark being preterbed slightly, so as to cause the same behaviour. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 12:40:31 2002 Delivered-To: freebsd-arch@freebsd.org Received: from gull.prod.itd.earthlink.net (gull.mail.pas.earthlink.net [207.217.120.84]) by hub.freebsd.org (Postfix) with ESMTP id 6E11337B400 for ; Wed, 27 Feb 2002 12:40:25 -0800 (PST) Received: from pool0139.cvx21-bradley.dialup.earthlink.net ([209.179.192.139] helo=mindspring.com) by gull.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16gAs9-00031a-00; Wed, 27 Feb 2002 12:40:22 -0800 Message-ID: <3C7D4401.9E57D6AD@mindspring.com> Date: Wed, 27 Feb 2002 12:39:29 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Bosko Milekic Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator References: <20020227005915.C17591-100000@mail.chesapeake.net> <3C7D1E31.B13915E7@mindspring.com> <20020227143330.A34054@unixdaemons.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Bosko Milekic wrote: > On Wed, Feb 27, 2002 at 09:58:09AM -0800, Terry Lambert wrote: > > First, let me say OUTSTANDING WORK! > > > > Jeff Roberson wrote: > > > There are also per cpu queues of items, with a per cpu lock. This allows > > > for very effecient allocation, and also it provides near linear > > > performance as the number of cpus increase. I do still depend on giant to > > > talk to the back end page supplier (kmem_alloc, etc.). Once the VM is > > > locked the allocator will not require giant at all. > > > > What is the per-CPU lock required for? I think it can be > > gotten rid of, or at least taken out of the critical path, > > with more information. > > Per-CPU caches. Reduces lock contention and trashes caches less often. I thinl you are misunderstanding. If the caches are per-CPU, then by definition, they will only ever be accessed by a single CPU, and so contention can be eliminated by ordered atomicity of operations, unlike where there is inter-CPU contention. Per CPU resources are really not something you would expect to be contended between CPUs, and within the context of a single CPU, contention is controllable. > > > I would eventually like to pull other allocators into uma (The slab > > > allocator). We could get rid of some of the kernel submaps and provide a > > > much more dynamic amount of various resources. Something I had in mind > > > were pbufs and mbufs, which could easily come from uma. This gives us the > > > ability to redistribute memory to wherever it is needed, and not lock it > > > in a particular place once it's there. > > > > How do you handle interrupt-time allocation of mbufs, in > > this case? The zalloci() handles this by pre-creation of > > the PTE's for the page mapping in the KVA, and then only > > has to deal with grabbing free physical pages to back them, > > which is a non-blocking operation that can occur at interrupt, > > and which, if it fails, is not fatal (i.e. it's handled; I've > > considered doing the same for the page mapping and PTE's, but > > that would make the time-to-run far less deterministic). > > Terry, how long will you keep thinking that mbufs come through the > zone allocator? :-) For G*d's sake man, we've been over this before! Then take that part out, and answer the question about interrupt time allocations. Whether I'm still substituting mbufs in there when I shouldn't be or not is irrelevent to the question. > > > The other problem is with the per cpu buckets. They are a > > > fixed size right now. I need to define several zones for > > > the buckets to come from and a way to manage growing/shrinking > > > the buckets. > > > > I built a "chain" allocator that dealt with this issue, and > > also the object granularit issue. Basically, it calculated > > the LCM of the object size rounded to a MAX(sizeof(long),8) > > boundary for processor alignment sensitivity reasons, and > > the page size (also for processor sensitivity reasons), and > > then allocated a contiguous region from which it obtained > > objects of that type. All in all, it meant zero unnecessary > > space wastage (for 1,000,000 TCP connections, the savings > > were 1/4 of a Gigabyte for one zone alone). > > That's great, until you run out of pre-allocated contiguous space. At which point you've reached the load bearing capacity of the system, and will have to stop, no matter what. It's not like you can swap mbufs. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 12:43:50 2002 Delivered-To: freebsd-arch@freebsd.org Received: from gull.prod.itd.earthlink.net (gull.mail.pas.earthlink.net [207.217.120.84]) by hub.freebsd.org (Postfix) with ESMTP id 17A1837B41E for ; Wed, 27 Feb 2002 12:43:25 -0800 (PST) Received: from pool0139.cvx21-bradley.dialup.earthlink.net ([209.179.192.139] helo=mindspring.com) by gull.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16gAuy-000730-00; Wed, 27 Feb 2002 12:43:17 -0800 Message-ID: <3C7D44AE.820215AB@mindspring.com> Date: Wed, 27 Feb 2002 12:42:22 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Alfred Perlstein Cc: Bosko Milekic , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator References: <20020227005915.C17591-100000@mail.chesapeake.net> <3C7D1E31.B13915E7@mindspring.com> <20020227143330.A34054@unixdaemons.com> <20020227193712.GQ80761@elvis.mu.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Alfred Perlstein wrote: > * Bosko Milekic [020227 11:33] wrote: > > On Wed, Feb 27, 2002 at 09:58:09AM -0800, Terry Lambert wrote: > > > > Terry, how long will you keep thinking that mbufs come through the > > zone allocator? :-) For G*d's sake man, we've been over this before! > > He means sockets. :) Yes, Bosko knows this; it's what I meant the last time I mistakenly said "mbuf's" in the same allocator discussion context. I took his ":-)" to mean that it was understood, and he was chiding me for my aphasic dyslexia again. 8-). Still doesn't answer the question about interrupt time allocations. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 12:47:20 2002 Delivered-To: freebsd-arch@freebsd.org Received: from gull.prod.itd.earthlink.net (gull.mail.pas.earthlink.net [207.217.120.84]) by hub.freebsd.org (Postfix) with ESMTP id 1CC8337B420 for ; Wed, 27 Feb 2002 12:45:59 -0800 (PST) Received: from pool0139.cvx21-bradley.dialup.earthlink.net ([209.179.192.139] helo=mindspring.com) by gull.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16gAxX-0002uj-00; Wed, 27 Feb 2002 12:45:55 -0800 Message-ID: <3C7D454D.3B9C3A69@mindspring.com> Date: Wed, 27 Feb 2002 12:45:01 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Alfred Perlstein Cc: Julian Elischer , Matthew Dillon , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator References: <200202271926.g1RJQCm29905@apollo.backplane.com> <20020227194256.GR80761@elvis.mu.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Alfred Perlstein wrote: > * Julian Elischer [020227 11:40] wrote: > > On Wed, 27 Feb 2002, Matthew Dillon wrote: > > > Well, one thing I've noticed right off the bat is that the code > > > is trying to take advantage of per-cpu queues but is still > > > having to obtain a per-cpu mutex to lock the per-cpu queue. > > > > I was wondering abuot that myself :-) > > It's basically the pre-emption stuff you guys are wondering about > along with the possiblity of free'ing back to another cpu's > cache that may be an issue. > > Jeff, are you fee'ing memory back to the cache it was initially > allocated from or not? See my other posting. The way to deal with this is to have a per CPU "work to do" queue, which is only locked when it is written to by another CPU, and only locked by the local CPU when it is non-empty, in order to empty it, where the empty-check can be done without locking. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 12:52:18 2002 Delivered-To: freebsd-arch@freebsd.org Received: from hawk.mail.pas.earthlink.net (hawk.mail.pas.earthlink.net [207.217.120.22]) by hub.freebsd.org (Postfix) with ESMTP id C2CB137B400 for ; Wed, 27 Feb 2002 12:52:13 -0800 (PST) Received: from pool0139.cvx21-bradley.dialup.earthlink.net ([209.179.192.139] helo=mindspring.com) by hawk.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 16gB3W-0004rR-00; Wed, 27 Feb 2002 12:52:07 -0800 Message-ID: <3C7D46BF.CE88CDEA@mindspring.com> Date: Wed, 27 Feb 2002 12:51:11 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Matthew Dillon Cc: Alfred Perlstein , Julian Elischer , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator References: <200202271926.g1RJQCm29905@apollo.backplane.com> <20020227194256.GR80761@elvis.mu.org> <200202271955.g1RJtAj30178@apollo.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Matthew Dillon wrote: > :It's basically the pre-emption stuff you guys are wondering about > :along with the possiblity of free'ing back to another cpu's > :cache that may be an issue. > : > :Jeff, are you fee'ing memory back to the cache it was initially > :allocated from or not? > > I don't know what Jeff is doing there but I do seem to recall a > paper from somewhere that indicated it was more efficient to free memory > to the current cpu's per-cpu cache rather then back to the original > cpu's cache because the current cpu's hardware L1/L2 cache likely already > has mastership of the memory. I think Linux does things this way. I think you are thinking of: Experience With an Efficient Parallel Kernel Memory Allocator Paul E. McKenney Jack Slingwine Phil Krueger Sequent Computer Systems, Inc. The paper is available at: http://citeseer.nj.nec.com/484408.html They used a "second chance" three layer coelescing strategy, where a third level coelesced freed objects back into pages for return to the host OS. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 12:58:38 2002 Delivered-To: freebsd-arch@freebsd.org Received: from hawk.mail.pas.earthlink.net (hawk.mail.pas.earthlink.net [207.217.120.22]) by hub.freebsd.org (Postfix) with ESMTP id 53B8537B41A for ; Wed, 27 Feb 2002 12:58:14 -0800 (PST) Received: from pool0139.cvx21-bradley.dialup.earthlink.net ([209.179.192.139] helo=mindspring.com) by hawk.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 16gB9O-00000a-00; Wed, 27 Feb 2002 12:58:10 -0800 Message-ID: <3C7D482B.984F6FE7@mindspring.com> Date: Wed, 27 Feb 2002 12:57:15 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Bosko Milekic Cc: Julian Elischer , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator References: <20020227143330.A34054@unixdaemons.com> <20020227150519.A42681@unixdaemons.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Bosko Milekic wrote: > On Wed, Feb 27, 2002 at 11:41:50AM -0800, Julian Elischer wrote: > > The idea of Per CPU caches is that only that CPU is accessing it so > > therefore you shouldn't need a lock at all. unless you are protecting > > against interrupts on your own processor > > and pre-emption. There are also ways to implement atomic > > operations on queues that require no locks at all. > > (e.g. using the test and swap instruction) > > Yes, that's exactly the point. You have to protect against pre-emption > and interrupts. This is ridiculous: 1) Kernel preemption should *never* result in CPU migration. 2) Allocations at interrupt level are an extreme special case, and should be handled as a special case, so as not to damage the performance of the common case. The degenerate scenario in interrupt allocation can always be handled by simply pre-allocating resources on a per driver instance basis, and then renewing the allocations in the upper level code. This results in a per driver instance used-but-free pool retention, but given the other wastage, it's a comparatively small penalty to pay. UnixWare used this method to allocate driver buffers, at one point in time (SVR4 ES/MP). Given the shared code, I'm pretty sure Solaris did, as well. Personally I think interrupt allocations can be handled at a much lower cost, anyway, without resorting to such tricks. Bruce Evans had a good suggestion in this regard some time back; it's probably time to dust it off. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 13: 3:28 2002 Delivered-To: freebsd-arch@freebsd.org Received: from hawk.mail.pas.earthlink.net (hawk.mail.pas.earthlink.net [207.217.120.22]) by hub.freebsd.org (Postfix) with ESMTP id 5002537B41C for ; Wed, 27 Feb 2002 13:03:19 -0800 (PST) Received: from pool0139.cvx21-bradley.dialup.earthlink.net ([209.179.192.139] helo=mindspring.com) by hawk.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 16gBEG-00013l-00; Wed, 27 Feb 2002 13:03:13 -0800 Message-ID: <3C7D4958.D1B8CD3D@mindspring.com> Date: Wed, 27 Feb 2002 13:02:16 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Bosko Milekic Cc: Matthew Dillon , Alfred Perlstein , Julian Elischer , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator References: <200202271926.g1RJQCm29905@apollo.backplane.com> <20020227194256.GR80761@elvis.mu.org> <200202271955.g1RJtAj30178@apollo.backplane.com> <20020227151722.B42681@unixdaemons.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Bosko Milekic wrote: > On Wed, Feb 27, 2002 at 11:55:10AM -0800, Matthew Dillon wrote: > > I don't know what Jeff is doing there but I do seem to recall a > > paper from somewhere that indicated it was more efficient to free memory > > to the current cpu's per-cpu cache rather then back to the original > > cpu's cache because the current cpu's hardware L1/L2 cache likely already > > has mastership of the memory. I think Linux does things this way. > > I seem to recall that in general, if you have a writer <--> reader > relationship in your code, that it is better to free back to the > originating CPU's cache. That is, if you are the thread doing the > freeing and you don't write to the object that you're freeing at all > (this is often the case), it is better to free to the originating CPU's > cache so as to prevent invalidation. This is true, but since it's an exceptional case, you can use a seperate queue with a lock, rather than interposing a lock into the global space. This also limits the contention window considerably, and eliminates the locking in the common case. As far as invalidation is concerned, you are already screwed on cache lines when you passed it off to the other CPU. The migration events need to be exceptional, rather than common. That they are common now is a bug in the scheduler. It is very unlikely, unless you rewrite all of the code, that you are going to avoid an mbuf allocated at interrupt on one CPU having the inp->ip_vhl modified on another (for example), if the protocol processing moves. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 13: 8: 6 2002 Delivered-To: freebsd-arch@freebsd.org Received: from hawk.mail.pas.earthlink.net (hawk.mail.pas.earthlink.net [207.217.120.22]) by hub.freebsd.org (Postfix) with ESMTP id 674AD37B405 for ; Wed, 27 Feb 2002 13:08:03 -0800 (PST) Received: from pool0139.cvx21-bradley.dialup.earthlink.net ([209.179.192.139] helo=mindspring.com) by hawk.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 16gBIu-0001aw-00; Wed, 27 Feb 2002 13:08:00 -0800 Message-ID: <3C7D4A77.2767A0F7@mindspring.com> Date: Wed, 27 Feb 2002 13:07:03 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Alfred Perlstein Cc: Matthew Dillon , Julian Elischer , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator References: <200202271926.g1RJQCm29905@apollo.backplane.com> <20020227194256.GR80761@elvis.mu.org> <200202271955.g1RJtAj30178@apollo.backplane.com> <20020227201903.GS80761@elvis.mu.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Alfred Perlstein wrote: > Last I checked linux does not. > > the stuff we both are talking about (free'ing back to original > cache) is detailed in the 'Horde' memory allocator which is hard > to find online. http://www.cs.utexas.edu/users/emery/hoard/ -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 13:40:15 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39]) by hub.freebsd.org (Postfix) with ESMTP id 7B3CB37B402 for ; Wed, 27 Feb 2002 13:40:10 -0800 (PST) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc53.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020227214010.UXUB2951.rwcrmhc53.attbi.com@InterJet.elischer.org>; Wed, 27 Feb 2002 21:40:10 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id NAA01982; Wed, 27 Feb 2002 13:37:16 -0800 (PST) Date: Wed, 27 Feb 2002 13:37:16 -0800 (PST) From: Julian Elischer To: Terry Lambert Cc: Bosko Milekic , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator In-Reply-To: <3C7D482B.984F6FE7@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 27 Feb 2002, Terry Lambert wrote: > This is ridiculous: > > 1) Kernel preemption should *never* result in CPU > migration. why not? "should not"" or "can't"? "shouldn't" can be argued either way, and "can't" is easily proven false. > > 2) Allocations at interrupt level are an extreme > special case, and should be handled as a special > case, so as not to damage the performance of the > common case. that's a whole different (design) issue. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 14:17:52 2002 Delivered-To: freebsd-arch@freebsd.org Received: from angelica.unixdaemons.com (angelica.unixdaemons.com [209.148.64.135]) by hub.freebsd.org (Postfix) with ESMTP id 2536C37B440 for ; Wed, 27 Feb 2002 14:17:40 -0800 (PST) Received: from angelica.unixdaemons.com (bmilekic@localhost.unixdaemons.com [127.0.0.1]) by angelica.unixdaemons.com (8.12.2/8.12.1) with ESMTP id g1RMHQh4057195; Wed, 27 Feb 2002 17:17:26 -0500 (EST) Received: (from bmilekic@localhost) by angelica.unixdaemons.com (8.12.2/8.12.1/Submit) id g1RMHQCZ057194; Wed, 27 Feb 2002 17:17:26 -0500 (EST) (envelope-from bmilekic) Date: Wed, 27 Feb 2002 17:17:26 -0500 From: Bosko Milekic To: Terry Lambert Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator Message-ID: <20020227171726.A46831@unixdaemons.com> References: <20020227005915.C17591-100000@mail.chesapeake.net> <3C7D1E31.B13915E7@mindspring.com> <20020227143330.A34054@unixdaemons.com> <3C7D4401.9E57D6AD@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3C7D4401.9E57D6AD@mindspring.com>; from tlambert2@mindspring.com on Wed, Feb 27, 2002 at 12:39:29PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Feb 27, 2002 at 12:39:29PM -0800, Terry Lambert wrote: > > > What is the per-CPU lock required for? I think it can be > > > gotten rid of, or at least taken out of the critical path, > > > with more information. > > > > Per-CPU caches. Reduces lock contention and trashes caches less often. > > I thinl you are misunderstanding. If the caches are per-CPU, > then by definition, they will only ever be accessed by a single > CPU, and so contention can be eliminated by ordered atomicity > of operations, unlike where there is inter-CPU contention. > > Per CPU resources are really not something you would expect > to be contended between CPUs, and within the context of a > single CPU, contention is controllable. With all due respect, I think that *you* are mis-understanding. I don't know what book says that only, without exception, CPU A will touch its own cache, without any CPU ever touching it as well. We need those locks for now, if not to deal with preemption then to also enable us to drain all the caches when the time comes to do so. Guys, please do Jeff the favor of reading his code before you go to definition and tell him what do do. > > > > I would eventually like to pull other allocators into uma (The slab > > > > allocator). We could get rid of some of the kernel submaps and provide a > > > > much more dynamic amount of various resources. Something I had in mind > > > > were pbufs and mbufs, which could easily come from uma. This gives us the > > > > ability to redistribute memory to wherever it is needed, and not lock it > > > > in a particular place once it's there. > > > > > > How do you handle interrupt-time allocation of mbufs, in > > > this case? The zalloci() handles this by pre-creation of > > > the PTE's for the page mapping in the KVA, and then only > > > has to deal with grabbing free physical pages to back them, > > > which is a non-blocking operation that can occur at interrupt, > > > and which, if it fails, is not fatal (i.e. it's handled; I've > > > considered doing the same for the page mapping and PTE's, but > > > that would make the time-to-run far less deterministic). > > > > Terry, how long will you keep thinking that mbufs come through the > > zone allocator? :-) For G*d's sake man, we've been over this before! > > Then take that part out, and answer the question about > interrupt time allocations. Whether I'm still substituting > mbufs in there when I shouldn't be or not is irrelevent to > the question. Well, if you look at the code, you'll see that it supports, in the constructor, to setup the routine that will take care of allocating slabs, should the allocator require more. That means that uma allows you to have your slabs allocated from whatever map you want, including one such as, say, mb_map, that has the reserved KVA already. > > > > The other problem is with the per cpu buckets. They are a > > > > fixed size right now. I need to define several zones for > > > > the buckets to come from and a way to manage growing/shrinking > > > > the buckets. > > > > > > I built a "chain" allocator that dealt with this issue, and > > > also the object granularit issue. Basically, it calculated > > > the LCM of the object size rounded to a MAX(sizeof(long),8) > > > boundary for processor alignment sensitivity reasons, and > > > the page size (also for processor sensitivity reasons), and > > > then allocated a contiguous region from which it obtained > > > objects of that type. All in all, it meant zero unnecessary > > > space wastage (for 1,000,000 TCP connections, the savings > > > were 1/4 of a Gigabyte for one zone alone). > > > > That's great, until you run out of pre-allocated contiguous space. > > At which point you've reached the load bearing capacity of the > system, and will have to stop, no matter what. It's not like > you can swap mbufs. Well, quite honestly, I could go on and argue that what you're saying makes no sense. But I am hesitant on wasting my time discussing hypotheticals when I know that no matter what I say you'll be able to nail me to the floor with a counter because I don't have access to the source code of the allocator you're talking about and, supposedly, you do. > -- Terry -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 14:20: 6 2002 Delivered-To: freebsd-arch@freebsd.org Received: from angelica.unixdaemons.com (angelica.unixdaemons.com [209.148.64.135]) by hub.freebsd.org (Postfix) with ESMTP id 890E237B400 for ; Wed, 27 Feb 2002 14:19:57 -0800 (PST) Received: from angelica.unixdaemons.com (bmilekic@localhost.unixdaemons.com [127.0.0.1]) by angelica.unixdaemons.com (8.12.2/8.12.1) with ESMTP id g1RMJkh4057493; Wed, 27 Feb 2002 17:19:46 -0500 (EST) Received: (from bmilekic@localhost) by angelica.unixdaemons.com (8.12.2/8.12.1/Submit) id g1RMJjEi057488; Wed, 27 Feb 2002 17:19:45 -0500 (EST) (envelope-from bmilekic) Date: Wed, 27 Feb 2002 17:19:45 -0500 From: Bosko Milekic To: Terry Lambert Cc: Matthew Dillon , Alfred Perlstein , Julian Elischer , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator Message-ID: <20020227171945.B46831@unixdaemons.com> References: <200202271926.g1RJQCm29905@apollo.backplane.com> <20020227194256.GR80761@elvis.mu.org> <200202271955.g1RJtAj30178@apollo.backplane.com> <20020227151722.B42681@unixdaemons.com> <3C7D4958.D1B8CD3D@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3C7D4958.D1B8CD3D@mindspring.com>; from tlambert2@mindspring.com on Wed, Feb 27, 2002 at 01:02:16PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Feb 27, 2002 at 01:02:16PM -0800, Terry Lambert wrote: > Bosko Milekic wrote: > > On Wed, Feb 27, 2002 at 11:55:10AM -0800, Matthew Dillon wrote: > > > I don't know what Jeff is doing there but I do seem to recall a > > > paper from somewhere that indicated it was more efficient to free memory > > > to the current cpu's per-cpu cache rather then back to the original > > > cpu's cache because the current cpu's hardware L1/L2 cache likely already > > > has mastership of the memory. I think Linux does things this way. > > > > I seem to recall that in general, if you have a writer <--> reader > > relationship in your code, that it is better to free back to the > > originating CPU's cache. That is, if you are the thread doing the > > freeing and you don't write to the object that you're freeing at all > > (this is often the case), it is better to free to the originating CPU's > > cache so as to prevent invalidation. > > This is true, but since it's an exceptional case, you can > use a seperate queue with a lock, rather than interposing > a lock into the global space. This also limits the contention > window considerably, and eliminates the locking in the common > case. > > As far as invalidation is concerned, you are already screwed > on cache lines when you passed it off to the other CPU. The > migration events need to be exceptional, rather than common. > That they are common now is a bug in the scheduler. It is > very unlikely, unless you rewrite all of the code, that you > are going to avoid an mbuf allocated at interrupt on one CPU > having the inp->ip_vhl modified on another (for example), if > the protocol processing moves. OK, since you obviously know what you're talking about, how about you sit down and produce some patches for Jeff? I think he would appreciate it very much, instead of the generalizations and "you should not do this, but do X" abstractions. > -- Terry -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 14:39:23 2002 Delivered-To: freebsd-arch@freebsd.org Received: from green.bikeshed.org (freefall.FreeBSD.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 924DA37B41B; Wed, 27 Feb 2002 14:39:08 -0800 (PST) Received: from localhost (green@localhost) by green.bikeshed.org (8.11.6/8.11.6) with ESMTP id g1RMd8i46060; Wed, 27 Feb 2002 17:39:08 -0500 (EST) (envelope-from green@green.bikeshed.org) Message-Id: <200202272239.g1RMd8i46060@green.bikeshed.org> X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: bde@FreeBSD.org Cc: arch@FreeBSD.org Subject: Do we want the _SYS_SYSPROTO_H_ junk? From: Brian Fundakowski Feldman Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 27 Feb 2002 17:39:07 -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Since obviously, by nature, all the code for syscall declarations inside #ifdef _SYS_SYSPROTO_H_ is bogus, is it truly useful to use it on new system calls, or should we not? I think it's worth having an entry in style(9) for system calls, and want to know what should be there regarding this. It seems the struct foo_args /* structure members stuff */ *uap; stuff is at least also consistent with what is similarly done with vnode operation declarations. What do you think? -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org <> bfeldman@tislabs.com \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 14:41:46 2002 Delivered-To: freebsd-arch@freebsd.org Received: from falcon.prod.itd.earthlink.net (falcon.mail.pas.earthlink.net [207.217.120.74]) by hub.freebsd.org (Postfix) with ESMTP id 0D4E437B49F for ; Wed, 27 Feb 2002 14:41:29 -0800 (PST) Received: from pool0052.cvx40-bradley.dialup.earthlink.net ([216.244.42.52] helo=mindspring.com) by falcon.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16gClF-0007IU-00; Wed, 27 Feb 2002 14:41:22 -0800 Message-ID: <3C7D604E.45D0D959@mindspring.com> Date: Wed, 27 Feb 2002 14:40:14 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Julian Elischer Cc: Bosko Milekic , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Julian Elischer wrote: > On Wed, 27 Feb 2002, Terry Lambert wrote: > > This is ridiculous: > > > > 1) Kernel preemption should *never* result in CPU > > migration. > > why not? > > "should not"" or "can't"? > "shouldn't" can be argued either way, > and "can't" is easily proven false. Should not. A kernel thread should run to completion on a single CPU. Note that this applies to kernel threads, not kernel processes. A kernel thread is a backing object -- a context -- borrowed for a short term operation. Any migration of these things will, by definition, result in cache busting. There's little good that can come from such migration. > > 2) Allocations at interrupt level are an extreme > > special case, and should be handled as a special > > case, so as not to damage the performance of the > > common case. > > that's a whole different (design) issue. Maybe. It is none the less true that the set of things that can happen at any time is much larger than the set of things which are permitted to happen at interrupt, and great care should be taken when increasing the size of the latter set. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 14:49:39 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by hub.freebsd.org (Postfix) with ESMTP id 509D837B405 for ; Wed, 27 Feb 2002 14:49:32 -0800 (PST) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g1RMnO576629; Wed, 27 Feb 2002 17:49:24 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Wed, 27 Feb 2002 17:49:24 -0500 (EST) From: Jeff Roberson To: Matthew Dillon Cc: arch@FreeBSD.ORG Subject: Re: Slab allocator In-Reply-To: <200202271926.g1RJQCm29905@apollo.backplane.com> Message-ID: <20020227172755.W59764-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG See comments below: On Wed, 27 Feb 2002, Matthew Dillon wrote: > > Well, one thing I've noticed right off the bat is that the code > is trying to take advantage of per-cpu queues but is still > having to obtain a per-cpu mutex to lock the per-cpu queue. > > Another thing I noticed is that the code appears to assume > that PCPU_GET(cpuid) is stable in certain places, and I don't > think that condition necessarily holds unless you explicitly > enter a critical section (critical_enter() and critical_exit()). > There are some cases where you obtain the per-cpu cache and lock > it, which would be safe even if the cpu changed out from under > you, and other case such as in uma_zalloc_internal() where you > assume that the cpuid is stable when it isn't. Ok, I did make a PCPU_GET mistake.. If uma_zalloc_internal is called from the fast path it needs to hand down a cache. The point of the locks is so that you don't have to have a critical section around the entire allocator. They really should be fast because they should only be cached in one cpu's cache. This also makes it easier to drain. I think that the preemption and migration case is going to be somewhat rare so it's ok to block another cpu for a little while if it happens. As long as I pass around a cpu # it shouldn't matter if I get preempted. > > I also noticed that cache_drain() appears to be the only > place where you iterate through the per-cpu mutexes. All > the other places appear to use the current-cpu's mutex. > > I would recommend the following: > > * That you review your code with special attention to > the lack of stability of PCPU_GET(cpuid) when you > are not in a critical section. Yes, thanks, I just noticed that bug. > > * That you consider getting rid of the per-cpu locks > and instead use critical_enter() and critical_exit() > to obtain a stable cpuid in order to allocate or > free from the current cpu's cache without having to > obtain any mutexes whatsoever. I'm not sure how I could properly and effeciently handle draining the per cpu queues in this scheme. One alternative to the current strategy is to wrap the fast path in a critical section and only if you are out of per cpu buckets or your buckets are empty would you drop the critical section and enter the zone mutex. Then you could go about allocating buckets and re-enter the critical section to see if you still need it. If cpu migration happens very often you could potentially get several cpus following the 'fill the bucket' path. This would cause you to ask for much more memory than you really need. > > Theoretically this would allow most calls to allocate > and free small amounts of memory to run as fast as > a simple procedure call would run (akin to what > the kernel malloc() in -stable is able to accomplish). > > * That you consider an alternative method for draining > the per-cpu caches. For example, by having the > per-cpu code use a global, shared SX lock along > with the critical section to access their per-cpu > caches and then have the cache_drain code obtain > an exclusive SX lock in order to have full access > to all of the per-cpu caches. > A global SX lock would cause cache thrashing across all cpus. As it is I'd like some way to cache line size align specific things for maximum smp performance. I've been toying with the idea of adding a new MD api to get cache parameters. This would be neccisary if I want to keep the UMA_ALIGN_CACHE flag. > * Documentation. i.e. comment the code more, especially > areas where you have to special-case things like for > example when you unlock a cpu cache in order to > call uma_zfree_internal(). Yes, I agree. > > -Matt > Matthew Dillon > > Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 14:53:45 2002 Delivered-To: freebsd-arch@freebsd.org Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by hub.freebsd.org (Postfix) with ESMTP id 37CA337B402; Wed, 27 Feb 2002 14:53:42 -0800 (PST) Received: by elvis.mu.org (Postfix, from userid 1192) id 13F2EAE27E; Wed, 27 Feb 2002 14:53:42 -0800 (PST) Date: Wed, 27 Feb 2002 14:53:41 -0800 From: Alfred Perlstein To: Brian Fundakowski Feldman Cc: bde@FreeBSD.org, arch@FreeBSD.org Subject: Re: Do we want the _SYS_SYSPROTO_H_ junk? Message-ID: <20020227225341.GX80761@elvis.mu.org> References: <200202272239.g1RMd8i46060@green.bikeshed.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200202272239.g1RMd8i46060@green.bikeshed.org> User-Agent: Mutt/1.3.27i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * Brian Fundakowski Feldman [020227 14:39] wrote: > Since obviously, by nature, all the code for syscall declarations inside > #ifdef _SYS_SYSPROTO_H_ is bogus, is it truly useful to use it on new system > calls, or should we not? I think it's worth having an entry in style(9) for > system calls, and want to know what should be there regarding this. It > seems the struct foo_args /* structure members stuff */ *uap; stuff is at > least also consistent with what is similarly done with vnode operation > declarations. > > What do you think? I think there's more important stuff to worry about than this. I also find that SYSPROTO helps when making syscall modules but I'm not sure what you're getting at so I'd apprecciate it if you held off whatever your plans are for a day. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' Tax deductible donations for FreeBSD: http://www.freebsdfoundation.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 15: 4:51 2002 Delivered-To: freebsd-arch@freebsd.org Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50]) by hub.freebsd.org (Postfix) with ESMTP id C724A37B437 for ; Wed, 27 Feb 2002 15:03:49 -0800 (PST) Received: from pool0052.cvx40-bradley.dialup.earthlink.net ([216.244.42.52] helo=mindspring.com) by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16gD6q-0000pp-00; Wed, 27 Feb 2002 15:03:40 -0800 Message-ID: <3C7D6586.558B8EE1@mindspring.com> Date: Wed, 27 Feb 2002 15:02:30 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Bosko Milekic Cc: Matthew Dillon , Alfred Perlstein , Julian Elischer , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator References: <200202271926.g1RJQCm29905@apollo.backplane.com> <20020227194256.GR80761@elvis.mu.org> <200202271955.g1RJtAj30178@apollo.backplane.com> <20020227151722.B42681@unixdaemons.com> <3C7D4958.D1B8CD3D@mindspring.com> <20020227171945.B46831@unixdaemons.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Bosko Milekic wrote: > OK, since you obviously know what you're talking about, how about you > sit down and produce some patches for Jeff? I think he would appreciate > it very much, instead of the generalizations and "you should not do > this, but do X" abstractions. Before you get up in arms... in case this wasn't clear... I think that his code is importable without any patches. I was pointing out how the complaints people have can be addressed without altering the theme or the majority of the code itself. I was not suggesting that the changes must be made before import. The comments on not renaming the files, and the prefix on the name are salient, as are the statistics comments (keep appropriate statistics, rather than trying to emulate previously appropriate statistics). I would like to see him address the issues he feels need to be addressed, but since the performance is not worse with the code, all of that can be handled later, after an import. It's certain that the current allocation code can't SMP scale the way Jeff's code can and it's a step in the right direction. Now is the time to get things into -current, so that they can be stabilized (if necessary) and improved (I think several people, myself included, believe the code can be improved, but it doesn't have to be before it can go in). Kirk likes the code; what else is required before simply importing it, and making the vm_zone code optional so that the rest can be converted? -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 15: 7:12 2002 Delivered-To: freebsd-arch@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 1ECD237B41A for ; Wed, 27 Feb 2002 15:07:07 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g1RN75T31581; Wed, 27 Feb 2002 15:07:05 -0800 (PST) (envelope-from dillon) Date: Wed, 27 Feb 2002 15:07:05 -0800 (PST) From: Matthew Dillon Message-Id: <200202272307.g1RN75T31581@apollo.backplane.com> To: Jeff Roberson Cc: arch@FreeBSD.ORG Subject: Re: Slab allocator References: <20020227172755.W59764-100000@mail.chesapeake.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :See comments below: : :On Wed, 27 Feb 2002, Matthew Dillon wrote: : :> :> Well, one thing I've noticed right off the bat is that the code :> is trying to take advantage of per-cpu queues but is still :> having to obtain a per-cpu mutex to lock the per-cpu queue. :> :> Another thing I noticed is that the code appears to assume :> that PCPU_GET(cpuid) is stable in certain places, and I don't :> think that condition necessarily holds unless you explicitly :> enter a critical section (critical_enter() and critical_exit()). :> There are some cases where you obtain the per-cpu cache and lock :> it, which would be safe even if the cpu changed out from under :> you, and other case such as in uma_zalloc_internal() where you :> assume that the cpuid is stable when it isn't. : :Ok, I did make a PCPU_GET mistake.. If uma_zalloc_internal is called from :the fast path it needs to hand down a cache. The point of the locks is :so that you don't have to have a critical section around the entire :allocator. They really should be fast because they should only be cached :in one cpu's cache. This also makes it easier to drain. I think that the :preemption and migration case is going to be somewhat rare so it's ok to :block another cpu for a little while if it happens. As long as I pass :around a cpu # it shouldn't matter if I get preempted. Well, of course it is always nice when using a critical section to minimize the cycles, which is why you would only use it for the common-case code. When used properly it can save a lot of cycles. For i386 critical_enter() will soon be optimized down to an inlined non-bus-locked ++td->td_critnest and critical_exit() will wind up being essentially --td->td_critnest. That is a huge savings over a mutex which at a minimum is going to do a locked bus cycle to memory. You want to be careful not to penalize the critical path (i.e. the common case code) for the benefit of procedures which are executed comparitively rarely. That said, critical sections do not necessarily have to block interrupts. Bruce has demonstrated that certain FAST interrupts can in fact be allowed to operate even while in a critical section. The critical_*() code I will be committing as soon as possible gets us half way there and already allows certain interrupts (such as VM related IPIs) to execute while inside a critical section. For SMPng I am confident that at the very least we will be able to schedule ithreads even while in a critical section, as long as sched_lock is not being held or is being held in a safe zone, and we will probably be able to execute certain FAST interrupts as well. So I would not worry too much about critical sections blocking interrupts. They should be thought of as a mechanism to prevent unexpected preemption or cpu migration and, insofar as FAST interrupts do not usually call into other subsystems, to prevent unexpected alterations of the per-cpu data. They should not be thought of as a mechanism that blocks interrupts. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 15:10:40 2002 Delivered-To: freebsd-arch@freebsd.org Received: from angelica.unixdaemons.com (angelica.unixdaemons.com [209.148.64.135]) by hub.freebsd.org (Postfix) with ESMTP id 102F237B42A for ; Wed, 27 Feb 2002 15:10:27 -0800 (PST) Received: from angelica.unixdaemons.com (bmilekic@localhost.unixdaemons.com [127.0.0.1]) by angelica.unixdaemons.com (8.12.2/8.12.1) with ESMTP id g1RNABh4063580; Wed, 27 Feb 2002 18:10:11 -0500 (EST) Received: (from bmilekic@localhost) by angelica.unixdaemons.com (8.12.2/8.12.1/Submit) id g1RNABkO063579; Wed, 27 Feb 2002 18:10:11 -0500 (EST) (envelope-from bmilekic) Date: Wed, 27 Feb 2002 18:10:11 -0500 From: Bosko Milekic To: Terry Lambert Cc: Matthew Dillon , Alfred Perlstein , Julian Elischer , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator Message-ID: <20020227181011.A62898@unixdaemons.com> References: <200202271926.g1RJQCm29905@apollo.backplane.com> <20020227194256.GR80761@elvis.mu.org> <200202271955.g1RJtAj30178@apollo.backplane.com> <20020227151722.B42681@unixdaemons.com> <3C7D4958.D1B8CD3D@mindspring.com> <20020227171945.B46831@unixdaemons.com> <3C7D6586.558B8EE1@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3C7D6586.558B8EE1@mindspring.com>; from tlambert2@mindspring.com on Wed, Feb 27, 2002 at 03:02:30PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Feb 27, 2002 at 03:02:30PM -0800, Terry Lambert wrote: > Bosko Milekic wrote: > > OK, since you obviously know what you're talking about, how about you > > sit down and produce some patches for Jeff? I think he would appreciate > > it very much, instead of the generalizations and "you should not do > > this, but do X" abstractions. > > Before you get up in arms... in case this wasn't clear... > > I think that his code is importable without any patches. > > I was pointing out how the complaints people have can be > addressed without altering the theme or the majority of > the code itself. I was not suggesting that the changes > must be made before import. > > The comments on not renaming the files, and the prefix > on the name are salient, as are the statistics comments > (keep appropriate statistics, rather than trying to > emulate previously appropriate statistics). > > I would like to see him address the issues he feels need > to be addressed, but since the performance is not worse > with the code, all of that can be handled later, after > an import. > > It's certain that the current allocation code can't SMP > scale the way Jeff's code can and it's a step in the > right direction. > > Now is the time to get things into -current, so that > they can be stabilized (if necessary) and improved (I > think several people, myself included, believe the code > can be improved, but it doesn't have to be before it > can go in). > > Kirk likes the code; what else is required before simply > importing it, and making the vm_zone code optional so > that the rest can be converted? > > -- Terry Thank you. I believe that we now understand each other and are tuned into the same frequency. :-) -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 15:31:38 2002 Delivered-To: freebsd-arch@freebsd.org Received: from angelica.unixdaemons.com (angelica.unixdaemons.com [209.148.64.135]) by hub.freebsd.org (Postfix) with ESMTP id 0198737B417 for ; Wed, 27 Feb 2002 15:31:34 -0800 (PST) Received: from angelica.unixdaemons.com (bmilekic@localhost.unixdaemons.com [127.0.0.1]) by angelica.unixdaemons.com (8.12.2/8.12.1) with ESMTP id g1RNVOh4065953; Wed, 27 Feb 2002 18:31:24 -0500 (EST) Received: (from bmilekic@localhost) by angelica.unixdaemons.com (8.12.2/8.12.1/Submit) id g1RNVOdL065952; Wed, 27 Feb 2002 18:31:24 -0500 (EST) (envelope-from bmilekic) Date: Wed, 27 Feb 2002 18:31:24 -0500 From: Bosko Milekic To: Matthew Dillon Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator Message-ID: <20020227183124.B62898@unixdaemons.com> References: <20020227172755.W59764-100000@mail.chesapeake.net> <200202272307.g1RN75T31581@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200202272307.g1RN75T31581@apollo.backplane.com>; from dillon@apollo.backplane.com on Wed, Feb 27, 2002 at 03:07:05PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, Feb 27, 2002 at 03:07:05PM -0800, Matthew Dillon wrote: > For SMPng I am confident that at the very least we will be able to > schedule ithreads even while in a critical section, as long as > sched_lock is not being held or is being held in a safe zone, > and we will probably be able to execute certain FAST interrupts > as well. So I would not worry too much about critical sections > blocking interrupts. They should be thought of as a mechanism > to prevent unexpected preemption or cpu migration and, insofar as > FAST interrupts do not usually call into other subsystems, to > prevent unexpected alterations of the per-cpu data. They should > not be thought of as a mechanism that blocks interrupts. Well, we're going to be doing context-stealing soon, don't forget about that. That means that if we're in a critical section, we won't be able to take advantage of stealing by taking the context and running the handler immediately. So, to schedule an ithread we will need to get some sort of sched_lock. Sure, perhaps it won't be _the_ sched_lock, but it will have to be some sort of lock for the run queue we will be placing our ithread on when not performing context stealing. Personally, I don't believe that having the fast-path allocation be a critical section. We're headed in the pre-emption path and disabling pre-emption every time I want to allocate something is pretty unreasonable [*]. [*] Yes, I know that you don't like pre-emption. > -Matt -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 15:49:21 2002 Delivered-To: freebsd-arch@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 29FD337B402 for ; Wed, 27 Feb 2002 15:49:15 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g1RNnDF31841; Wed, 27 Feb 2002 15:49:13 -0800 (PST) (envelope-from dillon) Date: Wed, 27 Feb 2002 15:49:13 -0800 (PST) From: Matthew Dillon Message-Id: <200202272349.g1RNnDF31841@apollo.backplane.com> To: Bosko Milekic Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator References: <20020227172755.W59764-100000@mail.chesapeake.net> <200202272307.g1RN75T31581@apollo.backplane.com> <20020227183124.B62898@unixdaemons.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :.. :> FAST interrupts do not usually call into other subsystems, to :> prevent unexpected alterations of the per-cpu data. They should :> not be thought of as a mechanism that blocks interrupts. : : Well, we're going to be doing context-stealing soon, don't forget :about that. That means that if we're in a critical section, we won't be :able to take advantage of stealing by taking the context and running the :handler immediately. So, to schedule an ithread we will need to get some :sort of sched_lock. Sure, perhaps it won't be _the_ sched_lock, but it :will have to be some sort of lock for the run queue we will be placing :our ithread on when not performing context stealing. Personally, I don't :believe that having the fast-path allocation be a critical section. :We're headed in the pre-emption path and disabling pre-emption every :time I want to allocate something is pretty unreasonable [*]. : :[*] Yes, I know that you don't like pre-emption. : :-- :Bosko Milekic If an i-scheduled interrupt is blocked by a critical section which only lasts 1 uS, there will be no detrimental effect on system performance. If our codebase is pockmarked with thousands of little 1uS-long critical sections it will have no cumulative effect whatsoever on interrupt latencies. None. This may seem counter-intuitive but it's true, and anyone who has ever written embedded software will tell you the same thing. It isn't the number of places you disable preemption that kills you, it's the LONGEST place you disable preemption that kills you. It's true on everything from a little 8 bit microcomputer to an IBM mainframe. Or, to put it another way, every time the system takes an interrupt it disrupts code flow and pollutes the L1 and L2 caches of the cpu if the cpu is otherwise doing something useful (like running a system call). If an interrupt does not require real time response, there is no point disrupting the system in order to give it real time response. The system will actually operate more efficiently if you allow it to continue whatever it was already doing, at least for a little while. This doesn't mean that preemption is bad, just that it is bad if you go overboard trying to do it. Being rabid about critical sections classifies as "going overboard". Critical sections can save far, far more clock cycles then non-conflicting mutexes without any effect on interrupt performance if used properly. Also keep in mind that a critical section != sched_lock. Just using a critical section in something like the SLAB allocator does not have the same detrimental effect that holding the sched_lock (which obtains a critical section) has. Bruce has demonstrated this and it should be an obvious truism to anyone in SMP land. We need to work on reducing the time we hold sched_lock, but that is a very different issue then using a critical section to protect a per-process cache. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 15:57:56 2002 Delivered-To: freebsd-arch@freebsd.org Received: from green.bikeshed.org (freefall.FreeBSD.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 583E837B417; Wed, 27 Feb 2002 15:57:46 -0800 (PST) Received: from localhost (green@localhost) by green.bikeshed.org (8.11.6/8.11.6) with ESMTP id g1RNvjY46402; Wed, 27 Feb 2002 18:57:45 -0500 (EST) (envelope-from green@green.bikeshed.org) Message-Id: <200202272357.g1RNvjY46402@green.bikeshed.org> X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Alfred Perlstein Cc: bde@FreeBSD.org, arch@FreeBSD.org Subject: Re: Do we want the _SYS_SYSPROTO_H_ junk? In-Reply-To: Message from Alfred Perlstein of "Wed, 27 Feb 2002 14:53:41 PST." <20020227225341.GX80761@elvis.mu.org> From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 27 Feb 2002 18:57:45 -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Alfred Perlstein wrote: > * Brian Fundakowski Feldman [020227 14:39] wrote: > > Since obviously, by nature, all the code for syscall declarations inside > > #ifdef _SYS_SYSPROTO_H_ is bogus, is it truly useful to use it on new system > > calls, or should we not? I think it's worth having an entry in style(9) for > > system calls, and want to know what should be there regarding this. It > > seems the struct foo_args /* structure members stuff */ *uap; stuff is at > > least also consistent with what is similarly done with vnode operation > > declarations. > > > > What do you think? > > I think there's more important stuff to worry about than this. > > I also find that SYSPROTO helps when making syscall modules but > I'm not sure what you're getting at so I'd apprecciate it if you > held off whatever your plans are for a day. I want to know if, on new code, we should put them. E.g.: #ifndef _SYS_SYSPROTO_H_ struct open_args { char *path; int flags; int mode; }; #endif int open(td, uap) struct thread *td; register struct open_args /* { syscallarg(char *) path; syscallarg(int) flags; syscallarg(int) mode; } */ *uap; { The first part, if ever actually called into existence by sysproto.h not being included, would be bogus. Do we want to keep introducing those? -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org <> bfeldman@tislabs.com \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 16:20:10 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rwcrmhc53.attbi.com (rwcrmhc53.attbi.com [204.127.198.39]) by hub.freebsd.org (Postfix) with ESMTP id AC0EB37B417; Wed, 27 Feb 2002 16:20:07 -0800 (PST) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc53.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020228002007.ZXSG2951.rwcrmhc53.attbi.com@InterJet.elischer.org>; Thu, 28 Feb 2002 00:20:07 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id QAA02777; Wed, 27 Feb 2002 16:01:18 -0800 (PST) Date: Wed, 27 Feb 2002 16:01:16 -0800 (PST) From: Julian Elischer To: "Brian F. Feldman" Cc: Alfred Perlstein , bde@FreeBSD.org, arch@FreeBSD.org Subject: Re: Do we want the _SYS_SYSPROTO_H_ junk? In-Reply-To: <200202272357.g1RNvjY46402@green.bikeshed.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG well theoretically you should be using prototypes On Wed, 27 Feb 2002, Brian F. Feldman wrote: > #ifndef _SYS_SYSPROTO_H_ > struct open_args { > char *path; > int flags; > int mode; > }; > #endif > int > open(td, uap) > struct thread *td; <---- > register struct open_args /* { <---- > syscallarg(char *) path; > syscallarg(int) flags; > syscallarg(int) mode; > } */ *uap; > { > > The first part, if ever actually called into existence by sysproto.h not > being included, would be bogus. Do we want to keep introducing those? > > -- > Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ > <> green@FreeBSD.org <> bfeldman@tislabs.com \ The Power to Serve! \ > Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 20:59:45 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by hub.freebsd.org (Postfix) with ESMTP id 349E237B400 for ; Wed, 27 Feb 2002 20:59:34 -0800 (PST) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g1S4xXP99330 for ; Wed, 27 Feb 2002 23:59:33 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Wed, 27 Feb 2002 23:59:33 -0500 (EST) From: Jeff Roberson To: arch@freebsd.org Subject: Slab allocator update Message-ID: <20020227234433.Y59764-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I have updated UMA. It's available at http://www.chesapeake.net/~jroberson/uma.tar This fixes uma on x86, which was definately broken. Thanks go to phk for helping me debug this. This also fixes a few lock order reversals, and my bad use of cpuid in zalloc_internal. I have one more lock order reversal to fix that occurs some times when zones are drained. This patch has been tested on SMP alpha and a single proc x86. If you intend to test this patch on a SMP box with greater than 2 cpus please adjust maxcpu in uma_core.c There is a big XXX next to this that explains why. I'd like to summarize the input that I have received so far: 1) Some folks disagree with the per cpu lock, but that can be worked on later. 2) malloc_type Statistics should reflect what is feasible with this allocator design, and some stats don't always have to be completely acurate. 3) No real concensus on name space issues yet. 4) What I have so far is workable, incorporating other objects into uma is somewhat questionable. (mbufs and pbufs anyone?) If anyone disagrees with the statements above please let me know. My plans going forward are: 1) Fix the last lock order reversal. 2) Fixup the statistics for uma and malloc. 3) Convince people to test it. 4) Commit. 5) Work on converting everything to uma_* interfaces, and adding initializers. 6) nap Does anyone see any work items that I missed? I'd like the road to commit to be well defined so it actually happens. Otherwise I think I'll be maintaining patch sets forever. Thanks, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 27 23:20:49 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rwcrmhc52.attbi.com (rwcrmhc52.attbi.com [216.148.227.88]) by hub.freebsd.org (Postfix) with ESMTP id 2B66537B405 for ; Wed, 27 Feb 2002 23:20:26 -0800 (PST) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc52.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020228072006.RLEX1147.rwcrmhc52.attbi.com@InterJet.elischer.org>; Thu, 28 Feb 2002 07:20:06 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id XAA04287; Wed, 27 Feb 2002 23:04:31 -0800 (PST) Date: Wed, 27 Feb 2002 23:04:30 -0800 (PST) From: Julian Elischer To: Jeff Roberson Cc: arch@freebsd.org Subject: Re: Slab allocator update In-Reply-To: <20020227234433.Y59764-100000@mail.chesapeake.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 27 Feb 2002, Jeff Roberson wrote: > > 1) Fix the last lock order reversal. > 2) Fixup the statistics for uma and malloc. > 3) Convince people to test it. > 4) Commit. Please try integrate it in such a form that both new and old can be compiled in with a config option. (for a while) > 5) Work on converting everything to uma_* interfaces, and adding > initializers. Do lots of testingto prove that it's an improvement. if "yes", remove old system. if "no" figure out why.. keep old system or rewrite.. > 6) nap > > Does anyone see any work items that I missed? I'd like the road to commit > to be well defined so it actually happens. Otherwise I think I'll be > maintaining patch sets forever. > > > Thanks, > Jeff > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 0:12: 0 2002 Delivered-To: freebsd-arch@freebsd.org Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by hub.freebsd.org (Postfix) with ESMTP id 64EBA37B41A for ; Thu, 28 Feb 2002 00:11:58 -0800 (PST) Received: by elvis.mu.org (Postfix, from userid 1192) id 43828AE2A2; Thu, 28 Feb 2002 00:11:58 -0800 (PST) Date: Thu, 28 Feb 2002 00:11:58 -0800 From: Alfred Perlstein To: Julian Elischer Cc: Jeff Roberson , arch@freebsd.org Subject: Re: Slab allocator update Message-ID: <20020228081158.GG80761@elvis.mu.org> References: <20020227234433.Y59764-100000@mail.chesapeake.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.27i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * Julian Elischer [020227 23:20] wrote: > > > On Wed, 27 Feb 2002, Jeff Roberson wrote: > > > > > 1) Fix the last lock order reversal. > > 2) Fixup the statistics for uma and malloc. > > 3) Convince people to test it. > > 4) Commit. > > Please try integrate it in such a form that both new and old can be > compiled in with a config option. > (for a while) With the default for the _new_ one switched over two days after the initial commit so that Jeff doesn't get flooded with mail about it as he works out the first couple of panics. :) > if "yes", remove old system. > if "no" figure out why.. keep old system or rewrite.. Sounds appropriate. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' Tax deductible donations for FreeBSD: http://www.freebsdfoundation.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 1:52:18 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id 74E3F37B41A; Thu, 28 Feb 2002 01:52:07 -0800 (PST) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id UAA15829; Thu, 28 Feb 2002 20:52:03 +1100 Date: Thu, 28 Feb 2002 20:52:35 +1100 (EST) From: Bruce Evans X-X-Sender: To: "Brian F. Feldman" Cc: Alfred Perlstein , , Subject: Re: Do we want the _SYS_SYSPROTO_H_ junk? In-Reply-To: <200202272357.g1RNvjY46402@green.bikeshed.org> Message-ID: <20020228202851.X52134-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Wed, 27 Feb 2002, Brian F. Feldman wrote: > I want to know if, on new code, we should put them. E.g.: > > #ifndef _SYS_SYSPROTO_H_ > struct open_args { > char *path; > int flags; > int mode; > }; > #endif > int > open(td, uap) > struct thread *td; > register struct open_args /* { > syscallarg(char *) path; > syscallarg(int) flags; > syscallarg(int) mode; > } */ *uap; > { > > The first part, if ever actually called into existence by sysproto.h not > being included, would be bogus. Do we want to keep introducing those? The ifdef'ed version is better, if there is to be one at all, because it can at least in theory be checked automatically (e.g., by not including so that the struct is not declared, but somehow declare the function anyway; then compile and check if the compile worked and gave the same result). The second pseudo-declaration of the struct in the comment is bogus. I removed the comment globally when I implemented , but it came back in a few files in the Lite2 merge. Both versions are really just comments. It can be hard to remember what is in *uap without them. Automatic checking for the ifdefed version would just check the consistency of the comments. Wrong comments for simple things are worse than no comments. The same few files that have syscallarg() in comments also have SCARG() in code. We don't really use either syscallarg() or SCARG(). We just require the MD code to arrange the struct so that ordinary struct member references work right. I would prefer the MD code to push the struct members onto the stack so that no args structs or pseudo-declarations of them are required. I would keep introducing the ifdefed version of the struct while there is still an args struct. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 5:14:15 2002 Delivered-To: freebsd-arch@freebsd.org Received: from green.bikeshed.org (freefall.FreeBSD.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id DFE9337B402; Thu, 28 Feb 2002 05:14:06 -0800 (PST) Received: from localhost (green@localhost) by green.bikeshed.org (8.11.6/8.11.6) with ESMTP id g1SDE3t50441; Thu, 28 Feb 2002 08:14:04 -0500 (EST) (envelope-from green@green.bikeshed.org) Message-Id: <200202281314.g1SDE3t50441@green.bikeshed.org> X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Bruce Evans Cc: "Brian F. Feldman" , Alfred Perlstein , bde@FreeBSD.org, arch@FreeBSD.org Subject: Re: Do we want the _SYS_SYSPROTO_H_ junk? In-Reply-To: Your message of "Thu, 28 Feb 2002 20:52:35 +1100." <20020228202851.X52134-100000@gamplex.bde.org> From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 28 Feb 2002 08:14:03 -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Bruce Evans wrote: > On Wed, 27 Feb 2002, Brian F. Feldman wrote: > > > I want to know if, on new code, we should put them. E.g.: > > > > #ifndef _SYS_SYSPROTO_H_ > > struct open_args { > > char *path; > > int flags; > > int mode; > > }; > > #endif > > int > > open(td, uap) > > struct thread *td; > > register struct open_args /* { > > syscallarg(char *) path; > > syscallarg(int) flags; > > syscallarg(int) mode; > > } */ *uap; > > { > > > > The first part, if ever actually called into existence by sysproto.h not > > being included, would be bogus. Do we want to keep introducing those? > > The ifdef'ed version is better, if there is to be one at all, because it > can at least in theory be checked automatically (e.g., by not including > so that the struct is not declared, but somehow declare > the function anyway; then compile and check if the compile worked and > gave the same result). However, it doesn't _really_ match what's in sys/sysproto.h because it doesn't have explicit padding (so could potentially assemble to a different format for that object, rather than the one expected by the syscall handler). > The second pseudo-declaration of the struct in the comment is bogus. I > removed the comment globally when I implemented , but it > came back in a few files in the Lite2 merge. So for documentation's sake, since it really is useful to generally see the real "arguments" to a syscall (or vop, or ...) instead of the opaque one, we should not have the comment in the declaration itself but keep the one before it? I'd definitely prefer to have one. > Both versions are really just comments. It can be hard to remember what > is in *uap without them. Automatic checking for the ifdefed version would > just check the consistency of the comments. Wrong comments for simple > things are worse than no comments. > > The same few files that have syscallarg() in comments also have SCARG() > in code. We don't really use either syscallarg() or SCARG(). We just > require the MD code to arrange the struct so that ordinary struct member > references work right. I would prefer the MD code to push the struct > members onto the stack so that no args structs or pseudo-declarations > of them are required. Wouldn't this be not-too-hard to do by declaring an inline function with some assembly which would push the argument space onto the stack before struct proc * and then calling the sy_call? The only trouble seems to be C's general lack of wanting to let you dynamically choose an arbitrary amount of data onto the stack, unless there are endianness/object layout concerns. > I would keep introducing the ifdefed version of the struct while there > is still an args struct. Why, though, do we declare it with the explicit padding in sys/sysproto.h if that's just what the machine's compiler will pad it to in the first place? -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org <> bfeldman@tislabs.com \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 7:12:25 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail11.speakeasy.net (mail11.speakeasy.net [216.254.0.211]) by hub.freebsd.org (Postfix) with ESMTP id 43D7B37B402 for ; Thu, 28 Feb 2002 07:11:10 -0800 (PST) Received: (qmail 28950 invoked from network); 28 Feb 2002 15:11:01 -0000 Received: from unknown (HELO server.baldwin.cx) ([65.90.117.97]) (envelope-sender ) by mail11.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 28 Feb 2002 15:11:01 -0000 Received: from laptop.baldwin.cx (john@laptop.baldwin.cx [192.168.0.4]) by server.baldwin.cx (8.11.6/8.11.6) with ESMTP id g1SF9vG40915 for ; Thu, 28 Feb 2002 10:09:57 -0500 (EST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 Date: Thu, 28 Feb 2002 10:09:51 -0500 (EST) From: John Baldwin To: arch@FreeBSD.org Subject: SMPng Design (Well, some of it anyways) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG First, let me prefix this by saying that the last two weeks have been very stressful. If you guys are trying to kick me out of FreeBSD just keep it up, you almost convinced me to leave this time around. It sure seems to be your goal. :( Secondly, the changes we are making to the kernel with SMPng can't all be done in piecemeal fashion. Not all changes are 5 line commits. For example, the kernel preemption patch is rather small, however, it exposes a number of really obscure bugs that aren't easy to track down. Rather, kernel preemption is a long term goal, and having developed it in a side branch has helped give future direction as to how the kernel should go. If you guys can't handle a roadmap that has future milestones farther away than 1 week, then I have to wonder why even are doing SMPng. Also, we have most of the preemption stuff in current now. It's not optimal at the moment, but it's close enough that there is lower-hanging fruit (such as the proc locking stuff I've been doing instead) that gives us a bigger bang for the buck. Anyways, as I said at BSDCon (but was apparently ignored when I said it, just as people seem to not have noticed when I said was working on changing all the process credential stuff) I have been working on a sort of design document. It's up to about 7 pages of actual text in PS and PDF. There are some things in it that I'm sure are going to upset people. Mostly one of my themes that a lot of folks don't seem to share is this: We should not optimize yet. The kernel architecture is still in a state of flux and we need to be able to change API's when we find that the ones we have don't actually work. If we invest a lot of time optimizing the API's we have now, then it will be a big pain if we need to change that API. Also, people will not want to lose all their work doing optimization and thus will try to stall the needed API changes. I don't want to fight those battles. To reach the at least 10% worse than 4.x or better goal I announced at BSDCon for 5.0, I want us to be working on pushing down Giant by locking subsystems. Not cheating by trying to do very machine specific optimizations. I don't want i386 to meet the goal but all other arch's to be dog slow. I'd much rather have the effort spent on MI code. I know I probably just pissed a lot of you off, but I'm taking a long range view here, not a short range view. Anyways, the document as it currently stands can be found at: http://www.FreeBSD.org/~jhb/smpng/design/article.{ps,pdf} I will continue to add new sections and flush out the skeleton ones as I have time. The paper at the URL above will be updated as I do so. If you collectively decide as a group that I'm off my rocker and this is all crap, then I'll happily step down from SMPng and go do my work somewhere else as in that case I am obviously not the right person for the job. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 8: 8:28 2002 Delivered-To: freebsd-arch@freebsd.org Received: from Mail6.nc.rr.com (fe6.southeast.rr.com [24.93.67.53]) by hub.freebsd.org (Postfix) with ESMTP id 3275037B402 for ; Thu, 28 Feb 2002 08:08:26 -0800 (PST) Received: from beta ([66.57.78.89]) by Mail6.nc.rr.com with Microsoft SMTPSVC(5.5.1877.687.68); Thu, 28 Feb 2002 11:08:25 -0500 From: "Joe Magura" To: Subject: Strong Arm? Date: Thu, 28 Feb 2002 11:06:17 -0500 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Importance: Normal Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Folks, Any work being done on Strong Arm platform or any other "PocketPC" platforms? Thanks, Joe Magura To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 8:20:48 2002 Delivered-To: freebsd-arch@freebsd.org Received: from networld.com (relay.networld.com [65.88.251.103]) by hub.freebsd.org (Postfix) with ESMTP id 5D07B37B402 for ; Thu, 28 Feb 2002 08:20:45 -0800 (PST) Received: from [65.88.244.127] (HELO softweyr.com) by networld.com (CommuniGate Pro SMTP 3.4.7) with ESMTP id 14023609; Thu, 28 Feb 2002 09:15:17 -0700 Message-ID: <3C7E598A.7040402@softweyr.com> Date: Thu, 28 Feb 2002 09:23:38 -0700 From: Wes Peters User-Agent: Mozilla/5.0 (X11; U; Linux i386; en-US; rv:0.9.4) Gecko/20011126 Netscape6/6.2.1 X-Accept-Language: en-us MIME-Version: 1.0 To: Joe Magura Cc: freebsd-arch@FreeBSD.org Subject: Re: Strong Arm? References: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Joe Magura wrote: > Folks, > > Any work being done on Strong Arm platform or any other "PocketPC" > platforms? Yes, but it's being done over in the NetBSD project. (: -- "Where am I, and what am I doing in this handbasket?" Wes Peters Softweyr LLC wes@softweyr.com http://softweyr.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 10:41:11 2002 Delivered-To: freebsd-arch@freebsd.org Received: from storm.FreeBSD.org.uk (storm.FreeBSD.org.uk [194.242.139.170]) by hub.freebsd.org (Postfix) with ESMTP id 7E45137B402 for ; Thu, 28 Feb 2002 10:40:37 -0800 (PST) Received: (from uucp@localhost) by storm.FreeBSD.org.uk (8.11.6/8.11.6) with UUCP id g1SIeaa54205 for arch@freebsd.org; Thu, 28 Feb 2002 18:40:36 GMT (envelope-from mark@grimreaper.grondar.za) Received: from grimreaper (localhost [127.0.0.1]) by grimreaper.grondar.org (8.12.2/8.12.2) with ESMTP id g1SIaog4051908 for ; Thu, 28 Feb 2002 18:36:50 GMT (envelope-from mark@grimreaper.grondar.za) Message-Id: <200202281836.g1SIaog4051908@grimreaper.grondar.org> To: arch@freebsd.org Subject: Warning and lint(1) fixes. Review please. Date: Thu, 28 Feb 2002 18:36:50 +0000 From: Mark Murray Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hi Please review the enclosed fixes. I've been running most of them for more than a month, and they are heavily useful in fixing up lint moanings. (There are a smattering of __P() removals in there - they will be a separate commit). M -- o Mark Murray \_ O.\_ Warning: this .sig is umop ap!sdn Index: i386/include/atomic.h =================================================================== RCS file: /home/ncvs/src/sys/i386/include/atomic.h,v retrieving revision 1.26 diff -u -d -r1.26 atomic.h --- i386/include/atomic.h 28 Feb 2002 06:17:05 -0000 1.26 +++ i386/include/atomic.h 28 Feb 2002 09:43:25 -0000 @@ -89,6 +89,7 @@ * The assembly is volatilized to demark potential before-and-after side * effects if an interrupt or SMP collision were to occur. */ +#ifdef __GNUC__ #define ATOMIC_ASM(NAME, TYPE, OP, CONS, V) \ static __inline void \ atomic_##NAME##_##TYPE(volatile u_##TYPE *p, u_##TYPE v)\ @@ -97,6 +98,9 @@ : "+m" (*p) \ : CONS (V)); \ } +#else +#define ATOMIC_ASM(NAME, TYPE, OP, CONS, V) +#endif /* * Atomic compare and set, used by the mutex functions @@ -112,6 +116,7 @@ { int res = exp; +#ifdef __GNUC__ __asm __volatile( " pushfl ; " " cli ; " @@ -127,6 +132,7 @@ : "r" (src), /* 1 */ "m" (*(dst)) /* 2 */ : "memory"); +#endif return (res); } @@ -136,6 +142,7 @@ { int res = exp; +#ifdef __GNUC__ __asm __volatile ( " " __XSTRING(MPLOCKED) " " " cmpxchgl %1,%2 ; " @@ -147,6 +154,7 @@ : "r" (src), /* 1 */ "m" (*(dst)) /* 2 */ : "memory"); +#endif return (res); } @@ -375,12 +383,14 @@ { u_int result; +#ifdef __GNUC__ __asm __volatile ( " xorl %0,%0 ; " " xchgl %1,%0 ; " "# atomic_readandclear_int" : "=&r" (result) /* 0 (result) */ : "m" (*addr)); /* 1 (addr) */ +#endif return (result); } @@ -390,12 +400,14 @@ { u_long result; +#ifdef __GNUC__ __asm __volatile ( " xorl %0,%0 ; " " xchgl %1,%0 ; " "# atomic_readandclear_int" : "=&r" (result) /* 0 (result) */ : "m" (*addr)); /* 1 (addr) */ +#endif return (result); } Index: i386/include/bus_at386.h =================================================================== RCS file: /home/ncvs/src/sys/i386/include/bus_at386.h,v retrieving revision 1.18 diff -u -d -r1.18 bus_at386.h --- i386/include/bus_at386.h 18 Feb 2002 13:43:19 -0000 1.18 +++ i386/include/bus_at386.h 24 Feb 2002 21:28:54 -0000 @@ -274,6 +274,7 @@ else #endif { +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ 1: movb (%2),%%al \n\ @@ -282,6 +283,7 @@ "=D" (addr), "=c" (count) : "r" (bsh + offset), "0" (addr), "1" (count) : "%eax", "memory"); +#endif } #endif } @@ -301,6 +303,7 @@ else #endif { +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ 1: movw (%2),%%ax \n\ @@ -309,6 +312,7 @@ "=D" (addr), "=c" (count) : "r" (bsh + offset), "0" (addr), "1" (count) : "%eax", "memory"); +#endif } #endif } @@ -328,6 +332,7 @@ else #endif { +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ 1: movl (%2),%%eax \n\ @@ -336,6 +341,7 @@ "=D" (addr), "=c" (count) : "r" (bsh + offset), "0" (addr), "1" (count) : "%eax", "memory"); +#endif } #endif } @@ -374,7 +380,8 @@ if (tag == I386_BUS_SPACE_IO) #endif { - int _port_ = bsh + offset; \ + int _port_ = bsh + offset; +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ 1: inb %w2,%%al \n\ @@ -384,6 +391,7 @@ "=D" (addr), "=c" (count), "=d" (_port_) : "0" (addr), "1" (count), "2" (_port_) : "%eax", "memory", "cc"); +#endif } #endif #if defined(_I386_BUS_MEMIO_H_) @@ -391,7 +399,8 @@ else #endif { - int _port_ = bsh + offset; \ + int _port_ = bsh + offset; +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ repne \n\ @@ -399,6 +408,7 @@ "=D" (addr), "=c" (count), "=S" (_port_) : "0" (addr), "1" (count), "2" (_port_) : "memory", "cc"); +#endif } #endif } @@ -412,7 +422,8 @@ if (tag == I386_BUS_SPACE_IO) #endif { - int _port_ = bsh + offset; \ + int _port_ = bsh + offset; +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ 1: inw %w2,%%ax \n\ @@ -422,6 +433,7 @@ "=D" (addr), "=c" (count), "=d" (_port_) : "0" (addr), "1" (count), "2" (_port_) : "%eax", "memory", "cc"); +#endif } #endif #if defined(_I386_BUS_MEMIO_H_) @@ -429,7 +441,8 @@ else #endif { - int _port_ = bsh + offset; \ + int _port_ = bsh + offset; +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ repne \n\ @@ -437,6 +450,7 @@ "=D" (addr), "=c" (count), "=S" (_port_) : "0" (addr), "1" (count), "2" (_port_) : "memory", "cc"); +#endif } #endif } @@ -450,7 +464,8 @@ if (tag == I386_BUS_SPACE_IO) #endif { - int _port_ = bsh + offset; \ + int _port_ = bsh + offset; +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ 1: inl %w2,%%eax \n\ @@ -460,6 +475,7 @@ "=D" (addr), "=c" (count), "=d" (_port_) : "0" (addr), "1" (count), "2" (_port_) : "%eax", "memory", "cc"); +#endif } #endif #if defined(_I386_BUS_MEMIO_H_) @@ -467,7 +483,8 @@ else #endif { - int _port_ = bsh + offset; \ + int _port_ = bsh + offset; +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ repne \n\ @@ -475,6 +492,7 @@ "=D" (addr), "=c" (count), "=S" (_port_) : "0" (addr), "1" (count), "2" (_port_) : "memory", "cc"); +#endif } #endif } @@ -595,6 +613,7 @@ else #endif { +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ 1: lodsb \n\ @@ -603,6 +622,7 @@ "=S" (addr), "=c" (count) : "r" (bsh + offset), "0" (addr), "1" (count) : "%eax", "memory", "cc"); +#endif } #endif } @@ -622,6 +642,7 @@ else #endif { +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ 1: lodsw \n\ @@ -630,6 +651,7 @@ "=S" (addr), "=c" (count) : "r" (bsh + offset), "0" (addr), "1" (count) : "%eax", "memory", "cc"); +#endif } #endif } @@ -649,6 +671,7 @@ else #endif { +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ 1: lodsl \n\ @@ -657,6 +680,7 @@ "=S" (addr), "=c" (count) : "r" (bsh + offset), "0" (addr), "1" (count) : "%eax", "memory", "cc"); +#endif } #endif } @@ -696,7 +720,8 @@ if (tag == I386_BUS_SPACE_IO) #endif { - int _port_ = bsh + offset; \ + int _port_ = bsh + offset; +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ 1: lodsb \n\ @@ -706,6 +731,7 @@ "=d" (_port_), "=S" (addr), "=c" (count) : "0" (_port_), "1" (addr), "2" (count) : "%eax", "memory", "cc"); +#endif } #endif #if defined(_I386_BUS_MEMIO_H_) @@ -713,7 +739,8 @@ else #endif { - int _port_ = bsh + offset; \ + int _port_ = bsh + offset; +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ repne \n\ @@ -721,6 +748,7 @@ "=D" (_port_), "=S" (addr), "=c" (count) : "0" (_port_), "1" (addr), "2" (count) : "memory", "cc"); +#endif } #endif } @@ -734,7 +762,8 @@ if (tag == I386_BUS_SPACE_IO) #endif { - int _port_ = bsh + offset; \ + int _port_ = bsh + offset; +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ 1: lodsw \n\ @@ -744,6 +773,7 @@ "=d" (_port_), "=S" (addr), "=c" (count) : "0" (_port_), "1" (addr), "2" (count) : "%eax", "memory", "cc"); +#endif } #endif #if defined(_I386_BUS_MEMIO_H_) @@ -751,7 +781,8 @@ else #endif { - int _port_ = bsh + offset; \ + int _port_ = bsh + offset; +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ repne \n\ @@ -759,6 +790,7 @@ "=D" (_port_), "=S" (addr), "=c" (count) : "0" (_port_), "1" (addr), "2" (count) : "memory", "cc"); +#endif } #endif } @@ -772,7 +804,8 @@ if (tag == I386_BUS_SPACE_IO) #endif { - int _port_ = bsh + offset; \ + int _port_ = bsh + offset; +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ 1: lodsl \n\ @@ -782,6 +815,7 @@ "=d" (_port_), "=S" (addr), "=c" (count) : "0" (_port_), "1" (addr), "2" (count) : "%eax", "memory", "cc"); +#endif } #endif #if defined(_I386_BUS_MEMIO_H_) @@ -789,7 +823,8 @@ else #endif { - int _port_ = bsh + offset; \ + int _port_ = bsh + offset; +#ifdef __GNUC__ __asm __volatile(" \n\ cld \n\ repne \n\ @@ -797,6 +832,7 @@ "=D" (_port_), "=S" (addr), "=c" (count) : "0" (_port_), "1" (addr), "2" (count) : "memory", "cc"); +#endif } #endif } @@ -1167,10 +1203,12 @@ bus_space_barrier(bus_space_tag_t tag, bus_space_handle_t bsh, bus_size_t offset, bus_size_t len, int flags) { +#ifdef __GNUC__ if (flags & BUS_SPACE_BARRIER_READ) __asm __volatile("lock; addl $0,0(%%esp)" : : : "memory"); else __asm __volatile("" : : : "memory"); +#endif } #endif /* _I386_BUS_AT386_H_ */ Index: i386/include/pcpu.h =================================================================== RCS file: /home/ncvs/src/sys/i386/include/pcpu.h,v retrieving revision 1.32 diff -u -d -r1.32 pcpu.h --- i386/include/pcpu.h 11 Dec 2001 23:33:40 -0000 1.32 +++ i386/include/pcpu.h 28 Feb 2002 10:44:43 -0000 @@ -32,8 +32,22 @@ #ifdef _KERNEL #ifndef __GNUC__ -#error gcc is required to use this file -#endif + +#ifndef lint +#error gcc or lint is required to use this file +#else /* lint */ +#define __PCPU_PTR(name) +#define __PCPU_GET(name) +#define __PCPU_SET(name, val) +#define PCPU_GET(member) __PCPU_GET(pc_ ## member) +#define PCPU_PTR(member) __PCPU_PTR(pc_ ## member) +#define PCPU_SET(member, val) __PCPU_SET(pc_ ## member, val) +#define PCPU_MD_FIELDS \ + int foo; \ + char bar +#endif /* lint */ + +#else /* __GNUC__ */ #include #include @@ -141,6 +155,8 @@ #define PCPU_GET(member) __PCPU_GET(pc_ ## member) #define PCPU_PTR(member) __PCPU_PTR(pc_ ## member) #define PCPU_SET(member, val) __PCPU_SET(pc_ ## member, val) + +#endif /* __GNUC__ */ #endif /* _KERNEL */ Index: i386/include/profile.h =================================================================== RCS file: /home/ncvs/src/sys/i386/include/profile.h,v retrieving revision 1.26 diff -u -d -r1.26 profile.h --- i386/include/profile.h 31 Jan 2002 13:49:55 -0000 1.26 +++ i386/include/profile.h 19 Feb 2002 16:56:31 -0000 @@ -82,28 +82,35 @@ #define _MCOUNT_DECL static __inline void _mcount -#define MCOUNT \ -void \ -mcount() \ -{ \ - uintfptr_t selfpc, frompc; \ - /* \ - * Find the return address for mcount, \ - * and the return address for mcount's caller. \ - * \ - * selfpc = pc pushed by call to mcount \ - */ \ - asm("movl 4(%%ebp),%0" : "=r" (selfpc)); \ - /* \ - * frompc = pc pushed by call to mcount's caller. \ - * The caller's stack frame has already been built, so %ebp is \ - * the caller's frame pointer. The caller's raddr is in the \ - * caller's frame following the caller's caller's frame pointer. \ - */ \ - asm("movl (%%ebp),%0" : "=r" (frompc)); \ - frompc = ((uintfptr_t *)frompc)[1]; \ - _mcount(frompc, selfpc); \ +#ifdef __GNUC__ +#define MCOUNT \ +void \ +mcount() \ +{ \ + uintfptr_t selfpc, frompc; \ + /* \ + * Find the return address for mcount, \ + * and the return address for mcount's caller. \ + * \ + * selfpc = pc pushed by call to mcount \ + */ \ + asm("movl 4(%%ebp),%0" : "=r" (selfpc)); \ + /* \ + * frompc = pc pushed by call to mcount's caller. \ + * The caller's stack frame has already been built, so %ebp is \ + * the caller's frame pointer. The caller's raddr is in the \ + * caller's frame following the caller's caller's frame pointer.\ + */ \ + asm("movl (%%ebp),%0" : "=r" (frompc)); \ + frompc = ((uintfptr_t *)frompc)[1]; \ + _mcount(frompc, selfpc); \ +} +#else /* __GNUC__ */ +void \ +mcount() \ +{ \ } +#endif /* __GNUC__ */ typedef unsigned int uintfptr_t; @@ -117,16 +124,16 @@ #ifdef _KERNEL -void mcount __P((uintfptr_t frompc, uintfptr_t selfpc)); -void kmupetext __P((uintfptr_t nhighpc)); +void mcount(uintfptr_t frompc, uintfptr_t selfpc); +void kmupetext(uintfptr_t nhighpc); #ifdef GUPROF struct gmonparam; -void nullfunc_loop_profiled __P((void)); -void nullfunc_profiled __P((void)); -void startguprof __P((struct gmonparam *p)); -void stopguprof __P((struct gmonparam *p)); +void nullfunc_loop_profiled(void); +void nullfunc_profiled(void); +void startguprof(struct gmonparam *p); +void stopguprof(struct gmonparam *p); #else #define startguprof(p) #define stopguprof(p) @@ -139,12 +146,12 @@ __BEGIN_DECLS #ifdef __GNUC__ #ifdef __ELF__ -void mcount __P((void)) __asm(".mcount"); +void mcount(void) __asm(".mcount"); #else -void mcount __P((void)) __asm("mcount"); +void mcount(void) __asm("mcount"); #endif #endif -static void _mcount __P((uintfptr_t frompc, uintfptr_t selfpc)); +static void _mcount(uintfptr_t frompc, uintfptr_t selfpc); __END_DECLS #endif /* _KERNEL */ @@ -154,11 +161,11 @@ extern int cputime_bias; __BEGIN_DECLS -int cputime __P((void)); -void empty_loop __P((void)); -void mexitcount __P((uintfptr_t selfpc)); -void nullfunc __P((void)); -void nullfunc_loop __P((void)); +int cputime(void); +void empty_loop(void); +void mexitcount(uintfptr_t selfpc); +void nullfunc(void); +void nullfunc_loop(void); __END_DECLS #endif Index: sys/cdefs.h =================================================================== RCS file: /home/ncvs/src/sys/sys/cdefs.h,v retrieving revision 1.49 diff -u -d -r1.49 cdefs.h --- sys/cdefs.h 4 Dec 2001 01:29:54 -0000 1.49 +++ sys/cdefs.h 19 Feb 2002 15:32:10 -0000 @@ -112,6 +112,7 @@ * properly (old versions of gcc-2 supported the dead and pure features * in a different (wrong) way). */ +#ifdef __GNUC__ #if __GNUC__ < 2 || __GNUC__ == 2 && __GNUC_MINOR__ < 5 #define __dead2 #define __pure2 @@ -176,7 +177,6 @@ #define __printf0like(fmtarg, firstvararg) #endif -#ifdef __GNUC__ #define __strong_reference(sym,aliassym) \ extern __typeof (sym) aliassym __attribute__ ((__alias__ (#sym))); #ifdef __ELF__ @@ -244,7 +244,7 @@ #if !defined(lint) && !defined(STRIP_FBSDID) #define __FBSDID(s) __IDSTRING(__CONCAT(__rcsid_,__LINE__),s) #else -#define __FBSDID(s) struct __hack +#define __FBSDID(s) #endif #endif Index: sys/eventhandler.h =================================================================== RCS file: /home/ncvs/src/sys/sys/eventhandler.h,v retrieving revision 1.17 diff -u -d -r1.17 eventhandler.h --- sys/eventhandler.h 12 Sep 2001 08:38:05 -0000 1.17 +++ sys/eventhandler.h 19 Feb 2002 22:17:52 -0000 @@ -75,31 +75,33 @@ struct eventhandler_entry ee; \ type eh_func; \ }; \ -struct __hack +struct __hack_ ## name #define EVENTHANDLER_FAST_DEFINE(name, type) \ struct eventhandler_list Xeventhandler_list_ ## name = { #name }; \ -struct __hack +struct __hack_ ## name -#define EVENTHANDLER_FAST_INVOKE(name, args...) \ -do { \ - struct eventhandler_list *_el = &Xeventhandler_list_ ## name ; \ - struct eventhandler_entry *_ep, *_en; \ - \ - if (_el->el_flags & EHE_INITTED) { \ - lockmgr(&_el->el_lock, LK_EXCLUSIVE, NULL, curthread); \ - _ep = TAILQ_FIRST(&(_el->el_entries)); \ - while (_ep != NULL) { \ - _en = TAILQ_NEXT(_ep, ee_link); \ - ((struct eventhandler_entry_ ## name *)_ep)->eh_func(_ep->ee_arg , \ - ## args); \ - _ep = _en; \ - } \ - lockmgr(&_el->el_lock, LK_RELEASE, NULL, curthread); \ - } \ +#define EVENTHANDLER_FAST_INVOKE(name, args...) \ +do { \ + struct eventhandler_list *_el = &Xeventhandler_list_ ## name ; \ + struct eventhandler_entry *_ep, *_en; \ + \ + if (_el->el_flags & EHE_INITTED) { \ + lockmgr(&_el->el_lock, LK_EXCLUSIVE, NULL, curthread); \ + _ep = TAILQ_FIRST(&(_el->el_entries)); \ + while (_ep != NULL) { \ + _en = TAILQ_NEXT(_ep, ee_link); \ + ((struct eventhandler_entry_ ## name *)_ep)->eh_func( \ + _ep->ee_arg , \ + ## args \ + ); \ + _ep = _en; \ + } \ + lockmgr(&_el->el_lock, LK_RELEASE, NULL, curthread); \ + } \ } while (0) -#define EVENTHANDLER_FAST_REGISTER(name, func, arg, priority) \ +#define EVENTHANDLER_FAST_REGISTER(name, func, arg, priority) \ eventhandler_register(&Xeventhandler_list_ ## name, #name, func, arg, priority) #define EVENTHANDLER_FAST_DEREGISTER(name, tag) \ @@ -118,25 +120,27 @@ struct eventhandler_entry ee; \ type eh_func; \ }; \ -struct __hack +struct __hack_ ## name -#define EVENTHANDLER_INVOKE(name, args...) \ -do { \ - struct eventhandler_list *_el; \ - struct eventhandler_entry *_ep, *_en; \ - \ - if (((_el = eventhandler_find_list(#name)) != NULL) && \ - (_el->el_flags & EHE_INITTED)) { \ - lockmgr(&_el->el_lock, LK_EXCLUSIVE, NULL, curthread); \ - _ep = TAILQ_FIRST(&(_el->el_entries)); \ - while (_ep != NULL) { \ - _en = TAILQ_NEXT(_ep, ee_link); \ - ((struct eventhandler_entry_ ## name *)_ep)->eh_func(_ep->ee_arg , \ - ## args); \ - _ep = _en; \ - } \ - lockmgr(&_el->el_lock, LK_RELEASE, NULL, curthread); \ - } \ +#define EVENTHANDLER_INVOKE(name, args...) \ +do { \ + struct eventhandler_list *_el; \ + struct eventhandler_entry *_ep, *_en; \ + \ + if (((_el = eventhandler_find_list(#name)) != NULL) && \ + (_el->el_flags & EHE_INITTED)) { \ + lockmgr(&_el->el_lock, LK_EXCLUSIVE, NULL, curthread); \ + _ep = TAILQ_FIRST(&(_el->el_entries)); \ + while (_ep != NULL) { \ + _en = TAILQ_NEXT(_ep, ee_link); \ + ((struct eventhandler_entry_ ## name *)_ep)->eh_func( \ + _ep->ee_arg , \ + ## args \ + ); \ + _ep = _en; \ + } \ + lockmgr(&_el->el_lock, LK_RELEASE, NULL, curthread); \ + } \ } while (0) #define EVENTHANDLER_REGISTER(name, func, arg, priority) \ @@ -165,7 +169,7 @@ */ /* Shutdown events */ -typedef void (*shutdown_fn) __P((void *, int)); +typedef void (*shutdown_fn)(void *, int); #define SHUTDOWN_PRI_FIRST 0 #define SHUTDOWN_PRI_DEFAULT 10000 @@ -176,7 +180,7 @@ EVENTHANDLER_DECLARE(shutdown_final, shutdown_fn); /* Idle process event */ -typedef void (*idle_eventhandler_t) __P((void *, int)); +typedef void (*idle_eventhandler_t)(void *, int); #define IDLE_PRI_FIRST 10000 #define IDLE_PRI_LAST 20000 Index: sys/linker_set.h =================================================================== RCS file: /home/ncvs/src/sys/sys/linker_set.h,v retrieving revision 1.9 diff -u -d -r1.9 linker_set.h --- sys/linker_set.h 13 Jun 2001 10:58:39 -0000 1.9 +++ sys/linker_set.h 24 Feb 2002 21:50:30 -0000 @@ -42,6 +42,7 @@ * Private macros, not to be used outside this header file. */ /* this bit of h0h0magic brought to you by cpp */ +#ifdef __GNUC__ #define __GLOBL(sym) __GLOBL2(sym) #define __GLOBL2(sym) __asm(".globl " #sym) @@ -50,6 +51,11 @@ __GLOBL(__CONCAT(__stop_set_,set)); \ static void const * const __set_##set##_sym_##sym \ __attribute__((__section__("set_" #set),__unused__)) = &sym +#else /* !__GNUC__ */ +#define __GLOBL(sym) +#define __GLOBL2(sym) +#define __MAKE_SET(set, sym) +#endif /* __GNUC__ */ /* * Public macros. Index: sys/lock.h =================================================================== RCS file: /home/ncvs/src/sys/sys/lock.h,v retrieving revision 1.42 diff -u -d -r1.42 lock.h --- sys/lock.h 5 Jan 2002 08:47:13 -0000 1.42 +++ sys/lock.h 24 Feb 2002 21:41:04 -0000 @@ -245,8 +245,8 @@ witness_restore((lock), __CONCAT(n, __wf), __CONCAT(n, __wl)) #else /* WITNESS */ -#define WITNESS_INIT(lock) (lock)->lo_flags |= LO_INITIALIZED -#define WITNESS_DESTROY(lock) (lock)->lo_flags &= ~LO_INITIALIZED +#define WITNESS_INIT(lock) ((lock)->lo_flags |= LO_INITIALIZED) +#define WITNESS_DESTROY(lock) ((lock)->lo_flags &= ~LO_INITIALIZED) #define WITNESS_LOCK(lock, flags, file, line) #define WITNESS_UPGRADE(lock, flags, file, line) #define WITNESS_DOWNGRADE(lock, flags, file, line) Index: sys/malloc.h =================================================================== RCS file: /home/ncvs/src/sys/sys/malloc.h,v retrieving revision 1.54 diff -u -d -r1.54 malloc.h --- sys/malloc.h 10 Aug 2001 06:37:04 -0000 1.54 +++ sys/malloc.h 28 Feb 2002 10:24:13 -0000 @@ -153,7 +153,7 @@ * Deprecated macro versions of not-quite-malloc() and free(). */ #define MALLOC(space, cast, size, type, flags) \ - (space) = (cast)malloc((u_long)(size), (type), (flags)) + ((space) = (cast)malloc((u_long)(size), (type), (flags))) #define FREE(addr, type) free((addr), (type)) /* Index: sys/random.h =================================================================== RCS file: /home/ncvs/src/sys/sys/random.h,v retrieving revision 1.30 diff -u -d -r1.30 random.h --- sys/random.h 18 Feb 2001 17:40:47 -0000 1.30 +++ sys/random.h 24 Feb 2002 22:45:18 -0000 @@ -33,8 +33,15 @@ u_int read_random(void *, u_int); -enum esource { RANDOM_WRITE, RANDOM_KEYBOARD, RANDOM_MOUSE, RANDOM_NET, - RANDOM_INTERRUPT, ENTROPYSOURCE }; +enum esource { + RANDOM_START = 0, + RANDOM_WRITE = 0, + RANDOM_KEYBOARD, + RANDOM_MOUSE, + RANDOM_NET, + RANDOM_INTERRUPT, + ENTROPYSOURCE +}; void random_harvest(void *, u_int, u_int, u_int, enum esource); /* Allow the sysadmin to select the broad category of To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 11:23:23 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by hub.freebsd.org (Postfix) with ESMTP id C402837B405 for ; Thu, 28 Feb 2002 11:23:21 -0800 (PST) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g1SJNBt01025; Thu, 28 Feb 2002 14:23:11 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Thu, 28 Feb 2002 14:23:11 -0500 (EST) From: Jeff Roberson To: Julian Elischer Cc: Matthew Dillon , Subject: Re: Slab allocator In-Reply-To: Message-ID: <20020228141318.J31751-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Ok, both of you have requested more comments. I have commented what I felt were the trickier situations. Could you give me some examples of what seems to be lacking documentation? I'm perfectly willing to add it, but since I wrote the code most things seem perfectly intuitive to me. :-) Thanks, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 11:25:33 2002 Delivered-To: freebsd-arch@freebsd.org Received: from dragon.nuxi.com (trang.nuxi.com [66.92.13.169]) by hub.freebsd.org (Postfix) with ESMTP id 4A50A37B41B for ; Thu, 28 Feb 2002 11:25:29 -0800 (PST) Received: (from obrien@localhost) by dragon.nuxi.com (8.11.6/8.11.1) id g1SJPG330633; Thu, 28 Feb 2002 11:25:16 -0800 (PST) (envelope-from obrien) Date: Thu, 28 Feb 2002 11:21:12 -0800 From: "David O'Brien" To: Mark Murray Cc: arch@freebsd.org Subject: Re: Warning and lint(1) fixes. Review please. Message-ID: <20020228112112.A30563@dragon.nuxi.com> Reply-To: obrien@freebsd.org Mail-Followup-To: arch@freebsd.org References: <200202281836.g1SIaog4051908@grimreaper.grondar.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200202281836.g1SIaog4051908@grimreaper.grondar.org>; from mark@grondar.za on Thu, Feb 28, 2002 at 06:36:50PM +0000 X-Operating-System: FreeBSD 5.0-CURRENT Organization: The NUXI BSD group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, Feb 28, 2002 at 06:36:50PM +0000, Mark Murray wrote: > +#ifdef __GNUC__ > #define ATOMIC_ASM(NAME, TYPE, OP, CONS, V) \ > static __inline void \ > atomic_##NAME##_##TYPE(volatile u_##TYPE *p, u_##TYPE v)\ > @@ -97,6 +98,9 @@ > : "+m" (*p) \ > : CONS (V)); \ > } > +#else > +#define ATOMIC_ASM(NAME, TYPE, OP, CONS, V) > +#endif > > /* > * Atomic compare and set, used by the mutex functions > @@ -112,6 +116,7 @@ > { > int res = exp; > > +#ifdef __GNUC__ > __asm __volatile( > " pushfl ; " > " cli ; " > @@ -127,6 +132,7 @@ > : "r" (src), /* 1 */ > "m" (*(dst)) /* 2 */ > : "memory"); > +#endif Because you are changing obvious syntax errors, if one uses a non-GCC compiler, into things that will silently fail; I would be more comfortable with this change if you kept the errors. Can you add #error in the #else cases? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 11:34:49 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by hub.freebsd.org (Postfix) with ESMTP id 4AF6C37B405 for ; Thu, 28 Feb 2002 11:34:46 -0800 (PST) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g1SJYiD05037; Thu, 28 Feb 2002 14:34:44 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Thu, 28 Feb 2002 14:34:44 -0500 (EST) From: Jeff Roberson To: Terry Lambert , Cc: arch@FreeBSD.ORG Subject: Re: Slab allocator In-Reply-To: <3C7D44AE.820215AB@mindspring.com> Message-ID: <20020228142424.R529-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I'm very interested in potential optimizations to my current locking strategies. I understand why critical sections would be a good thing in the fast path. I actually started to implement it this way originally. I haven't been able to come up with a really clean way to implement the cache flushing that doesn't have some nasty side effect though. For instance, telling each cpu to flush it's caches would be somewhat possible, but then you'd need some way to effectively block and restart each cpu once the entire flush operation was done. This could lead to the page daemon thread being suspended until each cpu had entered malloc. That could be well over a time slice given that a processor may want to continue scheduling user threads and not actually do any allocations. As it is cpus are only blocked for the duration of the zone draining, so this would increase the latency here. Anyway, I'd appreciate more comments on the subject. Keep in mind the bucket flushing operations, and potentially statistics gathering as well. Thanks, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 12:15:36 2002 Delivered-To: freebsd-arch@freebsd.org Received: from storm.FreeBSD.org.uk (storm.FreeBSD.org.uk [194.242.139.170]) by hub.freebsd.org (Postfix) with ESMTP id 9127B37B41A; Thu, 28 Feb 2002 12:15:33 -0800 (PST) Received: (from uucp@localhost) by storm.FreeBSD.org.uk (8.11.6/8.11.6) with UUCP id g1SKFWg55114; Thu, 28 Feb 2002 20:15:32 GMT (envelope-from mark@grimreaper.grondar.za) Received: from grimreaper (localhost [127.0.0.1]) by grimreaper.grondar.org (8.12.2/8.12.2) with ESMTP id g1SKE1g4052891; Thu, 28 Feb 2002 20:14:01 GMT (envelope-from mark@grimreaper.grondar.za) Message-Id: <200202282014.g1SKE1g4052891@grimreaper.grondar.org> To: obrien@freebsd.org Cc: arch@freebsd.org Subject: Re: Warning and lint(1) fixes. Review please. References: <20020228112112.A30563@dragon.nuxi.com> In-Reply-To: <20020228112112.A30563@dragon.nuxi.com> ; from "David O'Brien" "Thu, 28 Feb 2002 11:21:12 PST." Date: Thu, 28 Feb 2002 20:14:01 +0000 From: Mark Murray Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > Because you are changing obvious syntax errors, if one uses a non-GCC > compiler, into things that will silently fail; I would be more > comfortable with this change if you kept the errors. > > Can you add #error in the #else cases? NP. I'll make that #error happen only #ifndef lint. Cool? M -- o Mark Murray \_ O.\_ Warning: this .sig is umop ap!sdn To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 12:21:50 2002 Delivered-To: freebsd-arch@freebsd.org Received: from dragon.nuxi.com (trang.nuxi.com [66.92.13.169]) by hub.freebsd.org (Postfix) with ESMTP id 6D48437B405 for ; Thu, 28 Feb 2002 12:21:45 -0800 (PST) Received: (from obrien@localhost) by dragon.nuxi.com (8.11.6/8.11.1) id g1SKLhf33827; Thu, 28 Feb 2002 12:21:43 -0800 (PST) (envelope-from obrien) Date: Thu, 28 Feb 2002 12:17:39 -0800 From: "David O'Brien" To: Mark Murray Cc: arch@freebsd.org Subject: Re: Warning and lint(1) fixes. Review please. Message-ID: <20020228121739.A33808@dragon.nuxi.com> Reply-To: obrien@freebsd.org References: <20020228112112.A30563@dragon.nuxi.com> <200202282014.g1SKE1g4052891@grimreaper.grondar.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200202282014.g1SKE1g4052891@grimreaper.grondar.org>; from mark@grondar.za on Thu, Feb 28, 2002 at 08:14:01PM +0000 X-Operating-System: FreeBSD 5.0-CURRENT Organization: The NUXI BSD group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, Feb 28, 2002 at 08:14:01PM +0000, Mark Murray wrote: > > Because you are changing obvious syntax errors, if one uses a non-GCC > > compiler, into things that will silently fail; I would be more > > comfortable with this change if you kept the errors. > > > > Can you add #error in the #else cases? > > NP. I'll make that #error happen only #ifndef lint. Cool? Why can't lint be made to accept #error, but continue processing? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 12:31:34 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rwcrmhc51.attbi.com (rwcrmhc51.attbi.com [204.127.198.38]) by hub.freebsd.org (Postfix) with ESMTP id D8D4D37B417; Thu, 28 Feb 2002 12:31:28 -0800 (PST) Received: from gateway.posi.net ([12.236.90.177]) by rwcrmhc51.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020228203128.CRSW2626.rwcrmhc51.attbi.com@gateway.posi.net>; Thu, 28 Feb 2002 20:31:28 +0000 Received: from localhost (kbyanc@localhost) by gateway.posi.net (8.11.6/8.11.6) with ESMTP id g1SKVRA11346; Thu, 28 Feb 2002 12:31:28 -0800 (PST) (envelope-from kbyanc@posi.net) X-Authentication-Warning: gateway.posi.net: kbyanc owned process doing -bs Date: Thu, 28 Feb 2002 12:31:27 -0800 (PST) From: Kelly Yancey To: John Baldwin Cc: arch@FreeBSD.ORG Subject: Re: SMPng Design (Well, some of it anyways) In-Reply-To: Message-ID: <20020228115007.R11198-100000@gateway.posi.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, 28 Feb 2002, John Baldwin wrote: > [ snip ] > > Secondly, the changes we are making to the kernel with SMPng can't all be done > in piecemeal fashion. Not all changes are 5 line commits. For example, the > kernel preemption patch is rather small, however, it exposes a number of > really obscure bugs that aren't easy to track down. Rather, kernel preemption > is a long term goal, and having developed it in a side branch has helped give > future direction as to how the kernel should go. If you guys can't handle a > roadmap that has future milestones farther away than 1 week, then I have to > wonder why even are doing SMPng. > > [ snip ] > > John Baldwin <>< http://www.FreeBSD.org/~jhb/ John, thank you for all of your hard work, both past and hopefully continuing, on SMPng. Both as a friend and as a fellow engineer, I have been disgusted by the abuse you have had to endure these past few weeks. I think it is important for everyone to remember that SMPng has its origins in a meeting almost two years ago amongst a select group of individuals who took it upon themselves to lead the development. As I understand, being one not at that meeting, some design took place and work was distributed amongst the group. It is now two years later, and who is leading SMPng? John is. John was not even invited to that meeting. I understand well how much events can detract from one's ability to contribute to the project, and I am not pointing fingers or accusing anyone of "dropping the ball." But we have to recognize the John has stepped up to the challenge of leading the SMPng development when others could not. As such, he deserves a modicum of respect from all developers, and I would think especially those who were unable to perform their self-appointed duties in SMPng development. Times have changed and perhaps others are finding time to become involved in SMPng. Perhaps they are finding John's direction to be different from what theirs would have been. That is beside the point. At this juncture, John is leading SMPng and has been for some time. The other developers who have been working with John have not voiced any concerns over how development has been progressing nor the direction it is progressing in. I also don't doubt that John welcomes all the help he can get, but that is the point: new work should be coordinated through John and within the existing SMPng design. It is disrespectful to all of the hard work he the rest of the SMPng team has done, and frankly a bit late in the game, to second-guess John now. John, don't let the air-chair generals distract you. Keep up the good work, Kelly kbyanc@{posi.net,FreeBSD.org} To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 12:35:33 2002 Delivered-To: freebsd-arch@freebsd.org Received: from storm.FreeBSD.org.uk (storm.FreeBSD.org.uk [194.242.139.170]) by hub.freebsd.org (Postfix) with ESMTP id EAD8637B402; Thu, 28 Feb 2002 12:35:30 -0800 (PST) Received: (from uucp@localhost) by storm.FreeBSD.org.uk (8.11.6/8.11.6) with UUCP id g1SKZUP55295; Thu, 28 Feb 2002 20:35:30 GMT (envelope-from mark@grimreaper.grondar.za) Received: from grimreaper (localhost [127.0.0.1]) by grimreaper.grondar.org (8.12.2/8.12.2) with ESMTP id g1SKV9g4053062; Thu, 28 Feb 2002 20:31:09 GMT (envelope-from mark@grimreaper.grondar.za) Message-Id: <200202282031.g1SKV9g4053062@grimreaper.grondar.org> To: obrien@freebsd.org Cc: arch@freebsd.org Subject: Re: Warning and lint(1) fixes. Review please. References: <20020228121739.A33808@dragon.nuxi.com> In-Reply-To: <20020228121739.A33808@dragon.nuxi.com> ; from "David O'Brien" "Thu, 28 Feb 2002 12:17:39 PST." Date: Thu, 28 Feb 2002 20:31:09 +0000 From: Mark Murray Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > On Thu, Feb 28, 2002 at 08:14:01PM +0000, Mark Murray wrote: > > > Because you are changing obvious syntax errors, if one uses a non-GCC > > > compiler, into things that will silently fail; I would be more > > > comfortable with this change if you kept the errors. > > > > > > Can you add #error in the #else cases? > > > > NP. I'll make that #error happen only #ifndef lint. Cool? > > Why can't lint be made to accept #error, but continue processing? 'Cos I'm trying to support multiple lints, and fixing them all seems a bit unreasonable :-) M -- o Mark Murray \_ O.\_ Warning: this .sig is umop ap!sdn To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 13: 2:38 2002 Delivered-To: freebsd-arch@freebsd.org Received: from dragon.nuxi.com (trang.nuxi.com [66.92.13.169]) by hub.freebsd.org (Postfix) with ESMTP id EF22937B49E for ; Thu, 28 Feb 2002 13:01:56 -0800 (PST) Received: (from obrien@localhost) by dragon.nuxi.com (8.11.6/8.11.1) id g1SL1hb34184; Thu, 28 Feb 2002 13:01:43 -0800 (PST) (envelope-from obrien) Date: Thu, 28 Feb 2002 12:57:39 -0800 From: "David O'Brien" To: Mark Murray Cc: arch@freebsd.org Subject: Re: Warning and lint(1) fixes. Review please. Message-ID: <20020228125739.A34165@dragon.nuxi.com> Reply-To: obrien@freebsd.org References: <20020228121739.A33808@dragon.nuxi.com> <200202282031.g1SKV9g4053062@grimreaper.grondar.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200202282031.g1SKV9g4053062@grimreaper.grondar.org>; from mark@grondar.za on Thu, Feb 28, 2002 at 08:31:09PM +0000 X-Operating-System: FreeBSD 5.0-CURRENT Organization: The NUXI BSD group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, Feb 28, 2002 at 08:31:09PM +0000, Mark Murray wrote: > > On Thu, Feb 28, 2002 at 08:14:01PM +0000, Mark Murray wrote: > > > > Because you are changing obvious syntax errors, if one uses a non-GCC > > > > compiler, into things that will silently fail; I would be more > > > > comfortable with this change if you kept the errors. > > > > > > > > Can you add #error in the #else cases? > > > > > > NP. I'll make that #error happen only #ifndef lint. Cool? > > > > Why can't lint be made to accept #error, but continue processing? > > 'Cos I'm trying to support multiple lints, and fixing them all seems > a bit unreasonable :-) I think dirtying up the code with tons of #ifndef lint is unreasonable. We have a base lint so that we can change things (ie, modify it). What lint's are you trying to support? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 13:15: 6 2002 Delivered-To: freebsd-arch@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 117CD37B42F for ; Thu, 28 Feb 2002 13:14:31 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g1SLEQW38736; Thu, 28 Feb 2002 13:14:26 -0800 (PST) (envelope-from dillon) Date: Thu, 28 Feb 2002 13:14:26 -0800 (PST) From: Matthew Dillon Message-Id: <200202282114.g1SLEQW38736@apollo.backplane.com> To: Jeff Roberson Cc: Julian Elischer , Subject: Re: Slab allocator References: <20020228141318.J31751-100000@mail.chesapeake.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :Ok, both of you have requested more comments. I have commented what I :felt were the trickier situations. Could you give me some examples of :what seems to be lacking documentation? I'm perfectly willing to add it, :but since I wrote the code most things seem perfectly intuitive to me. :-) : :Thanks, :Jeff Could you explain the change made (crinit() and crzero() calls) to kern/init_main.c ? Does uma_zalloc() automatically zero the returned memory? You replace MALLOC(..M_ZERO) in two places, example: - MALLOC(fp, struct file *, sizeof(struct file), M_FILE, M_WAITOK | M_ZERO ); + fp = uma_zalloc(file_zone, M_WAITOK); I recommend KASSERTing bucket_ub_ptr is in-bounds in vm/uma_core.c line 1403 (i.e. us_freecount and ub_ptr are coherent). I recommend KASSERTing that the uz_free_slab list is non-empty on line 1381. On line 82 of vm/uma_int.h you describe embedding a slab header in a Page. I could not find where in the code this embedding takes place (I looked for 'slab header' in the code but could not find anything related to the comment on line 82 of vm/uma_int.h In anycase, if you are embeddeding the slab header I recommend adding a magic number of the base of the slab header structure described in the comment and KASSERT()ing it as a sanity check. -- -- documentation -- -- uma_timeout line 185 uma_core.c. Could you describe what a 'working set calc.' is? What is a working set? In general, the comments at the head of the routines do not include context. For example, take hash_sfind(). 'Find a slab within a zone that has a matching data field'. What I would like to see is a comment like this: 'Find a slab within a zone that has a matching data field. This function is mainly used by the deallocator to locate the slab which owns an item data pointer.' Other examples. Line 266, the comment for hash_expand(), says what it does 'Expands the hash table for OFFPAGE zones', but does not say WHY it does it or HOW it does it. In anycase, this is general for all the routines. It would be nice if you did a once-over and clarified the context of the comment where it seems appropriate. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 13:25:38 2002 Delivered-To: freebsd-arch@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id D7D3637B43E for ; Thu, 28 Feb 2002 13:25:09 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g1SLP8738805; Thu, 28 Feb 2002 13:25:08 -0800 (PST) (envelope-from dillon) Date: Thu, 28 Feb 2002 13:25:08 -0800 (PST) From: Matthew Dillon Message-Id: <200202282125.g1SLP8738805@apollo.backplane.com> To: Jeff Roberson Cc: Terry Lambert , arch@FreeBSD.ORG Subject: Re: Slab allocator References: <20020228142424.R529-100000@mail.chesapeake.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :I'm very interested in potential optimizations to my current locking :strategies. I understand why critical sections would be a good thing in :the fast path. I actually started to implement it this way originally. I :haven't been able to come up with a really clean way to implement the :cache flushing that doesn't have some nasty side effect though. For :instance, telling each cpu to flush it's caches would be somewhat :possible, but then you'd need some way to effectively block and restart :each cpu once the entire flush operation was done. This could lead to the :page daemon thread being suspended until each cpu had entered malloc. :That could be well over a time slice given that a processor may want to :continue scheduling user threads and not actually do any allocations. As :it is cpus are only blocked for the duration of the zone draining, so this :would increase the latency here. : :Anyway, I'd appreciate more comments on the subject. Keep in mind the :bucket flushing operations, and potentially statistics gathering as well. : :Thanks, :Jeff In looking over the code only cache_drain() seems to need to lock some 'other' cpu's cache. So the immediate question is: "When does cache_drain() get called and is immediate action necessary?" In looking at your code, the sequence is: vm_pageout_scan() -> uma_reclaim() -> zone_drain() To me this indicates that timing is not absolutely critical, in which case I would recommend that instead of doing this directly in vm_pageout_scan() you instead flag it in the cpu's per-cpu area and do the drain on a per-cpu basis. Where to put the test I am not entirely sure, but it seems worthwhile considering that if you can accomplish this you can get rid of the per-cpu mutex entirely and use a critical section instead. Another alternative is to be a little more pro-active about the draining. For example, when allocating or freeing from a zone the code would also check for 'excessive cached data' and clean it out for the calling cpu. I am not advocating that you do all of this before you commit. On the contrary, I believe it is far more important to first get what you have stabilized and into the tree and then work on tuning it as another stage, similar to the two and three stage commit I intend to do for critical_*(). Stage 1: get the thing working and safe, Stage 2: cleanup and optimize, Stage 3: repeat, etc... -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 13:49: 2 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by hub.freebsd.org (Postfix) with ESMTP id 8E85137B400 for ; Thu, 28 Feb 2002 13:48:55 -0800 (PST) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g1SLmop54603; Thu, 28 Feb 2002 16:48:50 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Thu, 28 Feb 2002 16:48:50 -0500 (EST) From: Jeff Roberson To: Matthew Dillon Cc: Julian Elischer , Subject: Re: Slab allocator In-Reply-To: <200202282114.g1SLEQW38736@apollo.backplane.com> Message-ID: <20020228161626.N529-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > > Could you explain the change made (crinit() and crzero() calls) to > kern/init_main.c ? Sure. I picked a few things to add initializers for to give examples of how the object caching coud be used. To do this I had to figure out what state the objects were in when they were freed, and how they were initialized. So any freed state that matches initial state can be cached. When looking at creds I noticed that everyone calls crdup except for init_main. So the extra zeroing was unneccisary. This is what crzero does. crinit sets up the zone so that cred may be allocated. I'd rather do this than "if (cred_zone == NULL) create" all the time. The initial zeroing may be ok since it's going to be brought into cache anyway, so maybe it isn't worth it. > > Does uma_zalloc() automatically zero the returned memory? You replace > MALLOC(..M_ZERO) in two places, example: > > - MALLOC(fp, struct file *, sizeof(struct file), M_FILE, M_WAITOK | M_ZERO > ); > + fp = uma_zalloc(file_zone, M_WAITOK); I don't automatically zero memory on return. I have a common initializer that can be used for memory when it is first brought into the allocator. This was the behavior of vm_zone. I could provide a common zero constructor that zeros memory if that is desired. And again, as with the cred zone, the memory was zeroed and subsequently filled in with real values. Files and creds turned out to be less of a win with the object initializers than I was expecting. One of these originally had an embeded mutex but was later switched to a pool mutex. So the savings are pretty minimal. On some objects where there was a lot of dependable state I actually got noticable performance gains. > > I recommend KASSERTing bucket_ub_ptr is in-bounds in vm/uma_core.c > line 1403 (i.e. us_freecount and ub_ptr are coherent). Agreed. In my current working tree I have a few more bucket assertions. > > I recommend KASSERTing that the uz_free_slab list is non-empty > on line 1381. This is probably a good thing as well. > > On line 82 of vm/uma_int.h you describe embedding a slab header > in a Page. I could not find where in the code this embedding > takes place (I looked for 'slab header' in the code but could > not find anything related to the comment on line 82 of vm/uma_int.h If you look at the zone_ctor there is a rather convoluted process if the OFFPAGE flag is not set. This calculates the offset into the slab for the slab header. This is stored in the uz_pgoff field in the zone, and added to the page address of freed memory. If OFFPAGE is set, or if malloc is used, you always have to do a hash lookup to find the slab structure. If you look in slab_zalloc it either allocates a slab structure from the slabzone or it just points it into the recently allocated memory. > > In anycase, if you are embeddeding the slab header I recommend > adding a magic number of the base of the slab header structure > described in the comment and KASSERT()ing it as a sanity check. I originally had one.. I don't remember why I took it out. > > -- -- > documentation > -- -- > > uma_timeout line 185 uma_core.c. Could you describe what a > 'working set calc.' is? What is a working set? It is described in zone_workingset. Basically this tells us how many free items to keep in the zone if the vm asks us to return memory. The reason is so that you don't free all of your memory and then have an immediate need to reallocate. I will make things a little clearer. > > In general, the comments at the head of the routines do not include > context. For example, take hash_sfind(). 'Find a slab within a zone > that has a matching data field'. What I would like to see is a comment > like this: 'Find a slab within a zone that has a matching data field. > This function is mainly used by the deallocator to locate the slab > which owns an item data pointer.' > > Other examples. Line 266, the comment for hash_expand(), says > what it does 'Expands the hash table for OFFPAGE zones', but does > not say WHY it does it or HOW it does it. To reduce collisions. ;-) > > In anycase, this is general for all the routines. It would be > nice if you did a once-over and clarified the context of the comment > where it seems appropriate. > > -Matt > Matthew Dillon > > I definately see where you are going. My comments lack a bit of the big picture information. I will grab some information that I have included in this email, and review my documentation in general. I think I geared it more towards people who were already familiar with the bonwick papers. Thanks for the input! Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 14:15:57 2002 Delivered-To: freebsd-arch@freebsd.org Received: from storm.FreeBSD.org.uk (storm.FreeBSD.org.uk [194.242.139.170]) by hub.freebsd.org (Postfix) with ESMTP id 5EB6437B400; Thu, 28 Feb 2002 14:15:51 -0800 (PST) Received: (from uucp@localhost) by storm.FreeBSD.org.uk (8.11.6/8.11.6) with UUCP id g1SMFos60742; Thu, 28 Feb 2002 22:15:50 GMT (envelope-from mark@grimreaper.grondar.za) Received: from grimreaper (localhost [127.0.0.1]) by grimreaper.grondar.org (8.12.2/8.12.2) with ESMTP id g1SMAfg4054047; Thu, 28 Feb 2002 22:10:41 GMT (envelope-from mark@grimreaper.grondar.za) Message-Id: <200202282210.g1SMAfg4054047@grimreaper.grondar.org> To: obrien@freebsd.org Cc: arch@freebsd.org Subject: Re: Warning and lint(1) fixes. Review please. References: <20020228125739.A34165@dragon.nuxi.com> In-Reply-To: <20020228125739.A34165@dragon.nuxi.com> ; from "David O'Brien" "Thu, 28 Feb 2002 12:57:39 PST." Date: Thu, 28 Feb 2002 22:10:41 +0000 From: Mark Murray Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > > 'Cos I'm trying to support multiple lints, and fixing them all seems > > a bit unreasonable :-) > > I think dirtying up the code with tons of #ifndef lint is unreasonable. > We have a base lint so that we can change things (ie, modify it). > What lint's are you trying to support? The one in base, lclint from ports and flexelint (a commercial lint). So far. I am not religious about "messing up the code base", but I would like to strike some kind of middle ground. Having lint not barf over #error's is one of the problems I want to avoid. M -- o Mark Murray \_ O.\_ Warning: this .sig is umop ap!sdn To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 16:17: 9 2002 Delivered-To: freebsd-arch@freebsd.org Received: from blount.mail.mindspring.net (blount.mail.mindspring.net [207.69.200.226]) by hub.freebsd.org (Postfix) with ESMTP id 9C4A537B400; Thu, 28 Feb 2002 16:17:07 -0800 (PST) Received: from user-1120bnj.dsl.mindspring.com ([66.32.46.243] helo=europa2) by blount.mail.mindspring.net with smtp (Exim 3.33 #1) id 16gajR-0002fa-00; Thu, 28 Feb 2002 19:17:05 -0500 Message-Id: <3.0.6.32.20020228191531.00da5868@imatowns.com> X-Sender: ggombert@imatowns.com X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32) Date: Thu, 28 Feb 2002 19:15:31 -0500 To: John Baldwin , arch@FreeBSD.org From: Glenn Gombert Subject: Re: SMPng Design (Well, some of it anyways) In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I would offer just a 'general observation' that there seems to be more kernel locks/locking structures/mutexes that are required for basic use. If some sort of simplification to make the code easier to work on and maintain if fewer locking structures were used and the ground rules for when and where they were applied we defined in your document. Yes this might require some 'rework' of locks in general on the front end, but I think it might make the code that is in -current easier to maintain and enhance in the future (and certainly easier for new comers to work on as well) .. Glenn Gombert ggombert@imatowns.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 17:33:58 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail5.speakeasy.net (mail5.speakeasy.net [216.254.0.205]) by hub.freebsd.org (Postfix) with ESMTP id 18E3C37B41B for ; Thu, 28 Feb 2002 17:33:49 -0800 (PST) Received: (qmail 30614 invoked from network); 1 Mar 2002 01:33:39 -0000 Received: from unknown (HELO server.baldwin.cx) ([65.91.152.157]) (envelope-sender ) by mail5.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 1 Mar 2002 01:33:39 -0000 Received: from laptop.baldwin.cx (john@laptop.baldwin.cx [192.168.0.4]) by server.baldwin.cx (8.11.6/8.11.6) with ESMTP id g211XLG42454; Thu, 28 Feb 2002 20:33:21 -0500 (EST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <3.0.6.32.20020228191531.00da5868@imatowns.com> Date: Thu, 28 Feb 2002 20:33:14 -0500 (EST) From: John Baldwin To: Glenn Gombert Subject: Re: SMPng Design (Well, some of it anyways) Cc: arch@FreeBSD.org Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 01-Mar-02 Glenn Gombert wrote: > > I would offer just a 'general observation' that there seems to be more > kernel locks/locking structures/mutexes that are required for basic use. If > some sort of simplification to make the code easier to work on and maintain > if fewer locking structures were used and the ground rules for when and > where they were applied we defined in your document. Yes this might require > some 'rework' of locks in general on the front end, but I think it might > make the code that is in -current easier to maintain and enhance in the > future (and certainly easier for new comers to work on as well) .. How do you mean exactly? More locks in that we now have more primitives than before or more locks in that we have more actual locks than before? > Glenn Gombert > ggombert@imatowns.com -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 23:26:47 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mailgate.originative.co.uk (mailgate.originative.co.uk [62.232.68.68]) by hub.freebsd.org (Postfix) with ESMTP id 927CA37B402; Thu, 28 Feb 2002 23:26:45 -0800 (PST) Received: from lobster.originative.co.uk (lobster [62.232.68.81]) by mailgate.originative.co.uk (Postfix) with ESMTP id 733B91D169; Fri, 1 Mar 2002 07:26:43 +0000 (GMT) Subject: Re: cvs commit: src/sys/tools vnode_if.awk From: Paul Richards To: "David E. O'Brien" Cc: arch@freebsd.org In-Reply-To: <200203010120.g211KOT81981@freefall.freebsd.org> References: <200203010120.g211KOT81981@freefall.freebsd.org> Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/1.0 (Preview Release) Date: 01 Mar 2002 07:26:42 +0000 Message-Id: <1014967603.88498.0.camel@lobster.freebsd-services.com> Mime-Version: 1.0 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Re-directed to arch. On Fri, 2002-03-01 at 01:20, David E. O'Brien wrote: > obrien 2002/02/28 17:20:24 PST > > Modified files: > sys/tools vnode_if.awk > Log: > Return vnode_if back to its AWK roots. > It became a Perl script in rev 1.20. This removes one more dependence > on perl for the kernel build. Is there an "official" policy about what's happening with Perl in our tree? Personally. I think it would be a mistake for FreeBSD to move away from using Perl. It's now standard in a lot of operating systems and far from moving away we should probably be making more use of it. Paul. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 23:35:28 2002 Delivered-To: freebsd-arch@freebsd.org Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by hub.freebsd.org (Postfix) with ESMTP id 2CCEB37B402; Thu, 28 Feb 2002 23:35:26 -0800 (PST) Received: by elvis.mu.org (Postfix, from userid 1098) id 05FF0AE279; Thu, 28 Feb 2002 23:35:26 -0800 (PST) Date: Thu, 28 Feb 2002 23:35:26 -0800 From: Bill Fumerola To: Paul Richards Cc: "David E. O'Brien" , arch@freebsd.org Subject: Re: cvs commit: src/sys/tools vnode_if.awk Message-ID: <20020301073525.GL803@elvis.mu.org> References: <200203010120.g211KOT81981@freefall.freebsd.org> <1014967603.88498.0.camel@lobster.freebsd-services.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1014967603.88498.0.camel@lobster.freebsd-services.com> User-Agent: Mutt/1.3.27i X-Operating-System: FreeBSD 4.5-MUORG-20020215 i386 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, Mar 01, 2002 at 07:26:42AM +0000, Paul Richards wrote: > Is there an "official" policy about what's happening with Perl in our > tree? > > Personally. I think it would be a mistake for FreeBSD to move away from > using Perl. It's now standard in a lot of operating systems and far from > moving away we should probably be making more use of it. has it really been 3 months? time for the quarterly "perl's use in the freebsd build" bikeshed! looking forward to re-reading a bunch of points already in the archives, -- - bill fumerola / fumerola@yahoo-inc.com / billf@FreeBSD.org / billf@mu.org - my anger management counselor can beat up your self-affirmation therapist To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 28 23:37:42 2002 Delivered-To: freebsd-arch@freebsd.org Received: from dragon.nuxi.com (trang.nuxi.com [66.92.13.169]) by hub.freebsd.org (Postfix) with ESMTP id BBC2337B400 for ; Thu, 28 Feb 2002 23:37:39 -0800 (PST) Received: (from obrien@localhost) by dragon.nuxi.com (8.11.6/8.11.1) id g217bJR03545; Thu, 28 Feb 2002 23:37:19 -0800 (PST) (envelope-from obrien) Date: Thu, 28 Feb 2002 23:37:19 -0800 From: "David O'Brien" To: Paul Richards Cc: arch@freebsd.org Subject: Re: cvs commit: src/sys/tools vnode_if.awk Message-ID: <20020228233719.A1473@dragon.nuxi.com> Reply-To: obrien@freebsd.org References: <200203010120.g211KOT81981@freefall.freebsd.org> <1014967603.88498.0.camel@lobster.freebsd-services.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <1014967603.88498.0.camel@lobster.freebsd-services.com>; from paul@freebsd-services.com on Fri, Mar 01, 2002 at 07:26:42AM +0000 X-Operating-System: FreeBSD 5.0-CURRENT Organization: The NUXI BSD group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, Mar 01, 2002 at 07:26:42AM +0000, Paul Richards wrote: > Personally. I think it would be a mistake for FreeBSD to move away from > using Perl. It's now standard in a lot of operating systems and far from > moving away we should probably be making more use of it. Send me a sparc64 and PowerPC binary and we can talk. Until then Perl is on its way out of being part of the kernel build. It is an impediment to porting to a new platform. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Mar 1 9:38:14 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail.rpi.edu (mail.rpi.edu [128.113.22.40]) by hub.freebsd.org (Postfix) with ESMTP id 5BD4937B41B for ; Fri, 1 Mar 2002 09:38:04 -0800 (PST) Received: from [128.113.24.47] (gilead.acs.rpi.edu [128.113.24.47]) by mail.rpi.edu (8.12.1/8.12.1) with ESMTP id g21Hbtkk154152; Fri, 1 Mar 2002 12:38:00 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <1014967603.88498.0.camel@lobster.freebsd-services.com> References: <200203010120.g211KOT81981@freefall.freebsd.org> <1014967603.88498.0.camel@lobster.freebsd-services.com> Date: Fri, 1 Mar 2002 11:50:51 -0500 To: Paul Richards From: Garance A Drosihn Subject: Re: cvs commit: src/sys/tools vnode_if.awk Cc: arch@FreeBSD.ORG Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: MIMEDefang 2.3 (www dot roaringpenguin dot com slash mimedefang) Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG At 7:26 AM +0000 3/1/02, Paul Richards wrote: >On Fri, 2002-03-01 at 01:20, David E. O'Brien wrote: > > Log: > > Return vnode_if back to its AWK roots. > > It became a Perl script in rev 1.20. This removes one more > > dependence on perl for the kernel build. > >Is there an "official" policy about what's happening with Perl in our >tree? > >Personally. I think it would be a mistake for FreeBSD to move away >from using Perl. It's now standard in a lot of operating systems and >far from moving away we should probably be making more use of it. There have been several threads on why it is a problem for perl to be used as part of the system-building process (such as building kernels). It is particularly painful when bringing up freebsd on some new hardware platform. Removing perl from the system-build process is not the same as removing it from freebsd. -- Garance Alistair Drosehn = gad@eclipse.acs.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Mar 1 11:35:53 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id 14F8D37B405 for ; Fri, 1 Mar 2002 11:35:45 -0800 (PST) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id GAA02617; Sat, 2 Mar 2002 06:35:36 +1100 Date: Sat, 2 Mar 2002 06:36:13 +1100 (EST) From: Bruce Evans X-X-Sender: To: Mark Murray Cc: Subject: Re: Warning and lint(1) fixes. Review please. In-Reply-To: <200202281836.g1SIaog4051908@grimreaper.grondar.org> Message-ID: <20020302060943.U58081-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, 28 Feb 2002, Mark Murray wrote: > Please review the enclosed fixes. I've been running most of them > for more than a month, and they are heavily useful in fixing up > lint moanings. > Index: i386/include/atomic.h > =================================================================== > RCS file: /home/ncvs/src/sys/i386/include/atomic.h,v > retrieving revision 1.26 > diff -u -d -r1.26 atomic.h > --- i386/include/atomic.h 28 Feb 2002 06:17:05 -0000 1.26 > +++ i386/include/atomic.h 28 Feb 2002 09:43:25 -0000 > @@ -89,6 +89,7 @@ > * The assembly is volatilized to demark potential before-and-after side > * effects if an interrupt or SMP collision were to occur. > */ > +#ifdef __GNUC__ > #define ATOMIC_ASM(NAME, TYPE, OP, CONS, V) \ > static __inline void \ > atomic_##NAME##_##TYPE(volatile u_##TYPE *p, u_##TYPE v)\ > @@ -97,6 +98,9 @@ > : "+m" (*p) \ > : CONS (V)); \ > } > +#else > +#define ATOMIC_ASM(NAME, TYPE, OP, CONS, V) > +#endif Should be an extern function for the non-gcc case. We already have some support for this (we build extern versions in atomic.c). > @@ -112,6 +116,7 @@ > { > int res = exp; > > +#ifdef __GNUC__ > __asm __volatile( > " pushfl ; " > " cli ; " > @@ -127,6 +132,7 @@ > : "r" (src), /* 1 */ > "m" (*(dst)) /* 2 */ > : "memory"); > +#endif > > return (res); > } This works (static function instead of extern), but it gives messier code. Keep the non-gcc declarations separate. > Index: i386/include/bus_at386.h > =================================================================== > RCS file: /home/ncvs/src/sys/i386/include/bus_at386.h,v > retrieving revision 1.18 > diff -u -d -r1.18 bus_at386.h > --- i386/include/bus_at386.h 18 Feb 2002 13:43:19 -0000 1.18 > +++ i386/include/bus_at386.h 24 Feb 2002 21:28:54 -0000 > @@ -274,6 +274,7 @@ > else > #endif > { > +#ifdef __GNUC__ > __asm __volatile(" \n\ > cld \n\ > 1: movb (%2),%%al \n\ As in atomic.h, but more so. There are zillions of interfaces in this file. I'm surprised that you didn't have to change more. > @@ -374,7 +380,8 @@ > if (tag == I386_BUS_SPACE_IO) > #endif > { > - int _port_ = bsh + offset; \ > + int _port_ = bsh + offset; OK to fix all of these :-). > Index: i386/include/pcpu.h > =================================================================== > RCS file: /home/ncvs/src/sys/i386/include/pcpu.h,v > retrieving revision 1.32 > diff -u -d -r1.32 pcpu.h > --- i386/include/pcpu.h 11 Dec 2001 23:33:40 -0000 1.32 > +++ i386/include/pcpu.h 28 Feb 2002 10:44:43 -0000 > @@ -32,8 +32,22 @@ > #ifdef _KERNEL > > #ifndef __GNUC__ > -#error gcc is required to use this file > -#endif > + > +#ifndef lint > +#error gcc or lint is required to use this file > +#else /* lint */ > +#define __PCPU_PTR(name) > +#define __PCPU_GET(name) > +#define __PCPU_SET(name, val) I can't think of any good way to handle this. > Index: sys/cdefs.h > =================================================================== > RCS file: /home/ncvs/src/sys/sys/cdefs.h,v > retrieving revision 1.49 > diff -u -d -r1.49 cdefs.h > --- sys/cdefs.h 4 Dec 2001 01:29:54 -0000 1.49 > +++ sys/cdefs.h 19 Feb 2002 15:32:10 -0000 > @@ -112,6 +112,7 @@ > * properly (old versions of gcc-2 supported the dead and pure features > * in a different (wrong) way). > */ > +#ifdef __GNUC__ > #if __GNUC__ < 2 || __GNUC__ == 2 && __GNUC_MINOR__ < 5 > #define __dead2 > #define __pure2 Bogus. If __GNUC__ is not defined, then it is less than 2. > @@ -176,7 +177,6 @@ > #define __printf0like(fmtarg, firstvararg) > #endif > > -#ifdef __GNUC__ > #define __strong_reference(sym,aliassym) \ > extern __typeof (sym) aliassym __attribute__ ((__alias__ (#sym))); > #ifdef __ELF__ This gcc ifdef is unrelated to the one above. It should have its own version checks. I think __alias__ is a syntax error except for relative recent versions of gcc. > @@ -244,7 +244,7 @@ > #if !defined(lint) && !defined(STRIP_FBSDID) > #define __FBSDID(s) __IDSTRING(__CONCAT(__rcsid_,__LINE__),s) > #else > -#define __FBSDID(s) struct __hack > +#define __FBSDID(s) > #endif > #endif This breaks enforcement of a semicolon after __FBSDID(). > Index: sys/eventhandler.h > =================================================================== > RCS file: /home/ncvs/src/sys/sys/eventhandler.h,v > retrieving revision 1.17 > diff -u -d -r1.17 eventhandler.h > --- sys/eventhandler.h 12 Sep 2001 08:38:05 -0000 1.17 > +++ sys/eventhandler.h 19 Feb 2002 22:17:52 -0000 > @@ -75,31 +75,33 @@ > struct eventhandler_entry ee; \ > type eh_func; \ > }; \ > -struct __hack > +struct __hack_ ## name The same incomplete struct should be used for this everywhere. rm-rf any lint that doesn't like this. > +#define EVENTHANDLER_FAST_INVOKE(name, args...) \ > +do { \ > + struct eventhandler_list *_el = &Xeventhandler_list_ ## name ; \ > + struct eventhandler_entry *_ep, *_en; \ > + \ > + if (_el->el_flags & EHE_INITTED) { \ > + lockmgr(&_el->el_lock, LK_EXCLUSIVE, NULL, curthread); \ > + _ep = TAILQ_FIRST(&(_el->el_entries)); \ > + while (_ep != NULL) { \ > + _en = TAILQ_NEXT(_ep, ee_link); \ > + ((struct eventhandler_entry_ ## name *)_ep)->eh_func( \ > + _ep->ee_arg , \ > + ## args \ > + ); \ > + _ep = _en; \ > + } \ > + lockmgr(&_el->el_lock, LK_RELEASE, NULL, curthread); \ > + } \ > } while (0) This is almost readable now. It still has 4-char indents. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Mar 1 11:40:15 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rwcrmhc54.attbi.com (rwcrmhc54.attbi.com [216.148.227.87]) by hub.freebsd.org (Postfix) with ESMTP id 4EC4C37B41B for ; Fri, 1 Mar 2002 11:40:13 -0800 (PST) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc54.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020301194013.LSZC1214.rwcrmhc54.attbi.com@InterJet.elischer.org>; Fri, 1 Mar 2002 19:40:13 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id LAA12545; Fri, 1 Mar 2002 11:24:18 -0800 (PST) Date: Fri, 1 Mar 2002 11:24:16 -0800 (PST) From: Julian Elischer To: Brooks Davis Cc: Maxime Henrion , freebsd-arch@freebsd.org Subject: Re: Patches to if_loop + the interface cloning framework In-Reply-To: <20020212155646.A26408@Odin.AC.HMC.Edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I think loopback is not really 'optional' and should come as soon as you have any networking at all. i.e. I think it should be removed as an option/module and made dependent on having any networking. On Tue, 12 Feb 2002, Brooks Davis wrote: > On Tue, Feb 12, 2002 at 05:44:53PM -0600, Maxime Henrion wrote: > > > > I've updated the patch at the same location, adding a panic() in case > > the creation of lo0 fails. I'll be interested in removing the KLD > > stuff from this file since it's not working anyway, and adds some error > > checking, but that will be a bit later ;-) > > Looks good though error checking would be nice. ;-) I'm not convinced > that removing the module support is a good idea. I'd much rather move > in the other direction in general. > > -- Brooks > > -- > Any statement of the form "X is the one, true Y" is FALSE. > PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Mar 1 12:35:56 2002 Delivered-To: freebsd-arch@freebsd.org Received: from storm.FreeBSD.org.uk (storm.FreeBSD.org.uk [194.242.139.170]) by hub.freebsd.org (Postfix) with ESMTP id 15AA537B402 for ; Fri, 1 Mar 2002 12:35:48 -0800 (PST) Received: (from uucp@localhost) by storm.FreeBSD.org.uk (8.11.6/8.11.6) with UUCP id g21KZhL92457; Fri, 1 Mar 2002 20:35:44 GMT (envelope-from mark@grimreaper.grondar.za) Received: from grimreaper (localhost [127.0.0.1]) by grimreaper.grondar.org (8.12.2/8.12.2) with ESMTP id g21KYGg4074172; Fri, 1 Mar 2002 20:34:16 GMT (envelope-from mark@grimreaper.grondar.za) Message-Id: <200203012034.g21KYGg4074172@grimreaper.grondar.org> To: Bruce Evans Cc: arch@FreeBSD.ORG Subject: Re: Warning and lint(1) fixes. Review please. References: <20020302060943.U58081-100000@gamplex.bde.org> In-Reply-To: <20020302060943.U58081-100000@gamplex.bde.org> ; from Bruce Evans "Sat, 02 Mar 2002 06:36:13 +1100." Date: Fri, 01 Mar 2002 20:34:16 +0000 From: Mark Murray Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > > +#else > > +#define ATOMIC_ASM(NAME, TYPE, OP, CONS, V) > > +#endif > > Should be an extern function for the non-gcc case. We already have some > support for this (we build extern versions in atomic.c). Hmm. Nice. I like that. After looking, it is nasty, because there is a type in there, so I've made it into a #define that also declares a n extern function. > > @@ -112,6 +116,7 @@ > > { > > int res = exp; > > > > +#ifdef __GNUC__ > > __asm __volatile( > > " pushfl ; " > > " cli ; " > > @@ -127,6 +132,7 @@ > > : "r" (src), /* 1 */ > > "m" (*(dst)) /* 2 */ > > : "memory"); > > +#endif > > > > return (res); > > } > > This works (static function instead of extern), but it gives messier code. > Keep the non-gcc declarations separate. IMO, that creates divergent declarations, but other files do that too, so NP. > > Index: i386/include/bus_at386.h > > =================================================================== > > RCS file: /home/ncvs/src/sys/i386/include/bus_at386.h,v > > retrieving revision 1.18 > > diff -u -d -r1.18 bus_at386.h > > --- i386/include/bus_at386.h 18 Feb 2002 13:43:19 -0000 1.18 > > +++ i386/include/bus_at386.h 24 Feb 2002 21:28:54 -0000 > > @@ -274,6 +274,7 @@ > > else > > #endif > > { > > +#ifdef __GNUC__ > > __asm __volatile(" \n\ > > cld \n\ > > 1: movb (%2),%%al \n\ > > As in atomic.h, but more so. There are zillions of interfaces in this file. > I'm surprised that you didn't have to change more. :-) Me too. I'll fix. > > @@ -374,7 +380,8 @@ > > if (tag == I386_BUS_SPACE_IO) > > #endif > > { > > - int _port_ = bsh + offset; \ > > + int _port_ = bsh + offset; > > OK to fix all of these :-). Cool! > > Index: i386/include/pcpu.h > > =================================================================== > > RCS file: /home/ncvs/src/sys/i386/include/pcpu.h,v > > retrieving revision 1.32 > > diff -u -d -r1.32 pcpu.h > > --- i386/include/pcpu.h 11 Dec 2001 23:33:40 -0000 1.32 > > +++ i386/include/pcpu.h 28 Feb 2002 10:44:43 -0000 > > @@ -32,8 +32,22 @@ > > #ifdef _KERNEL > > > > #ifndef __GNUC__ > > -#error gcc is required to use this file > > -#endif > > + > > +#ifndef lint > > +#error gcc or lint is required to use this file > > +#else /* lint */ > > +#define __PCPU_PTR(name) > > +#define __PCPU_GET(name) > > +#define __PCPU_SET(name, val) > > I can't think of any good way to handle this. OK for the above as a stopgap? > > Index: sys/cdefs.h > > =================================================================== > > RCS file: /home/ncvs/src/sys/sys/cdefs.h,v > > retrieving revision 1.49 > > diff -u -d -r1.49 cdefs.h > > --- sys/cdefs.h 4 Dec 2001 01:29:54 -0000 1.49 > > +++ sys/cdefs.h 19 Feb 2002 15:32:10 -0000 > > @@ -112,6 +112,7 @@ > > * properly (old versions of gcc-2 supported the dead and pure features > > * in a different (wrong) way). > > */ > > +#ifdef __GNUC__ > > #if __GNUC__ < 2 || __GNUC__ == 2 && __GNUC_MINOR__ < 5 > > #define __dead2 > > #define __pure2 > > Bogus. If __GNUC__ is not defined, then it is less than 2. > > > @@ -176,7 +177,6 @@ > > #define __printf0like(fmtarg, firstvararg) > > #endif > > > > -#ifdef __GNUC__ > > #define __strong_reference(sym,aliassym) \ > > extern __typeof (sym) aliassym __attribute__ ((__alias__ (#sym))); > > #ifdef __ELF__ > > This gcc ifdef is unrelated to the one above. It should have its own > version checks. I think __alias__ is a syntax error except for relative > recent versions of gcc. The above is a move of the "#ifdef __GNUC__" to a few lines above where it was to enclose more GNU C specific code. > > @@ -244,7 +244,7 @@ > > #if !defined(lint) && !defined(STRIP_FBSDID) > > #define __FBSDID(s) __IDSTRING(__CONCAT(__rcsid_,__LINE__),s) > > #else > > -#define __FBSDID(s) struct __hack > > +#define __FBSDID(s) > > #endif > > #endif > > This breaks enforcement of a semicolon after __FBSDID(). But it (and friends) cause multiple whinings about multiple declarations of 'struct __hack'. I can work around that, though. > > Index: sys/eventhandler.h > > =================================================================== > > RCS file: /home/ncvs/src/sys/sys/eventhandler.h,v > > retrieving revision 1.17 > > diff -u -d -r1.17 eventhandler.h > > --- sys/eventhandler.h 12 Sep 2001 08:38:05 -0000 1.17 > > +++ sys/eventhandler.h 19 Feb 2002 22:17:52 -0000 > > @@ -75,31 +75,33 @@ > > struct eventhandler_entry ee; \ > > type eh_func; \ > > }; \ > > -struct __hack > > +struct __hack_ ## name > > The same incomplete struct should be used for this everywhere. rm-rf any > lint that doesn't like this. Point taken :-). Fixed. > > +#define EVENTHANDLER_FAST_INVOKE(name, args...) \ > > +do { \ > > + struct eventhandler_list *_el = &Xeventhandler_list_ ## name ; \ > > + struct eventhandler_entry *_ep, *_en; \ > > + \ > > + if (_el->el_flags & EHE_INITTED) { \ > > + lockmgr(&_el->el_lock, LK_EXCLUSIVE, NULL, curthread); \ > > + _ep = TAILQ_FIRST(&(_el->el_entries)); \ > > + while (_ep != NULL) { \ > > + _en = TAILQ_NEXT(_ep, ee_link); \ > > + ((struct eventhandler_entry_ ## name *)_ep)->eh_func( \ > > + _ep->ee_arg , \ > > + ## args \ > > + ); \ > > + _ep = _en; \ > > + } \ > > + lockmgr(&_el->el_lock, LK_RELEASE, NULL, curthread); \ > > + } \ > > } while (0) > > This is almost readable now. It still has 4-char indents. I've fixed that file using tabs (+ 4-space continuation indents). So OK so far? I'll post again a commit candidate before I commit this. M -- o Mark Murray \_ O.\_ Warning: this .sig is umop ap!sdn To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Mar 1 12:36:53 2002 Delivered-To: freebsd-arch@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id C8BEF37B402 for ; Fri, 1 Mar 2002 12:36:49 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g21KaIs46295; Fri, 1 Mar 2002 12:36:18 -0800 (PST) (envelope-from dillon) Date: Fri, 1 Mar 2002 12:36:18 -0800 (PST) From: Matthew Dillon Message-Id: <200203012036.g21KaIs46295@apollo.backplane.com> To: Julian Elischer Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator update References: Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :On Wed, 27 Feb 2002, Jeff Roberson wrote: : :> :> 1) Fix the last lock order reversal. :> 2) Fixup the statistics for uma and malloc. :> 3) Convince people to test it. :> 4) Commit. : :Please try integrate it in such a form that both new and old can be :compiled in with a config option. :(for a while) : :> 5) Work on converting everything to uma_* interfaces, and adding :> initializers. : :Do lots of testingto prove that it's an improvement. I think it only needs to have 'similar' performance to be an improvement, since the eventual goal is to collapse the kernel malloc and zalloc subsytems into one. Right now we have rather serious issues with KVM exhaustion. The fact that the existing kernel malloc uses kmem_map and zalloc uses kernel_map for expansion, and that none of the memory is ever returned, is one of the primary culprits. I would happy if that mess were consolidated into one universal allocation mechanism capable of returning memory to the system even if it meant a slight loss in performance. I'm not sure I agree with an integration that tries to keep the old mechanisms alive. If it's easy to do, then sure. But otherwise we should just grin and bear it. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Mar 1 12:48:57 2002 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by hub.freebsd.org (Postfix) with ESMTP id 4539937B400 for ; Fri, 1 Mar 2002 12:48:51 -0800 (PST) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.2/8.12.2) with ESMTP id g21KmNLv016961; Fri, 1 Mar 2002 21:48:23 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Matthew Dillon Cc: Julian Elischer , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator update In-Reply-To: Your message of "Fri, 01 Mar 2002 12:36:18 PST." <200203012036.g21KaIs46295@apollo.backplane.com> Date: Fri, 01 Mar 2002 21:48:23 +0100 Message-ID: <16960.1015015703@critter.freebsd.dk> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message <200203012036.g21KaIs46295@apollo.backplane.com>, Matthew Dillon wri tes: > >:On Wed, 27 Feb 2002, Jeff Roberson wrote: >: >:> >:> 1) Fix the last lock order reversal. >:> 2) Fixup the statistics for uma and malloc. >:> 3) Convince people to test it. >:> 4) Commit. >: >:Please try integrate it in such a form that both new and old can be >:compiled in with a config option. >:(for a while) >: >:> 5) Work on converting everything to uma_* interfaces, and adding >:> initializers. >: >:Do lots of testingto prove that it's an improvement. > > I think it only needs to have 'similar' performance to be an > improvement, since the eventual goal is to collapse the > kernel malloc and zalloc subsytems into one. I ran an earlier version of JeffR's patches on my testbox for a couple of weeks and saw a pretty consistent speedup in the order of single digit percentages on most of what I timed. I don't know if the current patch set is significantly changed in any important aspect since then. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Mar 1 13:20:53 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rwcrmhc52.attbi.com (rwcrmhc52.attbi.com [216.148.227.88]) by hub.freebsd.org (Postfix) with ESMTP id D955537B4A1 for ; Fri, 1 Mar 2002 13:20:13 -0800 (PST) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc52.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020301212013.OZPJ1147.rwcrmhc52.attbi.com@InterJet.elischer.org>; Fri, 1 Mar 2002 21:20:13 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id NAA12950; Fri, 1 Mar 2002 13:15:31 -0800 (PST) Date: Fri, 1 Mar 2002 13:15:29 -0800 (PST) From: Julian Elischer To: Matthew Dillon Cc: Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator update In-Reply-To: <200203012036.g21KaIs46295@apollo.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, 1 Mar 2002, Matthew Dillon wrote: > > :On Wed, 27 Feb 2002, Jeff Roberson wrote: > : > :> > :> 1) Fix the last lock order reversal. > :> 2) Fixup the statistics for uma and malloc. > :> 3) Convince people to test it. > :> 4) Commit. > : > :Please try integrate it in such a form that both new and old can be > :compiled in with a config option. > :(for a while) > : > :> 5) Work on converting everything to uma_* interfaces, and adding > :> initializers. > : > :Do lots of testingto prove that it's an improvement. > > I think it only needs to have 'similar' performance to be an > improvement, since the eventual goal is to collapse the > kernel malloc and zalloc subsytems into one. > > Right now we have rather serious issues with KVM exhaustion. The > fact that the existing kernel malloc uses kmem_map and zalloc uses > kernel_map for expansion, and that none of the memory is ever returned, > is one of the primary culprits. I would happy if that mess were > consolidated into one universal allocation mechanism capable of > returning memory to the system even if it meant a slight loss in > performance. > > I'm not sure I agree with an integration that tries to keep the > old mechanisms alive. If it's easy to do, then sure. But otherwise > we should just grin and bear it. it should be possible to make a MALLOC wrapper for uma "Universal memory allocator?" sounds a bit ambitious :-) > > -Matt > Matthew Dillon > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Mar 1 13:25:47 2002 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by hub.freebsd.org (Postfix) with ESMTP id AF80837B41B for ; Fri, 1 Mar 2002 13:25:42 -0800 (PST) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.2/8.12.2) with ESMTP id g21LPQLv017619; Fri, 1 Mar 2002 22:25:26 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Julian Elischer Cc: Matthew Dillon , Jeff Roberson , arch@FreeBSD.ORG Subject: Re: Slab allocator update In-Reply-To: Your message of "Fri, 01 Mar 2002 13:15:29 PST." Date: Fri, 01 Mar 2002 22:25:26 +0100 Message-ID: <17618.1015017926@critter.freebsd.dk> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message , Ju lian Elischer writes: >it should be possible to make a MALLOC wrapper for uma >"Universal memory allocator?" sounds a bit ambitious :-) Jeff already did. He has a bunch of zones named "16", "32", "64", ... "2048" which do the obvious thing. neat trick :-) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Mar 1 15:13:27 2002 Delivered-To: freebsd-arch@freebsd.org Received: from nimbus.nj.caldera.com (nimbus.nj.caldera.com [132.147.103.56]) by hub.freebsd.org (Postfix) with ESMTP id 5CD8937B417 for ; Fri, 1 Mar 2002 15:13:19 -0800 (PST) Received: from caldera.com (bird [132.147.135.198]) by nimbus.nj.caldera.com (8.10.1/UW7.1.1-NSCd) with ESMTP id g21NB8I22839; Fri, 1 Mar 2002 18:13:24 -0500 (EST) Message-ID: <3C800A80.96CEA9D2@caldera.com> Date: Fri, 01 Mar 2002 18:10:56 -0500 From: Sergey Babkin Organization: Caldera International, Inc. X-Mailer: Mozilla 4.7 [en] (X11; U; FreeBSD 4.0-19990626-CURRENT i386) X-Accept-Language: ru, en MIME-Version: 1.0 To: arch@freebsd.org, chawla@caldera.com Subject: proposition for new socket syscalls {send,recv}fromto Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hi all, In case if anyone wonders, it's still me but from my work e-mail address. The story is that we (Caldera) are considering a possibility of adding a couple more of system calls to make the coexistance of applications with high-availability clusters a bit easier. If this happens, it would be good to make these syscalls not limited to OpenUnix (former UnixWare) and OpenLinux but portable among the Unix systems, BSD included. Personally I believe that BSD would benefit from these syscalls as well. The situation we are trying to solve is: In the high-availability clusters it's convenient and typical to assign an IP address to a logical server (or service). This logical server may be moved between the physical hosts as neccessary (for example, if a physical host fails or needs to be shut down for maintenance). So this addres gets added to an interface of the current physical host as an alias. Here comes the bad part: this alias happens to be on the same subnet as the primary address of this interface, and this may cause a confusion about the source address of the packets coming out of this host. Yes, I know that this situation is from the area of "you are not supposed to do this" but the reason seems quite compelling. It's no big deal for the TCP connections coming to this host: when accept() is done, the local side gets whatever address was specified in the SYN packet, same as for the multi-homed hosts, and things work fine. But for the UDP servers (for example, tftp or BIND) there is an issue: The UDP server sockets normally have INADDR_ANY as their local address. As an outgoing packet with INADDR_ANY in its source address goes down through the IP layer in ip_output() it notices that and fills in the source address with the address of the interface though which the packet is going to be sent. Obviously if the machine has two addresses from the same subnet then the address found first fill be always used. And here comes the problem: this address may not be the same to which the client has sent its request. For example, let's suppose that the server has the addresses 192.168.1.1 (the physical host's addesss) and 192.168.1.3 (the cluster's logical host address) on the same interface. The client has the address 192.168.1.100 and send a request to the server at 192.168.1.3. The server handles the requests and sends back the reply, but since its source address is filled in as described above, to che client this reply appears as coming from 192.168.1.1, so the client happily discards it and continues waiting. The fix in short: the server should do a bind() to the right address before doing the reply. However in practice this code gets much more compilcated and ugly, as will be discussed further. The other situations in which the same problem occurs: One is a service on a multihomed host. Suppose that a host has two interfaces, 192.168.1.1 and 172.16.2.2 with some UDP server running on it, bound to a socket with local address INADDR_ANY. Some client with address 192.168.1.100 sends an UDP request to the server at address 172.16.2.2. The server receives the request and sends a reply back. However it happens that the reply packet is routed through the interface 192.168.1.1 and has its address filled in as such, so again the client won't recognise the reply. Another one is the Netware emulator. About 6 years ago I've tried to port the Netwre emulator from Linux to FreeBSD. However this emulator sends all the IPX packets from the specific source address, so I've tried to do bind() and such but did not get it quite right and failed. The Linux implementation of IPX works around it by sending the whole packet header with the source and destination addresses in the body of the packet, which is ugly. The details of doing the bind(): To reply to some UDP packet destined to some specific address, the destination address of this packet must be extracted and then used as the source address for sending the reply packet. This looks as follows: First, do setsockopt(sockfd, ..., IP_RECVDSTADDR, ....) to enable extraction of the destination address. Then receive the packets with recvmsg() and the control buffer pointed to by msg_control of struct msghdr will (possibly along with the other options) contain the destination address of the packet received. This option can be identified by its header (struct cmsghdr) by the fields cmsg_level==IPPROTO_IP, cmsg_name==IP_RECVDSTADDR. It should be noted that struct cmsghdr is not portable. OpenUnix calls the logically same structure "struct opthdr" and has slightly different field names. So the only portable way is to ignore the header structure and handle the options in raw byte format or define your own similar structure. Then this address can be used to do bind() before sending the reply. However here we have a bad problem: you can't just do a bind() on the socket where you are listening for incoming datagrams. If you do so, the datagrams coming to this port but other addresses of this host will be thrown away. So what you need to do is to create a new socket, set the option SO_REUSEADDR on it, bind it to the specific address and then send the datagram from it. Obviously it's a lot of overhead, plus here comes another catch: after you do so you can't just close this another socket since by this time it may have gotten some incoming datagrams queued to it. So what you have to do is to keep a cache of sockets with various addresses bound to them, do select() on all of them before doing recvmsg(), and when sending an answer reusing the socket with the right address from the cache (or if there is no socket with this address cached yet, creating a new one and adding it to the cache). All this is real, real ugly. How can we fix this situation ? Everything would become a lot simpler if we have the calls: ssize_t recvfromto(int s, void *buf, size_t len, int flags, struct sockaddr *from, int *fromlen, struct sockaddr *to, int *tolen) This call would receive a datagram and fill both its source address (from) and its destination address (to) into the buffers. ssize_t sendfromto(int s, void *buf, size_t len, int flags, const struct sockaddr *from, int fromlen, const struct sockaddr *to, int tolen) This call would send a datagram from the specified address to the specified address without any need to do an extra bind(). Of course, just as when doing bind() this call shoud check that the "from" address actually belongs to some local interface. With these syscalls added the modifications to the servers become easy and obvious. -SB P.S. I'm going on a trip next week, and will be back only on about March 14th, I won't be reading and answering much of e-mail in the meantime To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Mar 1 18:45:11 2002 Delivered-To: freebsd-arch@freebsd.org Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by hub.freebsd.org (Postfix) with SMTP id 18EBF37B402 for ; Fri, 1 Mar 2002 18:45:07 -0800 (PST) Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 2 Mar 2002 02:45:06 +0000 (GMT) To: Sergey Babkin Cc: arch@freebsd.org, chawla@caldera.com Subject: Re: proposition for new socket syscalls {send,recv}fromto In-Reply-To: Your message of "Fri, 01 Mar 2002 18:10:56 EST." <3C800A80.96CEA9D2@caldera.com> Date: Sat, 02 Mar 2002 02:45:05 +0000 From: Ian Dowse Message-ID: <200203020245.aa34899@salmon.maths.tcd.ie> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message <3C800A80.96CEA9D2@caldera.com>, Sergey Babkin writes: > >The fix in short: the server should do a bind() to the right address >before doing the reply. However in practice this code gets much more >compilcated and ugly, as will be discussed further. Linux has an IP_PKTINFO socket option and IP_PKTINFO control message that (I think) allows you record the destination IP on incoming datagrams and set the source address on outgoing ones. A quick, minimally tested sample program which uses it is at: http://www.maths.tcd.ie/~iedowse/FreeBSD/pktinfo.c In FreeBSD, we only seem to have this capability for IPv6, using the IPV6_PKTINFO option and control messages. Implementations of either an IP_PKTINFO or an IP_SENDSRCADDR control message have been discussed on freebsd-net a few times, but nothing has been committed yet. For systems that have such a mechanism, the proposed syscalls could just be implemented as library functions instead. Ian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Mar 1 19:18:56 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by hub.freebsd.org (Postfix) with ESMTP id 0705A37B405 for ; Fri, 1 Mar 2002 19:18:53 -0800 (PST) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g223IfC98961; Fri, 1 Mar 2002 22:18:41 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Fri, 1 Mar 2002 22:18:40 -0500 (EST) From: Jeff Roberson To: Poul-Henning Kamp Cc: Matthew Dillon , Julian Elischer , Subject: Re: Slab allocator update In-Reply-To: <16960.1015015703@critter.freebsd.dk> Message-ID: <20020301214243.B43446-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, 1 Mar 2002, Poul-Henning Kamp wrote: > > I ran an earlier version of JeffR's patches on my testbox for > a couple of weeks and saw a pretty consistent speedup in the > order of single digit percentages on most of what I timed. > > I don't know if the current patch set is significantly changed > in any important aspect since then. I lost a lot of my performance gains when I replaced malloc. malloc(9) is extremely fast. I finally cought up to it again with the per cpu queues, but there is still extra overhead. There is an extra function call, and then in the free path I have to do a hash lookup on the page. This is because I don't have the freed size when the data is returned, and I need to find out what zone it came from. The hash table is slightly smaller than the original kmemusage array and should have minimum collisions. This organization also eliminates some potential space savings on large objects. The original solaris allocator could force allocations to waste less than 8% of memory. They acomplish this by allocating several contiguous pages and then cutting them into pieces. Since I have to do a hash look up on the page that a particular allocation came from it may not be the starting page for that slab. This would force me to insert a slab into the hash table for every page it owns, and I'd have to provide extra linkage information for this. The other alternative is to use resource tags for every malloc allocation. It would probably be a pointer to the slab that lived just before the allocated data. This would give much higher overhead for small allocations, but save space on large allocations. I intend to gather statistics on which sizes are used the most frequently so I can pick the most appropriately sized malloc zones, and when I gather this data I'll know whether it's a win to lose space on small allocations to save it on the really large ones. I have been considering further unionizing the slab structure to make it suitable for zone style allocations as well as malloc style allocations. Malloc style allocations don't need the type stable storage overhead. The links can be stored directly in the memory. This would yield further space optimizations, and speed optimizations. I could share the front end per cpu cache code and the zone configuration code, but I'd have to provide another slab_zalloc for malloc that setup linkage correctly. Anyway, there is still a lot of work that can be done to improve this. Jeff > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by incompetence. > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Mar 1 19:55:28 2002 Delivered-To: freebsd-arch@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id EC8BA37B416 for ; Fri, 1 Mar 2002 19:55:25 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g223tOn49150; Fri, 1 Mar 2002 19:55:24 -0800 (PST) (envelope-from dillon) Date: Fri, 1 Mar 2002 19:55:24 -0800 (PST) From: Matthew Dillon Message-Id: <200203020355.g223tOn49150@apollo.backplane.com> To: Jeff Roberson Cc: Poul-Henning Kamp , Julian Elischer , Subject: Re: Slab allocator update References: <20020301214243.B43446-100000@mail.chesapeake.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG : :I lost a lot of my performance gains when I replaced malloc. malloc(9) is :extremely fast. I finally cought up to it again with the per cpu queues, :but there is still extra overhead. There is an extra function call, and :then in the free path I have to do a hash lookup on the page. This is :because I don't have the freed size when the data is returned, and I need :to find out what zone it came from. The hash table is slightly smaller :than the original kmemusage array and should have minimum collisions. I've found that free(ptr) can usually be turned into blahblah_free(ptr, bytes) almost universally in programs, and the kernel MALLOC is no exception. The size is known at free time in virtually all cases, even for strings. But that would mean a change in the API. It might be beneficial to introduce two versions of your free function, one which does not require the size and another, faster version which does, and then slowly adjust the kernel to use the new function as well as add sanity checks for INARIANTS to ensure we don't accidently leak or corrupt memory by specifying the wrong size. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Mar 1 20:26:57 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by hub.freebsd.org (Postfix) with ESMTP id 0513737B429 for ; Fri, 1 Mar 2002 20:26:43 -0800 (PST) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g224Qc814304; Fri, 1 Mar 2002 23:26:38 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Fri, 1 Mar 2002 23:26:37 -0500 (EST) From: Jeff Roberson To: Matthew Dillon Cc: Poul-Henning Kamp , Julian Elischer , Subject: Re: Slab allocator update In-Reply-To: <200203020355.g223tOn49150@apollo.backplane.com> Message-ID: <20020301232155.N43446-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, 1 Mar 2002, Matthew Dillon wrote: > > It might be beneficial to introduce two versions of your free function, > one which does not require the size and another, faster version which > does, and then slowly adjust the kernel to use the new function as well > as add sanity checks for INARIANTS to ensure we don't accidently leak > or corrupt memory by specifying the wrong size. > > -Matt > I would love to do this, but I thought I'd encounter too much friction. Implementing this would remove the special cases for malloc, and I could implement more memory effecient slab formats. Specifying the wrong size would cause you to try to free to the wrong zone, which uma can catch easily. Assuming everyone agrees with this, I'll do it after the initial version is checked in and tested. Thanks, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Mar 2 0:17:35 2002 Delivered-To: freebsd-arch@freebsd.org Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by hub.freebsd.org (Postfix) with ESMTP id 886F537B405 for ; Sat, 2 Mar 2002 00:17:28 -0800 (PST) Received: by elvis.mu.org (Postfix, from userid 1192) id 641E4AE2C3; Sat, 2 Mar 2002 00:17:28 -0800 (PST) Date: Sat, 2 Mar 2002 00:17:28 -0800 From: Alfred Perlstein To: Jeff Roberson Cc: Matthew Dillon , Poul-Henning Kamp , Julian Elischer , arch@FreeBSD.ORG Subject: Re: Slab allocator update Message-ID: <20020302081728.GR77980@elvis.mu.org> References: <200203020355.g223tOn49150@apollo.backplane.com> <20020301232155.N43446-100000@mail.chesapeake.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020301232155.N43446-100000@mail.chesapeake.net> User-Agent: Mutt/1.3.27i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * Jeff Roberson [020301 20:27] wrote: > On Fri, 1 Mar 2002, Matthew Dillon wrote: > > > > > It might be beneficial to introduce two versions of your free function, > > one which does not require the size and another, faster version which > > does, and then slowly adjust the kernel to use the new function as well > > as add sanity checks for INARIANTS to ensure we don't accidently leak > > or corrupt memory by specifying the wrong size. > > > > -Matt > > > I would love to do this, but I thought I'd encounter too much friction. > Implementing this would remove the special cases for malloc, and I could > implement more memory effecient slab formats. Specifying the wrong size > would cause you to try to free to the wrong zone, which uma can catch > easily. > > Assuming everyone agrees with this, I'll do it after the initial version > is checked in and tested. What the hell? Why isn't there a pointer per page or PAGE_SIZE*N that points to the bucket? This is a simple space/speed tradeoff that we want to make for speed. With that in place all you need to do is round down the Free'd location to the nearest page_size and do a lookup into the perfect hash for that page. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' Tax deductible donations for FreeBSD: http://www.freebsdfoundation.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Mar 2 3: 7:32 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by hub.freebsd.org (Postfix) with ESMTP id 85CCB37B400 for ; Sat, 2 Mar 2002 03:07:28 -0800 (PST) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id g22B7KO79716; Sat, 2 Mar 2002 06:07:21 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Sat, 2 Mar 2002 06:07:20 -0500 (EST) From: Jeff Roberson To: Alfred Perlstein Cc: Matthew Dillon , Poul-Henning Kamp , Julian Elischer , Subject: Re: Slab allocator update In-Reply-To: <20020302081728.GR77980@elvis.mu.org> Message-ID: <20020302055809.B43446-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > What the hell? > > Why isn't there a pointer per page or PAGE_SIZE*N that points to > the bucket? This is a simple space/speed tradeoff that we want > to make for speed. With that in place all you need to do is > round down the Free'd location to the nearest page_size and > do a lookup into the perfect hash for that page. > There is right now, that's the mallochash. It works as long as the base page for each item in the slab is the same. That is to say, if you have a slab with multiple pages it can only hold one item. This is because you may have collisions, in which case you need to have a link. There is only space set aside in each slab for one link. If you want to completely avoid collisions you define a fixed bucket for every possible page. That is how the current malloc implementation works. That's really not a problem when you have a fixed size space for the heap (ie kmem_map). But really the goal is to manage the kernel address space in one map. If you want to hold a pointer to every possible page in a 1 gigabyte address space, assuming 4kb pages, you're using 1MB for this map. I guess we could do it, and sparsely fill this array. That seems like a somewhat reasonable solution as well, although more costly in terms of memory usage than providing the size on free. Comments? Jeff > -- > -Alfred Perlstein [alfred@freebsd.org] > 'Instead of asking why a piece of software is using "1970s technology," > start asking why software is ignoring 30 years of accumulated wisdom.' > Tax deductible donations for FreeBSD: http://www.freebsdfoundation.org/ > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Mar 2 3:35: 3 2002 Delivered-To: freebsd-arch@freebsd.org Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by hub.freebsd.org (Postfix) with ESMTP id B3BAE37B416 for ; Sat, 2 Mar 2002 03:34:59 -0800 (PST) Received: by elvis.mu.org (Postfix, from userid 1192) id 7C2E9AE2A2; Sat, 2 Mar 2002 03:34:59 -0800 (PST) Date: Sat, 2 Mar 2002 03:34:59 -0800 From: Alfred Perlstein To: Jeff Roberson Cc: Matthew Dillon , Poul-Henning Kamp , Julian Elischer , arch@FreeBSD.ORG Subject: Re: Slab allocator update Message-ID: <20020302113459.GU77980@elvis.mu.org> References: <20020302081728.GR77980@elvis.mu.org> <20020302055809.B43446-100000@mail.chesapeake.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020302055809.B43446-100000@mail.chesapeake.net> User-Agent: Mutt/1.3.27i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG * Jeff Roberson [020302 03:07] wrote: > > > What the hell? > > > > Why isn't there a pointer per page or PAGE_SIZE*N that points to > > the bucket? This is a simple space/speed tradeoff that we want > > to make for speed. With that in place all you need to do is > > round down the Free'd location to the nearest page_size and > > do a lookup into the perfect hash for that page. > > > > There is right now, that's the mallochash. It works as long as the base > page for each item in the slab is the same. That is to say, if you have a > slab with multiple pages it can only hold one item. This is because you > may have collisions, in which case you need to have a link. There is > only space set aside in each slab for one link. If you want to completely > avoid collisions you define a fixed bucket for every possible page. That > is how the current malloc implementation works. That's really not a > problem when you have a fixed size space for the heap (ie kmem_map). But > really the goal is to manage the kernel address space in one map. If you > want to hold a pointer to every possible page in a 1 gigabyte address > space, assuming 4kb pages, you're using 1MB for this map. I guess we > could do it, and sparsely fill this array. That seems like a somewhat > reasonable solution as well, although more costly in terms of memory usage > than providing the size on free. > > Comments? As I said this is the right thing to do, make the tradeoff for speed. using .1% of the system memory in order to effeciently manage the pool seems like a worthwhile tradeoff. You could halve this requirement by doing roundoffs to 2xPAGE_SIZE and half it again by making it a 16 bit integer pointing into an indirect array, but that's over optimizing for space imo. I think that the overhead and inconvience to store the size of the allocations may be too much for us to deal with. Anyhow, you said you had some performance issues, using the simple hash will hopefully make the code smaller and more simple thereby speeding it up some. Why not try that and see if you get better numbers. All the enhanced features are nice, but it'd be a lot cooler if we could get some added performance out of this as well. As a side note, have you considered (or does it allow this already?) the ability to have a slab that is a multiple of PAGE_SIZE in order to more effeciently pack structures? Lastly it might make sense to have a double map, so you have an array of pointers to pages that contain pointers to your slab meta-data, then you only need to allocate another page for this when you grow the arena, this may cause too much complication though, but it may offer an improvement over hash chaining. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' Tax deductible donations for FreeBSD: http://www.freebsdfoundation.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Mar 2 10:15:11 2002 Delivered-To: freebsd-arch@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 6103E37B400 for ; Sat, 2 Mar 2002 10:15:04 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id g22IF0e55311; Sat, 2 Mar 2002 10:15:00 -0800 (PST) (envelope-from dillon) Date: Sat, 2 Mar 2002 10:15:00 -0800 (PST) From: Matthew Dillon Message-Id: <200203021815.g22IF0e55311@apollo.backplane.com> To: Alfred Perlstein Cc: Jeff Roberson , Poul-Henning Kamp , Julian Elischer , arch@FreeBSD.ORG Subject: Re: Slab allocator update References: <20020302081728.GR77980@elvis.mu.org> <20020302055809.B43446-100000@mail.chesapeake.net> <20020302113459.GU77980@elvis.mu.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :As I said this is the right thing to do, make the tradeoff for speed. :using .1% of the system memory in order to effeciently manage the pool :seems like a worthwhile tradeoff. : :You could halve this requirement by doing roundoffs to 2xPAGE_SIZE :and half it again by making it a 16 bit integer pointing into an :indirect array, but that's over optimizing for space imo. : :I think that the overhead and inconvience to store the size of the :allocations may be too much for us to deal with. I have to disagree here. I have a lot of experience converting malloc()/free() based systems to other types of memory allocators where the 'free' requires a size. It's utterly trivial. The size is known trivially in 99% of the cases. The vast majority of malloc()/free()'s in the kernel that could be said to require performance are malloc()'s and free()'s of structures, for which the size is known. :Anyhow, you said you had some performance issues, using the simple :hash will hopefully make the code smaller and more simple thereby :speeding it up some. Hash tables are reasonable solutions but they have downsides too. The biggest one is L1 cache pollution since you are essentially calculating a pseudo-random index. The other is storage. It is well worth it if one can avoid the storage requirement. :Lastly it might make sense to have a double map, so you have an array :of pointers to pages that contain pointers to your slab meta-data, :then you only need to allocate another page for this when you grow :the arena, this may cause too much complication though, but it may :offer an improvement over hash chaining. : :-- :-Alfred Perlstein [alfred@freebsd.org] Careful, all these features are going to increase the per-allocation overhead by a lot more then just a few bytes! Yuch! -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Mar 2 13: 5:24 2002 Delivered-To: freebsd-arch@freebsd.org Received: from dragon.nuxi.com (trang.nuxi.com [66.92.13.169]) by hub.freebsd.org (Postfix) with ESMTP id 4D41B37B400 for ; Sat, 2 Mar 2002 13:05:21 -0800 (PST) Received: (from obrien@localhost) by dragon.nuxi.com (8.11.6/8.11.1) id g22L43e58604; Sat, 2 Mar 2002 13:04:03 -0800 (PST) (envelope-from obrien) Date: Sat, 2 Mar 2002 12:59:58 -0800 From: "David O'Brien" To: Bruce Evans Cc: arch@FreeBSD.ORG Subject: Re: Warning and lint(1) fixes. Review please. Message-ID: <20020302125958.B58520@dragon.nuxi.com> Reply-To: obrien@FreeBSD.ORG References: <200202281836.g1SIaog4051908@grimreaper.grondar.org> <20020302060943.U58081-100000@gamplex.bde.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20020302060943.U58081-100000@gamplex.bde.org>; from bde@zeta.org.au on Sat, Mar 02, 2002 at 06:36:13AM +1100 X-Operating-System: FreeBSD 5.0-CURRENT Organization: The NUXI BSD group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sat, Mar 02, 2002 at 06:36:13AM +1100, Bruce Evans wrote: > > Index: i386/include/pcpu.h > > =================================================================== > > RCS file: /home/ncvs/src/sys/i386/include/pcpu.h,v > > retrieving revision 1.32 > > diff -u -d -r1.32 pcpu.h > > --- i386/include/pcpu.h 11 Dec 2001 23:33:40 -0000 1.32 > > +++ i386/include/pcpu.h 28 Feb 2002 10:44:43 -0000 > > @@ -32,8 +32,22 @@ > > #ifdef _KERNEL > > > > #ifndef __GNUC__ > > -#error gcc is required to use this file > > -#endif > > + > > +#ifndef lint > > +#error gcc or lint is required to use this file > > +#else /* lint */ > > +#define __PCPU_PTR(name) > > +#define __PCPU_GET(name) > > +#define __PCPU_SET(name, val) > > I can't think of any good way to handle this. remove the #ifndef lint wrapping and teach lint to ignore #error. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Mar 2 23: 7:43 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id B4C7037B41A; Sat, 2 Mar 2002 23:07:39 -0800 (PST) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id SAA04583; Sun, 3 Mar 2002 18:07:37 +1100 Date: Sun, 3 Mar 2002 18:08:19 +1100 (EST) From: Bruce Evans X-X-Sender: To: "David O'Brien" Cc: Subject: Re: Warning and lint(1) fixes. Review please. In-Reply-To: <20020302125958.B58520@dragon.nuxi.com> Message-ID: <20020303180145.T64083-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sat, 2 Mar 2002, David O'Brien wrote: > On Sat, Mar 02, 2002 at 06:36:13AM +1100, Bruce Evans wrote: > > > Index: i386/include/pcpu.h > > > =================================================================== > > > RCS file: /home/ncvs/src/sys/i386/include/pcpu.h,v > > > retrieving revision 1.32 > > > diff -u -d -r1.32 pcpu.h > > > --- i386/include/pcpu.h 11 Dec 2001 23:33:40 -0000 1.32 > > > +++ i386/include/pcpu.h 28 Feb 2002 10:44:43 -0000 > > > @@ -32,8 +32,22 @@ > > > #ifdef _KERNEL > > > > > > #ifndef __GNUC__ > > > -#error gcc is required to use this file > > > -#endif > > > + > > > +#ifndef lint > > > +#error gcc or lint is required to use this file > > > +#else /* lint */ > > > +#define __PCPU_PTR(name) > > > +#define __PCPU_GET(name) > > > +#define __PCPU_SET(name, val) > > > > I can't think of any good way to handle this. > > remove the #ifndef lint wrapping and teach lint to ignore #error. I mean for the whole file. We need to mess it up by supplying lots of dummy macros for the lint case, so that lint doesn't find errors in everything that uses the macros. But this weakens lint's checking significantly (much more than for dummy functions to replace inline ones). Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message