From owner-freebsd-current@FreeBSD.ORG Sat Jul 29 19:05:58 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1D19216A4DD for ; Sat, 29 Jul 2006 19:05:58 +0000 (UTC) (envelope-from lists@wm-access.no) Received: from lakepoint.domeneshop.no (lakepoint.domeneshop.no [194.63.248.54]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5EB7443D77 for ; Sat, 29 Jul 2006 19:05:56 +0000 (GMT) (envelope-from lists@wm-access.no) Received: from [192.168.5.8] (host-81-191-3-170.bluecom.no [81.191.3.170]) (authenticated bits=0) by lakepoint.domeneshop.no (8.13.6/8.13.6) with ESMTP id k6TJ5sI8018460 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sat, 29 Jul 2006 21:05:55 +0200 Message-ID: <44CBB179.6070904@wm-access.no> Date: Sat, 29 Jul 2006 21:05:29 +0200 From: =?ISO-8859-1?Q?Sten_Daniel_S=F8rsdal?= User-Agent: Thunderbird 1.5.0.5 (Windows/20060719) MIME-Version: 1.0 To: Peter Jeremy References: <200607251254.k6PCsBef092737@lurza.secnetix.de> <200607271058.13055.jhb@freebsd.org> <20060728121525.GA44917@uk.tiscali.com> <200607280928.36573.jhb@freebsd.org> <20060728134701.GA45273@uk.tiscali.com> <20060728210154.GC748@turion.vk2pj.dyndns.org> In-Reply-To: <20060728210154.GC748@turion.vk2pj.dyndns.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-current@freebsd.org Subject: Re: vmstat's entries type X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Jul 2006 19:05:58 -0000 Peter Jeremy wrote: > On Fri, 2006-Jul-28 14:47:01 +0100, Brian Candler wrote: >> On Fri, Jul 28, 2006 at 09:28:36AM -0400, John Baldwin wrote: >>> lock incl counter >>> jnc 1f >>> lock incl counter+4 >>> 1: >=20 > This approach still requires the reader to loop with something like > do { > a.lo =3D counter.lo; > a.hi =3D counter.hi; > b.lo =3D counter.lo; > b.hi =3D counter.hi; > } while (a.hi !=3D b.hi || a.lo > b.lo); > to ensure that the reader doesn't read the middle of an update. >=20 >> The 'polling' argument says just do >> lock incl counter >> and poll all counters every 5 minutes, looking for a wrap. I think tha= t's >> almost certainly going to be cheaper, as long as you can keep track of= where >> all these counters are located. >=20 > lock prefixes are always going to be extremely expensive on a MP > system because they require physical bus cycles. RISC architectures > usually only have TAS lock primitives (because "inc mem" doesn't > exist) and so require a spinlock to perform an atomic update. >=20 > In a MP configuration where it doesn't particularly matter if a > particular update gets counted this time or next time, I think the > cheapest option is to have per-CPU 32-bit counters (so no locks are > needed to update the counters) with a polling function to accumulate > all the individual counters into a 64-bit total. This pushes the cost > from the update (very frequent) into the read (which is relatively > infrequent), for a lower overall cost. >=20 > This turns the update into something like: > PCPU_SET(counter, PCPU_GET(counter)+1); > or > incl %fs:counter > (no locks or atomic operations) >=20 > Whilst the poll/read pseudo code looks something like > lock counter > foreach cpu { > uint32 a =3D cpu->counter; > uint32 b =3D cpu->last_counter; > uint32 c =3D counter.lo; > if (b > a) > counter.hi++; > counter.lo +=3D a - b; > if (counter.lo < c) > counter.hi++; > cpu->last_counter =3D a; > } > unlock counter; > (the lock prevents multiple readers updating counter simultaneously). >=20 > You execute this whenever a reader wants the counter value (eg via > SYSCTL_PROC), as well as a rate sufficient to prevent missing wraps > (eg every 2 seconds for a 10g byte counter). This rate is sufficiently= > lower than the update rate to make the whole exercise worthwhile. >=20 Is caching necessary somewhere or can the function return the value directly without storing the global accumulated counter? ( trying to get an understanding ) --=20 Sten Daniel S=F8rsdal