From owner-freebsd-current@FreeBSD.ORG Mon Jul 31 19:12:59 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E198C16A4E2; Mon, 31 Jul 2006 19:12:59 +0000 (UTC) (envelope-from prvs=julian=360b2d18d@elischer.org) Received: from a50.ironport.com (a50.ironport.com [63.251.108.112]) by mx1.FreeBSD.org (Postfix) with ESMTP id 02FB443D83; Mon, 31 Jul 2006 19:12:52 +0000 (GMT) (envelope-from prvs=julian=360b2d18d@elischer.org) Received: from unknown (HELO [10.251.18.229]) ([10.251.18.229]) by a50.ironport.com with ESMTP; 31 Jul 2006 12:12:52 -0700 Message-ID: <44CE5634.9080205@elischer.org> Date: Mon, 31 Jul 2006 12:12:52 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.13) Gecko/20060414 X-Accept-Language: en-us, en MIME-Version: 1.0 To: John Baldwin References: <200607251254.k6PCsBef092737@lurza.secnetix.de> <20060728134701.GA45273@uk.tiscali.com> <20060728210154.GC748@turion.vk2pj.dyndns.org> <200607311437.57273.jhb@freebsd.org> In-Reply-To: <200607311437.57273.jhb@freebsd.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: Peter Jeremy , freebsd-current@freebsd.org, Brian Candler Subject: Re: vmstat's entries type X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Jul 2006 19:13:00 -0000 John Baldwin wrote: >On Friday 28 July 2006 17:01, Peter Jeremy wrote: > > >>On Fri, 2006-Jul-28 14:47:01 +0100, Brian Candler wrote: >> >> >>>On Fri, Jul 28, 2006 at 09:28:36AM -0400, John Baldwin wrote: >>> >>> >>>> lock incl counter >>>> jnc 1f >>>> lock incl counter+4 >>>>1: >>>> >>>> >>This approach still requires the reader to loop with something like >> do { >> a.lo = counter.lo; >> a.hi = counter.hi; >> b.lo = counter.lo; >> b.hi = counter.hi; >> } while (a.hi != b.hi || a.lo > b.lo); >>to ensure that the reader doesn't read the middle of an update. >> >> > >Yes, but the idea here is that these stats are written to far more often than >read, so it's ok to pessimize the read case. > > > >>lock prefixes are always going to be extremely expensive on a MP >>system because they require physical bus cycles. RISC architectures >>usually only have TAS lock primitives (because "inc mem" doesn't >>exist) and so require a spinlock to perform an atomic update. >> >> > >Nah, you can do it directly using different primitives w/o a full spin lock. >If you had a 32-bit RISC arch that supported linked-load conditional-store >like Alpha/MIPS you could do something like: > >1: > ldl_l t0, (a0) > addi t0, 1 > mov t2, t0 # or add t2, t0, zero > stl_c t0, (a0) > beq t0, 1b > bne t2, 2f # skip next if we didn't wrap >2: > ldl_l t0, 4(a0) > addi t0, 1 > stl_c t0, (a0) > beq t0, 2b > >That doesn't require a full-blow spinlock which disables interrupts, etc. > > > >>In a MP configuration where it doesn't particularly matter if a >>particular update gets counted this time or next time, I think the >>cheapest option is to have per-CPU 32-bit counters (so no locks are >>needed to update the counters) with a polling function to accumulate >>all the individual counters into a 64-bit total. This pushes the cost >>from the update (very frequent) into the read (which is relatively >>infrequent), for a lower overall cost. >> >>This turns the update into something like: >> PCPU_SET(counter, PCPU_GET(counter)+1); >>or >> incl %fs:counter >>(no locks or atomic operations) >> >> > >Note that if you go with per-cpu counters then you don't need 'lock' prefixes >for the 'inc; jnc 1f; inc; 1:' case either. > > but you do want to disable interrupts so that you don't get switched to another cpu in the middle of it. That would result in the possibility of 2 threads writing to the same stats at the same time.