From owner-freebsd-arch@FreeBSD.ORG  Mon May 21 14:38:52 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@freebsd.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 5627916A400;
	Mon, 21 May 2007 14:38:52 +0000 (UTC)
	(envelope-from bde@optusnet.com.au)
Received: from mail16.syd.optusnet.com.au (mail16.syd.optusnet.com.au
	[211.29.132.197])
	by mx1.freebsd.org (Postfix) with ESMTP id E838A13C483;
	Mon, 21 May 2007 14:38:51 +0000 (UTC)
	(envelope-from bde@optusnet.com.au)
Received: from c211-30-216-190.carlnfd3.nsw.optusnet.com.au
	(c211-30-216-190.carlnfd3.nsw.optusnet.com.au [211.30.216.190])
	by mail16.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	l4LEcmQq013128
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 22 May 2007 00:38:49 +1000
Date: Tue, 22 May 2007 00:38:49 +1000 (EST)
From: Bruce Evans <bde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Attilio Rao <attilio@freebsd.org>
In-Reply-To: <4651FCB5.7070604@FreeBSD.org>
Message-ID: <20070521225032.C57233@delplex.bde.org>
References: <20070520155103.K632@10.0.0.1>
	<20070521113648.F86217@besplex.bde.org>
	<20070520213132.K632@10.0.0.1> <4651CAB8.8070007@FreeBSD.org>
	<4651CE2F.8080908@FreeBSD.org> <20070521022847.D679@10.0.0.1>
	<20070521195811.G56785@delplex.bde.org> <4651FCB5.7070604@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@freebsd.org
Subject: Re: sched_lock && thread_lock()
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 21 May 2007 14:38:52 -0000

On Mon, 21 May 2007, Attilio Rao wrote:

> Bruce Evans wrote:
>> We already have inconsistent accesses via PCPU_LAZY_INC(), and I think
>> all cases where you don't really care about the values can be handled
>> better by PCPU_LAZY_INC().  (Though I don't like PCPU_LAZY_INC(), since
>> it gives namespace pollution, a larger race window while combining the
>> values, complications, and missing support for combining the values
>> in dead kernels.)  Cases that can be handled by PCPU_LAZY_INC() are
>> distinguished by the variables being monotonically increasing with
>> time (until wrapping at UINT_MAX breaks them) and nothing except
>> userland looking at them.  The userland accesses are full of races,
>> but userland doesn't really care.  If accessing the pcpu is inefficient,
>> then PCPU_LAZY_INC() can always use atomic ops if that is less inefficient.
>
> Yes, but how do you manage to mantain per-cpu datas and not using per-cpu 
> fields? Do you want it fitting a table?
> In this case probabilly you will need an extra-lock for the table that will 
> increases overhead.

Just change the PCPU_LAZY_INC() macro to access the non-per-cpu global,
using the same locking as you use for the non-pcpu access method.  I
do this in some (old) sys trees for the !SMP case.  This gives a small
unpessimization and fixes reading the values from dead kernels, but
only in the !SMP case.  I also remove pc_cnt from the pcpu data in the
!SMP case.  This is a smaller optimization and requires larger changes
to remove adding up the pcpu data in vm_meter.c.  The adding up can
be left unchanged since it becomes a no-op if the data is left at all
zeros.  In the old sys trees, there is no locking for individual accesses,
so PCPU_LAZY_INC(var) becomes (++(var)).  In the current tree,
PCPU_LAZY_INC(var) is (++(<pcu var>)) written in asm for some arches to
ensure that it is atomic with respect to interrupts on these arches.
Atomicness with respect to interrupts is necessary and sufficient for
the non-per-cpu global !SMP case too.

I now seem to remember some more history of this.  A few vmmeter variables,
e.g., v_intr, have no natural locking and were originally locked by
sched_lock or atomic accesses.  People didn't like the slowness from this,
and removed some of the sched_locking and/or switched to PCPU_LAZY_INC().
The remaining sched locking seems to be mostly accidental.

Bugs found while grepping for sched_locking of VMCNT fields:
- v_trap is supposed to be updated by PCPU_LAZY_INC(), but subr_trap.c
   updates it by VMCNT_ADD().  The direct access used to be perfectly unlocked
   and non-atomic, since it occurs immediately after releasing sched_lock
   and just used "++".  Now it is atomic and a micro-pessimization.
- v_trap's name is now hidden by the macro so grepping for it doesn't work.

Non(?)-bugs found while grepping for VMCNT:
- most updates of v_free_count are via pq->lcnt++ and pq->lcnt-- in
   vm_page.c and vm_pageq.  pq->lcnt is VMCNT_PTR(&cnt.vm_free_count)
   for all pq.  The locking for this seems to be correct except of course
   in sysctls, since it uses vm_page_queue_free_mtx and never needed
   VMCNT access methods.  The unneeded VMCNT_GET()s for this alone comprise
   about 1/3 of the VMCNT_GET()s in vm.

Bruce