From owner-freebsd-hackers@FreeBSD.ORG  Sun Jul 25 10:07:27 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 320CE1065673
	for <freebsd-hackers@freebsd.org>; Sun, 25 Jul 2010 10:07:27 +0000 (UTC)
	(envelope-from avg@freebsd.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 86A978FC21
	for <freebsd-hackers@freebsd.org>; Sun, 25 Jul 2010 10:07:26 +0000 (UTC)
Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA03028;
	Sun, 25 Jul 2010 13:07:24 +0300 (EEST)
	(envelope-from avg@freebsd.org)
Received: from localhost.topspin.kiev.ua ([127.0.0.1])
	by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1Ocy79-000CxD-Ou; Sun, 25 Jul 2010 13:07:23 +0300
Message-ID: <4C4C0CD9.6000002@freebsd.org>
Date: Sun, 25 Jul 2010 13:07:21 +0300
From: Andriy Gapon <avg@freebsd.org>
User-Agent: Thunderbird 2.0.0.24 (X11/20100603)
MIME-Version: 1.0
To: RW <rwmaillists@googlemail.com>
References: <4C4B4BAB.3000005@freebsd.org>
	<20100725003144.3cfead39@gumby.homeunix.com>
In-Reply-To: <20100725003144.3cfead39@gumby.homeunix.com>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org
Subject: Re: pageout question
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Jul 2010 10:07:27 -0000

on 25/07/2010 02:31 RW said the following:
> On Sat, 24 Jul 2010 23:23:07 +0300
> Andriy Gapon <avg@freebsd.org> wrote:
> 
>> There is a good deal of comments in the vm_pageout.c code that imply
>> that we use a hysteresis approach to deal with low available pages
>> condition.
>>
>>
>> In general, the hysteresis, the comments and the code make sense.
>> My doubt, though, is about the block of code that is right below the
>> comment quoted above:
>> if (vm_pages_needed && !vm_page_count_min()) {
>>         if (!vm_paging_needed())
>>                 vm_pages_needed = 0;
>>         wakeup(&cnt.v_free_count);
>> }
> 
> As I understand it the hysteresis is done inside vm_pageout_scan, and
> the expectation is that one pass will typically satisfy this because the
> design aims to keep enough clean pages in the inactive queue.  

I have seen these lines in vm_pageout_scan:
/*
 * Calculate the number of pages we want to either free or move
 * to the cache.
 */
page_shortage = vm_paging_target() + addl_page_shortage_init;
...
/*
 * Compute the number of pages we want to try to move from the
 * active queue to the inactive queue.
 */
page_shortage = vm_paging_target() +
        cnt.v_inactive_target - cnt.v_inactive_count;
page_shortage += addl_page_shortage;

But I am not sure about "clean pages in the inactive queue" part.
>From what I can see in the code,  pagedaemon only tries to maintain a certain
number of pages on inactive queue - I am speaking about  vm_pageout_page_stats().
But I do not see any code ensuring level of _clean_ inactive pages.
And, if I am not mistaken, there is no guarantee even that those pages will not
be re-activated when pagedaemon actually scans them.

> I'm not sure if  the vm_paging_needed() call is correct or not, but it
> may be that that the intent is to avoid immediately going back to a
> depleted inactive queue when cache+free is within normal bounds,
> because it could result in avoidable paging to swap. 

Well, OTOH, if the current pass results in many pages being re-activated and
many pages still left on the inactive queue because they are dirty (see
maxlaunder in vm_pageout_scan), then it is premature to quit paging when we only
reached bare minimum of available pages (see pass and maxlaunder again).  IMHO,
of course.


As a side discussion, I wonder if current setting of v_inactive_target is
adequate.  It "feels" that it should be bigger.

-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Sun Jul 25 10:20:24 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0C64C1065670
	for <hackers@freebsd.org>; Sun, 25 Jul 2010 10:20:24 +0000 (UTC)
	(envelope-from culot@0xd0.org)
Received: from 0xd0.org (ks28346.kimsufi.com [91.121.92.146])
	by mx1.freebsd.org (Postfix) with ESMTP id BA48C8FC14
	for <hackers@freebsd.org>; Sun, 25 Jul 2010 10:20:23 +0000 (UTC)
Received: from 0xd0.org (doudou.0xd0.org [172.16.0.254])
	by 0xd0.org (8.14.4/8.14.4) with ESMTP id o6P9davv015960;
	Sun, 25 Jul 2010 11:39:36 +0200 (CEST) (envelope-from culot@0xd0.org)
Received: (from culot@localhost)
	by 0xd0.org (8.14.4/8.14.4/Submit) id o6P9datc015959;
	Sun, 25 Jul 2010 11:39:36 +0200 (CEST) (envelope-from culot)
Date: Sun, 25 Jul 2010 11:39:35 +0200
From: Frederic Culot <frederic@culot.org>
To: hackers@freebsd.org
Message-ID: <20100725093935.GC1917@culot.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
X-PGP-Key: http://culot.org/public/pgp-key.txt
User-Agent: Mutt/1.5.20 (2009-06-14)
X-Mailman-Approved-At: Sun, 25 Jul 2010 11:49:29 +0000
Cc: 
Subject: lint(1) improvements from OpenBSD
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Jul 2010 10:20:24 -0000

Hi,

I noticed on the the FreeBSD list of projects and ideas an item related to
lint(1) and the port of improvements from the OpenBSD project:

http://www.freebsd.org/projects/ideas/ideas.html#p-lint

I would like to know more about this project but unfortunately no technical
contact was specified on the web page, hence I write to the hackers list.

Does someone have more information related to this project (what improvements
does the text refer to)?
Has someone started working on it?

Thanks,
Frederic

--
mail: frederic@culot.org
 web: http://culot.org

From owner-freebsd-hackers@FreeBSD.ORG  Sun Jul 25 13:41:49 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 331B7106566B
	for <freebsd-hackers@freebsd.org>; Sun, 25 Jul 2010 13:41:49 +0000 (UTC)
	(envelope-from rwmaillists@googlemail.com)
Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50])
	by mx1.freebsd.org (Postfix) with ESMTP id BEA2E8FC17
	for <freebsd-hackers@freebsd.org>; Sun, 25 Jul 2010 13:41:46 +0000 (UTC)
Received: by wwe15 with SMTP id 15so5732688wwe.31
	for <freebsd-hackers@freebsd.org>; Sun, 25 Jul 2010 06:41:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=googlemail.com; s=gamma;
	h=domainkey-signature:received:received:date:from:to:subject
	:message-id:in-reply-to:references:x-mailer:mime-version
	:content-type:content-transfer-encoding;
	bh=+MN1Bej/vBcNquCWcQC9tvuSvpsR5gBKGfI67C4dp1o=;
	b=HxNUBxrR69lewU4aZokBlpvh48g0nerwzOzWM6OxcqaHgYV/CMRRR0ozOkdTVZAfgr
	px+8LtyQUIP6sekxM3EQGKgxaVibhf2B/0kSa9HbjH8eY33vATftNOLclFsrpYrRDSip
	LInGu6PTLcyjOsEdo8fR/kAXHhas3TCuXgfAs=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma;
	h=date:from:to:subject:message-id:in-reply-to:references:x-mailer
	:mime-version:content-type:content-transfer-encoding;
	b=uPxzYdcLnSXb17lQdgSa8FpqgRGjIXvLtjhYCGOGDzqN/bkq8RtfGmgyN97uNHWe/u
	K2NHvcB3GyNYu5dLW2O3T2BydQBCFTJhocrpRcJRN/XHejeeECIP81E6WVmrGmYbLq1O
	O2KT6h1vxA61DxkTv2sLWliL+/oXyXJFoElVw=
Received: by 10.227.140.154 with SMTP id i26mr5950534wbu.199.1280065305903;
	Sun, 25 Jul 2010 06:41:45 -0700 (PDT)
Received: from gumby.homeunix.com (bb-87-81-140-128.ukonline.co.uk
	[87.81.140.128])
	by mx.google.com with ESMTPS id e31sm2151183wbe.5.2010.07.25.06.41.44
	(version=SSLv3 cipher=RC4-MD5); Sun, 25 Jul 2010 06:41:45 -0700 (PDT)
Date: Sun, 25 Jul 2010 14:41:41 +0100
From: RW <rwmaillists@googlemail.com>
To: freebsd-hackers@freebsd.org
Message-ID: <20100725144141.6f1f33cc@gumby.homeunix.com>
In-Reply-To: <4C4C0CD9.6000002@freebsd.org>
References: <4C4B4BAB.3000005@freebsd.org>
	<20100725003144.3cfead39@gumby.homeunix.com>
	<4C4C0CD9.6000002@freebsd.org>
X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; i386-portbld-freebsd8.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: Re: pageout question
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Jul 2010 13:41:49 -0000

On Sun, 25 Jul 2010 13:07:21 +0300
Andriy Gapon <avg@freebsd.org> wrote:

> on 25/07/2010 02:31 RW said the following:

> > As I understand it the hysteresis is done inside vm_pageout_scan,
> > and the expectation is that one pass will typically satisfy this
> > because the design aims to keep enough clean pages in the inactive
> > queue.  
> 

> But I am not sure about "clean pages in the inactive queue" ... But I
> do not see any code ensuring level of _clean_ inactive pages. 

In FreeBSD the inactive queue contains disk cache pages which normally
provide most of the clean pages needed. In addition pages are dribbled
out to swap, and the resulting clean pages are placed at the back of
the inactive queue to make another pass. 

> 
> > I'm not sure if  the vm_paging_needed() call is correct or not, but
> > it may be that that the intent is to avoid immediately going back
> > to a depleted inactive queue when cache+free is within normal
> > bounds, because it could result in avoidable paging to swap. 
> 
> Well, OTOH, if the current pass results in many pages being
> re-activated and many pages still left on the inactive queue because
> they are dirty (see maxlaunder in vm_pageout_scan), 

Dirty-pages  make three passes through the inactive queue: twice dirty,
once clean. They are paged-out at the end of the second paass, so it's
unlike that they reactivated except under very heavy thrashing. 

> then it is
> premature to quit paging when we only reached bare minimum of
> available pages (see pass and maxlaunder again).  IMHO, of course.

It's not the bare minimum, that's another level that vm_page_count_min()
tests for.

From owner-freebsd-hackers@FreeBSD.ORG  Sun Jul 25 14:19:46 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E3F29106567C
	for <freebsd-hackers@freebsd.org>; Sun, 25 Jul 2010 14:19:46 +0000 (UTC)
	(envelope-from avg@freebsd.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 5A1E38FC17
	for <freebsd-hackers@freebsd.org>; Sun, 25 Jul 2010 14:19:45 +0000 (UTC)
Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA05221;
	Sun, 25 Jul 2010 17:19:43 +0300 (EEST)
	(envelope-from avg@freebsd.org)
Received: from localhost.topspin.kiev.ua ([127.0.0.1])
	by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1Od23L-000DCN-8E; Sun, 25 Jul 2010 17:19:43 +0300
Message-ID: <4C4C47FD.6080802@freebsd.org>
Date: Sun, 25 Jul 2010 17:19:41 +0300
From: Andriy Gapon <avg@freebsd.org>
User-Agent: Thunderbird 2.0.0.24 (X11/20100603)
MIME-Version: 1.0
To: RW <rwmaillists@googlemail.com>
References: <4C4B4BAB.3000005@freebsd.org>	<20100725003144.3cfead39@gumby.homeunix.com>	<4C4C0CD9.6000002@freebsd.org>
	<20100725144141.6f1f33cc@gumby.homeunix.com>
In-Reply-To: <20100725144141.6f1f33cc@gumby.homeunix.com>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org
Subject: Re: pageout question
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Jul 2010 14:19:47 -0000

on 25/07/2010 16:41 RW said the following:
> On Sun, 25 Jul 2010 13:07:21 +0300
> Andriy Gapon <avg@freebsd.org> wrote:
> 
>> on 25/07/2010 02:31 RW said the following:
> 
>>> As I understand it the hysteresis is done inside vm_pageout_scan,
>>> and the expectation is that one pass will typically satisfy this
>>> because the design aims to keep enough clean pages in the inactive
>>> queue.  
> 
>> But I am not sure about "clean pages in the inactive queue" ... But I
>> do not see any code ensuring level of _clean_ inactive pages. 
> 
> In FreeBSD the inactive queue contains disk cache pages which normally
> provide most of the clean pages needed. In addition pages are dribbled
> out to swap, and the resulting clean pages are placed at the back of
> the inactive queue to make another pass. 

Well, "normally" and "most" are not quite quantitative.
Personally, I do not see any guarantees that inactive queue would contain enough
clean pages to reach paging target on a single pass.

>>> I'm not sure if  the vm_paging_needed() call is correct or not, but
>>> it may be that that the intent is to avoid immediately going back
>>> to a depleted inactive queue when cache+free is within normal
>>> bounds, because it could result in avoidable paging to swap. 
>> Well, OTOH, if the current pass results in many pages being
>> re-activated and many pages still left on the inactive queue because
>> they are dirty (see maxlaunder in vm_pageout_scan), 
> 
> Dirty-pages  make three passes through the inactive queue: twice dirty,
> once clean. They are paged-out at the end of the second paass, so it's
> unlike that they reactivated except under very heavy thrashing. 

I didn't mean to say that dirty pages would get re-activated.
Clean pages can perfectly be re-activated if they were referenced since their
de-activation time.

>> then it is
>> premature to quit paging when we only reached bare minimum of
>> available pages (see pass and maxlaunder again).  IMHO, of course.
> 
> It's not the bare minimum, that's another level that vm_page_count_min()
> tests for.

I meant bare minimum to stop paging, that is, going above lower watermark of the
paging hysteresis.

-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Sun Jul 25 20:28:54 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 839071065677
	for <freebsd-hackers@freebsd.org>; Sun, 25 Jul 2010 20:28:54 +0000 (UTC)
	(envelope-from rwmaillists@googlemail.com)
Received: from mail-ww0-f42.google.com (mail-ww0-f42.google.com [74.125.82.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 450648FC13
	for <freebsd-hackers@freebsd.org>; Sun, 25 Jul 2010 20:28:53 +0000 (UTC)
Received: by wwf26 with SMTP id 26so2268976wwf.1
	for <freebsd-hackers@freebsd.org>; Sun, 25 Jul 2010 13:28:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=googlemail.com; s=gamma;
	h=domainkey-signature:received:received:date:from:to:subject
	:message-id:in-reply-to:references:x-mailer:mime-version
	:content-type:content-transfer-encoding;
	bh=xiKsSYWPKBjVqgiw15IxND+oktrVcFIGgbQe6f3FOgE=;
	b=ngAELpadb3V2LUS/r/D64X4MXF02oWR2EGOEhT5bGDQtC9tP6eVnZf9OVfTb6qC7CL
	+aUCZC9OjnT77vJGONK/g+hdkkF3Dchu+MZmjD3zVqnCSTuz9PNK++SsSO2b6NMlDeq7
	d9oWX4klKdQPOPcOhL4NdMLLKvsKz5daf14JE=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma;
	h=date:from:to:subject:message-id:in-reply-to:references:x-mailer
	:mime-version:content-type:content-transfer-encoding;
	b=cMlivFKKM3Xl9B98y2bkG+t0sRMTrJWhbPAKEFqV7ncgFiTAGW6vZWuXHzafxgb+3W
	nfaLFCWkhS1QKy48i1wkW7aCjy1EsOtMWlj3EnrGN2SQ/jDNXcMn6iNfS+Bvhr+IRDHQ
	W1N1my0v7jwzoWV8nOJ5fw2m4ZHBFQ5Nb6q3M=
Received: by 10.227.146.147 with SMTP id h19mr6344713wbv.222.1280089732914;
	Sun, 25 Jul 2010 13:28:52 -0700 (PDT)
Received: from gumby.homeunix.com (bb-87-81-140-128.ukonline.co.uk
	[87.81.140.128])
	by mx.google.com with ESMTPS id e31sm2397228wbe.23.2010.07.25.13.28.51
	(version=SSLv3 cipher=RC4-MD5); Sun, 25 Jul 2010 13:28:52 -0700 (PDT)
Date: Sun, 25 Jul 2010 21:28:49 +0100
From: RW <rwmaillists@googlemail.com>
To: freebsd-hackers@freebsd.org
Message-ID: <20100725212849.1e07f40c@gumby.homeunix.com>
In-Reply-To: <4C4C47FD.6080802@freebsd.org>
References: <4C4B4BAB.3000005@freebsd.org>
	<20100725003144.3cfead39@gumby.homeunix.com>
	<4C4C0CD9.6000002@freebsd.org>
	<20100725144141.6f1f33cc@gumby.homeunix.com>
	<4C4C47FD.6080802@freebsd.org>
X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; i386-portbld-freebsd8.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: Re: pageout question
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Jul 2010 20:28:54 -0000

On Sun, 25 Jul 2010 17:19:41 +0300
Andriy Gapon <avg@freebsd.org> wrote:

> on 25/07/2010 16:41 RW said the following:

> > In FreeBSD the inactive queue contains disk cache pages which
> > normally provide most of the clean pages needed. In addition pages
> > are dribbled out to swap, and the resulting clean pages are placed
> > at the back of the inactive queue to make another pass. 
> 
> Well, "normally" and "most" are not quite quantitative.
> Personally, I do not see any guarantees that inactive queue would
> contain enough clean pages to reach paging target on a single pass.

I didn't say it say it was guaranteed. I just think the scenario where
a first pass ends up between the watermarks is rare. And when it
happens I don't see a compelling reason to do extra paging to reach an
arbitrary target.

I think the comment about not clearing vm_pages_needed is referring to
clearing it below the low-watermark because the daemon would then get
woken-up almost immediately.

> I meant bare minimum to stop paging, that is, going above lower
> watermark of the paging hysteresis.
 

From owner-freebsd-hackers@FreeBSD.ORG  Sun Jul 25 20:43:13 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DE979106564A
	for <freebsd-hackers@freebsd.org>; Sun, 25 Jul 2010 20:43:13 +0000 (UTC)
	(envelope-from avg@freebsd.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 4F1038FC0A
	for <freebsd-hackers@freebsd.org>; Sun, 25 Jul 2010 20:43:12 +0000 (UTC)
Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA08883;
	Sun, 25 Jul 2010 23:43:10 +0300 (EEST)
	(envelope-from avg@freebsd.org)
Received: from localhost.topspin.kiev.ua ([127.0.0.1])
	by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1Od82P-000Day-NR; Sun, 25 Jul 2010 23:43:09 +0300
Message-ID: <4C4CA1DC.2050902@freebsd.org>
Date: Sun, 25 Jul 2010 23:43:08 +0300
From: Andriy Gapon <avg@freebsd.org>
User-Agent: Thunderbird 2.0.0.24 (X11/20100603)
MIME-Version: 1.0
To: RW <rwmaillists@googlemail.com>
References: <4C4B4BAB.3000005@freebsd.org>	<20100725003144.3cfead39@gumby.homeunix.com>	<4C4C0CD9.6000002@freebsd.org>	<20100725144141.6f1f33cc@gumby.homeunix.com>	<4C4C47FD.6080802@freebsd.org>
	<20100725212849.1e07f40c@gumby.homeunix.com>
In-Reply-To: <20100725212849.1e07f40c@gumby.homeunix.com>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org
Subject: Re: pageout question
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 25 Jul 2010 20:43:13 -0000

on 25/07/2010 23:28 RW said the following:
> On Sun, 25 Jul 2010 17:19:41 +0300
> Andriy Gapon <avg@freebsd.org> wrote:
> 
>> on 25/07/2010 16:41 RW said the following:
> 
>>> In FreeBSD the inactive queue contains disk cache pages which
>>> normally provide most of the clean pages needed. In addition pages
>>> are dribbled out to swap, and the resulting clean pages are placed
>>> at the back of the inactive queue to make another pass. 
>> Well, "normally" and "most" are not quite quantitative.
>> Personally, I do not see any guarantees that inactive queue would
>> contain enough clean pages to reach paging target on a single pass.
> 
> I didn't say it say it was guaranteed. I just think the scenario where
> a first pass ends up between the watermarks is rare. And when it
> happens I don't see a compelling reason to do extra paging to reach an
> arbitrary target.

Well, it seems neither I nor you have data to show whether it's rare or not (and
it would greatly depend on workload too).
As to "arbitrary target" - well, that's the whole point of hysteresis-like
behavior.  We start paging also at an "arbitrary" point.

-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Mon Jul 26 12:19:37 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7088B1065675;
	Mon, 26 Jul 2010 12:19:37 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 4FBFD8FC0C;
	Mon, 26 Jul 2010 12:19:37 +0000 (UTC)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
	by cyrus.watson.org (Postfix) with ESMTPS id CADC946B3B;
	Mon, 26 Jul 2010 08:19:36 -0400 (EDT)
Date: Mon, 26 Jul 2010 13:19:36 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Alexander Motin <mav@FreeBSD.org>
In-Reply-To: <4C4B720A.6020802@FreeBSD.org>
Message-ID: <alpine.BSF.2.00.1007261318380.10170@fledge.watson.org>
References: <4C4AF046.40507@FreeBSD.org>
	<A30A636F-E925-456E-8866-4E46B3BA367F@lavabit.com>
	<4C4B720A.6020802@FreeBSD.org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org,
	Rui Paulo <rpaulo@lavabit.com>
Subject: Re: Intel TurboBoost in practice
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 12:19:37 -0000

On Sun, 25 Jul 2010, Alexander Motin wrote:

>> The numbers that you are showing doesn't show much difference. Have you 
>> tried buildworld?
>
> If you mean relative difference -- as I have told, it's mostly because of my 
> CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is enabled most 
> of time if CPU is not overheated. It probably doesn't, as it works on clear 
> table under air conditioner. So maximal effect I can expect on is 4.2%. In 
> such situation 2.8% probably not so bad to illustrate that feature works and 
> there is space for further improvements. If I had Core i5-750S I would 
> expect 33% boost.

Can I recommend the use of ministat(1) and sample sizes of at least 8 runs per 
configuration?

Robert

>
> If you mean absolute difference, here are results or four buildworld runs:
> hw.acpi.cpu.cx_lowest=C1: 4654.23 sec
> hw.acpi.cpu.cx_lowest=C2: 4556.37 sec
> hw.acpi.cpu.cx_lowest=C2: 4570.85 sec
> hw.acpi.cpu.cx_lowest=C1: 4679.83 sec
> Benefit is about 2.1%. Each time results were erased and sources
> pre-cached into RAM. Storage was SSD, so disk should not be an issue.
>
> -- 
> Alexander Motin
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
>

From owner-freebsd-hackers@FreeBSD.ORG  Mon Jul 26 14:12:26 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BB2C11065673;
	Mon, 26 Jul 2010 14:12:26 +0000 (UTC)
	(envelope-from mavbsd@gmail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 1BD528FC12;
	Mon, 26 Jul 2010 14:12:25 +0000 (UTC)
Received: by fxm13 with SMTP id 13so119931fxm.13
	for <multiple recipients>; Mon, 26 Jul 2010 07:12:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:sender:message-id:date:from
	:user-agent:mime-version:to:cc:subject:references:in-reply-to
	:x-enigmail-version:content-type:content-transfer-encoding;
	bh=G7oTVotFxTUAVpJGOpyCve/ZLseePrYacSwaTC6aNno=;
	b=QbPZspZFmIbc++j9Ib4gbVezU1T2mNbNj99c/AOMgOMOH0K5BjnWEnBYVBPJ7VxOJ+
	wLtBsL+xUntTjL0rTnDQKh3qRjIDkwGLs8MDru4PAGmPBrPm6K15ejm5lT1IMPkKnVAs
	K/mgn4DrCMuHg4SGSiLX8s0hn/V1ZqfEyUGhw=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:x-enigmail-version:content-type
	:content-transfer-encoding;
	b=Gzu6YNHAKGwISQKktLl7d04TmFpIszqfBmLdg+Y296sFpSZ0tbiMk6RHP+ZymPEhJL
	Jmrs/uPjH364K4syI3ULulkkGR57spgefllJNdIzPZaWDFZ0jJHd09UCEANPl3YLNmK/
	PdhOu3tRT3+711/SFFJhM0HlVxJrwam4Jgc7o=
Received: by 10.223.109.140 with SMTP id j12mr6505414fap.22.1280153544092;
	Mon, 26 Jul 2010 07:12:24 -0700 (PDT)
Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226])
	by mx.google.com with ESMTPS id w11sm1401657fao.13.2010.07.26.07.12.21
	(version=SSLv3 cipher=RC4-MD5); Mon, 26 Jul 2010 07:12:22 -0700 (PDT)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <4C4D9779.8080505@FreeBSD.org>
Date: Mon, 26 Jul 2010 17:11:05 +0300
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.23 (X11/20091212)
MIME-Version: 1.0
To: Robert Watson <rwatson@FreeBSD.org>
References: <4C4AF046.40507@FreeBSD.org>
	<A30A636F-E925-456E-8866-4E46B3BA367F@lavabit.com>
	<4C4B720A.6020802@FreeBSD.org>
	<alpine.BSF.2.00.1007261318380.10170@fledge.watson.org>
In-Reply-To: <alpine.BSF.2.00.1007261318380.10170@fledge.watson.org>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org,
	Rui Paulo <rpaulo@lavabit.com>
Subject: Re: Intel TurboBoost in practice
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 14:12:26 -0000

Robert Watson wrote:
> On Sun, 25 Jul 2010, Alexander Motin wrote:
>>> The numbers that you are showing doesn't show much difference. Have
>>> you tried buildworld?
>>
>> If you mean relative difference -- as I have told, it's mostly because
>> of my CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is
>> enabled most of time if CPU is not overheated. It probably doesn't, as
>> it works on clear table under air conditioner. So maximal effect I can
>> expect on is 4.2%. In such situation 2.8% probably not so bad to
>> illustrate that feature works and there is space for further
>> improvements. If I had Core i5-750S I would expect 33% boost.
> 
> Can I recommend the use of ministat(1) and sample sizes of at least 8
> runs per configuration?

Thanks for pushing me to do it right. :) Here is 3*15 runs with fresh
kernel with disabled debug. Results are quite close to original: -2.73%
and -2.19% of time.
x C1
+ C2
* C3
+-----------------------------------------------------------------+
|+        *                                  x                    |
|+        *                                  x                    |
|+        *                                  x                    |
|+        *                                  x                    |
|+        *                                  x                    |
|+        *                                  x                    |
|+        *                                  x                    |
|+       **                                  x                    |
|+ +     **                                 xx                    |
|+ +     ** **                              xx                   x|
|                                         |__M_A____|             |
|A|                                                               |
|        |A|                                                      |
+-----------------------------------------------------------------+
    N        Min        Max     Median           Avg        Stddev
x  15      12.68      12.84      12.69     12.698667   0.039254966
+  15      12.35      12.36      12.35     12.351333  0.0035186578
Difference at 95.0% confidence
        -0.347333 +/- 0.0208409
        -2.7352% +/- 0.164119%
        (Student's t, pooled s = 0.0278687)
*  15      12.41      12.44      12.42         12.42  0.0075592895
Difference at 95.0% confidence
        -0.278667 +/- 0.0211391
        -2.19446% +/- 0.166467%
        (Student's t, pooled s = 0.0282674)

I also checked one more aspect -- TurboBoost works only when CPU runs at
highest EIST frequency (P0 state). I've reduced dev.cpu.0.freq from 3201
to 3067 and repeated the test:
x C1
+ C2
* C3
+-----------------------------------------------------------------+
| x                           +                              *    |
| x                           +                              *    |
| x                           +                              *    |
| x                           +                              *   *|
| x  x                        +                              *   *|
| x  x                        +  +                           *   *|
| x  x                        +  +                           *   *|
| x  x                        +  +                           *   *|
| x  x                    +   +  +   +                       *   *|
||MA|                                                             |
|                           |_MA_|                                |
|                                                            M_A_||
+-----------------------------------------------------------------+
    N        Min        Max     Median           Avg        Stddev
x  15      13.72      13.73      13.72     13.723333  0.0048795004
+  15      13.79      13.82       13.8     13.803333  0.0072374686
Difference at 95.0% confidence
        0.08 +/- 0.00461567
        0.582949% +/- 0.0336337%
        (Student's t, pooled s = 0.00617213)
*  15      13.89       13.9      13.89        13.894  0.0050709255
Difference at 95.0% confidence
        0.170667 +/- 0.00372127
        1.24362% +/- 0.0271164%
        (Student's t, pooled s = 0.00497613)

In that case using C2 or C3 predictably caused small performance reduce,
as after falling to sleep, CPU needs time to wakeup. Even if tested CPU0
won't ever sleep during test, it's TLB shutdown IPIs to other cores
still probably could suffer from waiting other cores' wakeup.

Obviously in first test these 0.58% and 1.24% were subtracted from the
TurboBoost's maximal benefit of 4.3% on this CPU.

-- 
Alexander Motin

From owner-freebsd-hackers@FreeBSD.ORG  Mon Jul 26 17:53:56 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4B91A1065674
	for <freebsd-hackers@freebsd.org>; Mon, 26 Jul 2010 17:53:56 +0000 (UTC)
	(envelope-from rwmaillists@googlemail.com)
Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com
	[74.125.82.182])
	by mx1.freebsd.org (Postfix) with ESMTP id D09BF8FC17
	for <freebsd-hackers@freebsd.org>; Mon, 26 Jul 2010 17:53:55 +0000 (UTC)
Received: by wyj26 with SMTP id 26so2847735wyj.13
	for <freebsd-hackers@freebsd.org>; Mon, 26 Jul 2010 10:53:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=googlemail.com; s=gamma;
	h=domainkey-signature:received:received:date:from:to:subject
	:message-id:in-reply-to:references:x-mailer:mime-version
	:content-type:content-transfer-encoding;
	bh=GiYkcUGb5QlxxMiNTgWsntXXs7CQ8p5NUnU9HoLq+2o=;
	b=RHVgfEkrQsr3A4fGGFwt49QAgfYcX47e4LAPnlJcrPqJlC/zKpzoUfJfaaRZiaZamu
	Z6jJoPMzx5iVd7kErH/J9D9BOhkU27CMBdVKSObcUBOTLglvvv0WH2NDp6djEkkCn4hd
	9MpLs+eXKJTz37o3kaVWIQm3K0EKW5Z5QdVTU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma;
	h=date:from:to:subject:message-id:in-reply-to:references:x-mailer
	:mime-version:content-type:content-transfer-encoding;
	b=e/bBCOjwvFxXJd1ehmvNCCm1Uwzu30OjldZhi8wILA2QgpZTW7RpzfvegyPKqZlIgf
	wWNVvUc0UbuA+M6Z2ckk4wnR6im14MGGRTxWNSpMoALfTv+pLeVI7o6JC88f4Dmpg0aJ
	68Yp2tINY30cVh62vFnSF2uvvYQDmbwITyy3k=
Received: by 10.227.144.129 with SMTP id z1mr7691304wbu.85.1280166834752;
	Mon, 26 Jul 2010 10:53:54 -0700 (PDT)
Received: from gumby.homeunix.com (bb-87-81-140-128.ukonline.co.uk
	[87.81.140.128])
	by mx.google.com with ESMTPS id l6sm2105804wed.25.2010.07.26.10.53.49
	(version=SSLv3 cipher=RC4-MD5); Mon, 26 Jul 2010 10:53:51 -0700 (PDT)
Date: Mon, 26 Jul 2010 18:53:48 +0100
From: RW <rwmaillists@googlemail.com>
To: freebsd-hackers@freebsd.org
Message-ID: <20100726185348.63ebf916@gumby.homeunix.com>
In-Reply-To: <4C4CA1DC.2050902@freebsd.org>
References: <4C4B4BAB.3000005@freebsd.org>
	<20100725003144.3cfead39@gumby.homeunix.com>
	<4C4C0CD9.6000002@freebsd.org>
	<20100725144141.6f1f33cc@gumby.homeunix.com>
	<4C4C47FD.6080802@freebsd.org>
	<20100725212849.1e07f40c@gumby.homeunix.com>
	<4C4CA1DC.2050902@freebsd.org>
X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; i386-portbld-freebsd8.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: Re: pageout question
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 17:53:56 -0000

On Sun, 25 Jul 2010 23:43:08 +0300
Andriy Gapon <avg@freebsd.org> wrote:

> on 25/07/2010 23:28 RW said the following:

> > I didn't say it say it was guaranteed. I just think the scenario
> > where a first pass ends up between the watermarks is rare. And when
> > it happens I don't see a compelling reason to do extra paging to
> > reach an arbitrary target.
> 
> Well, it seems neither I nor you have data to show whether it's rare
> or not (and it would greatly depend on workload too).
> As to "arbitrary target" - well, that's the whole point of
> hysteresis-like behavior.  We start paging also at an "arbitrary"
> point.


If after the first pass with light-paging the high watermark isn't
reached then the choices are

1) loop and immediately do a heavy-paging pass.

2) wait and let the daemon get woken-up for another light-paging pass -
only go to heavy-paging when this strategy isn't keeping up with demand.

To me (2) is doing the right thing. It's trying to satisfy  demand from
existing clean pages, and only paging heavily as a last resort. 

From owner-freebsd-hackers@FreeBSD.ORG  Mon Jul 26 19:00:33 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C32171065676
	for <freebsd-hackers@freebsd.org>; Mon, 26 Jul 2010 19:00:33 +0000 (UTC)
	(envelope-from avg@freebsd.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 0F7CF8FC19
	for <freebsd-hackers@freebsd.org>; Mon, 26 Jul 2010 19:00:32 +0000 (UTC)
Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA29272;
	Mon, 26 Jul 2010 22:00:29 +0300 (EEST)
	(envelope-from avg@freebsd.org)
Received: from localhost.topspin.kiev.ua ([127.0.0.1])
	by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1OdSua-000H6x-NS; Mon, 26 Jul 2010 22:00:29 +0300
Message-ID: <4C4DDB4B.9000307@freebsd.org>
Date: Mon, 26 Jul 2010 22:00:27 +0300
From: Andriy Gapon <avg@freebsd.org>
User-Agent: Thunderbird 2.0.0.24 (X11/20100603)
MIME-Version: 1.0
To: RW <rwmaillists@googlemail.com>, freebsd-hackers@freebsd.org
References: <4C4B4BAB.3000005@freebsd.org>	<20100725003144.3cfead39@gumby.homeunix.com>	<4C4C0CD9.6000002@freebsd.org>	<20100725144141.6f1f33cc@gumby.homeunix.com>	<4C4C47FD.6080802@freebsd.org>
	<20100725212849.1e07f40c@gumby.homeunix.com>
	<4C4CA1DC.2050902@freebsd.org>
In-Reply-To: <4C4CA1DC.2050902@freebsd.org>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: pageout question
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 19:00:33 -0000

on 25/07/2010 23:43 Andriy Gapon said the following:
> on 25/07/2010 23:28 RW said the following:
>> I didn't say it say it was guaranteed. I just think the scenario where
>> a first pass ends up between the watermarks is rare. And when it
>> happens I don't see a compelling reason to do extra paging to reach an
>> arbitrary target.
> 
> Well, it seems neither I nor you have data to show whether it's rare or not (and
> it would greatly depend on workload too).
> As to "arbitrary target" - well, that's the whole point of hysteresis-like
> behavior.  We start paging also at an "arbitrary" point.

Well, it seems that you are right (at least to a certain degree) - with
"moderately high" memory load (starting lots of memory hungry "real"
applications and not letting them sit idle) a single pass was always sufficient.
 Even with my suggested change! :-)  I.e. that single pass was always able to
shoot to or over the high watermark.
So, in fact, there is not much (any?) difference between current code and
patched code in this case.

But not quite so with stress2 swap test.
In that case more than one pass was needed in almost all the cases.  Again, this
is with patched vm_pageout().

Which brings another interesting point which was overlooked initially.
vm_pageout() loop can make at most two passes back-to-back, after that it slows
down to make an additional pass every 1/2 seconds:
if (vm_pages_needed) {
        /*
         * Still not done, take a second pass without waiting
         * (unlimited dirty cleaning), otherwise sleep a bit
         * and try again.
         */
        ++pass;
        if (pass > 1)
                msleep(&vm_pages_needed,
                    &vm_page_queue_free_mtx, PVM, "psleep",
                    hz / 2);
} else {

With the patched code and stress2 I indeed observed pagedaemon spending time in
this sleep.

On the other hand, current unpatched code is more optimistic about calling it
done.  So even if only a handful of pages is freed and available memory goes
just above low watermark, pagedaemon would decide that it had a successful pass
and would reset pass count to zero.  Those freed pages would, of course, get
consumed immediately and a new pass would be requested.  Since the history is
lost at this point, there would be no rate limit for the new pass.

So my _theory_ is that in very harsh conditions doing true hysteresis would
result in many _accounted_ passes and thus throttled down pagedaemon.  On the
other hand, the current code would still do many passes because of the constant
memory pressure, but they will be (mostly) unaccounted and thus pagedaemon would
be scanning pages 'like crazy'.

In other words: with current code available page count would rapidly oscillate
around low watermark, while with patched code available page count would mostly
stay low.

Not sure which one is better.  But for me, in such extreme conditions,  slowing
things down sounds better than spinning pagedaemon.

P.S.
Just in case, I would like to point out that the patch doesn't change condition
when the waiters are notified about available memory - it is still
!vm_page_count_min().  The patches only changes when vm_pages_needed is reset.
This is kind of obvious, but I decided to make it explicit.

-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Mon Jul 26 19:32:33 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A3FD4106564A;
	Mon, 26 Jul 2010 19:32:33 +0000 (UTC)
	(envelope-from courtney.shaun@gmail.com)
Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50])
	by mx1.freebsd.org (Postfix) with ESMTP id 19F808FC18;
	Mon, 26 Jul 2010 19:32:32 +0000 (UTC)
Received: by wwe15 with SMTP id 15so581956wwe.31
	for <multiple recipients>; Mon, 26 Jul 2010 12:32:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:date:message-id
	:subject:from:to:content-type;
	bh=sCHIA8LGc3F/5dcOcWE236UgUgbOoWMSsa6rlJTih2Y=;
	b=A3QeF/1CEsvb8Jfv0/Z+DzmB2rknKGbCMeaNnrtYESHLTT33/pEONKYAFpDfPBhPBo
	i0YEqPYCHOeuG/j0R+xFb9lnk/EzXGxOrkYUNZBSYiZDpqcB+gXEjYGSd+kH2zNyReyx
	+wvsV0yUFx1nDxxXfCY1x6xLVngXAZD+z5WU8=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:content-type;
	b=UbmhJqr1DpK2Q/5jImCXUtbfymliTPw/94XZ8FRrZ9aJZaiZu7kN9wWWlWfjtLe/Iu
	5/hvzuzui5Uop8IuH+OYFx7OmoK3q80GF0msY33kCp3t9hkJm8lU4et190h45cu7fX2j
	0hJiNv7N/HZm4fr9jMXT9FuWqOSLUB3Ah7Mss=
MIME-Version: 1.0
Received: by 10.227.69.195 with SMTP id a3mr7731099wbj.58.1280170863856; Mon, 
	26 Jul 2010 12:01:03 -0700 (PDT)
Received: by 10.216.38.198 with HTTP; Mon, 26 Jul 2010 12:01:03 -0700 (PDT)
Date: Tue, 27 Jul 2010 03:01:03 +0800
Message-ID: <AANLkTi=wvbSp2yM7VY-gPL7n0utdLy7-f25uRRyCtiSo@mail.gmail.com>
From: "courtney.shaun@gmail.com" <courtney.shaun@gmail.com>
To: ka@pacific.net, freebsd-ports@freebsd.org, courtney.shaun@gmail.com, 
	freebsd-hackers@freebsd.org, image001.gif@01CB1DD3.639
Content-Type: text/plain; charset=ISO-8859-1
Cc: 
Subject: discount news : g--&
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 19:32:33 -0000

discount news :
      the South Africa's World Cup has finished ,but i know that the
promotions in  www.yong-rong.com  for the South Africa's World Cup has
not finished ,do you know ?i think the Website can be tested,because I
have bought some ,that company mainly sell all kinds of
MP3,TV,Motorbike,Cellphone,Laptop etc,you can buy their products as
soon as possible ,maybe the promotions will be end soon.
      good luck ,Greeting !  5--&

From owner-freebsd-hackers@FreeBSD.ORG  Mon Jul 26 19:59:35 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D62DF106564A
	for <freebsd-hackers@freebsd.org>; Mon, 26 Jul 2010 19:59:35 +0000 (UTC)
	(envelope-from avg@freebsd.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 282C08FC15
	for <freebsd-hackers@freebsd.org>; Mon, 26 Jul 2010 19:59:34 +0000 (UTC)
Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA29983;
	Mon, 26 Jul 2010 22:59:32 +0300 (EEST)
	(envelope-from avg@freebsd.org)
Received: from localhost.topspin.kiev.ua ([127.0.0.1])
	by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1OdTpk-000HAo-HB; Mon, 26 Jul 2010 22:59:32 +0300
Message-ID: <4C4DE923.5030307@freebsd.org>
Date: Mon, 26 Jul 2010 22:59:31 +0300
From: Andriy Gapon <avg@freebsd.org>
User-Agent: Thunderbird 2.0.0.24 (X11/20100603)
MIME-Version: 1.0
To: RW <rwmaillists@googlemail.com>
References: <4C4B4BAB.3000005@freebsd.org>	<20100725003144.3cfead39@gumby.homeunix.com>	<4C4C0CD9.6000002@freebsd.org>	<20100725144141.6f1f33cc@gumby.homeunix.com>	<4C4C47FD.6080802@freebsd.org>	<20100725212849.1e07f40c@gumby.homeunix.com>	<4C4CA1DC.2050902@freebsd.org>
	<20100726185348.63ebf916@gumby.homeunix.com>
In-Reply-To: <20100726185348.63ebf916@gumby.homeunix.com>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org
Subject: Re: pageout question
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 19:59:35 -0000

on 26/07/2010 20:53 RW said the following:
> If after the first pass with light-paging the high watermark isn't
> reached then the choices are
> 
> 1) loop and immediately do a heavy-paging pass.
> 
> 2) wait and let the daemon get woken-up for another light-paging pass -
> only go to heavy-paging when this strategy isn't keeping up with demand.
> 
> To me (2) is doing the right thing. It's trying to satisfy  demand from
> existing clean pages, and only paging heavily as a last resort. 

Well, based on my observations, if the first pass doesn't reach the high
watermark, then we are in a high pressure situation and so we would have to do
some heavy-lifting anyways.  In my opinion, it's better to start doing more work
 at once than trying to pretend that situation would somehow resolve itself.

-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Tue Jul 27 00:53:34 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BB3821065674
	for <hackers@freebsd.org>; Tue, 27 Jul 2010 00:53:34 +0000 (UTC)
	(envelope-from delphij@delphij.net)
Received: from tarsier.geekcn.org (tarsier.geekcn.org [IPv6:2001:470:a803::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 646938FC14
	for <hackers@freebsd.org>; Tue, 27 Jul 2010 00:53:34 +0000 (UTC)
Received: from mail.geekcn.org (tarsier.geekcn.org [211.166.10.233])
	by tarsier.geekcn.org (Postfix) with ESMTP id 673FBA5FA72;
	Tue, 27 Jul 2010 08:53:33 +0800 (CST)
X-Virus-Scanned: amavisd-new at geekcn.org
Received: from tarsier.geekcn.org ([211.166.10.233])
	by mail.geekcn.org (mail.geekcn.org [211.166.10.233]) (amavisd-new,
	port 10024)
	with LMTP id kbCuxVA79dsK; Tue, 27 Jul 2010 08:53:26 +0800 (CST)
Received: from delta.delphij.net (drawbridge.ixsystems.com [206.40.55.65])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by tarsier.geekcn.org (Postfix) with ESMTPSA id 61CFDA5FB1C;
	Tue, 27 Jul 2010 08:53:24 +0800 (CST)
DomainKey-Signature: a=rsa-sha1; s=default; d=delphij.net; c=nofws; q=dns;
	h=message-id:date:from:reply-to:organization:user-agent:
	mime-version:to:cc:subject:references:in-reply-to:
	x-enigmail-version:openpgp:content-type:content-transfer-encoding;
	b=r3V9044kgeX9ZIBwGF8yBD5x32PcXJ218wCKskt3adQsJ6GIjGlV6RG57ovEUN+xY
	WWzN+DvOzzNmbtKCdFp2A==
Message-ID: <4C4E2DFF.1010203@delphij.net>
Date: Mon, 26 Jul 2010 17:53:19 -0700
From: Xin LI <delphij@delphij.net>
Organization: The Geek China Organization
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.1.11) Gecko/20100721 Thunderbird/3.0.6 ThunderBrowse/3.3.1
MIME-Version: 1.0
To: Frederic Culot <frederic@culot.org>
References: <20100725093935.GC1917@culot.org>
In-Reply-To: <20100725093935.GC1917@culot.org>
X-Enigmail-Version: 1.0.1
OpenPGP: id=3FCA37C1;
	url=http://www.delphij.net/delphij.asc
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: hackers@freebsd.org
Subject: Re: lint(1) improvements from OpenBSD
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: d@delphij.net
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Jul 2010 00:53:34 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 2010/07/25 02:39, Frederic Culot wrote:
> Hi,
> 
> I noticed on the the FreeBSD list of projects and ideas an item related to
> lint(1) and the port of improvements from the OpenBSD project:
> 
> http://www.freebsd.org/projects/ideas/ideas.html#p-lint
> 
> I would like to know more about this project but unfortunately no technical
> contact was specified on the web page, hence I write to the hackers list.
> 
> Does someone have more information related to this project (what improvements
> does the text refer to)?
> Has someone started working on it?

I think it's talking about OpenBSD's xlint (src/usr.bin/xlint).

No I am not aware of anyone working on this.

Cheers,
- -- 
Xin LI <delphij@delphij.net>	http://www.delphij.net/
FreeBSD - The Power to Serve!	       Live free or die
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (FreeBSD)

iQEcBAEBCAAGBQJMTi3/AAoJEATO+BI/yjfBxMAIAK5Hz21ipEJFMao1U0BXUEun
WGofq+cokgXYA94JsfOrl/KmwwaEetZVp21Gc1yyL+Kp4ZYvzpv+eEzdm98TH5rv
wHJp298j/hs0gxkrDP2XqnIrjd+YCuJg19CbZ7rEC6SeuAJ4mEJR1DW6dpmM7TSa
lZnGgTnZp6SMUY2knU2GQfQjd+f0IXP370ksjSF3CPMwaKHzKoCLLWHR9uBacGjb
QLPU4AvmExxfTa6icsfCVNNcIeFdq6653Hq9HJdsvGbkX623PMxzcG/BfeIETDUo
/zwOnx1Pp27cpvVNf7K6tqt2aNZlr2Fjxq9mz4hy6yAnVmJiqX2vz1Z2jAN6lrw=
=YWXj
-----END PGP SIGNATURE-----

From owner-freebsd-hackers@FreeBSD.ORG  Tue Jul 27 15:54:07 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 38DD41065674
	for <freebsd-hackers@freebsd.org>; Tue, 27 Jul 2010 15:54:07 +0000 (UTC)
	(envelope-from kraduk@googlemail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id C0C7C8FC0C
	for <freebsd-hackers@freebsd.org>; Tue, 27 Jul 2010 15:54:06 +0000 (UTC)
Received: by fxm13 with SMTP id 13so732719fxm.13
	for <freebsd-hackers@freebsd.org>; Tue, 27 Jul 2010 08:54:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=googlemail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:date:message-id
	:subject:from:to:content-type;
	bh=wrObfz+kTJ4eg0jG+yJnvmK9b+X3/OR9JUotbo3rDrA=;
	b=hi5IeyZjJah0s4lsVaneNUh3L3+KgYV/mOsJl0ErlKPlBMmYHKDo4KQTFxd9imiNMv
	p58y/MXEeAZFQvbExhDBtIkQ5O/+lvTf+h1EfTTOEVz3qRXDUCpTJMDls599Q+2wNI5f
	09wTSNSkIy916IMEKt72/lOBcyaZF3lMjF3ks=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:content-type;
	b=nH5441msTxi91qqN03ocfPATABxRJpJkDkwKP2cWvU8jsM2AUKMB3JdGI1Mv7uxYF1
	uSPw+BTgRXbmy9TcCS50DPqzpWZcySXVoIOxU8gL9EQxhJtzYqxJ5199Bmj5OdmfraIQ
	Zc8gDhMsVLwyXp2MTZuNKevVA4qGVgkqNvid4=
MIME-Version: 1.0
Received: by 10.239.188.19 with SMTP id n19mr553838hbh.154.1280244560784; Tue, 
	27 Jul 2010 08:29:20 -0700 (PDT)
Received: by 10.239.160.201 with HTTP; Tue, 27 Jul 2010 08:29:20 -0700 (PDT)
Date: Tue, 27 Jul 2010 16:29:20 +0100
Message-ID: <AANLkTinUVKByfTX+f9DOQ97jh43VPVSug_=BDpJ9PB0z@mail.gmail.com>
From: krad <kraduk@googlemail.com>
To: freebsd-hackers@freebsd.org, 
	FreeBSD Questions <freebsd-questions@freebsd.org>
X-Mailman-Approved-At: Tue, 27 Jul 2010 16:13:31 +0000
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: 
Subject: possible NFS lockups
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Jul 2010 15:54:07 -0000

I have a production mail system with an nfs backend. Every now and again we
see the nfs die on a particular head end. However it doesn't die across all
the nodes. This suggests to me there isnt an issue with the filer itself and
the stats from the filer concur with that.

The symptoms are lines like this appearing in dmesg

nfs server 10.44.17.138:/vol/vol1/mail: not responding
nfs server 10.44.17.138:/vol/vol1/mail: is alive again

trussing df it seems to hang on getfsstat, this is presumably when it tries
the nfs mounts

eg

__sysctl(0xbfbfe224,0x2,0xbfbfe22c,0xbfbfe230,0x0,0x0) = 0 (0x0)
mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
1746583552 (0x681ac000)
mmap(0x682ac000,344064,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
1747632128 (0x682ac000)
munmap(0x681ac000,344064)                        = 0 (0x0)
getfsstat(0x68201000,0x1270,0x2,0xbfbfe960,0xbfbfe95c,0x1) = 9 (0x9)


I have played with mount options a fair bit but they dont make much
difference. This is what they are set to at present

10.44.17.138:/vol/vol1/mail     /mail/0 nfs
rw,noatime,tcp,acdirmax=320,acdirmin=180,acregmax=320,acregmin=180 0       0

When this locking is occuring I find that if I do a show mount or mount
10.44.17.138:/vol/vol1/mail again under another mount point I can access it
fine.

One thing I have just noticed is that lockd and statd always seem to have
died when this happens. Restarting does not help


I find all this a bit perplexing. Can anyone offer any help into why this
might be happening. I have dtrace compliled into the kernel if that could
help with debugging

From owner-freebsd-hackers@FreeBSD.ORG  Tue Jul 27 17:17:47 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7D97C1065677;
	Tue, 27 Jul 2010 17:17:47 +0000 (UTC)
	(envelope-from alan.l.cox@gmail.com)
Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com
	[209.85.160.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 38DDB8FC16;
	Tue, 27 Jul 2010 17:17:46 +0000 (UTC)
Received: by pwj9 with SMTP id 9so662965pwj.13
	for <multiple recipients>; Tue, 27 Jul 2010 10:17:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:reply-to
	:in-reply-to:references:date:message-id:subject:from:to:cc
	:content-type; bh=m7CoIsuMyLOGGjrWguBYVDJhpHS83sS73rWb1ddug7s=;
	b=uBqLvIs9m4IAldxXS69QRiaPxKZBoDe9s8cZwr3w84kWlqTGR/rrwJf40O9YP8hDHw
	6uU/PIQY4xkbbIWZmnDFtLPjD3N51ZUyjvxEDWJA0o+d5zXMzBVXMhuaoHrkIYiJt0/x
	JLmDMAF9PUdC8/H7VTDvxHk1rK7vdqrer5RNc=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:reply-to:in-reply-to:references:date:message-id
	:subject:from:to:cc:content-type;
	b=u4m9oDheAKwSwbolgN7j8/BkPGuymmHM4m3KcCBqrbs3fcH0Wo4gkt77jQmqH/hBvU
	9Qfpw9t5mAlImVtz6Evu3IsoGv+H0zY9CLYcaBArXdkyAJFHVyIcu86eQwl1x6l8EV34
	UbF1xNJAixWVbwZmzLCHVq5LrMY5GnHKz5qaw=
MIME-Version: 1.0
Received: by 10.114.59.10 with SMTP id h10mr13520595waa.194.1280251066633; 
	Tue, 27 Jul 2010 10:17:46 -0700 (PDT)
Received: by 10.114.173.9 with HTTP; Tue, 27 Jul 2010 10:17:46 -0700 (PDT)
In-Reply-To: <4C4D9779.8080505@FreeBSD.org>
References: <4C4AF046.40507@FreeBSD.org>
	<A30A636F-E925-456E-8866-4E46B3BA367F@lavabit.com>
	<4C4B720A.6020802@FreeBSD.org>
	<alpine.BSF.2.00.1007261318380.10170@fledge.watson.org>
	<4C4D9779.8080505@FreeBSD.org>
Date: Tue, 27 Jul 2010 12:17:46 -0500
Message-ID: <AANLkTi=G6wGSnsX-61ipcQqgsHpdWLR-1N+RybbpoR7Z@mail.gmail.com>
From: Alan Cox <alan.l.cox@gmail.com>
To: Alexander Motin <mav@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org,
	Robert Watson <rwatson@freebsd.org>, Rui Paulo <rpaulo@lavabit.com>
Subject: Re: Intel TurboBoost in practice
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: alc@freebsd.org
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Jul 2010 17:17:47 -0000

On Mon, Jul 26, 2010 at 9:11 AM, Alexander Motin <mav@freebsd.org> wrote:

> Robert Watson wrote:
> > On Sun, 25 Jul 2010, Alexander Motin wrote:
> >>> The numbers that you are showing doesn't show much difference. Have
> >>> you tried buildworld?
> >>
> >> If you mean relative difference -- as I have told, it's mostly because
> >> of my CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is
> >> enabled most of time if CPU is not overheated. It probably doesn't, as
> >> it works on clear table under air conditioner. So maximal effect I can
> >> expect on is 4.2%. In such situation 2.8% probably not so bad to
> >> illustrate that feature works and there is space for further
> >> improvements. If I had Core i5-750S I would expect 33% boost.
> >
> > Can I recommend the use of ministat(1) and sample sizes of at least 8
> > runs per configuration?
>
> Thanks for pushing me to do it right. :) Here is 3*15 runs with fresh
> kernel with disabled debug. Results are quite close to original: -2.73%
> and -2.19% of time.
> x C1
> + C2
> * C3
> +-----------------------------------------------------------------+
> |+        *                                  x                    |
> |+        *                                  x                    |
> |+        *                                  x                    |
> |+        *                                  x                    |
> |+        *                                  x                    |
> |+        *                                  x                    |
> |+        *                                  x                    |
> |+       **                                  x                    |
> |+ +     **                                 xx                    |
> |+ +     ** **                              xx                   x|
> |                                         |__M_A____|             |
> |A|                                                               |
> |        |A|                                                      |
> +-----------------------------------------------------------------+
>    N        Min        Max     Median           Avg        Stddev
> x  15      12.68      12.84      12.69     12.698667   0.039254966
> +  15      12.35      12.36      12.35     12.351333  0.0035186578
> Difference at 95.0% confidence
>        -0.347333 +/- 0.0208409
>        -2.7352% +/- 0.164119%
>        (Student's t, pooled s = 0.0278687)
> *  15      12.41      12.44      12.42         12.42  0.0075592895
> Difference at 95.0% confidence
>        -0.278667 +/- 0.0211391
>        -2.19446% +/- 0.166467%
>        (Student's t, pooled s = 0.0282674)
>
> I also checked one more aspect -- TurboBoost works only when CPU runs at
> highest EIST frequency (P0 state). I've reduced dev.cpu.0.freq from 3201
> to 3067 and repeated the test:
> x C1
> + C2
> * C3
> +-----------------------------------------------------------------+
> | x                           +                              *    |
> | x                           +                              *    |
> | x                           +                              *    |
> | x                           +                              *   *|
> | x  x                        +                              *   *|
> | x  x                        +  +                           *   *|
> | x  x                        +  +                           *   *|
> | x  x                        +  +                           *   *|
> | x  x                    +   +  +   +                       *   *|
> ||MA|                                                             |
> |                           |_MA_|                                |
> |                                                            M_A_||
> +-----------------------------------------------------------------+
>    N        Min        Max     Median           Avg        Stddev
> x  15      13.72      13.73      13.72     13.723333  0.0048795004
> +  15      13.79      13.82       13.8     13.803333  0.0072374686
> Difference at 95.0% confidence
>        0.08 +/- 0.00461567
>        0.582949% +/- 0.0336337%
>        (Student's t, pooled s = 0.00617213)
> *  15      13.89       13.9      13.89        13.894  0.0050709255
> Difference at 95.0% confidence
>        0.170667 +/- 0.00372127
>        1.24362% +/- 0.0271164%
>        (Student's t, pooled s = 0.00497613)
>
> In that case using C2 or C3 predictably caused small performance reduce,
> as after falling to sleep, CPU needs time to wakeup. Even if tested CPU0
> won't ever sleep during test, it's TLB shutdown IPIs to other cores
> still probably could suffer from waiting other cores' wakeup.
>
>
In the deeper sleep states, are the TLB contents actually maintained while
the processor sleeps?  (I notice that in some configurations, we actually
flush dirty data from the cache before sleeping.)

Alan

From owner-freebsd-hackers@FreeBSD.ORG  Tue Jul 27 17:59:26 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 05BE6106575F
	for <freebsd-hackers@freebsd.org>; Tue, 27 Jul 2010 17:59:26 +0000 (UTC)
	(envelope-from ambrisko@ambrisko.com)
Received: from mail.ambrisko.com (mail.ambrisko.com [64.174.51.43])
	by mx1.freebsd.org (Postfix) with ESMTP id B87028FC38
	for <freebsd-hackers@freebsd.org>; Tue, 27 Jul 2010 17:59:24 +0000 (UTC)
X-Ambrisko-Me: Yes
Received: from server2.ambrisko.com (HELO www.ambrisko.com) ([192.168.1.2])
	by ironport.ambrisko.com with ESMTP; 27 Jul 2010 10:31:19 -0700
Received: from ambrisko.com (localhost [127.0.0.1])
	by www.ambrisko.com (8.14.3/8.14.3) with ESMTP id o6RHgCau070018;
	Tue, 27 Jul 2010 10:42:12 -0700 (PDT)
	(envelope-from ambrisko@ambrisko.com)
Received: (from ambrisko@localhost)
	by ambrisko.com (8.14.3/8.14.3/Submit) id o6RHgB6M070017;
	Tue, 27 Jul 2010 10:42:11 -0700 (PDT) (envelope-from ambrisko)
From: Doug Ambrisko <ambrisko@ambrisko.com>
Message-Id: <201007271742.o6RHgB6M070017@ambrisko.com>
In-Reply-To: <AANLkTikXM7hWW6lXSzOwKuwT58a_5dFB6Dl9HFE0NDKw@mail.gmail.com>
To: Garrett Cooper <yanefbsd@gmail.com>
Date: Tue, 27 Jul 2010 10:42:11 -0700 (PDT)
X-Mailer: ELM [version 2.4ME+ PL94b (25)]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
Cc: FreeBSD-Hackers <freebsd-hackers@FreeBSD.org>
Subject: Re: Set default pxeboot vfs.root.mountfrom to nfs?
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Jul 2010 17:59:26 -0000

Garrett Cooper writes:
| Hi Hackers,
|     I realize this is a trivial patch, but it's a minor item that I
| found kind of fascinating (and not thoroughly documented elsewhere
| because many examples are booting mfsroots instead of directly booting
| off nfs roots), but I'm proposing that pxeboot default to
| vfs.root.mountfrom="nfs" to reduce the need for special case
| loader.conf files just for pxe booting (and thus, enable
| out-of-the-box netbooting ^o^!!!).
|     Thoughts?
| 
| Index: boot/i386/libi386/pxe.c
| ===================================================================
| --- boot/i386/libi386/pxe.c	(revision 209563)
| +++ boot/i386/libi386/pxe.c	(working copy)
| @@ -308,6 +308,7 @@
|  		}
|  		setenv("boot.nfsroot.server", inet_ntoa(rootip), 1);
|  		setenv("boot.nfsroot.path", rootpath, 1);
| +		setenv("vfs.root.mountfrom", "nfs", 0);
|  		setenv("dhcp.host-name", hostname, 1);
|  	}
|      }

Interesting, are you looking at my patch from work or came up with
the same thing?  We had this patch here for years.

I haven't checked it in due to tracking done why it wasn't done in
the first place so I didn't break any assumptions.  FWIW, I have seen
no issues with patch in either NFS boots or MFS roots.

Doug A.

From owner-freebsd-hackers@FreeBSD.ORG  Tue Jul 27 18:44:55 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A1B311065670;
	Tue, 27 Jul 2010 18:44:55 +0000 (UTC)
	(envelope-from mavbsd@gmail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id CD8508FC14;
	Tue, 27 Jul 2010 18:44:54 +0000 (UTC)
Received: by fxm13 with SMTP id 13so861548fxm.13
	for <multiple recipients>; Tue, 27 Jul 2010 11:44:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:sender:message-id:date:from
	:user-agent:mime-version:to:cc:subject:references:in-reply-to
	:x-enigmail-version:content-type:content-transfer-encoding;
	bh=vwM3oOzazoVICoeBt30W6FdEX3ICVIxGQZyUCLrVI0s=;
	b=sJjGMhgZ2zl2L2B9nJke024C3vz8RdA/UNuw/0cSYa6DriYn8dta2NbRAX7HbVqBSh
	vH3B+ssWSVnJqS2879EkFJAQRcl6k4WAUWDZllHAuED4K+QItYATR/2jliLHnQpIEAZB
	FzG3/qUglHSZDf0Cvo8S8Ctd2WB43gAD+HcAQ=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:x-enigmail-version:content-type
	:content-transfer-encoding;
	b=nvvJGTqd0X9Ud9x+5Dk85V33PEoy6uH+ZKrflOrBCZIVXXyUHv4kxktIIsxpKdnMD/
	71Ef19verD6Kt3PCYIoQyoOLZ8ELA9PzkmN1BAQ1A/ZeBQ5XTqfGEkrHFrC/Gyjt5hld
	u+m7kxyZqrmimtW6nc2dcVUevpDYJiF2EA+Oc=
Received: by 10.223.119.131 with SMTP id z3mr8524223faq.61.1280256293689;
	Tue, 27 Jul 2010 11:44:53 -0700 (PDT)
Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226])
	by mx.google.com with ESMTPS id e22sm1418129faa.0.2010.07.27.11.44.52
	(version=SSLv3 cipher=RC4-MD5); Tue, 27 Jul 2010 11:44:52 -0700 (PDT)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <4C4F2921.5030604@FreeBSD.org>
Date: Tue, 27 Jul 2010 21:44:49 +0300
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.24 (X11/20100402)
MIME-Version: 1.0
To: alc@freebsd.org
References: <4C4AF046.40507@FreeBSD.org>	<A30A636F-E925-456E-8866-4E46B3BA367F@lavabit.com>	<4C4B720A.6020802@FreeBSD.org>	<alpine.BSF.2.00.1007261318380.10170@fledge.watson.org>	<4C4D9779.8080505@FreeBSD.org>
	<AANLkTi=G6wGSnsX-61ipcQqgsHpdWLR-1N+RybbpoR7Z@mail.gmail.com>
In-Reply-To: <AANLkTi=G6wGSnsX-61ipcQqgsHpdWLR-1N+RybbpoR7Z@mail.gmail.com>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org
Subject: Re: Intel TurboBoost in practice
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Jul 2010 18:44:55 -0000

Alan Cox wrote:
> On Mon, Jul 26, 2010 at 9:11 AM, Alexander Motin <mav@freebsd.org
> <mailto:mav@freebsd.org>> wrote:
> 
>     In that case using C2 or C3 predictably caused small performance reduce,
>     as after falling to sleep, CPU needs time to wakeup. Even if tested CPU0
>     won't ever sleep during test, it's TLB shutdown IPIs to other cores
>     still probably could suffer from waiting other cores' wakeup.
> 
> In the deeper sleep states, are the TLB contents actually maintained
> while the processor sleeps?  (I notice that in some configurations, we
> actually flush dirty data from the cache before sleeping.)

As I understand, we flush caches only as last resort, if platform does
not supports special techniques, such as disabling arbitration or making
CPU to wake up on bus mastering. But same ACPI C-states could map into
different CPU C-states. Some of these CPU states (like C6) could imply
caches invalidation, though I am not sure it can be seen outside.

ACPI 3.0 specification tells nothing about TLBs, so I am not sure we can
count on their invalidation, except we do it ourselves, like it is done
for caches when CPU can't keep their coherency while sleeping.

-- 
Alexander Motin

From owner-freebsd-hackers@FreeBSD.ORG  Tue Jul 27 19:55:44 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1EFEC1065679;
	Tue, 27 Jul 2010 19:55:44 +0000 (UTC)
	(envelope-from kraduk@googlemail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 7B6CC8FC0C;
	Tue, 27 Jul 2010 19:55:43 +0000 (UTC)
Received: by fxm13 with SMTP id 13so902905fxm.13
	for <multiple recipients>; Tue, 27 Jul 2010 12:55:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=googlemail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:content-type;
	bh=rWOz8nNTQrOgYW0+uBnP1vVr1kQ+FGW9soiIlD9YH1Y=;
	b=jWpt2IsMr8ji1teqnvJMuAVOqaBF7/d+bJD7f2HzYLpofzScxxytBTAvRG2JZSzLrZ
	jUZk4im/Y8knMz0DQ4l4rQYVMHgw+p4fSf2etlXhF0y5Xii8g3cupBcOUz5SIk5xfvAq
	D7yph8fjDKhHXPHfJ5oC426HAks6OtqcS7Qlw=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type;
	b=o2GC10U3LVUpaZERgwNIdmsH+Af35E+SBF3pqnLamLfTK0Qm/mJAiZbG+PCzsu9YKX
	XPGs+S3i2us5fG3GIGrm5KzaSBHrjCM+TTptYU63MpRLdY8DyDxV1FerGAs8J04pbF8m
	pVhQB3TjaBhzapkqgWA5QvULQWHgPKwaDI91I=
MIME-Version: 1.0
Received: by 10.239.154.204 with SMTP id f12mr585988hbc.143.1280260542150; 
	Tue, 27 Jul 2010 12:55:42 -0700 (PDT)
Received: by 10.239.160.201 with HTTP; Tue, 27 Jul 2010 12:55:42 -0700 (PDT)
In-Reply-To: <AANLkTinUVKByfTX+f9DOQ97jh43VPVSug_=BDpJ9PB0z@mail.gmail.com>
References: <AANLkTinUVKByfTX+f9DOQ97jh43VPVSug_=BDpJ9PB0z@mail.gmail.com>
Date: Tue, 27 Jul 2010 20:55:42 +0100
Message-ID: <AANLkTi=3LKv4DkaX_yHo5WfXK33YGYSAaOvqh5mjSVTV@mail.gmail.com>
From: krad <kraduk@googlemail.com>
To: freebsd-hackers@freebsd.org, 
	FreeBSD Questions <freebsd-questions@freebsd.org>
X-Mailman-Approved-At: Tue, 27 Jul 2010 21:01:04 +0000
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: 
Subject: Re: possible NFS lockups
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Jul 2010 19:55:44 -0000

On 27 July 2010 16:29, krad <kraduk@googlemail.com> wrote:

> I have a production mail system with an nfs backend. Every now and again we
> see the nfs die on a particular head end. However it doesn't die across all
> the nodes. This suggests to me there isnt an issue with the filer itself and
> the stats from the filer concur with that.
>
> The symptoms are lines like this appearing in dmesg
>
> nfs server 10.44.17.138:/vol/vol1/mail: not responding
> nfs server 10.44.17.138:/vol/vol1/mail: is alive again
>
> trussing df it seems to hang on getfsstat, this is presumably when it tries
> the nfs mounts
>
> eg
>
> __sysctl(0xbfbfe224,0x2,0xbfbfe22c,0xbfbfe230,0x0,0x0) = 0 (0x0)
> mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
> 1746583552 (0x681ac000)
> mmap(0x682ac000,344064,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
> 1747632128 (0x682ac000)
> munmap(0x681ac000,344064)                        = 0 (0x0)
> getfsstat(0x68201000,0x1270,0x2,0xbfbfe960,0xbfbfe95c,0x1) = 9 (0x9)
>
>
> I have played with mount options a fair bit but they dont make much
> difference. This is what they are set to at present
>
> 10.44.17.138:/vol/vol1/mail     /mail/0 nfs
> rw,noatime,tcp,acdirmax=320,acdirmin=180,acregmax=320,acregmin=180 0       0
>
> When this locking is occuring I find that if I do a show mount or mount
> 10.44.17.138:/vol/vol1/mail again under another mount point I can access
> it fine.
>
> One thing I have just noticed is that lockd and statd always seem to have
> died when this happens. Restarting does not help
>
>
> I find all this a bit perplexing. Can anyone offer any help into why this
> might be happening. I have dtrace compliled into the kernel if that could
> help with debugging
>

sorry i missed a bit of critical info

# uname -a
FreeBSD X 8.1-STABLE FreeBSD 8.1-STABLE #2: Mon Jul 26 16:10:19 BST 2010
root@mk-pimap-7.b2b.uk.tiscali.com:/usr/obj/usr/src/sys/DTRACE  i

From owner-freebsd-hackers@FreeBSD.ORG  Wed Jul 28 07:18:44 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8D7031065672
	for <freebsd-hackers@freebsd.org>; Wed, 28 Jul 2010 07:18:44 +0000 (UTC)
	(envelope-from asimex@gmx.net)
Received: from mail.gmx.net (mailout-de.gmx.net [213.165.64.22])
	by mx1.freebsd.org (Postfix) with SMTP id EB3478FC14
	for <freebsd-hackers@freebsd.org>; Wed, 28 Jul 2010 07:18:43 +0000 (UTC)
Received: (qmail 2007 invoked by uid 0); 28 Jul 2010 06:52:02 -0000
Received: from 212.118.142.74 by www022.gmx.net with HTTP;
	Wed, 28 Jul 2010 08:52:02 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Date: Wed, 28 Jul 2010 08:52:01 +0200
From: "Andreas Feid" <asimex@gmx.net>
In-Reply-To: <AANLkTi=3LKv4DkaX_yHo5WfXK33YGYSAaOvqh5mjSVTV@mail.gmail.com>
Message-ID: <20100728065201.234030@gmx.net>
MIME-Version: 1.0
References: <AANLkTinUVKByfTX+f9DOQ97jh43VPVSug_=BDpJ9PB0z@mail.gmail.com>
	<AANLkTi=3LKv4DkaX_yHo5WfXK33YGYSAaOvqh5mjSVTV@mail.gmail.com>
To: krad <kraduk@googlemail.com>, freebsd-questions@freebsd.org,
	freebsd-hackers@freebsd.org
X-Authenticated: #138425
X-Flags: 0001
X-Mailer: WWW-Mail 6100 (Global Message Exchange)
X-Priority: 3
X-Provags-ID: V01U2FsdGVkX18dDfFVGfYdia+o8sWzdtp/V94DshEznloD4fKIZ3
	fUPI8aJEZZlEwjwBUvkc9dkDi35aL7ebKYmg== 
Content-Transfer-Encoding: 8bit
X-GMX-UID: put6eCARRkkNbs1mcWRqSIdudWkvKNM6
X-FuHaFi: 0.51000000000000001
Cc: 
Subject: Re: possible NFS lockups
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Jul 2010 07:18:44 -0000

I have a few remarks and questions; what happens when the system is in this state? Your access to the mount fails but is restored after a while, or do you need to remount, under normal conditions the access should be restored automaticlly. The error message per se is indicating a busy server and should clear up after a while, as you have seen.
How frequent do you see the error, once per hour, day? If you say filer, I assume you are talking about a Netapp filer, it might be worth taking a perfstat when the error happens, and when the condition exists. I think dtrace will not really help since this seems a server issue to me.
As the filer is used to store mails, I assume we are talking about qmail  or similiar environment with a huge number of small files, I would like to know how the directory structure looks on the filer. 
If possible get a perfstat and provide the directory structure offline to me and I will have a look.

-Andreas 
-------- Original-Nachricht --------
> Datum: Tue, 27 Jul 2010 20:55:42 +0100
> Von: krad <kraduk@googlemail.com>
> An: freebsd-hackers@freebsd.org, FreeBSD Questions <freebsd-questions@freebsd.org>
> Betreff: Re: possible NFS lockups

> On 27 July 2010 16:29, krad <kraduk@googlemail.com> wrote:
> 
> > I have a production mail system with an nfs backend. Every now and again
> we
> > see the nfs die on a particular head end. However it doesn't die across
> all
> > the nodes. This suggests to me there isnt an issue with the filer itself
> and
> > the stats from the filer concur with that.
> >
> > The symptoms are lines like this appearing in dmesg
> >
> > nfs server 10.44.17.138:/vol/vol1/mail: not responding
> > nfs server 10.44.17.138:/vol/vol1/mail: is alive again
> >
> > trussing df it seems to hang on getfsstat, this is presumably when it
> tries
> > the nfs mounts
> >
> > eg
> >
> > __sysctl(0xbfbfe224,0x2,0xbfbfe22c,0xbfbfe230,0x0,0x0) = 0 (0x0)
> > mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) =
> > 1746583552 (0x681ac000)
> > mmap(0x682ac000,344064,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0)
> =
> > 1747632128 (0x682ac000)
> > munmap(0x681ac000,344064)                        = 0 (0x0)
> > getfsstat(0x68201000,0x1270,0x2,0xbfbfe960,0xbfbfe95c,0x1) = 9 (0x9)
> >
> >
> > I have played with mount options a fair bit but they dont make much
> > difference. This is what they are set to at present
> >
> > 10.44.17.138:/vol/vol1/mail     /mail/0 nfs
> > rw,noatime,tcp,acdirmax=320,acdirmin=180,acregmax=320,acregmin=180 0    
>   0
> >
> > When this locking is occuring I find that if I do a show mount or mount
> > 10.44.17.138:/vol/vol1/mail again under another mount point I can access
> > it fine.
> >
> > One thing I have just noticed is that lockd and statd always seem to
> have
> > died when this happens. Restarting does not help
> >
> >
> > I find all this a bit perplexing. Can anyone offer any help into why
> this
> > might be happening. I have dtrace compliled into the kernel if that
> could
> > help with debugging
> >
> 
> sorry i missed a bit of critical info
> 
> # uname -a
> FreeBSD X 8.1-STABLE FreeBSD 8.1-STABLE #2: Mon Jul 26 16:10:19 BST 2010
> root@mk-pimap-7.b2b.uk.tiscali.com:/usr/obj/usr/src/sys/DTRACE  i
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 06:01:11 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A34C6106564A
	for <hackers@freebsd.org>; Thu, 29 Jul 2010 06:01:11 +0000 (UTC)
	(envelope-from yanegomi@gmail.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 700E38FC08
	for <hackers@freebsd.org>; Thu, 29 Jul 2010 06:01:11 +0000 (UTC)
Received: by iwn35 with SMTP id 35so300960iwn.13
	for <hackers@freebsd.org>; Wed, 28 Jul 2010 23:01:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:date:message-id
	:subject:from:to:content-type;
	bh=ylIhYcZE9IGSSLxwskrRAXRMkjbc9szOSqHUOoj3veE=;
	b=haUdb5OR5IQY/16R9uFEF6Nhxq8Qw1CulRZJYPA0n/Va2dgZHtS/EuKYSbLrEfIDZl
	X4nXClzqsI/edHxdrIaFVhZcx24JuXW+h12j35UE1s5zdsmsmFealQXUjOJvfXmDnO6n
	aVvCoL6XngSmVwIgmWTzChZBeMGETarKWjmyk=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:content-type;
	b=PSKZGLdLIs6OOs0C/PHXPRRhRs7FUmOaLPSBYLeF1S0Nku8HT7GUgX9FywtpO9Ezuf
	aUD3dqt0kJz3GuM4Zr/hqtTi+gydkhDOStDX2bKY+e38MTmHD3+1VbzcC6OAcUKqaHmo
	qhtPcAoanQEc6s8GVVIy0pK+MKphBGPy49Xww=
MIME-Version: 1.0
Received: by 10.231.59.13 with SMTP id j13mr13391175ibh.77.1280383270838; Wed, 
	28 Jul 2010 23:01:10 -0700 (PDT)
Received: by 10.231.169.18 with HTTP; Wed, 28 Jul 2010 23:01:10 -0700 (PDT)
Date: Wed, 28 Jul 2010 23:01:10 -0700
Message-ID: <AANLkTinkwvK2tHZ0okZE48bvW8-N4WBOm284fVK2Xi3F@mail.gmail.com>
From: Garrett Cooper <yanegomi@gmail.com>
To: hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Cc: 
Subject: nanosleep - does it make sense with tv_sec < 0?
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 06:01:11 -0000

Hi Hackers,
    I ran into an oddity with the POSIX spec that seems a bit unrealistic:

[EINVAL]
    The rqtp argument specified a nanosecond value less than zero or
greater than or equal to 1000 million.

    Seems like it should also apply for seconds < 0. We current
silently pass this argument in kern/kern_time.c:kern_nanosleep:

int
kern_nanosleep(struct thread *td, struct timespec *rqt, struct timespec *rmt)
{
        struct timespec ts, ts2, ts3;
        struct timeval tv;
        int error;

        if (rqt->tv_nsec < 0 || rqt->tv_nsec >= 1000000000)
                return (EINVAL);
        if (rqt->tv_sec < 0 || (rqt->tv_sec == 0 && rqt->tv_nsec ==
0)) // <-- first clause here
                return (0);

    but I'm wondering whether or not it makes logical sense for us to
do this (sleep for a negative amount of time?)...
    FWIW Linux returns -1 and sets EINVAL in this case, which makes
more sense to me.
Thanks,
-Garrett

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 06:26:54 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2CADC1065672;
	Thu, 29 Jul 2010 06:26:54 +0000 (UTC)
	(envelope-from yanegomi@gmail.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id CF8308FC14;
	Thu, 29 Jul 2010 06:26:53 +0000 (UTC)
Received: by iwn35 with SMTP id 35so326475iwn.13
	for <multiple recipients>; Wed, 28 Jul 2010 23:26:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:date:message-id
	:subject:from:to:cc:content-type;
	bh=PEU5dOx2jKlgYHOSv1Mr0Zmf6a772ZRXY75on/0Ca/k=;
	b=XrIsrEoBzk5X2AmErw87EpDuTe40lxcEb0XpFl5eotJ/SQdpPD2LmnvZS/S3SvxUhu
	quyRr7QbR0zVx16WySH0Vkadnxj6mGNrKKYuwW1AA07lHtxYwEBVAvJZAbfI8jqDdbNT
	Kweja7rSC2jFwnP1IDBg6UebD1NDq+NQE2xuw=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:cc:content-type;
	b=P1Ymru1q0qwHX0rTZbN3hNpqrf3q52txZwW/jxHJ+cKhSx4aTJGN+W4bcChlkYNHWj
	nuPZyCsfa0UAXIHjxlapurBH1DwnelSLobRMkJF6dzdhthJUB3LHo575gT07CBv6q+dp
	XaT+o9UHrkm4N5gpn3wlgO9i/Z3C5IzQ3BFd8=
MIME-Version: 1.0
Received: by 10.231.184.68 with SMTP id cj4mr13562211ibb.93.1280384812889; 
	Wed, 28 Jul 2010 23:26:52 -0700 (PDT)
Received: by 10.231.169.18 with HTTP; Wed, 28 Jul 2010 23:26:52 -0700 (PDT)
Date: Wed, 28 Jul 2010 23:26:52 -0700
Message-ID: <AANLkTimy666B=r7_T_kqz-CPck43yMbJtw=1OeG3r-N2@mail.gmail.com>
From: Garrett Cooper <yanegomi@gmail.com>
To: hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Cc: standards@freebsd.org
Subject: Deterministic failure to meet sysconf(_SC_TIMER_MAX) for 
	CLOCK_REALTIME
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 06:26:54 -0000

Hi,
    Running the following noted test [1], I always run into issues on
the 29th iteration and EAGAIN:

$ conformance/behavior/timers/1-1.run-test
timer_create() did not return success for iteration 29: Resource
temporarily unavailable
$ conformance/behavior/timers/1-1.run-test
timer_create() did not return success for iteration 29: Resource
temporarily unavailable
$ conformance/behavior/timers/1-1.run-test
timer_create() did not return success for iteration 29: Resource
temporarily unavailable
$ conformance/behavior/timers/1-1.run-test
timer_create() did not return success for iteration 29: Resource
temporarily unavailable

    Interestingly enough, sysconf(_SC_TIMER_MAX) returns 54; this is
the requirement that the test is attempting to validate (that at least
_SC_TIMER_MAX timers can be created via timer_create).
    The timers kernel code is capped to 25 by default, by a
preprocessor define in .../sys/sysctl.h:

/sys/sys/sysctl.h:#define CTL_P1003_1B_TIMER_MAX			25	/* int */

    Doesn't make sense why an additional 4 timers were created.
    Oh, and the sysctl reports something else entirely:

p1003_1b.timers: 200112
p1003_1b.delaytimer_max: 2147483647
p1003_1b.timer_max: 32

    So, what number is the source of truth and why don't they all match?
Thanks!
-Garrett

PS I'm still running a CURRENT kernel based off of r206173...

[1] http://ltp.git.sourceforge.net/git/gitweb.cgi?p=ltp/ltp-dev.git;a=blob;f=testcases/open_posix_testsuite/conformance/behavior/timers/1-1.c;h=ac043b0913e93f8db93cc74e249316f5ff82bdc8;hb=HEAD

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 09:57:21 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BFE021065670
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 09:57:21 +0000 (UTC)
	(envelope-from rhfb@akira.stdio.com)
Received: from akira.stdio.com (akira.stdio.com [204.152.114.29])
	by mx1.freebsd.org (Postfix) with SMTP id 8C5848FC18
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 09:57:11 +0000 (UTC)
Received: from akira (localhost [127.0.0.1])
	by akira.stdio.com (Postfix) with SMTP id AD3F3C2
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 05:39:58 -0400 (EDT)
Date: Thu Jul 29 05:41:06 EDT 2010
In-Reply-To: <AANLkTi=3LKv4DkaX_yHo5WfXK33YGYSAaOvqh5mjSVTV@mail.gmail.com>
From: <rhfb@akira.stdio.com> 
To: <freebsd-hackers@freebsd.org>
References: <AANLkTinUVKByfTX+f9DOQ97jh43VPVSug_=BDpJ9PB0z@mail.gmail.com>
	<AANLkTi=3LKv4DkaX_yHo5WfXK33YGYSAaOvqh5mjSVTV@mail.gmail.com>
Message-Id: <20100729094046.AD3F3C2@akira.stdio.com>
Subject: (no subject)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 09:57:21 -0000

I have a similar problem.

I have a NFS server (8.0 upgraded a couple times since Feb 2010) that locks up
and requires a reboot.

The clients are busy vm's from VMWare ESXi using the NFS server for vmdk virtual
disk storage.

The ESXi reports nfs server inactive and all the vm's post disk write errors when
trying to write to their disk.

/etc/rc.d/nfsd restart fails to work (it can not kill the nfsd process)

The nfsd process runs at 100% cpu at rc_lo state in top.

reboot is the only fix.

It has only happened under two circumstances.
1) Installation of a VM using Windows 2008.
2) Migrating 16 million mail messages from a physical server to a VM running FreeBSD with ZFS file system as a VM on the ESXi box that uses NFS to store the VM's ZFS disk.

The NFS server uses ZFS also.

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 14:43:30 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ABC0A106566B
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 14:43:29 +0000 (UTC)
	(envelope-from pebu3op@googlemail.com)
Received: from mail.net.t-labs.tu-berlin.de (mail.net.t-labs.tu-berlin.de
	[130.149.220.252])
	by mx1.freebsd.org (Postfix) with ESMTP id 253028FC17
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 14:43:28 +0000 (UTC)
Received: from raven.net.t-labs.tu-berlin.de (raven.net.t-labs.tu-berlin.de
	[130.149.220.18])
	by mail.net.t-labs.tu-berlin.de (Postfix) with ESMTP id B680570015BA
	for <freebsd-hackers@freebsd.org>;
	Thu, 29 Jul 2010 16:13:16 +0200 (CEST)
From: Alexander Fiveg <pebu3op@googlemail.com>
Organization: Google
To: freebsd-hackers@freebsd.org
Date: Thu, 29 Jul 2010 16:13:15 +0200
User-Agent: KMail/1.9.10
MIME-Version: 1.0
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <201007291613.15719.pebu3op@googlemail.com>
Subject: coherence-problem on the mapped memory buffer
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: pebu3op@googlemail.com
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 14:43:30 -0000

Hello hackers, 

while working on the "ringmap"-project I've faced a problem of
"no coherency in the memory regions mapped from kernel into the 
user-space".

Details: 
While integrating ringmap with the ixgbe-driver, I've made some 
changes to the ixgbe:

1. The mbufs for received packets will be only allocated once.

2. Allocated mbufs will be reused as in ring-buffer one after the other (no
new mbufs will be allocated again).  

3. Packet buffers (mbuf->m_data) will mapped into the user-space. So, the
user-space process has access to the packets after those DMA-transfer from the 
network adapter into the RAM

Problem: 
Sometimes the user-space process sees not new DMAed data in the mapped
packet-buffer, but the OLD data that was previously stored in the same packet
buffer. If I try to monitor the received data in the kernel, the kernel sees
the data correctly. But sometimes it is vice versa: the user-space process 
sees the correct new data and the kernel sees the old data in the buffer.

It seems to be that the memory-buffer for packets is not synchronized with all  
CPU's caches. Probably [user|kernel]-thread tries sometimes to reads the old 
dirty data from the cache of the CPU the thread running on. (In the same time 
the other thread sees the new data in the same mapped buffer).
 
Can you please provide me with some information that would be helpful for 
avoiding this unexpected coherence-problem. 

Alex

P.S. Details about hardware and used software:
1. /var/run/dmesg.boot :
...
CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x20f10  Family = f  Model = 21  Stepping = 0
  
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x1<SSE3>
  AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
  AMD Features2=0x3<LAHF,CMP>
real memory  = 3758030848 (3583 MB)
avail memory = 3677495296 (3507 MB)
ACPI APIC Table: <A M I  OEMAPIC >
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 4 package(s) x 2 core(s)
...

2. uname -v
FreeBSD 9.0-CURRENT #3

3. sysctl kern.osreldate
kern.osreldate: 900014

4. //depot/projects/soc2010/ringmap/

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 16:13:27 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5B71B1065675
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 16:13:27 +0000 (UTC)
	(envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id A95B58FC1D
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 16:13:26 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA26736;
	Thu, 29 Jul 2010 19:13:24 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4C51A8A3.7080808@icyb.net.ua>
Date: Thu, 29 Jul 2010 19:13:23 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Thunderbird 2.0.0.24 (X11/20100517)
MIME-Version: 1.0
To: pebu3op@googlemail.com
References: <201007291613.15719.pebu3op@googlemail.com>
In-Reply-To: <201007291613.15719.pebu3op@googlemail.com>
X-Enigmail-Version: 0.95.7
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org
Subject: Re: coherence-problem on the mapped memory buffer
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 16:13:27 -0000

on 29/07/2010 17:13 Alexander Fiveg said the following:
> P.S. Details about hardware and used software:
> 1. /var/run/dmesg.boot :
> ...
> CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU)
>   Origin = "AuthenticAMD"  Id = 0x20f10  Family = f  Model = 21  Stepping = 0
>   
> Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>   Features2=0x1<SSE3>
>   AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
>   AMD Features2=0x3<LAHF,CMP>
> real memory  = 3758030848 (3583 MB)
> avail memory = 3677495296 (3507 MB)
> ACPI APIC Table: <A M I  OEMAPIC >
> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
> FreeBSD/SMP: 4 package(s) x 2 core(s)
> ...
> 
> 2. uname -v
> FreeBSD 9.0-CURRENT #3
> 
> 3. sysctl kern.osreldate
> kern.osreldate: 900014
> 
> 4. //depot/projects/soc2010/ringmap/

No help, but just curious - do use amd64 variant?
If yes, can you reproduce the problem with i386?

-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 16:45:48 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DDC101065746
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 16:45:48 +0000 (UTC)
	(envelope-from pebu3op@googlemail.com)
Received: from mail.net.t-labs.tu-berlin.de (mail.net.t-labs.tu-berlin.de
	[130.149.220.252])
	by mx1.freebsd.org (Postfix) with ESMTP id A08E08FC08
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 16:45:48 +0000 (UTC)
Received: from raven.net.t-labs.tu-berlin.de (raven.net.t-labs.tu-berlin.de
	[130.149.220.18])
	by mail.net.t-labs.tu-berlin.de (Postfix) with ESMTP id 3A92870015BA;
	Thu, 29 Jul 2010 18:45:47 +0200 (CEST)
From: Alexander Fiveg <pebu3op@googlemail.com>
Organization: Google
To: Andriy Gapon <avg@icyb.net.ua>
Date: Thu, 29 Jul 2010 18:45:45 +0200
User-Agent: KMail/1.9.10
References: <201007291613.15719.pebu3op@googlemail.com>
	<4C51A8A3.7080808@icyb.net.ua>
In-Reply-To: <4C51A8A3.7080808@icyb.net.ua>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <201007291845.46015.pebu3op@googlemail.com>
Cc: freebsd-hackers@freebsd.org
Subject: Re: coherence-problem on the mapped memory buffer
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: pebu3op@googlemail.com
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 16:45:49 -0000

On Thursday 29 July 2010 18:13:23 Andriy Gapon wrote:
> on 29/07/2010 17:13 Alexander Fiveg said the following:
> > P.S. Details about hardware and used software:
> > 1. /var/run/dmesg.boot :
> > ...
> > CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU)
> >   Origin = "AuthenticAMD"  Id = 0x20f10  Family = f  Model = 21  Stepping
> > = 0
> >
> > Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,
> >MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x1<SSE3>
> >   AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
> >   AMD Features2=0x3<LAHF,CMP>
> > real memory  = 3758030848 (3583 MB)
> > avail memory = 3677495296 (3507 MB)
> > ACPI APIC Table: <A M I  OEMAPIC >
> > FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
> > FreeBSD/SMP: 4 package(s) x 2 core(s)
> > ...
> >
> > 2. uname -v
> > FreeBSD 9.0-CURRENT #3
> >
> > 3. sysctl kern.osreldate
> > kern.osreldate: 900014
> >
> > 4. //depot/projects/soc2010/ringmap/
>
> No help, but just curious - do use amd64 variant?
> If yes, can you reproduce the problem with i386?

No, my kernel is i386, but I will try test it with amd64.

Thanks 
Alex

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 16:57:36 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CD0521065675
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 16:57:36 +0000 (UTC)
	(envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 26D548FC1B
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 16:57:35 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA27313;
	Thu, 29 Jul 2010 19:57:33 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4C51B2FD.6070702@icyb.net.ua>
Date: Thu, 29 Jul 2010 19:57:33 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Thunderbird 2.0.0.24 (X11/20100517)
MIME-Version: 1.0
To: pebu3op@googlemail.com
References: <201007291613.15719.pebu3op@googlemail.com>
	<4C51A8A3.7080808@icyb.net.ua>
In-Reply-To: <4C51A8A3.7080808@icyb.net.ua>
X-Enigmail-Version: 0.95.7
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cc: freebsd-hackers@freebsd.org
Subject: Re: coherence-problem on the mapped memory buffer
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 16:57:36 -0000

on 29/07/2010 19:13 Andriy Gapon said the following:
> on 29/07/2010 17:13 Alexander Fiveg said the following:
>> P.S. Details about hardware and used software:
>> 1. /var/run/dmesg.boot :
>> ...
>> CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU)
>>   Origin = "AuthenticAMD"  Id = 0x20f10  Family = f  Model = 21  Stepping = 0
>>   
>> Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>>   Features2=0x1<SSE3>
>>   AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
>>   AMD Features2=0x3<LAHF,CMP>
>> real memory  = 3758030848 (3583 MB)
>> avail memory = 3677495296 (3507 MB)
>> ACPI APIC Table: <A M I  OEMAPIC >
>> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
>> FreeBSD/SMP: 4 package(s) x 2 core(s)
>> ...
>>
>> 2. uname -v
>> FreeBSD 9.0-CURRENT #3
>>
>> 3. sysctl kern.osreldate
>> kern.osreldate: 900014
>>
>> 4. //depot/projects/soc2010/ringmap/

In fact I have a suspicion that the problem might have to do with multiple
mappings of the shared pages, but far from sure...
Take a look at Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 Programming the PAT;
starting at the following words:
«The PAT allows any memory type to be specified in the page tables, and therefore
it is possible to have a single physical page mapped to two or more different
linear addresses, each with different memory types. Intel does not support this
practice...»


-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 17:02:41 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8CCE01065680
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 17:02:41 +0000 (UTC)
	(envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id D9A4C8FC14
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 17:02:40 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA27387;
	Thu, 29 Jul 2010 20:02:38 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Message-ID: <4C51B42D.1060402@icyb.net.ua>
Date: Thu, 29 Jul 2010 20:02:37 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Thunderbird 2.0.0.24 (X11/20100517)
MIME-Version: 1.0
To: pebu3op@googlemail.com
References: <201007291613.15719.pebu3op@googlemail.com>
	<4C51A8A3.7080808@icyb.net.ua>
	<201007291845.46015.pebu3op@googlemail.com>
In-Reply-To: <201007291845.46015.pebu3op@googlemail.com>
X-Enigmail-Version: 0.95.7
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org
Subject: Re: coherence-problem on the mapped memory buffer
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 17:02:41 -0000

on 29/07/2010 19:45 Alexander Fiveg said the following:
> On Thursday 29 July 2010 18:13:23 Andriy Gapon wrote:
>> on 29/07/2010 17:13 Alexander Fiveg said the following:
>>> P.S. Details about hardware and used software:
>>> 1. /var/run/dmesg.boot :
>>> ...
>>> CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU)
>>>   Origin = "AuthenticAMD"  Id = 0x20f10  Family = f  Model = 21  Stepping
>>> = 0
>>>
>>> Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,
>>> MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x1<SSE3>
>>>   AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
>>>   AMD Features2=0x3<LAHF,CMP>
>>> real memory  = 3758030848 (3583 MB)
>>> avail memory = 3677495296 (3507 MB)
>>> ACPI APIC Table: <A M I  OEMAPIC >
>>> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
>>> FreeBSD/SMP: 4 package(s) x 2 core(s)
>>> ...
>>>
>>> 2. uname -v
>>> FreeBSD 9.0-CURRENT #3
>>>
>>> 3. sysctl kern.osreldate
>>> kern.osreldate: 900014
>>>
>>> 4. //depot/projects/soc2010/ringmap/
>> No help, but just curious - do use amd64 variant?
>> If yes, can you reproduce the problem with i386?
> 
> No, my kernel is i386, but I will try test it with amd64.

Oh, nevermind actually.

-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 20:03:03 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 64BA71065789
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 20:03:03 +0000 (UTC)
	(envelope-from babkin@verizon.net)
Received: from vms173019pub.verizon.net (vms173019pub.verizon.net
	[206.46.173.19])
	by mx1.freebsd.org (Postfix) with ESMTP id 3B4DD8FC08
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 20:03:03 +0000 (UTC)
Received: from vms170009.mailsrvcs.net ([unknown] [172.18.12.132])
	by vms173019.mailsrvcs.net
	(Sun Java(tm) System Messaging Server 7u2-7.02 32bit (built Apr 16
	2009)) with ESMTPA id <0L6C00J6E50W3RW1@vms173019.mailsrvcs.net> for
	freebsd-hackers@freebsd.org; Thu, 29 Jul 2010 15:02:58 -0500 (CDT)
Received: from 130.214.17.1 ([130.214.17.1]) by vms170009.mailsrvcs.net
	(Verizon Webmail) with HTTP; Thu, 29 Jul 2010 15:02:56 -0500 (CDT)
Date: Thu, 29 Jul 2010 15:02:56 -0500 (CDT)
From: Sergey Babkin <babkin@verizon.net>
To: avg@icyb.net.ua
Message-id: <382607918.1356296.1280433776963.JavaMail.root@vms170009.mailsrvcs.net>
MIME-version: 1.0
Content-type: text/plain; charset=UTF-8
Content-transfer-encoding: quoted-printable
X-Originating-IP: [130.214.17.1]
X-Mailman-Approved-At: Thu, 29 Jul 2010 20:13:59 +0000
Cc: freebsd-hackers@freebsd.org, pebu3op@googlemail.com
Subject: Re: Re: coherence-problem on the mapped memory buffer
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 20:03:03 -0000

Jul 29, 2010 12:58:07 PM, avg@icyb.net.ua wrote:

>on 29/07/2010 19:13 Andriy Gapon said the following:
>> on 29/07/2010 17:13 Alexander Fiveg said the following:
>In fact I have a suspicion that the problem might have to do with multiple
>mappings of the shared pages, but far from sure...
>Take a look at Intel=C2=AE 64 and IA-32 Architectures Software Developer=
=E2=80=99s Manual
>Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 Programming =
the PAT;
>starting at the following words:
>=C2=ABThe PAT allows any memory type to be specified in the page tables, a=
nd therefore
>it is possible to have a single physical page mapped to two or more differ=
ent
>linear addresses, each with different memory types. Intel does not support=
 this
>practice...=C2=BB

My guess would be that the memory type is not marked as DMA-capable. AFAIK =
the Intel CPUs
do the hardware snooping on the physical addresses, so they have no coheren=
cy issues benween=20
themselves. However if a DMA writer changes the memory, this I think does n=
ot get normally=20
propagated to the front-side bus, and the CPUs would not see it. You may ne=
ed to either
explicitly flush the CPU cache before accessing these pages or mark them as=
 non-cacheable.

-SB

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 20:16:42 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EC9491065672
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 20:16:42 +0000 (UTC)
	(envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 4254A8FC08
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 20:16:41 +0000 (UTC)
Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA29684;
	Thu, 29 Jul 2010 23:16:32 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Received: from localhost.topspin.kiev.ua ([127.0.0.1])
	by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1OeZWq-0002Kr-4u; Thu, 29 Jul 2010 23:16:32 +0300
Message-ID: <4C51E198.8060800@icyb.net.ua>
Date: Thu, 29 Jul 2010 23:16:24 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Thunderbird 2.0.0.24 (X11/20100603)
MIME-Version: 1.0
To: Sergey Babkin <babkin@verizon.net>
References: <382607918.1356296.1280433776963.JavaMail.root@vms170009.mailsrvcs.net>
In-Reply-To: <382607918.1356296.1280433776963.JavaMail.root@vms170009.mailsrvcs.net>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cc: freebsd-hackers@freebsd.org, pebu3op@googlemail.com
Subject: Re: coherence-problem on the mapped memory buffer
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 20:16:43 -0000

on 29/07/2010 23:02 Sergey Babkin said the following:
> Jul 29, 2010 12:58:07 PM, avg@icyb.net.ua wrote:
> 
>> on 29/07/2010 19:13 Andriy Gapon said the following:
>>> on 29/07/2010 17:13 Alexander Fiveg said the following:
>> In fact I have a suspicion that the problem might have to do with multiple
>> mappings of the shared pages, but far from sure...
>> Take a look at Intel® 64 and IA-32 Architectures Software Developer’s Manual
>> Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 Programming the PAT;
>> starting at the following words:
>> «The PAT allows any memory type to be specified in the page tables, and therefore
>> it is possible to have a single physical page mapped to two or more different
>> linear addresses, each with different memory types. Intel does not support this
>> practice...»
> 
> My guess would be that the memory type is not marked as DMA-capable. AFAIK the Intel CPUs
> do the hardware snooping on the physical addresses, so they have no coherency issues benween 
> themselves. However if a DMA writer changes the memory, this I think does not get normally 
> propagated to the front-side bus, and the CPUs would not see it. You may need to either
> explicitly flush the CPU cache before accessing these pages or mark them as non-cacheable.

My guess was approximately the same - if one mapping is done in kernel for DMA
purposes, then the memory type is, most likely, set to uncached.  But the
userland mapping of the same pages most likely marks the same pages (via
different virtual addresses) as cached.  Depending on the hardware and on what
mappings were used on a particular CPU (core) to access that memory, there could
be differences in interaction with DMA.

-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 20:29:31 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E9CB11065677
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 20:29:31 +0000 (UTC)
	(envelope-from ligregni@unixmexico.org)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id BC7D08FC12
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 20:29:31 +0000 (UTC)
Received: by iwn35 with SMTP id 35so733647iwn.13
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 13:29:31 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.231.146.135 with SMTP id h7mr401546ibv.149.1280433620838; Thu, 
	29 Jul 2010 13:00:20 -0700 (PDT)
Received: by 10.231.192.65 with HTTP; Thu, 29 Jul 2010 13:00:20 -0700 (PDT)
Date: Thu, 29 Jul 2010 15:00:20 -0500
Message-ID: <AANLkTi=ntPn67hcR8Sa9bT2cu64u-Gr5LMZMbKjy9EFH@mail.gmail.com>
From: Sergio Ligregni <ligregni@unixmexico.org>
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Improvement for Distributed Audit Project
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 20:29:32 -0000

I am Sergio Ligregni, from Mexico, I am currently working in the Distribute=
d
Audit Project at GSoC 2010, I want to ask your help in these things:

HELP NEEDED:

/*++++++++++++++++++++++*/

- which code should I base my development in getting parameters from a file=
?
(I've searched some audit.c, auditd_fbsd.c, auditd.c but not got the
function to do that, maybe I missed something), currently I have files like=
:
/var/audit
/var2/audit
1000
yes
53686

and got the parameters with sscanf, but the right way (the one I want to
know wich code to take as baseline):

dir:/var/audit /var2/audit
time: 1000
slave_dir: yes
port: 53686

and not to use sscanf (the avoiding of that function is a security concern
made by my mentor). I think I can do an algorithm to implement that, but
maybe there is a better/safer way to do in order to keeping the standard.

/*++++++++++++++++++++++*/
Currently I have this function to verify if a file is a trail, having it's
name, this is very poor and it needs to be improved, any ideas?

 /*
* When exploring /var/audit/ (or the directory where the trails are), not
* all files are trails so we must ensure we will only deal with the ones
* that are trails.
*/
static int
is_audit_trail(char *path)
{
  /*
   * We have these posibilities, only the first one is allowed
   * 20100619223115.20100619223131 20100619223131.not_terminated
   * current
   */
  if (strlen(path) =3D=3D 29 && path[14] =3D=3D '.' && isdigit(path[15])) {
    /* XXX To improve this checking later */
    return 1;
  }
  return 0;
}
/*++++++++++++++++++++++*/

By the way the Wiki and the Perforce Repository for this project are:

http://wiki.freebsd.org/SOC2010SergioLigregni
http://p4db.freebsd.org/depotTreeBrowser.cgi?FSPC=3D//depot/projects/soc201=
0/disaudit&HIDEDEL=3DNO

Thanks!
--=20
-----------------------------------------------------------
Sergio Andr=E9s Ligregni Arredondo

Estudiante Ingenier=EDa en Sistemas Computacionales, ITQ.
Is UNIX Hot Enough for You? | FreeBSD

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 21:17:13 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CAC111065672
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 21:17:13 +0000 (UTC)
	(envelope-from emaste@freebsd.org)
Received: from mail1.sandvine.com (Mail1.sandvine.com [64.7.137.162])
	by mx1.freebsd.org (Postfix) with ESMTP id 51F878FC18
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 21:17:13 +0000 (UTC)
Received: from labgw2.phaedrus.sandvine.com (192.168.222.22) by
	WTL-EXCH-1.sandvine.com (192.168.196.31) with Microsoft SMTP Server id
	14.0.694.0; Thu, 29 Jul 2010 17:06:10 -0400
Received: by labgw2.phaedrus.sandvine.com (Postfix, from userid 10332)	id
	D6E0A33C00; Thu, 29 Jul 2010 17:06:22 -0400 (EDT)
Date: Thu, 29 Jul 2010 17:06:22 -0400
From: Ed Maste <emaste@freebsd.org>
To: <freebsd-hackers@freebsd.org>
Message-ID: <20100729210622.GA84094@sandvine.com>
References: <201007281510.o6SFAV5J052045@svn.freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <201007281510.o6SFAV5J052045@svn.freebsd.org>
User-Agent: Mutt/1.4.2.1i
Subject: Re: svn commit: r210561 - projects/sv/sys/net
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 21:17:13 -0000

On Wed, Jul 28, 2010 at 03:10:31PM +0000, Attilio Rao wrote:

> Log:
>   Initial import of the netdump files.
>   They still need a lot of polishing and cleanup so they might not be
>   considered definitive at all.

This code is a port to recent FreeBSD of Darrell Anderson's network
crashdump support, which was done in the 4.x days.  I can't find a
current website with the original versions but archive.org has a cache
of course:

http://web.archive.org/web/20041204223729/http://www.cs.duke.edu/~anderson/freebsd/netdump/

Quoting from the old readme:

  Netdump provides FreeBSD kernel crash dumping over the network.
  Netdump is a FreeBSD kernel module client and user-level server.

  A normal kernel crash writes a raw dump of memory to a dedicated
  partition (usually the swap partition) using a low-level disk routine,
  and then copies that raw dump into a file (via savecore) during the
  following boot process.

  Netdump replaces the standard dump routine. During a crash, a netdump
  client broadcasts to locate a netdump server, then sends the dump as
  UDP/IP packets (with retransmission after loss). The netdump server
  creates a dump file suitable for gdb. If netdump fails (for example,
  no netdump server is located), a normal disk dump is performed. 

There is cleanup work to be done still, but we plan to have this in
shape for 9.0.

-Ed

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 21:41:23 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7A391106567B
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 21:41:23 +0000 (UTC)
	(envelope-from pebu3op@googlemail.com)
Received: from mail.net.t-labs.tu-berlin.de (mail.net.t-labs.tu-berlin.de
	[130.149.220.252])
	by mx1.freebsd.org (Postfix) with ESMTP id 0F9AF8FC0C
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 21:41:22 +0000 (UTC)
Received: from raven.net.t-labs.tu-berlin.de (raven.net.t-labs.tu-berlin.de
	[130.149.220.18])
	by mail.net.t-labs.tu-berlin.de (Postfix) with ESMTP id 0DEF5700D29E;
	Thu, 29 Jul 2010 23:41:22 +0200 (CEST)
From: Alexander Fiveg <pebu3op@googlemail.com>
Organization: Google
To: Andriy Gapon <avg@icyb.net.ua>
Date: Thu, 29 Jul 2010 23:41:20 +0200
User-Agent: KMail/1.9.10
References: <382607918.1356296.1280433776963.JavaMail.root@vms170009.mailsrvcs.net>
	<4C51E198.8060800@icyb.net.ua>
In-Reply-To: <4C51E198.8060800@icyb.net.ua>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Message-Id: <201007292341.21123.pebu3op@googlemail.com>
Cc: freebsd-hackers@freebsd.org, Sergey Babkin <babkin@verizon.net>
Subject: Re: coherence-problem on the mapped memory buffer
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: pebu3op@googlemail.com
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 21:41:23 -0000

On Thursday 29 July 2010 22:16:24 Andriy Gapon wrote:
> on 29/07/2010 23:02 Sergey Babkin said the following:
> > Jul 29, 2010 12:58:07 PM, avg@icyb.net.ua wrote:
> >> on 29/07/2010 19:13 Andriy Gapon said the following:
> >>> on 29/07/2010 17:13 Alexander Fiveg said the following:
> >>
> >> In fact I have a suspicion that the problem might have to do with
> >> multiple mappings of the shared pages, but far from sure...
> >> Take a look at Intel=C2=AE 64 and IA-32 Architectures Software Develop=
er=E2=80=99s
> >> Manual Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4
> >> Programming the PAT; starting at the following words:
> >> =C2=ABThe PAT allows any memory type to be specified in the page table=
s, and
> >> therefore it is possible to have a single physical page mapped to two =
or
> >> more different linear addresses, each with different memory types. Int=
el
> >> does not support this practice...=C2=BB
> >
> > My guess would be that the memory type is not marked as DMA-capable.
> > AFAIK the Intel CPUs do the hardware snooping on the physical addresses,
> > so they have no coherency issues benween themselves. However if a DMA
> > writer changes the memory, this I think does not get normally propagated
> > to the front-side bus, and the CPUs would not see it. You may need to
> > either explicitly flush the CPU cache before accessing these pages or
> > mark them as non-cacheable.
>
> My guess was approximately the same - if one mapping is done in kernel for
> DMA purposes, then the memory type is, most likely, set to uncached.  But
> the userland mapping of the same pages most likely marks the same pages
> (via different virtual addresses) as cached.  Depending on the hardware a=
nd
> on what mappings were used on a particular CPU (core) to access that
> memory, there could be differences in interaction with DMA.

Thanks a lot for your answers. But  i am afraid i do not have enough=20
experience to solve these tasks. Could you please provide me with helpful=20
information how to:=20
=2D get access to the pages associated with a certain memory-buffer ?=20
I mean, I want to get the structures, that describe the page properties I=20
should change (for instance, in order to make the page non-cacheable).

if you are aware of any good papers or examples in the system code, where=20
these topics are covered, I would appreciate it if you gave me the=20
references.=20

Alex

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 22:09:25 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2074A1065693
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 22:09:25 +0000 (UTC)
	(envelope-from avg@icyb.net.ua)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 6A0F88FC0C
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 22:09:24 +0000 (UTC)
Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id BAA01097;
	Fri, 30 Jul 2010 01:09:19 +0300 (EEST)
	(envelope-from avg@icyb.net.ua)
Received: from localhost.topspin.kiev.ua ([127.0.0.1])
	by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1OebHz-0002Sb-GZ; Fri, 30 Jul 2010 01:09:19 +0300
Message-ID: <4C51FC0E.9050204@icyb.net.ua>
Date: Fri, 30 Jul 2010 01:09:18 +0300
From: Andriy Gapon <avg@icyb.net.ua>
User-Agent: Thunderbird 2.0.0.24 (X11/20100603)
MIME-Version: 1.0
To: pebu3op@googlemail.com
References: <382607918.1356296.1280433776963.JavaMail.root@vms170009.mailsrvcs.net>
	<4C51E198.8060800@icyb.net.ua>
	<201007292341.21123.pebu3op@googlemail.com>
In-Reply-To: <201007292341.21123.pebu3op@googlemail.com>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org, Sergey Babkin <babkin@verizon.net>
Subject: Re: coherence-problem on the mapped memory buffer
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 22:09:25 -0000

on 30/07/2010 00:41 Alexander Fiveg said the following:
> Thanks a lot for your answers. But  i am afraid i do not have enough 
> experience to solve these tasks. Could you please provide me with helpful 
> information how to: 
> - get access to the pages associated with a certain memory-buffer ? 
> I mean, I want to get the structures, that describe the page properties I 
> should change (for instance, in order to make the page non-cacheable).
> 
> if you are aware of any good papers or examples in the system code, where 
> these topics are covered, I would appreciate it if you gave me the 
> references. 

I don't have a recipe, but some pointers to get you started:
1. investigate BUS_DMA_NOCACHE, see bus_dma(9)
2. check sys/dev/sound/pci/hda/hdac.c for HDAC_F_DMA_NOCACHE and comment about
PCIe snoop - this might be relevenat
3. see pmap_change_attr for way to change caching type for a memory mapping
4. hope that more knowledgeable people (experts) provide their advice, keep
nudging them via mailing list(s) :-)

-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 23:39:03 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4C90D106566B
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 23:39:03 +0000 (UTC)
	(envelope-from mdf356@gmail.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 167308FC17
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 23:39:02 +0000 (UTC)
Received: by iwn35 with SMTP id 35so939082iwn.13
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 16:39:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:sender:received:date
	:x-google-sender-auth:message-id:subject:from:to:content-type;
	bh=srbXnpSkwnZyWReN2hEzf3ACyGvYQNjoyKDQ2bOvOmM=;
	b=Lw8PqhJT7jzmhrrw0mb4DelDo6R7ni5ImEh+khweawC/vLa0g+0k5HjLBnDalhAaYe
	w0FC3vfL+HVMtlcuNOLWLRHWk10WZQj/OQGm4zVNOAGaLhCRulckdh28TKgrz0mm3Q+1
	i8o92pn23drjfNBcUWXpGNQcf82H+tO9O2aUY=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:date:x-google-sender-auth:message-id:subject
	:from:to:content-type;
	b=GuBzI/LH20/o4wBarwm2qn/9hv8ddPKnIgsvQc8mTr8oDYqJIStJxBA7p183x82J0q
	vcE1Euz0vS5vJ8Hv4yQz5G9bbumHQfktJUvASXGLGD47sxFDeDESJ7A8qELfvfFgmWdo
	ZbGwwFum7dRihK/0FN58fddWLCLiy6p+9pvXk=
MIME-Version: 1.0
Received: by 10.42.9.69 with SMTP id l5mr186837icl.80.1280446742146; Thu, 29 
	Jul 2010 16:39:02 -0700 (PDT)
Sender: mdf356@gmail.com
Received: by 10.42.6.85 with HTTP; Thu, 29 Jul 2010 16:39:02 -0700 (PDT)
Date: Thu, 29 Jul 2010 16:39:02 -0700
X-Google-Sender-Auth: 4ouVY9hWjuzZ2dhMKYwYcWwNxKs
Message-ID: <AANLkTikY20TxyeyqO5zP3zC-azb748kV-MdevPfm+8cq@mail.gmail.com>
From: mdf@FreeBSD.org
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Subject: sched_pin() versus PCPU_GET
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 23:39:03 -0000

We've seen a few instances at work where witness_warn() in ast()
indicates the sched lock is still held, but the place it claims it was
held by is in fact sometimes not possible to keep the lock, like:

	thread_lock(td);
	td->td_flags &= ~TDF_SELECT;
	thread_unlock(td);

What I was wondering is, even though the assembly I see in objdump -S
for witness_warn has the increment of td_pinned before the PCPU_GET:

ffffffff802db210:	65 48 8b 1c 25 00 00 	mov    %gs:0x0,%rbx
ffffffff802db217:	00 00
ffffffff802db219:	ff 83 04 01 00 00    	incl   0x104(%rbx)
	 * Pin the thread in order to avoid problems with thread migration.
	 * Once that all verifies are passed about spinlocks ownership,
	 * the thread is in a safe path and it can be unpinned.
	 */
	sched_pin();
	lock_list = PCPU_GET(spinlocks);
ffffffff802db21f:	65 48 8b 04 25 48 00 	mov    %gs:0x48,%rax
ffffffff802db226:	00 00
	if (lock_list != NULL && lock_list->ll_count != 0) {
ffffffff802db228:	48 85 c0             	test   %rax,%rax
	 * Pin the thread in order to avoid problems with thread migration.
	 * Once that all verifies are passed about spinlocks ownership,
	 * the thread is in a safe path and it can be unpinned.
	 */
	sched_pin();
	lock_list = PCPU_GET(spinlocks);
ffffffff802db22b:	48 89 85 f0 fe ff ff 	mov    %rax,-0x110(%rbp)
ffffffff802db232:	48 89 85 f8 fe ff ff 	mov    %rax,-0x108(%rbp)
	if (lock_list != NULL && lock_list->ll_count != 0) {
ffffffff802db239:	0f 84 ff 00 00 00    	je     ffffffff802db33e
<witness_warn+0x30e>
ffffffff802db23f:	44 8b 60 50          	mov    0x50(%rax),%r12d

is it possible for the hardware to do any re-ordering here?

The reason I'm suspicious is not just that the code doesn't have a
lock leak at the indicated point, but in one instance I can see in the
dump that the lock_list local from witness_warn is from the pcpu
structure for CPU 0 (and I was warned about sched lock 0), but the
thread id in panic_cpu is 2.  So clearly the thread was being migrated
right around panic time.

This is the amd64 kernel on stable/7.  I'm not sure exactly what kind
of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC.

So... do we need some kind of barrier in the code for sched_pin() for
it to really do what it claims?  Could the hardware have re-ordered
the "mov    %gs:0x48,%rax" PCPU_GET to before the sched_pin()
increment?

Thanks,
matthew

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jul 29 23:57:26 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9D5671065673
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 23:57:26 +0000 (UTC)
	(envelope-from mdf356@gmail.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 60E658FC20
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 23:57:26 +0000 (UTC)
Received: by iwn35 with SMTP id 35so960151iwn.13
	for <freebsd-hackers@freebsd.org>; Thu, 29 Jul 2010 16:57:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:sender:received
	:in-reply-to:references:date:x-google-sender-auth:message-id:subject
	:from:to:content-type:content-transfer-encoding;
	bh=JofS1w1C5pEvTz/bYg2iJafvnkpVzgN2GwEf13Wku84=;
	b=MVX6SDjnXVf1N0n/VKO2BCCcHcoN1toO2ZPkgJj40gbHUqVrhCFLEHQsMSQ02VdJDH
	e0F4DUnZpc3/uebcojSTxlUCHA0I2a23SiufSZKyTCBMZf+984aDHpU/qq7QnrE4a8Ps
	25RGtXOAEAHba0vLNX9Wq3i4Wk091uQVw0Jb4=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:content-type
	:content-transfer-encoding;
	b=rKfRVw8WxuSy1fzHecX4/IQI7LLO1CRL//1b8O/Jnhp9EbPp1WwI8uSk/DB6rcjxn8
	b/umbfzEVKdqYXsKW8x+AZykjxNfy47hwNlZJ5M5kQI9g1VOsqOWFN4xvrvsZl1uab8F
	4mP3qgSB2i0bNAtGqlF83QGUDGaSvJgWxNNxA=
MIME-Version: 1.0
Received: by 10.42.9.4 with SMTP id k4mr194785ick.72.1280447845206; Thu, 29 
	Jul 2010 16:57:25 -0700 (PDT)
Sender: mdf356@gmail.com
Received: by 10.42.6.85 with HTTP; Thu, 29 Jul 2010 16:57:25 -0700 (PDT)
In-Reply-To: <AANLkTikY20TxyeyqO5zP3zC-azb748kV-MdevPfm+8cq@mail.gmail.com>
References: <AANLkTikY20TxyeyqO5zP3zC-azb748kV-MdevPfm+8cq@mail.gmail.com>
Date: Thu, 29 Jul 2010 16:57:25 -0700
X-Google-Sender-Auth: E1Ba7yahsoiKgm4UgP7rPIrjA_w
Message-ID: <AANLkTimGjNATWmuGqTDMFQ0r3gHnsv0Bc69pBb6QYO9L@mail.gmail.com>
From: mdf@FreeBSD.org
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Re: sched_pin() versus PCPU_GET
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 23:57:26 -0000

On Thu, Jul 29, 2010 at 4:39 PM,  <mdf@freebsd.org> wrote:
> We've seen a few instances at work where witness_warn() in ast()
> indicates the sched lock is still held, but the place it claims it was
> held by is in fact sometimes not possible to keep the lock, like:
>
> =A0 =A0 =A0 =A0thread_lock(td);
> =A0 =A0 =A0 =A0td->td_flags &=3D ~TDF_SELECT;
> =A0 =A0 =A0 =A0thread_unlock(td);
>
> What I was wondering is, even though the assembly I see in objdump -S
> for witness_warn has the increment of td_pinned before the PCPU_GET:
>
> ffffffff802db210: =A0 =A0 =A0 65 48 8b 1c 25 00 00 =A0 =A0mov =A0 =A0%gs:=
0x0,%rbx
> ffffffff802db217: =A0 =A0 =A0 00 00
> ffffffff802db219: =A0 =A0 =A0 ff 83 04 01 00 00 =A0 =A0 =A0 incl =A0 0x10=
4(%rbx)
> =A0 =A0 =A0 =A0 * Pin the thread in order to avoid problems with thread m=
igration.
> =A0 =A0 =A0 =A0 * Once that all verifies are passed about spinlocks owner=
ship,
> =A0 =A0 =A0 =A0 * the thread is in a safe path and it can be unpinned.
> =A0 =A0 =A0 =A0 */
> =A0 =A0 =A0 =A0sched_pin();
> =A0 =A0 =A0 =A0lock_list =3D PCPU_GET(spinlocks);
> ffffffff802db21f: =A0 =A0 =A0 65 48 8b 04 25 48 00 =A0 =A0mov =A0 =A0%gs:=
0x48,%rax
> ffffffff802db226: =A0 =A0 =A0 00 00
> =A0 =A0 =A0 =A0if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) {
> ffffffff802db228: =A0 =A0 =A0 48 85 c0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0tes=
t =A0 %rax,%rax
> =A0 =A0 =A0 =A0 * Pin the thread in order to avoid problems with thread m=
igration.
> =A0 =A0 =A0 =A0 * Once that all verifies are passed about spinlocks owner=
ship,
> =A0 =A0 =A0 =A0 * the thread is in a safe path and it can be unpinned.
> =A0 =A0 =A0 =A0 */
> =A0 =A0 =A0 =A0sched_pin();
> =A0 =A0 =A0 =A0lock_list =3D PCPU_GET(spinlocks);
> ffffffff802db22b: =A0 =A0 =A0 48 89 85 f0 fe ff ff =A0 =A0mov =A0 =A0%rax=
,-0x110(%rbp)
> ffffffff802db232: =A0 =A0 =A0 48 89 85 f8 fe ff ff =A0 =A0mov =A0 =A0%rax=
,-0x108(%rbp)
> =A0 =A0 =A0 =A0if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) {
> ffffffff802db239: =A0 =A0 =A0 0f 84 ff 00 00 00 =A0 =A0 =A0 je =A0 =A0 ff=
ffffff802db33e
> <witness_warn+0x30e>
> ffffffff802db23f: =A0 =A0 =A0 44 8b 60 50 =A0 =A0 =A0 =A0 =A0 =A0 mov =A0=
 =A00x50(%rax),%r12d
>
> is it possible for the hardware to do any re-ordering here?
>
> The reason I'm suspicious is not just that the code doesn't have a
> lock leak at the indicated point, but in one instance I can see in the
> dump that the lock_list local from witness_warn is from the pcpu
> structure for CPU 0 (and I was warned about sched lock 0), but the
> thread id in panic_cpu is 2. =A0So clearly the thread was being migrated
> right around panic time.
>
> This is the amd64 kernel on stable/7. =A0I'm not sure exactly what kind
> of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC.
>
> So... do we need some kind of barrier in the code for sched_pin() for
> it to really do what it claims? =A0Could the hardware have re-ordered
> the "mov =A0 =A0%gs:0x48,%rax" PCPU_GET to before the sched_pin()
> increment?

So after some research, the answer I'm getting is "maybe".  What I'm
concerned about is whether the h/w reordered the read of PCPU_GET in
front of the previous store to increment td_pinned.  While not an
ultimate authority,
http://en.wikipedia.org/wiki/Memory_ordering#In_SMP_microprocessor_systems
implies that stores can be reordered after loads for both Intel and
amd64 chips, which would I believe account for the behavior seen here.

Thanks,
matthew

From owner-freebsd-hackers@FreeBSD.ORG  Fri Jul 30 09:44:18 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E61C01065675;
	Fri, 30 Jul 2010 09:44:18 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 7EC948FC20;
	Fri, 30 Jul 2010 09:44:17 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o6U9iDmO001589
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 30 Jul 2010 12:44:13 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id
	o6U9iD0M029019; Fri, 30 Jul 2010 12:44:13 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o6U9iD0r029018; 
	Fri, 30 Jul 2010 12:44:13 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Fri, 30 Jul 2010 12:44:13 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: mdf@freebsd.org
Message-ID: <20100730094413.GJ22295@deviant.kiev.zoral.com.ua>
References: <AANLkTikY20TxyeyqO5zP3zC-azb748kV-MdevPfm+8cq@mail.gmail.com>
	<AANLkTimGjNATWmuGqTDMFQ0r3gHnsv0Bc69pBb6QYO9L@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="OdQvBiqfLsaeimeB"
Content-Disposition: inline
In-Reply-To: <AANLkTimGjNATWmuGqTDMFQ0r3gHnsv0Bc69pBb6QYO9L@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.2 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_50,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-hackers@freebsd.org
Subject: Re: sched_pin() versus PCPU_GET
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2010 09:44:19 -0000


--OdQvBiqfLsaeimeB
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 29, 2010 at 04:57:25PM -0700, mdf@freebsd.org wrote:
> On Thu, Jul 29, 2010 at 4:39 PM,  <mdf@freebsd.org> wrote:
> > We've seen a few instances at work where witness_warn() in ast()
> > indicates the sched lock is still held, but the place it claims it was
> > held by is in fact sometimes not possible to keep the lock, like:
> >
> > =9A =9A =9A =9Athread_lock(td);
> > =9A =9A =9A =9Atd->td_flags &=3D ~TDF_SELECT;
> > =9A =9A =9A =9Athread_unlock(td);
> >
> > What I was wondering is, even though the assembly I see in objdump -S
> > for witness_warn has the increment of td_pinned before the PCPU_GET:
> >
> > ffffffff802db210: =9A =9A =9A 65 48 8b 1c 25 00 00 =9A =9Amov =9A =9A%g=
s:0x0,%rbx
> > ffffffff802db217: =9A =9A =9A 00 00
> > ffffffff802db219: =9A =9A =9A ff 83 04 01 00 00 =9A =9A =9A incl =9A 0x=
104(%rbx)
> > =9A =9A =9A =9A * Pin the thread in order to avoid problems with thread=
 migration.
> > =9A =9A =9A =9A * Once that all verifies are passed about spinlocks own=
ership,
> > =9A =9A =9A =9A * the thread is in a safe path and it can be unpinned.
> > =9A =9A =9A =9A */
> > =9A =9A =9A =9Asched_pin();
> > =9A =9A =9A =9Alock_list =3D PCPU_GET(spinlocks);
> > ffffffff802db21f: =9A =9A =9A 65 48 8b 04 25 48 00 =9A =9Amov =9A =9A%g=
s:0x48,%rax
> > ffffffff802db226: =9A =9A =9A 00 00
> > =9A =9A =9A =9Aif (lock_list !=3D NULL && lock_list->ll_count !=3D 0) {
> > ffffffff802db228: =9A =9A =9A 48 85 c0 =9A =9A =9A =9A =9A =9A =9A =9At=
est =9A %rax,%rax
> > =9A =9A =9A =9A * Pin the thread in order to avoid problems with thread=
 migration.
> > =9A =9A =9A =9A * Once that all verifies are passed about spinlocks own=
ership,
> > =9A =9A =9A =9A * the thread is in a safe path and it can be unpinned.
> > =9A =9A =9A =9A */
> > =9A =9A =9A =9Asched_pin();
> > =9A =9A =9A =9Alock_list =3D PCPU_GET(spinlocks);
> > ffffffff802db22b: =9A =9A =9A 48 89 85 f0 fe ff ff =9A =9Amov =9A =9A%r=
ax,-0x110(%rbp)
> > ffffffff802db232: =9A =9A =9A 48 89 85 f8 fe ff ff =9A =9Amov =9A =9A%r=
ax,-0x108(%rbp)
> > =9A =9A =9A =9Aif (lock_list !=3D NULL && lock_list->ll_count !=3D 0) {
> > ffffffff802db239: =9A =9A =9A 0f 84 ff 00 00 00 =9A =9A =9A je =9A =9A =
ffffffff802db33e
> > <witness_warn+0x30e>
> > ffffffff802db23f: =9A =9A =9A 44 8b 60 50 =9A =9A =9A =9A =9A =9A mov =
=9A =9A0x50(%rax),%r12d
> >
> > is it possible for the hardware to do any re-ordering here?
> >
> > The reason I'm suspicious is not just that the code doesn't have a
> > lock leak at the indicated point, but in one instance I can see in the
> > dump that the lock_list local from witness_warn is from the pcpu
> > structure for CPU 0 (and I was warned about sched lock 0), but the
> > thread id in panic_cpu is 2. =9ASo clearly the thread was being migrated
> > right around panic time.
> >
> > This is the amd64 kernel on stable/7. =9AI'm not sure exactly what kind
> > of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC.
> >
> > So... do we need some kind of barrier in the code for sched_pin() for
> > it to really do what it claims? =9ACould the hardware have re-ordered
> > the "mov =9A =9A%gs:0x48,%rax" PCPU_GET to before the sched_pin()
> > increment?
>=20
> So after some research, the answer I'm getting is "maybe".  What I'm
> concerned about is whether the h/w reordered the read of PCPU_GET in
> front of the previous store to increment td_pinned.  While not an
> ultimate authority,
> http://en.wikipedia.org/wiki/Memory_ordering#In_SMP_microprocessor_systems
> implies that stores can be reordered after loads for both Intel and
> amd64 chips, which would I believe account for the behavior seen here.
>=20

Am I right that you suggest that in the sequence
	mov	%gs:0x0,%rbx      [1]
	incl	0x104(%rbx)       [2]
	mov	%gs:0x48,%rax     [3]
interrupt and preemption happen between points [2] and [3] ?
And the %rax value after the thread was put back onto the (different) new
CPU and executed [3] was still from the old cpu' pcpu area ?

I do not believe this is possible. CPU is always self-consistent. Context
switch from the thread can only occur on the return from interrupt
handler, in critical_exit() or such. This code is executing on the
same processor, and thus should already see the effect of [2], that
would prevent context switch.

If interrupt happens between [1] and [2], then context saving code
should still see the consistent view of the register file state,
regardless of the processor issuing speculative read of
*%gs:0x48. Return from the interrupt is the serialization point due to
iret, causing read in [3] to be reissued.


--OdQvBiqfLsaeimeB
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iEYEARECAAYFAkxSnuwACgkQC3+MBN1Mb4hAcwCgpwr8EgJm76cM3HJSlDyM9MaF
8UcAn2570On4CnWqPKpIDR70UoY+AVg9
=EFO7
-----END PGP SIGNATURE-----

--OdQvBiqfLsaeimeB--

From owner-freebsd-hackers@FreeBSD.ORG  Fri Jul 30 13:44:01 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 723BE1065676
	for <freebsd-hackers@freebsd.org>; Fri, 30 Jul 2010 13:44:01 +0000 (UTC)
	(envelope-from mdf356@gmail.com)
Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com
	[209.85.160.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 293DB8FC13
	for <freebsd-hackers@freebsd.org>; Fri, 30 Jul 2010 13:44:00 +0000 (UTC)
Received: by gyg4 with SMTP id 4so774573gyg.13
	for <freebsd-hackers@freebsd.org>; Fri, 30 Jul 2010 06:44:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:sender:received
	:in-reply-to:references:date:x-google-sender-auth:message-id:subject
	:from:to:cc:content-type:content-transfer-encoding;
	bh=QAbQg1mb5WxeZymaeMAqZd+gWN0GbJv2/gXq1MqK65o=;
	b=Gy4VgkG1lW/041bGAE/bjFmi1QiopcKcGSDjO73OA1ZPRRr2IOzQ6c4WlddhcY+IRV
	X5Wzsmks5FceYKmzMHKsgi6wvhVXsBXxHAAjPLFVX/dEoF8WbDuwJ2mymvzB2tk/naJ5
	vjeWidWNZusYWXF1kgQ7xMy9xGobtTRU9E13I=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	b=HNtKjcFCzj0tB0m/6Q7I52/KA/vKT63Y41ZTZCYqLmPgi4JEyejYGZBAe+dQfPA77n
	cwh7k8OgefknLYkTpXtvj7ORWTxkiPbX9hqKKpSs9+qptylCbwwUHx1X5f9OE58iwaZC
	RDq4lbAyUDCPgz2aWjOnklA4YxfmFLKc0B87c=
MIME-Version: 1.0
Received: by 10.151.63.18 with SMTP id q18mr3310765ybk.100.1280497440267; Fri, 
	30 Jul 2010 06:44:00 -0700 (PDT)
Sender: mdf356@gmail.com
Received: by 10.42.6.85 with HTTP; Fri, 30 Jul 2010 06:44:00 -0700 (PDT)
In-Reply-To: <20100730094413.GJ22295@deviant.kiev.zoral.com.ua>
References: <AANLkTikY20TxyeyqO5zP3zC-azb748kV-MdevPfm+8cq@mail.gmail.com>
	<AANLkTimGjNATWmuGqTDMFQ0r3gHnsv0Bc69pBb6QYO9L@mail.gmail.com>
	<20100730094413.GJ22295@deviant.kiev.zoral.com.ua>
Date: Fri, 30 Jul 2010 06:44:00 -0700
X-Google-Sender-Auth: VCKwH3Z8JSv6YA9bSy2UlTXLAfE
Message-ID: <AANLkTi=PFxARt8Jw0fq09gWEzZgAeeQxRyrBHKYa2PXq@mail.gmail.com>
From: mdf@FreeBSD.org
To: Kostik Belousov <kostikbel@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-hackers@freebsd.org
Subject: Re: sched_pin() versus PCPU_GET
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2010 13:44:01 -0000

2010/7/30 Kostik Belousov <kostikbel@gmail.com>:
> On Thu, Jul 29, 2010 at 04:57:25PM -0700, mdf@freebsd.org wrote:
>> On Thu, Jul 29, 2010 at 4:39 PM, =A0<mdf@freebsd.org> wrote:
>> > We've seen a few instances at work where witness_warn() in ast()
>> > indicates the sched lock is still held, but the place it claims it was
>> > held by is in fact sometimes not possible to keep the lock, like:
>> >
>> > =A0 =A0 =A0 =A0thread_lock(td);
>> > =A0 =A0 =A0 =A0td->td_flags &=3D ~TDF_SELECT;
>> > =A0 =A0 =A0 =A0thread_unlock(td);
>> >
>> > What I was wondering is, even though the assembly I see in objdump -S
>> > for witness_warn has the increment of td_pinned before the PCPU_GET:
>> >
>> > ffffffff802db210: =A0 =A0 =A0 65 48 8b 1c 25 00 00 =A0 =A0mov =A0 =A0%=
gs:0x0,%rbx
>> > ffffffff802db217: =A0 =A0 =A0 00 00
>> > ffffffff802db219: =A0 =A0 =A0 ff 83 04 01 00 00 =A0 =A0 =A0 incl =A0 0=
x104(%rbx)
>> > =A0 =A0 =A0 =A0 * Pin the thread in order to avoid problems with threa=
d migration.
>> > =A0 =A0 =A0 =A0 * Once that all verifies are passed about spinlocks ow=
nership,
>> > =A0 =A0 =A0 =A0 * the thread is in a safe path and it can be unpinned.
>> > =A0 =A0 =A0 =A0 */
>> > =A0 =A0 =A0 =A0sched_pin();
>> > =A0 =A0 =A0 =A0lock_list =3D PCPU_GET(spinlocks);
>> > ffffffff802db21f: =A0 =A0 =A0 65 48 8b 04 25 48 00 =A0 =A0mov =A0 =A0%=
gs:0x48,%rax
>> > ffffffff802db226: =A0 =A0 =A0 00 00
>> > =A0 =A0 =A0 =A0if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) =
{
>> > ffffffff802db228: =A0 =A0 =A0 48 85 c0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
test =A0 %rax,%rax
>> > =A0 =A0 =A0 =A0 * Pin the thread in order to avoid problems with threa=
d migration.
>> > =A0 =A0 =A0 =A0 * Once that all verifies are passed about spinlocks ow=
nership,
>> > =A0 =A0 =A0 =A0 * the thread is in a safe path and it can be unpinned.
>> > =A0 =A0 =A0 =A0 */
>> > =A0 =A0 =A0 =A0sched_pin();
>> > =A0 =A0 =A0 =A0lock_list =3D PCPU_GET(spinlocks);
>> > ffffffff802db22b: =A0 =A0 =A0 48 89 85 f0 fe ff ff =A0 =A0mov =A0 =A0%=
rax,-0x110(%rbp)
>> > ffffffff802db232: =A0 =A0 =A0 48 89 85 f8 fe ff ff =A0 =A0mov =A0 =A0%=
rax,-0x108(%rbp)
>> > =A0 =A0 =A0 =A0if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) =
{
>> > ffffffff802db239: =A0 =A0 =A0 0f 84 ff 00 00 00 =A0 =A0 =A0 je =A0 =A0=
 ffffffff802db33e
>> > <witness_warn+0x30e>
>> > ffffffff802db23f: =A0 =A0 =A0 44 8b 60 50 =A0 =A0 =A0 =A0 =A0 =A0 mov =
=A0 =A00x50(%rax),%r12d
>> >
>> > is it possible for the hardware to do any re-ordering here?
>> >
>> > The reason I'm suspicious is not just that the code doesn't have a
>> > lock leak at the indicated point, but in one instance I can see in the
>> > dump that the lock_list local from witness_warn is from the pcpu
>> > structure for CPU 0 (and I was warned about sched lock 0), but the
>> > thread id in panic_cpu is 2. =A0So clearly the thread was being migrat=
ed
>> > right around panic time.
>> >
>> > This is the amd64 kernel on stable/7. =A0I'm not sure exactly what kin=
d
>> > of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC.
>> >
>> > So... do we need some kind of barrier in the code for sched_pin() for
>> > it to really do what it claims? =A0Could the hardware have re-ordered
>> > the "mov =A0 =A0%gs:0x48,%rax" PCPU_GET to before the sched_pin()
>> > increment?
>>
>> So after some research, the answer I'm getting is "maybe". =A0What I'm
>> concerned about is whether the h/w reordered the read of PCPU_GET in
>> front of the previous store to increment td_pinned. =A0While not an
>> ultimate authority,
>> http://en.wikipedia.org/wiki/Memory_ordering#In_SMP_microprocessor_syste=
ms
>> implies that stores can be reordered after loads for both Intel and
>> amd64 chips, which would I believe account for the behavior seen here.
>
> Am I right that you suggest that in the sequence
> =A0 =A0 =A0 =A0mov =A0 =A0 %gs:0x0,%rbx =A0 =A0 =A0[1]
> =A0 =A0 =A0 =A0incl =A0 =A00x104(%rbx) =A0 =A0 =A0 [2]
> =A0 =A0 =A0 =A0mov =A0 =A0 %gs:0x48,%rax =A0 =A0 [3]
> interrupt and preemption happen between points [2] and [3] ?
> And the %rax value after the thread was put back onto the (different) new
> CPU and executed [3] was still from the old cpu' pcpu area ?

Right, but I'm also asking if it's possible the hardware executed the
instructions as:

 =A0 =A0 =A0 =A0mov =A0 =A0 %gs:0x0,%rbx =A0 =A0 =A0[1]
 =A0 =A0 =A0 =A0mov =A0 =A0 %gs:0x48,%rax =A0 =A0 [3]
 =A0 =A0 =A0 =A0incl =A0 =A00x104(%rbx) =A0 =A0 =A0 [2]

On PowerPC this is definitely possible and I'd use an isync to prevent
the re-ordering.  I haven't been able to confirm that Intel/AMD
present such a strict ordering that no barrier is needed.

It's admittedly a very tight window, and we've only seen it twice, but
I have no other way to explain the symptom.  Unfortunately in the dump
gdb shows both %rax and %gs as 0, so I can't confirm that they had a
value I'd expect from another CPU.  The only thing I do have is
panic_cpu being different than the CPU at the time of
PCPU_GET(spinlock), but of course there's definitely a window there.

> I do not believe this is possible. CPU is always self-consistent. Context
> switch from the thread can only occur on the return from interrupt
> handler, in critical_exit() or such. This code is executing on the
> same processor, and thus should already see the effect of [2], that
> would prevent context switch.

Right, but if the hardware allowed reads to pass writes, then %rax
would have an incorrect value which would be saved at interrupt time,
and restored on another processor.

I can add a few sanity asserts to try to prove this one way or another
and hope they don't mess with the timing; this has only shown up when
testing with a hugely multi-threaded CIFS server.

The only reason I'm hammering at OOO execution being the explanation
is that it seems like the only way to explain the symptoms... unless I
prefer to believe that PCPU_GET is completely busted, which seems less
likely.

Thanks,
matthew

From owner-freebsd-hackers@FreeBSD.ORG  Fri Jul 30 14:10:04 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 974D71065672;
	Fri, 30 Jul 2010 14:10:04 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 4F9E48FC0C;
	Fri, 30 Jul 2010 14:10:04 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id BCBEA46B2C;
	Fri, 30 Jul 2010 10:10:03 -0400 (EDT)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id BC5438A03C;
	Fri, 30 Jul 2010 10:10:02 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-hackers@freebsd.org
Date: Fri, 30 Jul 2010 10:08:22 -0400
User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; )
References: <AANLkTikY20TxyeyqO5zP3zC-azb748kV-MdevPfm+8cq@mail.gmail.com>
In-Reply-To: <AANLkTikY20TxyeyqO5zP3zC-azb748kV-MdevPfm+8cq@mail.gmail.com>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <201007301008.22501.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Fri, 30 Jul 2010 10:10:02 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham
	version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: mdf@freebsd.org
Subject: Re: sched_pin() versus PCPU_GET
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2010 14:10:04 -0000

On Thursday, July 29, 2010 7:39:02 pm mdf@freebsd.org wrote:
> We've seen a few instances at work where witness_warn() in ast()
> indicates the sched lock is still held, but the place it claims it was
> held by is in fact sometimes not possible to keep the lock, like:
>=20
> 	thread_lock(td);
> 	td->td_flags &=3D ~TDF_SELECT;
> 	thread_unlock(td);
>=20
> What I was wondering is, even though the assembly I see in objdump -S
> for witness_warn has the increment of td_pinned before the PCPU_GET:
>=20
> ffffffff802db210:	65 48 8b 1c 25 00 00 	mov    %gs:0x0,%rbx
> ffffffff802db217:	00 00
> ffffffff802db219:	ff 83 04 01 00 00    	incl   0x104(%rbx)
> 	 * Pin the thread in order to avoid problems with thread migration.
> 	 * Once that all verifies are passed about spinlocks ownership,
> 	 * the thread is in a safe path and it can be unpinned.
> 	 */
> 	sched_pin();
> 	lock_list =3D PCPU_GET(spinlocks);
> ffffffff802db21f:	65 48 8b 04 25 48 00 	mov    %gs:0x48,%rax
> ffffffff802db226:	00 00
> 	if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) {
> ffffffff802db228:	48 85 c0             	test   %rax,%rax
> 	 * Pin the thread in order to avoid problems with thread migration.
> 	 * Once that all verifies are passed about spinlocks ownership,
> 	 * the thread is in a safe path and it can be unpinned.
> 	 */
> 	sched_pin();
> 	lock_list =3D PCPU_GET(spinlocks);
> ffffffff802db22b:	48 89 85 f0 fe ff ff 	mov    %rax,-0x110(%rbp)
> ffffffff802db232:	48 89 85 f8 fe ff ff 	mov    %rax,-0x108(%rbp)
> 	if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) {
> ffffffff802db239:	0f 84 ff 00 00 00    	je     ffffffff802db33e
> <witness_warn+0x30e>
> ffffffff802db23f:	44 8b 60 50          	mov    0x50(%rax),%r12d
>=20
> is it possible for the hardware to do any re-ordering here?
>=20
> The reason I'm suspicious is not just that the code doesn't have a
> lock leak at the indicated point, but in one instance I can see in the
> dump that the lock_list local from witness_warn is from the pcpu
> structure for CPU 0 (and I was warned about sched lock 0), but the
> thread id in panic_cpu is 2.  So clearly the thread was being migrated
> right around panic time.
>=20
> This is the amd64 kernel on stable/7.  I'm not sure exactly what kind
> of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC.
>=20
> So... do we need some kind of barrier in the code for sched_pin() for
> it to really do what it claims?  Could the hardware have re-ordered
> the "mov    %gs:0x48,%rax" PCPU_GET to before the sched_pin()
> increment?

Hmmm, I think it might be able to because they refer to different locations.

Note this rule in section 8.2.2 of Volume 3A:

  =E2=80=A2 Reads may be reordered with older writes to different locations=
 but not
    with older writes to the same location.

It is certainly true that sparc64 could reorder with RMO.  I believe ia64=20
could reorder as well.  Since sched_pin/unpin are frequently used to provid=
e=20
this sort of synchronization, we could use memory barriers in pin/unpin
like so:

sched_pin()
{
	td->td_pinned =3D atomic_load_acq_int(&td->td_pinned) + 1;
}

sched_unpin()
{
	atomic_store_rel_int(&td->td_pinned, td->td_pinned - 1);
}

We could also just use atomic_add_acq_int() and atomic_sub_rel_int(), but t=
hey=20
are slightly more heavyweight, though it would be more clear what is happen=
ing=20
I think.

=2D-=20
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Fri Jul 30 14:33:52 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 90A301065740;
	Fri, 30 Jul 2010 14:33:52 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 620228FC18;
	Fri, 30 Jul 2010 14:33:52 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id ED26346B38;
	Fri, 30 Jul 2010 10:33:51 -0400 (EDT)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1981B8A03C;
	Fri, 30 Jul 2010 10:33:51 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-hackers@freebsd.org
Date: Fri, 30 Jul 2010 10:31:34 -0400
User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; )
References: <AANLkTikY20TxyeyqO5zP3zC-azb748kV-MdevPfm+8cq@mail.gmail.com>
	<201007301008.22501.jhb@freebsd.org>
In-Reply-To: <201007301008.22501.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <201007301031.34266.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Fri, 30 Jul 2010 10:33:51 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham
	version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: mdf@freebsd.org
Subject: Re: sched_pin() versus PCPU_GET
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2010 14:33:52 -0000

On Friday, July 30, 2010 10:08:22 am John Baldwin wrote:
> On Thursday, July 29, 2010 7:39:02 pm mdf@freebsd.org wrote:
> > We've seen a few instances at work where witness_warn() in ast()
> > indicates the sched lock is still held, but the place it claims it was
> > held by is in fact sometimes not possible to keep the lock, like:
> >=20
> > 	thread_lock(td);
> > 	td->td_flags &=3D ~TDF_SELECT;
> > 	thread_unlock(td);
> >=20
> > What I was wondering is, even though the assembly I see in objdump -S
> > for witness_warn has the increment of td_pinned before the PCPU_GET:
> >=20
> > ffffffff802db210:	65 48 8b 1c 25 00 00 	mov    %gs:0x0,%rbx
> > ffffffff802db217:	00 00
> > ffffffff802db219:	ff 83 04 01 00 00    	incl   0x104(%rbx)
> > 	 * Pin the thread in order to avoid problems with thread migration.
> > 	 * Once that all verifies are passed about spinlocks ownership,
> > 	 * the thread is in a safe path and it can be unpinned.
> > 	 */
> > 	sched_pin();
> > 	lock_list =3D PCPU_GET(spinlocks);
> > ffffffff802db21f:	65 48 8b 04 25 48 00 	mov    %gs:0x48,%rax
> > ffffffff802db226:	00 00
> > 	if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) {
> > ffffffff802db228:	48 85 c0             	test   %rax,%rax
> > 	 * Pin the thread in order to avoid problems with thread migration.
> > 	 * Once that all verifies are passed about spinlocks ownership,
> > 	 * the thread is in a safe path and it can be unpinned.
> > 	 */
> > 	sched_pin();
> > 	lock_list =3D PCPU_GET(spinlocks);
> > ffffffff802db22b:	48 89 85 f0 fe ff ff 	mov    %rax,-0x110(%rbp)
> > ffffffff802db232:	48 89 85 f8 fe ff ff 	mov    %rax,-0x108(%rbp)
> > 	if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) {
> > ffffffff802db239:	0f 84 ff 00 00 00    	je     ffffffff802db33e
> > <witness_warn+0x30e>
> > ffffffff802db23f:	44 8b 60 50          	mov    0x50(%rax),%r12d
> >=20
> > is it possible for the hardware to do any re-ordering here?
> >=20
> > The reason I'm suspicious is not just that the code doesn't have a
> > lock leak at the indicated point, but in one instance I can see in the
> > dump that the lock_list local from witness_warn is from the pcpu
> > structure for CPU 0 (and I was warned about sched lock 0), but the
> > thread id in panic_cpu is 2.  So clearly the thread was being migrated
> > right around panic time.
> >=20
> > This is the amd64 kernel on stable/7.  I'm not sure exactly what kind
> > of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC.
> >=20
> > So... do we need some kind of barrier in the code for sched_pin() for
> > it to really do what it claims?  Could the hardware have re-ordered
> > the "mov    %gs:0x48,%rax" PCPU_GET to before the sched_pin()
> > increment?
>=20
> Hmmm, I think it might be able to because they refer to different locatio=
ns.
>=20
> Note this rule in section 8.2.2 of Volume 3A:
>=20
>   =E2=80=A2 Reads may be reordered with older writes to different locatio=
ns but not
>     with older writes to the same location.
>=20
> It is certainly true that sparc64 could reorder with RMO.  I believe ia64=
=20
> could reorder as well.  Since sched_pin/unpin are frequently used to prov=
ide=20
> this sort of synchronization, we could use memory barriers in pin/unpin
> like so:
>=20
> sched_pin()
> {
> 	td->td_pinned =3D atomic_load_acq_int(&td->td_pinned) + 1;
> }
>=20
> sched_unpin()
> {
> 	atomic_store_rel_int(&td->td_pinned, td->td_pinned - 1);
> }
>=20
> We could also just use atomic_add_acq_int() and atomic_sub_rel_int(), but=
 they=20
> are slightly more heavyweight, though it would be more clear what is happ=
ening=20
> I think.

However, to actually get a race you'd have to have an interrupt fire and
migrate you so that the speculative read was from the other CPU.  However, I
don't think the speculative read would be preserved in that case.  The CPU
has to return to a specific PC when it returns from the interrupt and it has
no way of storing the state for what speculative reordering it might be
doing, so presumably it is thrown away?  I suppose it is possible that it
actually retires both instructions (but reordered) and then returns to the =
PC
value after the read of listlocks after the interrupt.  However, in that ca=
se
the scheduler would not migrate as it would see td_pinned !=3D 0.  To get t=
he
race you have to have the interrupt take effect prior to modifying td_pinne=
d,
so I think the processor would have to discard the reordered read of
listlocks so it could safely resume execution at the 'incl' instruction.

The other nit there on x86 at least is that the incl instruction is doing
both a read and a write and another rule in the section 8.2.2 is this:

  =E2=80=A2 Reads are not reordered with other reads.

That would seem to prevent the read of listlocks from passing the read of
td_pinned in the incl instruction on x86.

=2D-=20
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Fri Jul 30 15:00:26 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EC8D6106564A
	for <freebsd-hackers@freebsd.org>; Fri, 30 Jul 2010 15:00:26 +0000 (UTC)
	(envelope-from rank1seeker@gmail.com)
Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50])
	by mx1.freebsd.org (Postfix) with ESMTP id 86CCD8FC22
	for <freebsd-hackers@freebsd.org>; Fri, 30 Jul 2010 15:00:26 +0000 (UTC)
Received: by wwc33 with SMTP id 33so1423720wwc.31
	for <freebsd-hackers@freebsd.org>; Fri, 30 Jul 2010 08:00:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:date:message-id
	:subject:from:to:content-type;
	bh=rkfvYqgwLmAiQ7cGdZZlemSdDrTdBemOqWhoRD2Pido=;
	b=ImMxtW8J+9+6IHLiurEkxMFr2KIwpaPTby4A+pe0NCECuSD1k+JZVu1uRdnMbT+wRp
	Ay58ULqevM9IhU9j2hEPsSkizic0h1uepunj3m0Ed0biJMjo5t625Ad3y53WjO/tsTHo
	w4q7lXiFi4S1J3+8Os6y3g/L1IQ5Z0gxz5lJM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:content-type;
	b=iGYneZboAw6gKnbncEk+WUq47hYJ3+o24KXZ3uVBl/F+hjodjDnHiyEER7EXmrgvLs
	LnsE2e5yVkJGN1hagfjeZJrAdlzLTaJoUfQF0e1uYE3KCh67PvW4pWlVQW8p+FEXVqp1
	rdD3AmWB4GMu7Tn73l+WcjMqenMoAem3j/c/w=
MIME-Version: 1.0
Received: by 10.216.90.3 with SMTP id d3mr1716348wef.99.1280500170993; Fri, 30 
	Jul 2010 07:29:30 -0700 (PDT)
Received: by 10.216.181.13 with HTTP; Fri, 30 Jul 2010 07:29:30 -0700 (PDT)
Date: Fri, 30 Jul 2010 16:29:30 +0200
Message-ID: <AANLkTikVaN2b2p68jSVYh0OUwBxzJ5wOpeGmfctiz2Dq@mail.gmail.com>
From: "Domagoj S." <rank1seeker@gmail.com>
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Subject: ls, mount point aware
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2010 15:00:27 -0000

As I can see, more and more base apps, are aware of mount points.
I.e; In 8.1, chgrp(1), chown(8) and cp(1) now have an -x flag.

And what about human users?

'ls' command, should in it's long list of directories, show something like:
Hey, this directory, is also a mount point.

One letter flag?

From owner-freebsd-hackers@FreeBSD.ORG  Fri Jul 30 16:41:31 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1E56F1065675;
	Fri, 30 Jul 2010 16:41:31 +0000 (UTC)
	(envelope-from sfourman@gmail.com)
Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com
	[209.85.213.182])
	by mx1.freebsd.org (Postfix) with ESMTP id ACAA78FC23;
	Fri, 30 Jul 2010 16:41:30 +0000 (UTC)
Received: by yxe42 with SMTP id 42so863530yxe.13
	for <multiple recipients>; Fri, 30 Jul 2010 09:41:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:cc:content-type;
	bh=g5889z5QNz8+5jVBzBgrGBGu2dUAHQAXbw3EtWvYYOs=;
	b=SuIYxp/hGWcIkXj/y+PgaBM5nQHLeKky6lBbxmRlPd/OH+Xv+vk4bPvWKenAJgVsXK
	5BNOfLeN/kMaxJbfQjxkIRTkX6rfL2oNIJ/kRRWUJX9Wce7uIMeKz4qvZ14Dx0UM8cIb
	TMuA6H/RDfiAHtJNIEh+zYpH1VoU1WUjL+Bo8=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=Z98yd0BNkpgzKimBDixv1Va2Y7O6wPXU4rWHXr+yUcC17SntWkWw+XJ332/VFvIdRL
	Dp+sy52qWS/n2EPonpZ+FhH28eDCQOZR/DG96skDNl1OdHS22TliigUmI++k5rVfIro4
	nII0yn6ImVpLA8QMAlLpGus7grORLVwMOC5OM=
MIME-Version: 1.0
Received: by 10.150.11.12 with SMTP id 12mr3566707ybk.309.1280508089855; Fri, 
	30 Jul 2010 09:41:29 -0700 (PDT)
Received: by 10.231.28.130 with HTTP; Fri, 30 Jul 2010 09:41:29 -0700 (PDT)
In-Reply-To: <AANLkTinUVKByfTX+f9DOQ97jh43VPVSug_=BDpJ9PB0z@mail.gmail.com>
References: <AANLkTinUVKByfTX+f9DOQ97jh43VPVSug_=BDpJ9PB0z@mail.gmail.com>
Date: Fri, 30 Jul 2010 11:41:29 -0500
Message-ID: <AANLkTi=POmkbAv8du+ekE-aZhqGQfyifyYaOJ_NPXvqj@mail.gmail.com>
From: "Sam Fourman Jr." <sfourman@gmail.com>
To: krad <kraduk@googlemail.com>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-hackers@freebsd.org,
	FreeBSD Questions <freebsd-questions@freebsd.org>
Subject: Re: possible NFS lockups
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2010 16:41:31 -0000

On Tue, Jul 27, 2010 at 10:29 AM, krad <kraduk@googlemail.com> wrote:
> I have a production mail system with an nfs backend. Every now and again we
> see the nfs die on a particular head end. However it doesn't die across all
> the nodes. This suggests to me there isnt an issue with the filer itself and
> the stats from the filer concur with that.
>
> The symptoms are lines like this appearing in dmesg
>
> nfs server 10.44.17.138:/vol/vol1/mail: not responding
> nfs server 10.44.17.138:/vol/vol1/mail: is alive again
>
> trussing df it seems to hang on getfsstat, this is presumably when it tries
> the nfs mounts
>

I also have this problem, where nfs locks up on a FreeBSD 9 server
and a FreeBSD RELENG_8 client


-- 

Sam Fourman Jr.
Fourman Networks
http://www.fourmannetworks.com

From owner-freebsd-hackers@FreeBSD.ORG  Fri Jul 30 17:41:40 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2795F106568A
	for <hackers@freebsd.org>; Fri, 30 Jul 2010 17:41:40 +0000 (UTC)
	(envelope-from fabiokaminski@gmail.com)
Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54])
	by mx1.freebsd.org (Postfix) with ESMTP id CEE698FC1F
	for <hackers@freebsd.org>; Fri, 30 Jul 2010 17:41:39 +0000 (UTC)
Received: by gwj23 with SMTP id 23so900694gwj.13
	for <hackers@freebsd.org>; Fri, 30 Jul 2010 10:41:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:date:message-id
	:subject:from:to:content-type;
	bh=w6F5fb9I2Yyazc5w7EH0ymhlc3UzzSd3Z9W3RmIViQQ=;
	b=fVStMXhgG/iQ/pZY52A4wQeTvP74q9XTsVCsJ+j++232EGEFmg5sK33xkHlcSwkkR0
	IjYoDRo22UJk26hF0n6i60I8xqmd6KTzos3s1tYXa10D04mdQk+jVr0H0wPyGENLSqQB
	EuLhe0nlx4eNOi18MMmCXNX4vhzuIAyZqnHS0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:content-type;
	b=BTuvpAUkGdjvCZXC3yj83zcavudb6csw9OZ1tcxM71GkcISgPv4q1L9ljQR+4NSSSs
	zEw4j3QkwWZH4nlfzlzvBIsl6TZZrcDrpwz82x773xlySvv6erWOKV71dN/3A4w7pXT7
	5q2IU5jfLgFsfPdekK2RS6vA3fXM6HXZMj2+g=
MIME-Version: 1.0
Received: by 10.90.119.18 with SMTP id r18mr2251414agc.92.1280510167727; Fri, 
	30 Jul 2010 10:16:07 -0700 (PDT)
Received: by 10.231.207.15 with HTTP; Fri, 30 Jul 2010 10:16:07 -0700 (PDT)
Date: Fri, 30 Jul 2010 14:16:07 -0300
Message-ID: <AANLkTiktaZSdfG5ZF0O=DMr0efmjeA1wnpVFRJARBnOm@mail.gmail.com>
From: Fabio Kaminski <fabiokaminski@gmail.com>
To: hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: 
Subject: freebsd exokernel
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2010 17:41:40 -0000

Hi folks,

i know its a kind of off topic, but i think this is the perfect list for
this..

anyone here think a little bit like me, and like the exokernel idea?

the primary idea is to leverage only things like schedulling, and drivers to
kernel ring .. and downgrade things like VFS and MM to userland rings
as library.. so an aplication could optionally use those as libs abstracting
things generically.. (like a bicicle with wheels)...
and when you really need or want.. you can go into the bare metal and create
your own application abstraction..

imagine what it could represent in performance since the layers get
optimized and are not on top of other layers... and without the context
switch between user an kernel ring??

for a applicartion virtual machine like java(with its own schedulling, mm
and fs layers), or a database (fs and memory layers) or a virtualization
software..

if we write a database for instance and want to outperform disks, the actual
scenario is: or you invade the kernel of the OS and implement your
abstraction(you has to know all the sou rce of it)
and part you code in the userland :s , or you dont mess with the kernel at
all (its too impratical) and keep in the userland.. and everybody has to be
ruled by only one homogenic way to "see" things.. your application may have
luck.. this kernel abstraction its good for you.. but you may has not.. and
even if you can see the gold.. you cant advance any further..

the mit guys create one based on 98 (i think) openbsd, and they created a
web server that (now optional) tcp protocol where persisted on disk, so its
protocol agnostic, and can change
its communication wall in runtime..

sometimes im looking to where evething is going in technology, and we are
kind of stepping back.. putting more layers on top of others layers... and
slowing everything.. instead of getting it faster as it can..

i would like to share experience and what you think about this..

would it be a feasible project to borrow things from freebsd, and start a
project like this? anyone like this idea ??

anyway, just some thoughts for now..

From owner-freebsd-hackers@FreeBSD.ORG  Sat Jul 31 12:07:15 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C7091106564A
	for <hackers@freebsd.org>; Sat, 31 Jul 2010 12:07:15 +0000 (UTC)
	(envelope-from jhs@berklix.com)
Received: from tower.berklix.org (tower.berklix.org [83.236.223.114])
	by mx1.freebsd.org (Postfix) with ESMTP id 535E58FC13
	for <hackers@freebsd.org>; Sat, 31 Jul 2010 12:07:14 +0000 (UTC)
Received: from park.js.berklix.net (p549A7B73.dip.t-dialin.net
	[84.154.123.115]) (authenticated bits=0)
	by tower.berklix.org (8.14.2/8.14.2) with ESMTP id o6VC7Buw046647;
	Sat, 31 Jul 2010 12:07:13 GMT (envelope-from jhs@berklix.com)
Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41])
	by park.js.berklix.net (8.13.8/8.13.8) with ESMTP id o6VC71sW065852;
	Sat, 31 Jul 2010 14:07:01 +0200 (CEST)
	(envelope-from jhs@berklix.com)
Received: from fire.js.berklix.net (localhost [127.0.0.1])
	by fire.js.berklix.net (8.14.3/8.14.3) with ESMTP id o6VC6rdn023424;
	Sat, 31 Jul 2010 14:06:58 +0200 (CEST)
	(envelope-from jhs@fire.js.berklix.net)
Message-Id: <201007311206.o6VC6rdn023424@fire.js.berklix.net>
To: Fabio Kaminski <fabiokaminski@gmail.com>
From: "Julian H. Stacey" <jhs@berklix.com>
Organization: http://www.berklix.com BSD Unix Linux Consultancy, Munich Germany
User-agent: EXMH on FreeBSD http://www.berklix.com/free/
X-URL: http://www.berklix.com
In-reply-to: Your message "Fri, 30 Jul 2010 14:16:07 -0300."
	<AANLkTiktaZSdfG5ZF0O=DMr0efmjeA1wnpVFRJARBnOm@mail.gmail.com> 
Date: Sat, 31 Jul 2010 14:06:53 +0200
Sender: jhs@berklix.com
Cc: hackers@freebsd.org
Subject: Re: freebsd exokernel 
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 31 Jul 2010 12:07:15 -0000

> would it be a feasible project to borrow things from freebsd, and start a
> project like this? anyone like this idea ??

The code is free to use :-)

> anyway, just some thoughts for now..

See also eg Mach.
	http://en.wikipedia.org/wiki/Mach
	http://en.wikipedia.org/wiki/Mach_%28kernel%29

Cheers,
Julian
-- 
Julian Stacey: BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com
	Mail plain text.  Not HTML, Not quoted-printable, Not Base64.

From owner-freebsd-hackers@FreeBSD.ORG  Sat Jul 31 12:48:40 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A8CB61065673
	for <hackers@freebsd.org>; Sat, 31 Jul 2010 12:48:40 +0000 (UTC)
	(envelope-from dr.clau@gmail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 343998FC1A
	for <hackers@freebsd.org>; Sat, 31 Jul 2010 12:48:39 +0000 (UTC)
Received: by fxm13 with SMTP id 13so1342200fxm.13
	for <hackers@freebsd.org>; Sat, 31 Jul 2010 05:48:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from
	:user-agent:mime-version:to:cc:subject:references:in-reply-to
	:content-type:content-transfer-encoding;
	bh=AicojR9GZY5luGwPmOikwm7xttxPz36iIRAJ9Qxz8UM=;
	b=UrOaw25OKpL966o+xq3iQfQlST4T8r0HKJ9AncGGAtlJUfV+kwW63avo+0X46t5K2Z
	+DN3crFnEOFeuEpycdkbSSBILUV/6bSNyD/dNj6+ZYMzSA+vjfthNUJsUYagOBtH0+EC
	j5sPGgRO2/R4BpplgTGfhDQcWtsGO5Xu6ewBk=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	b=aRpZ0E/lpeml7Pae4Xj2Q4D1D2aoLMQ2zsrxKCyPT7pgiekJ/Zk5+SD0VfGDl2htPf
	QArY7OWSRdmRz3WeUc57v/ehUY3L4qecDhvymq6wjpubAtX7mtAh2hwnanibjEH4/bZj
	D5cPcvxacx/9hQeNGJXVQMZb+dbQmgBL4RuKs=
Received: by 10.223.112.10 with SMTP id u10mr3370312fap.50.1280578889058;
	Sat, 31 Jul 2010 05:21:29 -0700 (PDT)
Received: from [127.0.0.103] ([89.47.225.20])
	by mx.google.com with ESMTPS id r27sm1197488faa.0.2010.07.31.05.21.26
	(version=TLSv1/SSLv3 cipher=RC4-MD5);
	Sat, 31 Jul 2010 05:21:27 -0700 (PDT)
Message-ID: <4C54154A.9040306@gmail.com>
Date: Sat, 31 Jul 2010 15:21:30 +0300
From: CDP <dr.clau@gmail.com>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.1.11) Gecko/20100722 Thunderbird/3.0.6
MIME-Version: 1.0
To: "Julian H. Stacey" <jhs@berklix.com>
References: <201007311206.o6VC6rdn023424@fire.js.berklix.net>
In-Reply-To: <201007311206.o6VC6rdn023424@fire.js.berklix.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Fabio Kaminski <fabiokaminski@gmail.com>, hackers@freebsd.org
Subject: Re: freebsd exokernel
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 31 Jul 2010 12:48:40 -0000

On 07/31/10 15:06, Julian H. Stacey wrote:
>> would it be a feasible project to borrow things from freebsd, and start a
>> project like this? anyone like this idea ??
>
> The code is free to use :-)
>
>> anyway, just some thoughts for now..
>
> See also eg Mach.
> 	http://en.wikipedia.org/wiki/Mach
> 	http://en.wikipedia.org/wiki/Mach_%28kernel%29

Add this to the list (have a look at the external links too):
http://en.wikipedia.org/wiki/L4_microkernel_family

You might also want to look at this:
http://os.inf.tu-dresden.de/L4/LinuxOnL4/overview.shtml

Regards,
	Claudiu.

From owner-freebsd-hackers@FreeBSD.ORG  Sat Jul 31 18:32:44 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DA54D1065674
	for <hackers@freebsd.org>; Sat, 31 Jul 2010 18:32:44 +0000 (UTC)
	(envelope-from fabiokaminski@gmail.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 996258FC14
	for <hackers@freebsd.org>; Sat, 31 Jul 2010 18:32:44 +0000 (UTC)
Received: by iwn35 with SMTP id 35so3278092iwn.13
	for <hackers@freebsd.org>; Sat, 31 Jul 2010 11:32:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:cc:content-type;
	bh=MinJBvy6g6YN3JY6JI1tQ2Ianso9weBxOHBKZwAmiYs=;
	b=JuiVkj1LiMOnmaKmcflweJRgvkdwGsh5/CgL7Jdvwc5eAcG64gSzYMiHDfhUoiKmSL
	TXiZuGyDykJJPIacHY70wwQ/ggZTR83EC8I1UpvL1nZrun7G2pwPtPGMBBZmNtu9X2Bw
	kg2neLfzEHQovMOPnAIUTRRONyCuJ6j/uEJJk=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=ixoyFaPNLR0JQOPNy4MvjmCqSLHsHuZzeG+w5/wcMC4TbZdg2E/wx9wpUVuQkmKqBy
	RckF99hZI/9O12PBa/iBKXDfypnM1YDY4K5PuW9v4dmenUKoTYGRA3nf7ZxMwU470Qjl
	OQUbNLv9F1Ye8wl2fUJoK7uaaATxBQSOrGkAM=
MIME-Version: 1.0
Received: by 10.231.193.135 with SMTP id du7mr3703714ibb.176.1280601163772; 
	Sat, 31 Jul 2010 11:32:43 -0700 (PDT)
Received: by 10.231.207.15 with HTTP; Sat, 31 Jul 2010 11:32:43 -0700 (PDT)
In-Reply-To: <4C54154A.9040306@gmail.com>
References: <201007311206.o6VC6rdn023424@fire.js.berklix.net>
	<4C54154A.9040306@gmail.com>
Date: Sat, 31 Jul 2010 15:32:43 -0300
Message-ID: <AANLkTikTg5zU5qXHU+Gw0kQoi8vqPwGaAE9v=2CB+Ldk@mail.gmail.com>
From: Fabio Kaminski <fabiokaminski@gmail.com>
To: CDP <dr.clau@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: "Julian H. Stacey" <jhs@berklix.com>, hackers@freebsd.org
Subject: Re: freebsd exokernel
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 31 Jul 2010 18:32:44 -0000

yes , i have snifed the mach.. but i dont like the message passing idea..
its from the microkernel species
and theres even a nouveau reincarnation called barrelfish
http://www.barrelfish.org .. wich is a sort of microkernel but running one
kernel core nucleus for each core and message passing each other.. (this is
very promissing for virtualization.. but monolitic still be the fastest)

its more like this L4 kernel.. good link indeed.... but with security
included.. in fact the original mit exokernel its more like a resource
policy system... http://en.wikipedia.org/wiki/Exokernel

and i think they solve the problem that L4 has, that you are left alone..
and the applications are obligated to implement  thought parts by
themselfs.. putting the abstractions in the userland as libraries.. so if
you want user ZFS ,Bsd VMM, Btrfs or create your own abstraction or mix
some, its just link with the proper .so file.. without needing to create a
half kernel/half app application..

thanks for the links

On Sat, Jul 31, 2010 at 9:21 AM, CDP <dr.clau@gmail.com> wrote:

> On 07/31/10 15:06, Julian H. Stacey wrote:
>
>> would it be a feasible project to borrow things from freebsd, and start a
>>> project like this? anyone like this idea ??
>>>
>>
>> The code is free to use :-)
>>
>>  anyway, just some thoughts for now..
>>>
>>
>> See also eg Mach.
>>        http://en.wikipedia.org/wiki/Mach
>>        http://en.wikipedia.org/wiki/Mach_%28kernel%29
>>
>
> Add this to the list (have a look at the external links too):
> http://en.wikipedia.org/wiki/L4_microkernel_family
>
> You might also want to look at this:
> http://os.inf.tu-dresden.de/L4/LinuxOnL4/overview.shtml
>
> Regards,
>        Claudiu.
>

From owner-freebsd-hackers@FreeBSD.ORG  Sat Jul 31 21:21:53 2010
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7664C1065673
	for <hackers@freebsd.org>; Sat, 31 Jul 2010 21:21:53 +0000 (UTC)
	(envelope-from mashtizadeh@gmail.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 3AB998FC15
	for <hackers@freebsd.org>; Sat, 31 Jul 2010 21:21:52 +0000 (UTC)
Received: by iwn35 with SMTP id 35so3414270iwn.13
	for <hackers@freebsd.org>; Sat, 31 Jul 2010 14:21:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=PeQU3hWstFS/5yyb3zr+OfYjTSqBFv20hwS0CP+NzqQ=;
	b=VxzmbHRd4k3Iu7f8n62Nzu5TtqnfZpS+BRpFfp/tAnqgClolCZSZaNlp7gsiDDydGA
	/ZroIQSZuINo64ClI4kBKbvy4upfW4ztWWdco4RPiFfXoEo0MGIzPbKy+e2t1KfCCgLf
	wost9UrNFk/EfmEUtk/uNfafxQr0tLa2UGym4=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=VXo+foLpEGC8y1Yexqw6czaJQWTwlkVq/sCBrQlUS7Ik2Ej4+QT729L2mR7A0HcBlr
	UI34fKEoEgcYWz7gsv8J5gPxvYINZdOaUDgxoASvkUI1dsiCEf0PRxWTgU1A0diw5tt1
	ujWZI23nZ0KST2T/SiBJK4AxHi/Mei5QT8N74=
MIME-Version: 1.0
Received: by 10.231.174.84 with SMTP id s20mr4303990ibz.94.1280609419350; Sat, 
	31 Jul 2010 13:50:19 -0700 (PDT)
Received: by 10.231.205.201 with HTTP; Sat, 31 Jul 2010 13:50:19 -0700 (PDT)
In-Reply-To: <AANLkTikTg5zU5qXHU+Gw0kQoi8vqPwGaAE9v=2CB+Ldk@mail.gmail.com>
References: <201007311206.o6VC6rdn023424@fire.js.berklix.net>
	<4C54154A.9040306@gmail.com>
	<AANLkTikTg5zU5qXHU+Gw0kQoi8vqPwGaAE9v=2CB+Ldk@mail.gmail.com>
Date: Sat, 31 Jul 2010 13:50:19 -0700
Message-ID: <AANLkTimbzPVQQQfkoQgph-oCpEWd41f_rfG20yAHLe0h@mail.gmail.com>
From: Ali Mashtizadeh <mashtizadeh@gmail.com>
To: Fabio Kaminski <fabiokaminski@gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: "Julian H. Stacey" <jhs@berklix.com>, hackers@freebsd.org
Subject: Re: freebsd exokernel
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 31 Jul 2010 21:21:53 -0000

Hi Fabio,

Exokernels are great operating systems for prototyping or learning.
You obviously incur a lot more performance hits when you implement
such an architecture. I haven't looked into the details of
DragonflyBSD too much but they have enough infrastructure to run a
userlevel kernel that is sort of paravirtualized. From what I've read
it seems it has enough infrastructure for you to use the platform as
an exokernel without too much modification. Might be a good starting
point for you.

In addition to the original exokernel work from MIT you might want to
check out corey which has some interesting work on multicore
scalability.
http://pdos.csail.mit.edu/corey/

Thanks,
~ Ali

On Sat, Jul 31, 2010 at 11:32 AM, Fabio Kaminski
<fabiokaminski@gmail.com> wrote:
> yes , i have snifed the mach.. but i dont like the message passing idea..
> its from the microkernel species
> and theres even a nouveau reincarnation called barrelfish
> http://www.barrelfish.org .. wich is a sort of microkernel but running on=
e
> kernel core nucleus for each core and message passing each other.. (this =
is
> very promissing for virtualization.. but monolitic still be the fastest)
>
> its more like this L4 kernel.. good link indeed.... but with security
> included.. in fact the original mit exokernel its more like a resource
> policy system... http://en.wikipedia.org/wiki/Exokernel
>
> and i think they solve the problem that L4 has, that you are left alone..
> and the applications are obligated to implement =C2=A0thought parts by
> themselfs.. putting the abstractions in the userland as libraries.. so if
> you want user ZFS ,Bsd VMM, Btrfs or create your own abstraction or mix
> some, its just link with the proper .so file.. without needing to create =
a
> half kernel/half app application..
>
> thanks for the links
>
> On Sat, Jul 31, 2010 at 9:21 AM, CDP <dr.clau@gmail.com> wrote:
>
>> On 07/31/10 15:06, Julian H. Stacey wrote:
>>
>>> would it be a feasible project to borrow things from freebsd, and start=
 a
>>>> project like this? anyone like this idea ??
>>>>
>>>
>>> The code is free to use :-)
>>>
>>> =C2=A0anyway, just some thoughts for now..
>>>>
>>>
>>> See also eg Mach.
>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0http://en.wikipedia.org/wiki/Mach
>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0http://en.wikipedia.org/wiki/Mach_%28kernel%=
29
>>>
>>
>> Add this to the list (have a look at the external links too):
>> http://en.wikipedia.org/wiki/L4_microkernel_family
>>
>> You might also want to look at this:
>> http://os.inf.tu-dresden.de/L4/LinuxOnL4/overview.shtml
>>
>> Regards,
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0Claudiu.
>>
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org=
"
>


--=20
Ali Mashtizadeh
=D8=B9=D9=84=DB=8C =D9=85=D8=B4=D8=AA=DB=8C =D8=B2=D8=A7=D8=AF=D9=87