From owner-freebsd-hackers@FreeBSD.ORG  Sun Oct 14 14:42:29 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 3B9793D2
 for <freebsd-hackers@freebsd.org>; Sun, 14 Oct 2012 14:42:29 +0000 (UTC)
 (envelope-from jilles@stack.nl)
Received: from mx1.stack.nl (unknown [IPv6:2001:610:1108:5012::107])
 by mx1.freebsd.org (Postfix) with ESMTP id C964C8FC12
 for <freebsd-hackers@freebsd.org>; Sun, 14 Oct 2012 14:42:27 +0000 (UTC)
Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131])
 by mx1.stack.nl (Postfix) with ESMTP id 5C9351203CC;
 Sun, 14 Oct 2012 16:42:23 +0200 (CEST)
Received: by snail.stack.nl (Postfix, from userid 1677)
 id 299192848C; Sun, 14 Oct 2012 16:42:23 +0200 (CEST)
Date: Sun, 14 Oct 2012 16:42:23 +0200
From: Jilles Tjoelker <jilles@stack.nl>
To: Eitan Adler <lists@eitanadler.com>
Subject: Re: -lpthread vs -pthread: does -D_REENTRANT matter?
Message-ID: <20121014144222.GA14503@stack.nl>
References: <CAF6rxg=PnQvtXydhx8+oRZJ2ERBoGwedXPcGi_9icYxAtPuxVw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAF6rxg=PnQvtXydhx8+oRZJ2ERBoGwedXPcGi_9icYxAtPuxVw@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Oct 2012 14:42:29 -0000

On Mon, Oct 08, 2012 at 12:17:08PM -0400, Eitan Adler wrote:
> The only difference between -lpthread and -pthread that I could see is
> that the latter also sets -D_REENTRANT.
> However, I can't find any uses of _REENTRANT anywhere outside of a few
> utilities that seem to define it manually.

> Testing with various manually written pthread programs resulted in
> identical binaries, let alone identical results.

> Is there an actual difference between -pthread and -lpthread or is
> this just a historical artifact?

In some cases, -pthread also affects the compiler's code generation. On
some RISC architectures, compilers may try to avoid loads and stores of
less than 32 bits.

For example (untested):
  struct { int n; char a, b, c, d; } *p;
  p->a = p->b = p->c = 0;

The compiler might load p->d and then store the 32 bits containing a, b,
c and d at once. This causes a race condition if p->d is written
concurrently.

Because C99 does not specify threading, it allows these transformations.
In C11, they are forbidden. Passing -pthread disables them as well.

-- 
Jilles Tjoelker

From owner-freebsd-hackers@FreeBSD.ORG  Sun Oct 14 16:19:49 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 3E9265C8
 for <freebsd-hackers@freebsd.org>; Sun, 14 Oct 2012 16:19:49 +0000 (UTC)
 (envelope-from dcherednik@roshianokatachi.com)
Received: from smtp.nanocore.sportcomitet.org (unknown
 [IPv6:2a01:4f8:d13:2941::1:3])
 by mx1.freebsd.org (Postfix) with ESMTP id AE8748FC0A
 for <freebsd-hackers@freebsd.org>; Sun, 14 Oct 2012 16:19:48 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by smtp.nanocore.sportcomitet.org (Postfix) with SMTP id A666AC03A1
 for <freebsd-hackers@freebsd.org>; Sun, 14 Oct 2012 20:19:47 +0400 (MSK)
Received: from [192.168.11.92] (ppp91-76-136-49.pppoe.mtu-net.ru
 [91.76.136.49])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 (Authenticated sender: dcherednik@roshianokatachi.com)
 by smtp.nanocore.sportcomitet.org (Postfix) with ESMTPSA id 574D3C01B5;
 Sun, 14 Oct 2012 20:19:46 +0400 (MSK)
Message-ID: <507AE61D.7030709@roshianokatachi.com>
Date: Sun, 14 Oct 2012 20:19:41 +0400
From: Daniil Cherednik <dcherednik@roshianokatachi.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:16.0) Gecko/20121011 Thunderbird/16.0.1
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
Subject: Re: Fast syscalls via sysenter
References: <201206182256.30535.dcherednik@roshianokatachi.com>
 <201206210811.20427.jhb@freebsd.org> <4FE55F91.5070303@gmail.com>
 <20120623165823.GX2337@deviant.kiev.zoral.com.ua>
In-Reply-To: <20120623165823.GX2337@deviant.kiev.zoral.com.ua>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-DSPAM-Result: Innocent
X-DSPAM-Processed: Sun Oct 14 20:19:47 2012
X-DSPAM-Confidence: 1.0000
X-DSPAM-Improbability: 1 in 98689409 chance of being spam
X-DSPAM-Probability: 0.0023
X-DSPAM-Signature: 24,507ae62312612005967964
X-DSPAM-Factors: 27, amd64+#+reasonable, 0.40000, in+#+#+#+current, 0.40000,
 References*gmail.com+#+deviant.kiev.zoral.com.ua, 0.40000,
 situation+#+#+is, 0.40000, shared+#+content, 0.40000,
 Message-ID*507AE61D.7030709+roshianokatachi.com, 0.40000,
 done+though, 0.40000, 9+#+#+p4, 0.40000,
 was+#+several, 0.40000, Baldwin+#+#+Monday, 0.40000,
 would+#+#+solution, 0.40000, function+No, 0.40000,
 time+#+#+#+like, 0.40000, c+#+No, 0.40000, On+Monday, 0.40000,
 David+#+#+#+2012, 0.40000, to+#+#+#+to, 0.40000,
 On+#+#+#+using, 0.40000, rules+#+#+#+see, 0.40000,
 know+#+#+#+it, 0.40000, vdso+syscall, 0.40000,
 beginner+#+kernel, 0.40000, Subject*Re+#+#+via, 0.40000,
 Received*Postfix+with, 0.40000, pushl+#+#+3, 0.40000,
 looks+#+#+#+some, 0.40000, calls+#+#+#+values, 0.40000
Cc: Konstantin Belousov <kostikbel@gmail.com>, davidxu@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Oct 2012 16:19:49 -0000

On 06/23/2012 08:58 PM, Konstantin Belousov wrote:
> On Sat, Jun 23, 2012 at 02:17:53PM +0800, David Xu wrote:
>> On 2012/06/21 20:11, John Baldwin wrote:
>>> On Monday, June 18, 2012 2:56:30 pm Daniil Cherednik wrote:
>>>> Hi!
>>>>
>>>> I am trying to continue the work started by DavidXu on implemention of
>>>> fast
>>>> syscalls via sysenter/sysexit.
>>>> http://people.freebsd.org/~davidxu/sysenter/kernel/
>>>> I have ported it on FreeBSD9. It looks like it works. Unfortunately I am a
>>>> beginner in kernel so I have some questions:
>>>>
>>>> 1. see http://people.freebsd.org/~davidxu/sysenter/kernel/kernel.patch
>>>> /*
>>>> * If %edx was changed, we can not use sysexit, because it
>>>> * needs %edx to restore userland %eip.
>>>> */
>>>> if (orig_edx != frame.tf_edx)
>>>> 	td->td_pcb->pcb_flags |= PCB_FULLCTX;
>>>>
>>>> What is the reason why we have to do this additional check? In
>>>> http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s
>>>> we store %edx to the stack in
>>>> pushl %edx		/* ring 3 next %eip */
>>>> and we restore the register in
>>>> popl	%edx		/* ring 3 %eip */
>>> Some system calls return two return values (pipe(2)) or return a 64-bit
>>> off_t (lseek(2)).  Those system calls change %edx's value and need that
>>> changed value to make it out to userland.
>>>
>>>> 2. see http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s
>>>> movl	PCPU(CURPCB),%esi
>>>> call	syscall
>>>>
>>>> Why do we  movl PCPU(CURPCB),%esi before calling syscall? syscall is just
>>>> c-
>>>> function.
>>> No clue on this one, looks like it is not needed.
>>>
>> [kib@ is cc'ed]
>> I implemented the sysenter syscall long time ago, it indeed can reduce
>> system call overhead on i386. I think it might be the time to implement
>> linux like vdso syscall now based on the work kib@ recently has done,
>> though I don''t know how to hook it into kib's code.
>> I quick googled it, and found they put some data into aux vector:
>> http://www.trilithium.com/johan/2005/08/linux-gate/
>> http://www.takatan.net/lxr/source/arch/um/os-Linux/elf_aux.c?a=x86_64#L40
> Yes, intent is to eventually switch to VDSO from current situation were
> libc is aware of shared page content. This was extensively discussed in
> flame that resulted in me writing the current gettimeofday(2) patch.
> It was arch@ several weeks ago, AFAIR.
>
> Committed gettimeofday() code structure allows for VDSO interposing without
> breaking normal symbol visibility rules.
>
> I do not see a sense in implementing syscall or sysenter support for
> i386 kernel. On the other hand, using syscall for 32bit binaries on amd64
> looks reasonable.
I was not able to write some time, sorry.
So. What about implementing vdso now? I know it was a patch and feature 
request 
http://lists.freebsd.org/pipermail/freebsd-bugs/2010-April/039597.html

About sysenter: I have ported sysenter patch for 9.0-RELEASE-p4, it 
looks fine. I made some fixes in SYS.h. The reason is (if i understand 
it right) we have to get elf without DT_TEXTREL in ld-elf.so
You can find the patch here:
https://redmine.sportcomitet.org/projects/dev-freebsd/repository/revisions/master/raw/sysenter.patch
https://redmine.sportcomitet.org/projects/dev-freebsd/repository/revisions/master/raw/sys/i386/i386/sysenter.s

But now, this patch breaks compatibility with i386 XEN PV kernel. I 
wanted to fix it, but without VDSO it would be limited solution. It is 
one of reasons why I am interested about vdso status.

So, about using 32bit binaries on amd64. It is reasonable. But if we 
will use it I think we have to implement vdso support in i386 kernel too 
for compatibility and it is better to implement sysenter too.


From owner-freebsd-hackers@FreeBSD.ORG  Sun Oct 14 16:53:08 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 307E9F32
 for <freebsd-hackers@freebsd.org>; Sun, 14 Oct 2012 16:53:08 +0000 (UTC)
 (envelope-from freebsd-hackers@m.gmane.org)
Received: from plane.gmane.org (plane.gmane.org [80.91.229.3])
 by mx1.freebsd.org (Postfix) with ESMTP id DBF7D8FC12
 for <freebsd-hackers@freebsd.org>; Sun, 14 Oct 2012 16:53:07 +0000 (UTC)
Received: from list by plane.gmane.org with local (Exim 4.69)
 (envelope-from <freebsd-hackers@m.gmane.org>) id 1TNRR1-00058J-Vg
 for freebsd-hackers@freebsd.org; Sun, 14 Oct 2012 18:53:03 +0200
Received: from dsl-hkibrasgw3-ffd6c300-228.dhcp.inet.fi ([88.195.214.228])
 by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
 id 1AlnuQ-0007hv-00
 for <freebsd-hackers@freebsd.org>; Sun, 14 Oct 2012 18:53:03 +0200
Received: from rakuco by dsl-hkibrasgw3-ffd6c300-228.dhcp.inet.fi with local
 (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00
 for <freebsd-hackers@freebsd.org>; Sun, 14 Oct 2012 18:53:03 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-hackers@freebsd.org
From: Raphael Kubo da Costa <rakuco@FreeBSD.org>
Subject: Re: -lpthread vs -pthread: does -D_REENTRANT matter?
Date: Sun, 14 Oct 2012 19:52:59 +0300
Lines: 24
Message-ID: <87zk3pnggk.fsf@FreeBSD.org>
References: <CAF6rxg=PnQvtXydhx8+oRZJ2ERBoGwedXPcGi_9icYxAtPuxVw@mail.gmail.com>
 <20121014144222.GA14503@stack.nl>
Mime-Version: 1.0
Content-Type: text/plain
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: dsl-hkibrasgw3-ffd6c300-228.dhcp.inet.fi
User-Agent: Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.2 (berkeley-unix)
Cancel-Lock: sha1:bHLg9z9aPYiFLmKemVP/DHaRnN8=
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Oct 2012 16:53:08 -0000

Jilles Tjoelker <jilles@stack.nl> writes:

> On Mon, Oct 08, 2012 at 12:17:08PM -0400, Eitan Adler wrote:
>> The only difference between -lpthread and -pthread that I could see is
>> that the latter also sets -D_REENTRANT.
>> However, I can't find any uses of _REENTRANT anywhere outside of a few
>> utilities that seem to define it manually.
>
>> Testing with various manually written pthread programs resulted in
>> identical binaries, let alone identical results.
>
>> Is there an actual difference between -pthread and -lpthread or is
>> this just a historical artifact?
>
> In some cases, -pthread also affects the compiler's code generation. On
> some RISC architectures, compilers may try to avoid loads and stores of
> less than 32 bits.

[...]

> Because C99 does not specify threading, it allows these transformations.
> In C11, they are forbidden. Passing -pthread disables them as well.

And does this not happen at all if one uses -lpthread instead?


From owner-freebsd-hackers@FreeBSD.ORG  Sun Oct 14 20:55:38 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 4A61CF8
 for <freebsd-hackers@freebsd.org>; Sun, 14 Oct 2012 20:55:38 +0000 (UTC)
 (envelope-from lists@eitanadler.com)
Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com
 [209.85.160.54])
 by mx1.freebsd.org (Postfix) with ESMTP id 10FE48FC0A
 for <freebsd-hackers@freebsd.org>; Sun, 14 Oct 2012 20:55:37 +0000 (UTC)
Received: by mail-pb0-f54.google.com with SMTP id rp8so4597923pbb.13
 for <freebsd-hackers@freebsd.org>; Sun, 14 Oct 2012 13:55:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=eitanadler.com; s=0xdeadbeef;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=m5rzsz88BPSza7kF+Hk4yrimrNG5v3EejbNq2dBx7es=;
 b=UlPxpIUQD0EggJ3kfIa3eIPiOB+3qojUq7OM+/VOT98BLpoeI1rS+VUHd3WywML9Or
 ON/X/pKMJQePHrW3pqarsY6UR2YnBlNmqnRE7ehzvaPy3pnBChqEfTqxeIpJIgQ12M8y
 21tUIDELLoh28kQL3j8VwVJuWLm/3s9qltIcs=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type:x-gm-message-state;
 bh=m5rzsz88BPSza7kF+Hk4yrimrNG5v3EejbNq2dBx7es=;
 b=JgJ9zEnsx89t5ONVEpjiaRpCUu1tljb/jBSzeoa3/o5KPtDGt8f6V6yrOw4EIeZaf9
 SY3i2V1BXej3hvVJiwlHNXjj6iOgWDwsMB6bYT4kNThhmeRtLdfQp/OqP3Fp16JEhVb7
 WBrcFf+yEmq6FuLfGKuzhD9CzymsGpOZgiBh35r3UWwLrc8Ij0Xpb8x7nm3saaPZj1Bw
 UEM8f0QZ1wH/wL5ARgZmnXO8SkkCTtmBJMuWERV8gXos0u3fqU03lTOX+wWfjXTjzMxp
 Z6BXlHKpOLurXQH03vbJhquKzRv/zeA6nKSOHrhHkWanNPePkQtm7f/igeMInu1LK3Fu
 p55Q==
Received: by 10.66.79.65 with SMTP id h1mr3521393pax.71.1350248137633; Sun, 14
 Oct 2012 13:55:37 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.66.161.163 with HTTP; Sun, 14 Oct 2012 13:55:07 -0700 (PDT)
In-Reply-To: <20121014144222.GA14503@stack.nl>
References: <CAF6rxg=PnQvtXydhx8+oRZJ2ERBoGwedXPcGi_9icYxAtPuxVw@mail.gmail.com>
 <20121014144222.GA14503@stack.nl>
From: Eitan Adler <lists@eitanadler.com>
Date: Sun, 14 Oct 2012 16:55:07 -0400
Message-ID: <CAF6rxg=Pq+=60h3OCaMY+wtviuhJ6w_erdx5Q7CSdQURG5+F=A@mail.gmail.com>
Subject: Re: -lpthread vs -pthread: does -D_REENTRANT matter?
To: Jilles Tjoelker <jilles@stack.nl>
Content-Type: text/plain; charset=UTF-8
X-Gm-Message-State: ALoCoQmCW4h+CBQAn5jRCVajiVFxci0rTxP/zLIaKNTRozenTyZw6/pL9KCDgSIJ36OHZitDahdF
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Oct 2012 20:55:38 -0000

On 14 October 2012 10:42, Jilles Tjoelker <jilles@stack.nl> wrote:

> Because C99 does not specify threading, it allows these transformations.
> In C11, they are forbidden. Passing -pthread disables them as well.

Is the man page wrong or do I misunderstand?

           This option sets flags for both the preprocessor and linker.  It
           does not affect the thread safety of object code produced by the
           compiler or that of libraries supplied with it.
-- 
Eitan Adler

From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct 15 11:52:32 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 988C2F62
 for <freebsd-hackers@freebsd.org>; Mon, 15 Oct 2012 11:52:32 +0000 (UTC)
 (envelope-from freebsd-hackers@m.gmane.org)
Received: from plane.gmane.org (plane.gmane.org [80.91.229.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 496E88FC0A
 for <freebsd-hackers@freebsd.org>; Mon, 15 Oct 2012 11:52:30 +0000 (UTC)
Received: from list by plane.gmane.org with local (Exim 4.69)
 (envelope-from <freebsd-hackers@m.gmane.org>) id 1TNjDn-0005Xi-Q1
 for freebsd-hackers@freebsd.org; Mon, 15 Oct 2012 13:52:35 +0200
Received: from lara.cc.fer.hr ([161.53.72.113])
 by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
 id 1AlnuQ-0007hv-00
 for <freebsd-hackers@freebsd.org>; Mon, 15 Oct 2012 13:52:35 +0200
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
 id 1AlnuQ-0007hv-00
 for <freebsd-hackers@freebsd.org>; Mon, 15 Oct 2012 13:52:35 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-hackers@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Subject: Re: NFS server bottlenecks
Date: Mon, 15 Oct 2012 13:52:14 +0200
Lines: 40
Message-ID: <k5gtdh$nc0$1@ger.gmane.org>
References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
 <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature";
 boundary="------------enig01F90BCB0CE5715825E41507"
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:14.0) Gecko/20120812 Thunderbird/14.0
In-Reply-To: <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com>
X-Enigmail-Version: 1.4.3
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Oct 2012 11:52:32 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig01F90BCB0CE5715825E41507
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 13/10/2012 17:22, Nikolay Denev wrote:

> drc3.patch applied and build cleanly and shows nice improvement!
>=20
> I've done a quick benchmark using iozone over the NFS mount from the Li=
nux host.
>=20

Hi,

If you are already testing, could you please also test this patch:

http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch

It should apply to HEAD without Rick's patches.

It's a bit different approach than Rick's, breaking down locks even more.=


--------------enig01F90BCB0CE5715825E41507
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlB7+O8ACgkQ/QjVBj3/HSx2ngCdE8Ab3oSlQY4uF+hzaMG2dOqK
3PwAn2FLAC/FsS36u4/5UljuM8qsTHym
=FhY4
-----END PGP SIGNATURE-----

--------------enig01F90BCB0CE5715825E41507--


From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct 15 13:41:43 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 7D7AF39E;
 Mon, 15 Oct 2012 13:41:43 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50])
 by mx1.freebsd.org (Postfix) with ESMTP id DA9598FC16;
 Mon, 15 Oct 2012 13:41:42 +0000 (UTC)
Received: by mail-wg0-f50.google.com with SMTP id 16so4160000wgi.31
 for <multiple recipients>; Mon, 15 Oct 2012 06:41:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=yOMhe4W/6hVDQgGMmzza+LSqsVDzldd2LqJR8geimAg=;
 b=QJVtMXJtLKPRK0cTOdh+e2N/sviqmzMaxHYyhALEfKyUQ/t0JKWa2gbW8nJuB50xyC
 oFVylLtfjj+p48zlyz45NVWPnsCn1T37jLLRi9AvhG7lec32xKKp9qh4L+XzSkakN1+c
 cR0uT4SULFlNsuCBufVKtSXvLemSRLstggSA93zDTvNdLaleaONwkYNCq3Rt9oSmlhan
 H2ZbjUMj70Nm0hlI6kAZQt1Qn7oFuWP54mhj1f/lR+cUmHX0wzNeuJ3JwYYy1bxb6UAk
 XHknq+soArTye9a5sLskIXePEf5I/0ctoqPFqPdvD+mJ8N42tdq3dBIvTnLMf9UvZEYv
 WoIw==
Received: by 10.180.102.131 with SMTP id fo3mr23997987wib.1.1350308501549;
 Mon, 15 Oct 2012 06:41:41 -0700 (PDT)
Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com.
 [217.18.249.148])
 by mx.google.com with ESMTPS id m14sm14025191wie.8.2012.10.15.06.41.39
 (version=TLSv1/SSLv3 cipher=OTHER);
 Mon, 15 Oct 2012 06:41:40 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <k5gtdh$nc0$1@ger.gmane.org>
Date: Mon, 15 Oct 2012 16:41:38 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <752224AF-F6B6-413C-8597-61829800E0BC@gmail.com>
References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
 <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <k5gtdh$nc0$1@ger.gmane.org>
To: Ivan Voras <ivoras@freebsd.org>
X-Mailer: Apple Mail (2.1498)
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Oct 2012 13:41:43 -0000


On Oct 15, 2012, at 2:52 PM, Ivan Voras <ivoras@freebsd.org> wrote:

> On 13/10/2012 17:22, Nikolay Denev wrote:
>=20
>> drc3.patch applied and build cleanly and shows nice improvement!
>>=20
>> I've done a quick benchmark using iozone over the NFS mount from the =
Linux host.
>>=20
>=20
> Hi,
>=20
> If you are already testing, could you please also test this patch:
>=20
> http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch
>=20
> It should apply to HEAD without Rick's patches.
>=20
> It's a bit different approach than Rick's, breaking down locks even =
more.
>=20

I will try to apply it to RELENG_9 as that's what I'm running and =
compare the results.


From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct 15 14:31:55 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 1A321381;
 Mon, 15 Oct 2012 14:31:55 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 771528FC16;
 Mon, 15 Oct 2012 14:31:54 +0000 (UTC)
Received: by mail-wg0-f42.google.com with SMTP id fm10so227346wgb.1
 for <multiple recipients>; Mon, 15 Oct 2012 07:31:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=2XGEsJ0S/5NHKzrTI5cvpDNx5j0GmK+mAdfGoRlgRTs=;
 b=0mKNrVPvn7S6O4tHoMc4B1/7LNgJTD/4g1AyuYl7KqDDfEURRTuNXZe5uu+2iTcQAz
 6XDA/ycETS1iWm8AOtDTx3VRr3UJuYHTjDYfD/9dxj2oSZi+kkdzSGg7qCKAuK7itSqx
 H0gp8kkDekqxhesgJnTqNT++NPcpXdVmsSCtCp8tz5vz3DuEoM0ZfSeXfTptNqZzUhsF
 waQ8vqADgSZlhuX+n0BVLUrZZWq1OyLqnkKCcwx3JJJ8luLr4Z6IaY6sPtG8YAC//5qy
 tsgpAKEnPQiL4xhO4WcHRsT+JMlaA7qVGWWdlk4ds2XSq1dgURFERADPqX0PGRqG94b3
 G7mw==
Received: by 10.180.87.34 with SMTP id u2mr24301324wiz.4.1350311513236;
 Mon, 15 Oct 2012 07:31:53 -0700 (PDT)
Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com.
 [217.18.249.148])
 by mx.google.com with ESMTPS id bn7sm16148172wib.8.2012.10.15.07.31.51
 (version=TLSv1/SSLv3 cipher=OTHER);
 Mon, 15 Oct 2012 07:31:51 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <k5gtdh$nc0$1@ger.gmane.org>
Date: Mon, 15 Oct 2012 17:31:50 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <0857D79A-6276-433F-9603-D52125CF190F@gmail.com>
References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
 <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <k5gtdh$nc0$1@ger.gmane.org>
To: Ivan Voras <ivoras@freebsd.org>
X-Mailer: Apple Mail (2.1498)
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Oct 2012 14:31:55 -0000


On Oct 15, 2012, at 2:52 PM, Ivan Voras <ivoras@freebsd.org> wrote:

> On 13/10/2012 17:22, Nikolay Denev wrote:
>=20
>> drc3.patch applied and build cleanly and shows nice improvement!
>>=20
>> I've done a quick benchmark using iozone over the NFS mount from the =
Linux host.
>>=20
>=20
> Hi,
>=20
> If you are already testing, could you please also test this patch:
>=20
> http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch
>=20
> It should apply to HEAD without Rick's patches.
>=20
> It's a bit different approach than Rick's, breaking down locks even =
more.
>=20

Applied and compiled OK, I will be able to test it tomorrow.


From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct 15 14:34:54 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 00CB14BF
 for <freebsd-hackers@freebsd.org>; Mon, 15 Oct 2012 14:34:53 +0000 (UTC)
 (envelope-from ivoras@gmail.com)
Received: from mail-qc0-f182.google.com (mail-qc0-f182.google.com
 [209.85.216.182])
 by mx1.freebsd.org (Postfix) with ESMTP id A827D8FC08
 for <freebsd-hackers@freebsd.org>; Mon, 15 Oct 2012 14:34:53 +0000 (UTC)
Received: by mail-qc0-f182.google.com with SMTP id l39so5089168qcs.13
 for <freebsd-hackers@freebsd.org>; Mon, 15 Oct 2012 07:34:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:from:date
 :x-google-sender-auth:message-id:subject:to:cc:content-type;
 bh=GtQaUSCGUYwr9Q823OR3frvUr5WBCW4SXWpCkkve7Gg=;
 b=jkoIapGXbbN3JTQRxuvUiP/ddvEhadGGOuZV4s3QbGuLNcGAv5oDuY1QDB37davr2O
 gQ8gkW8VVOVYxaZZLLvujNVCkgGMKFFCuELJHJ30Pz4gHCjU3XYL0jwXXRCSSpxvRFmw
 VMpmycuibXpekSoMQVX2lYbRd7cmARqM4lsKbpY/LtS6ewBBfynkR4WDsje8Hy3YAK4A
 PNs2ZnnQcm4m2l84WeBvXNbrWgdGDtiFfcE79/dRUxUeeGK8W32zSomJXPkUq7X39iNa
 /0XDh8v8aJMwmk1/POseKDsmFynDB6K2jNKNhcyhcvD8B+fJp6IOgzQp4rT0hs6igB/Y
 7n/g==
Received: by 10.224.188.200 with SMTP id db8mr20814938qab.86.1350311687553;
 Mon, 15 Oct 2012 07:34:47 -0700 (PDT)
MIME-Version: 1.0
Sender: ivoras@gmail.com
Received: by 10.49.82.231 with HTTP; Mon, 15 Oct 2012 07:34:07 -0700 (PDT)
In-Reply-To: <0857D79A-6276-433F-9603-D52125CF190F@gmail.com>
References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
 <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <k5gtdh$nc0$1@ger.gmane.org>
 <0857D79A-6276-433F-9603-D52125CF190F@gmail.com>
From: Ivan Voras <ivoras@freebsd.org>
Date: Mon, 15 Oct 2012 16:34:07 +0200
X-Google-Sender-Auth: HkkKm6ZDX-JowGb2TX1H2Dk6O5I
Message-ID: <CAF-QHFUU0hhtRNK1_p9zks2w+e22bfWOtv+XaqgFqTiURcJBbQ@mail.gmail.com>
Subject: Re: NFS server bottlenecks
To: Nikolay Denev <ndenev@gmail.com>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Oct 2012 14:34:54 -0000

On 15 October 2012 16:31, Nikolay Denev <ndenev@gmail.com> wrote:
>
> On Oct 15, 2012, at 2:52 PM, Ivan Voras <ivoras@freebsd.org> wrote:

>> http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch
>>
>> It should apply to HEAD without Rick's patches.
>>
>> It's a bit different approach than Rick's, breaking down locks even more.
>
> Applied and compiled OK, I will be able to test it tomorrow.

Ok, thanks!

The differences should be most visible in edge cases with a larger
number of nfsd processes (16+) and many CPU cores.

From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct 15 15:17:28 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 1D50881C;
 Mon, 15 Oct 2012 15:17:28 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
 [IPv6:2001:470:1f10:75::2])
 by mx1.freebsd.org (Postfix) with ESMTP id E4DE18FC12;
 Mon, 15 Oct 2012 15:17:27 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 3DB1BB924;
 Mon, 15 Oct 2012 11:17:27 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Rick Macklem <rmacklem@uoguelph.ca>
Subject: Re: NFS server bottlenecks
Date: Mon, 15 Oct 2012 11:08:09 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; )
References: <611092759.2189637.1350133402953.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <611092759.2189637.1350133402953.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Message-Id: <201210151108.09113.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Mon, 15 Oct 2012 11:17:27 -0400 (EDT)
Cc: Nikolay Denev <ndenev@gmail.com>, Garrett Wollman <wollman@freebsd.org>,
 FreeBSD Hackers <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Oct 2012 15:17:28 -0000

On Saturday, October 13, 2012 9:03:22 am Rick Macklem wrote:
> rick
> ps: I hope John doesn't mind being added to the cc list yet again. It's
>     just that I suspect he knows a fair bit about mutex implementation
>     and possible hardware cache line effects.

Currently mtx_pool just uses a simple array (I have patches to force the
array members to be cache-aligned, but they haven't been shown to help in
any benchmarks to date).  I do think though that I would prefer embedding
the mutexes in the hash table entries directly.  This is what we do for the
turnstile and sleep queue hash tables.

-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct 15 20:58:18 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 8FEF1350;
 Mon, 15 Oct 2012 20:58:18 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id 34CA68FC08;
 Mon, 15 Oct 2012 20:58:17 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EABp4fFCDaFvO/2dsb2JhbABFFoV8un6CIAEBBAEjVgUWDgoCAg0ZAlkGiBEGC6oFkwmBIYo4hSuBEgOVbIEVjxuDCYF7
X-IronPort-AV: E=Sophos;i="4.80,590,1344225600"; d="scan'208";a="186505520"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu-pri.mail.uoguelph.ca with ESMTP; 15 Oct 2012 16:58:16 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4A5B1B4039;
 Mon, 15 Oct 2012 16:58:16 -0400 (EDT)
Date: Mon, 15 Oct 2012 16:58:16 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Ivan Voras <ivoras@freebsd.org>
Message-ID: <1516511249.2287339.1350334696127.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <k5gtdh$nc0$1@ger.gmane.org>
Subject: Re: NFS server bottlenecks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Oct 2012 20:58:18 -0000

Ivan Voras wrote:
> On 13/10/2012 17:22, Nikolay Denev wrote:
> 
> > drc3.patch applied and build cleanly and shows nice improvement!
> >
> > I've done a quick benchmark using iozone over the NFS mount from the
> > Linux host.
> >
> 
> Hi,
> 
> If you are already testing, could you please also test this patch:
> 
> http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch
> 
I don't think (it is hard to test this) your trim cache algorithm
will choose the correct entries to delete.

The problem is that UDP entries very seldom time out (unless the
NFS server isn't seeing hardly any load) and are mostly trimmed
because the size exceeds the highwater mark.

With your code, it will clear out all of the entries in the first
hash buckets that aren't currently busy, until the total count
drops below the high water mark. (If you monitor a busy server
with "nfsstat -e -s", you'll see the cache never goes below the
high water mark, which is 500 by default.) This would delete
entries of fairly recent requests.

If you are going to replace the global LRU list with ones for
each hash bucket, then you'll have to compare the time stamps
on the least recently used entries of all the hash buckets and
then delete those. If you keep the timestamp of the least recent
one for that hash bucket in the hash bucket head, you could at least
use that to select which bucket to delete from next, but you'll still
need to:
  - lock that hash bucket
    - delete a few entries from that bucket's lru list
  - unlock hash bucket
- repeat for various buckets until the count is beloew the high
  water mark
Or something like that. I think you'll find it a lot more work that
one LRU list and one mutex. Remember that mutex isn't held for long.

Btw, the code looks very nice. (If I was being a style(9) zealot,
I'd remind you that it likes "return (X);" and not "return X;".

rick

> It should apply to HEAD without Rick's patches.
> 
> It's a bit different approach than Rick's, breaking down locks even
> more.

From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct 15 21:58:28 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 49512758
 for <freebsd-hackers@freebsd.org>; Mon, 15 Oct 2012 21:58:28 +0000 (UTC)
 (envelope-from ivoras@gmail.com)
Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com
 [209.85.212.54])
 by mx1.freebsd.org (Postfix) with ESMTP id F106A8FC0C
 for <freebsd-hackers@freebsd.org>; Mon, 15 Oct 2012 21:58:27 +0000 (UTC)
Received: by mail-vb0-f54.google.com with SMTP id v11so7416671vbm.13
 for <freebsd-hackers@freebsd.org>; Mon, 15 Oct 2012 14:58:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:from:date
 :x-google-sender-auth:message-id:subject:to:cc:content-type;
 bh=tJnSc+/HlFMB8Izt5qAntTCXsSyQVlSF+S8w2v+54Us=;
 b=oOMtU+wBsxXV1y5zjhyl5oH0K8Y1tEcStzcs6ToVXxGU01IUaWpy4hcnXQW4/3HMoz
 TpjLt0noi6WeqIzJ2PUgeKKKXtACetPIFyIVokwf7xukLLCo5TN8wpHeDS9p6nP2uWEL
 udmcsh+8SvT/65tVtRiwZ4vXfHSlLcs+VqEl4V7fy/WlvCAqihdPq47alBpghz2pWEjD
 /R8Y7Own8x8bTPrDWSHKHuezPwVl8ME1ftO4jeYKFoqp2uhaGsnLZdxzpGpghkaERcxp
 oq0sv0MC3UCESxVmph3/5x8955BWA4TRXCdFu2sniN6k5MHrdQbvTksXp9JTO6QJiHWA
 bWKQ==
Received: by 10.58.32.234 with SMTP id m10mr4658629vei.60.1350338306884; Mon,
 15 Oct 2012 14:58:26 -0700 (PDT)
MIME-Version: 1.0
Sender: ivoras@gmail.com
Received: by 10.59.0.37 with HTTP; Mon, 15 Oct 2012 14:57:46 -0700 (PDT)
In-Reply-To: <1516511249.2287339.1350334696127.JavaMail.root@erie.cs.uoguelph.ca>
References: <k5gtdh$nc0$1@ger.gmane.org>
 <1516511249.2287339.1350334696127.JavaMail.root@erie.cs.uoguelph.ca>
From: Ivan Voras <ivoras@freebsd.org>
Date: Mon, 15 Oct 2012 23:57:46 +0200
X-Google-Sender-Auth: l_I3vdWWsVANF5pElddtbH_WqPE
Message-ID: <CAF-QHFXHq0+Ld-Diu0jVhztwz68m+oH_AhzxNQF8S3QjQJ8uVw@mail.gmail.com>
Subject: Re: NFS server bottlenecks
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Oct 2012 21:58:28 -0000

On 15 October 2012 22:58, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> The problem is that UDP entries very seldom time out (unless the
> NFS server isn't seeing hardly any load) and are mostly trimmed
> because the size exceeds the highwater mark.
>
> With your code, it will clear out all of the entries in the first
> hash buckets that aren't currently busy, until the total count
> drops below the high water mark. (If you monitor a busy server
> with "nfsstat -e -s", you'll see the cache never goes below the
> high water mark, which is 500 by default.) This would delete
> entries of fairly recent requests.

You are right about that, if testing by Nikolay goes reasonably well,
I'll work on that.

> If you are going to replace the global LRU list with ones for
> each hash bucket, then you'll have to compare the time stamps
> on the least recently used entries of all the hash buckets and
> then delete those. If you keep the timestamp of the least recent
> one for that hash bucket in the hash bucket head, you could at least
> use that to select which bucket to delete from next, but you'll still
> need to:
>   - lock that hash bucket
>     - delete a few entries from that bucket's lru list
>   - unlock hash bucket
> - repeat for various buckets until the count is beloew the high
>   water mark

Ah, I think I get it: is the reliance on the high watermark as a
criteria for cache expiry the reason the list is a LRU instead of an
ordinary unordered list?

> Or something like that. I think you'll find it a lot more work that
> one LRU list and one mutex. Remember that mutex isn't held for long.

It could be, but the current state of my code is just groundwork for
the next things I have in plan:

1) Move the expiry code (the trim function) into a separate thread,
run periodically (or as a callout, I'll need to talk with someone
about which one is cheaper)

2) Replace the mutex with a rwlock. The only thing which is preventing
me from doing this right away is the LRU list, since each read access
modifies it (and requires a write lock). This is why I was asking you
if we can do away with the LRU algorithm.

> Btw, the code looks very nice. (If I was being a style(9) zealot,
> I'd remind you that it likes "return (X);" and not "return X;".

Thanks, I'll make it more style(9) compliant as I go along.

From owner-freebsd-hackers@FreeBSD.ORG  Tue Oct 16 00:45:16 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 4FE9C6C6;
 Tue, 16 Oct 2012 00:45:16 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
 [131.104.91.36])
 by mx1.freebsd.org (Postfix) with ESMTP id DD0A08FC08;
 Tue, 16 Oct 2012 00:45:15 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EAMisfFCDaFvO/2dsb2JhbABFhhK7C4IgAQEBAwEBAQEgKyALGw4KAgINGQIpAQkmBggHBAEcBIddBguoWJMQgSGKOBqFEYESA5M/gi2BFY8bgwmBRzQ
X-IronPort-AV: E=Sophos;i="4.80,592,1344225600"; d="scan'208";a="186531105"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-annu-pri.mail.uoguelph.ca with ESMTP; 15 Oct 2012 20:45:13 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id E3BCEB410E;
 Mon, 15 Oct 2012 20:45:13 -0400 (EDT)
Date: Mon, 15 Oct 2012 20:45:13 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Ivan Voras <ivoras@freebsd.org>
Message-ID: <230083937.2296102.1350348313903.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <CAF-QHFXHq0+Ld-Diu0jVhztwz68m+oH_AhzxNQF8S3QjQJ8uVw@mail.gmail.com>
Subject: Re: NFS server bottlenecks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Oct 2012 00:45:16 -0000

Ivan Voras wrote:
> On 15 October 2012 22:58, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> 
> > The problem is that UDP entries very seldom time out (unless the
> > NFS server isn't seeing hardly any load) and are mostly trimmed
> > because the size exceeds the highwater mark.
> >
> > With your code, it will clear out all of the entries in the first
> > hash buckets that aren't currently busy, until the total count
> > drops below the high water mark. (If you monitor a busy server
> > with "nfsstat -e -s", you'll see the cache never goes below the
> > high water mark, which is 500 by default.) This would delete
> > entries of fairly recent requests.
> 
> You are right about that, if testing by Nikolay goes reasonably well,
> I'll work on that.
> 
> > If you are going to replace the global LRU list with ones for
> > each hash bucket, then you'll have to compare the time stamps
> > on the least recently used entries of all the hash buckets and
> > then delete those. If you keep the timestamp of the least recent
> > one for that hash bucket in the hash bucket head, you could at least
> > use that to select which bucket to delete from next, but you'll
> > still
> > need to:
> >   - lock that hash bucket
> >     - delete a few entries from that bucket's lru list
> >   - unlock hash bucket
> > - repeat for various buckets until the count is beloew the high
> >   water mark
> 
> Ah, I think I get it: is the reliance on the high watermark as a
> criteria for cache expiry the reason the list is a LRU instead of an
> ordinary unordered list?
> 
Yes, I think you've gt it;-)

Have fun with it, rick

> > Or something like that. I think you'll find it a lot more work that
> > one LRU list and one mutex. Remember that mutex isn't held for long.
> 
> It could be, but the current state of my code is just groundwork for
> the next things I have in plan:
> 
> 1) Move the expiry code (the trim function) into a separate thread,
> run periodically (or as a callout, I'll need to talk with someone
> about which one is cheaper)
> 
> 2) Replace the mutex with a rwlock. The only thing which is preventing
> me from doing this right away is the LRU list, since each read access
> modifies it (and requires a write lock). This is why I was asking you
> if we can do away with the LRU algorithm.
> 
> > Btw, the code looks very nice. (If I was being a style(9) zealot,
> > I'd remind you that it likes "return (X);" and not "return X;".
> 
> Thanks, I'll make it more style(9) compliant as I go along.
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@FreeBSD.ORG  Tue Oct 16 10:29:48 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id ACAC0A61;
 Tue, 16 Oct 2012 10:29:48 +0000 (UTC)
 (envelope-from wkoszek@freebsd.czest.pl)
Received: from freebsd.czest.pl (freebsd.czest.pl [212.87.224.105])
 by mx1.freebsd.org (Postfix) with ESMTP id 321578FC1B;
 Tue, 16 Oct 2012 10:29:47 +0000 (UTC)
Received: from freebsd.czest.pl (freebsd.czest.pl [212.87.224.105])
 by freebsd.czest.pl (8.14.5/8.14.5) with ESMTP id q9GAJvKR007057;
 Tue, 16 Oct 2012 10:19:57 GMT
 (envelope-from wkoszek@freebsd.czest.pl)
Received: (from wkoszek@localhost)
 by freebsd.czest.pl (8.14.5/8.14.5/Submit) id q9GAJv5B007056;
 Tue, 16 Oct 2012 10:19:57 GMT (envelope-from wkoszek)
Date: Tue, 16 Oct 2012 10:19:57 +0000
From: "Wojciech A. Koszek" <wkoszek@freebsd.org>
To: freebsd-current@freebsd.org, freebsd-stable@freebsd.org,
 freebsd-hackers@freebsd.org
Subject: FreeBSD in Google Code-In 2012?  You can help too!
Message-ID: <20121016101957.GB53800@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-2
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7
 (freebsd.czest.pl [212.87.224.105]); Tue, 16 Oct 2012 10:19:58 +0000 (UTC)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Oct 2012 10:29:48 -0000

(cross-posted message; please keep discussion on freebsd-hackers@)

Hello,

Last year FreeBSD qualified for Google Code-In 2011 event--contest for
youngest open-source hackers in 13-17yr age range:

	http://www.google-melange.com/gci/homepage/google/gci2012

It was successful. We gained one more FreeBSD developer thanks to that
(Isabell Long) We're pondering participating in the contest this year as
well.

For now we only have 25 ideas. We need at least 100.

I felt all members of the FreeBSD community should help, so please submit
your own Google Code-In 2012 ideas here:

	http://www.emailmeform.com/builder/form/4aU93Obxo4NYdVAgb1

Examples of previously completed tasks:

	http://wiki.freebsd.org/GoogleCodeIn/2011Tasks

Those of you who have Wiki access, please spent 2 more minutes and submit
straight to Wiki:

	http://wiki.freebsd.org/GoogleCodeIn/2012Tasks

I plan to send out next e-mail if there's any progress on this project.

Help will be appreciated.

Thanks,

-- 
Wojciech A. Koszek
wkoszek@FreeBSD.czest.pl
http://FreeBSD.czest.pl/~wkoszek/

From owner-freebsd-hackers@FreeBSD.ORG  Tue Oct 16 17:00:43 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 1D9E966E;
 Tue, 16 Oct 2012 17:00:43 +0000 (UTC)
 (envelope-from wkoszek@freebsd.czest.pl)
Received: from freebsd.czest.pl (freebsd.czest.pl [212.87.224.105])
 by mx1.freebsd.org (Postfix) with ESMTP id 938848FC16;
 Tue, 16 Oct 2012 17:00:42 +0000 (UTC)
Received: from freebsd.czest.pl (freebsd.czest.pl [212.87.224.105])
 by freebsd.czest.pl (8.14.5/8.14.5) with ESMTP id q9GGopOU009154;
 Tue, 16 Oct 2012 16:50:51 GMT
 (envelope-from wkoszek@freebsd.czest.pl)
Received: (from wkoszek@localhost)
 by freebsd.czest.pl (8.14.5/8.14.5/Submit) id q9GGopps009153;
 Tue, 16 Oct 2012 16:50:51 GMT (envelope-from wkoszek)
Date: Tue, 16 Oct 2012 16:50:51 +0000
From: "Wojciech A. Koszek" <wkoszek@freebsd.org>
To: freebsd-current@freebsd.org, freebsd-stable@freebsd.org,
 freebsd-hackers@freebsd.org
Subject: Re: FreeBSD in Google Code-In 2012?  You can help too!
Message-ID: <20121016165051.GC53800@FreeBSD.org>
References: <20121016101957.GB53800@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-2
Content-Disposition: inline
In-Reply-To: <20121016101957.GB53800@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7
 (freebsd.czest.pl [212.87.224.105]); Tue, 16 Oct 2012 16:50:52 +0000 (UTC)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Oct 2012 17:00:43 -0000

On Tue, Oct 16, 2012 at 10:19:57AM +0000, Wojciech A. Koszek wrote:
> (cross-posted message; please keep discussion on freebsd-hackers@)
> 
> Hello,
> 
> Last year FreeBSD qualified for Google Code-In 2011 event--contest for
> youngest open-source hackers in 13-17yr age range:
> 
> 	http://www.google-melange.com/gci/homepage/google/gci2012
> 
> It was successful. We gained one more FreeBSD developer thanks to that
> (Isabell Long) We're pondering participating in the contest this year as
> well.
> 
> For now we only have 25 ideas. We need at least 100.
> 
> I felt all members of the FreeBSD community should help, so please submit
> your own Google Code-In 2012 ideas here:
> 
> 	http://www.emailmeform.com/builder/form/4aU93Obxo4NYdVAgb1
> 
> Examples of previously completed tasks:
> 
> 	http://wiki.freebsd.org/GoogleCodeIn/2011Tasks
> 
> Those of you who have Wiki access, please spent 2 more minutes and submit
> straight to Wiki:
> 
> 	http://wiki.freebsd.org/GoogleCodeIn/2012Tasks
> 
> I plan to send out next e-mail if there's any progress on this project.
> 
> Help will be appreciated.

Hi,

(cross-posted message; please keep discussion on freebsd-hackers@)

I made a mistake -- the web form didn't have "Contributor's name", thus I
don't know who of you guys contributed first 9 ideas; e-mail me which ideas
are yours, so that your name can be mentioned on Wiki:

	http://wiki.freebsd.org/GoogleCodeIn/2012Tasks

I made slight adjustments to the form to make some fields more precise:

	http://www.emailmeform.com/builder/form/4aU93Obxo4NYdVAgb1

Sorry and thanks,

-- 
Wojciech A. Koszek
wkoszek@FreeBSD.czest.pl
http://FreeBSD.czest.pl/~wkoszek/

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 17 11:21:02 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id F1BF092A;
 Wed, 17 Oct 2012 11:21:01 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 122A08FC0C;
 Wed, 17 Oct 2012 11:21:00 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA04290;
 Wed, 17 Oct 2012 14:20:58 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1TORgI-000Pc7-Do; Wed, 17 Oct 2012 14:20:58 +0300
Message-ID: <507E9498.10905@FreeBSD.org>
Date: Wed, 17 Oct 2012 14:20:56 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:16.0) Gecko/20121013 Thunderbird/16.0.1
MIME-Version: 1.0
To: freebsd-hackers <freebsd-hackers@FreeBSD.org>
Subject: _mtx_lock_spin: obsolete historic handling of kdb_active and panicstr?
X-Enigmail-Version: 1.4.5
Content-Type: text/plain; charset=X-VIET-VPS
Content-Transfer-Encoding: 7bit
Cc: Bruce Evans <brde@optusnet.com.au>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Oct 2012 11:21:02 -0000


_mtx_lock_spin has the following check in its retry loop:
if (i < 60000000 || kdb_active || panicstr != NULL)
        DELAY(1);
else
        _mtx_lock_spin_failed(m);

Which means that in the (!kdb_active && panicstr == NULL) case we will make at
most 60000000 iterations and then call _mtx_lock_spin_failed (which proceeds to
panic).  When either kdb_active or panicstr is set, then we are going to loop
forever.

I've done some digging through the lengthy history and many evolutions of the
code (unfortunately I haven't kept records during the research), and my
conclusion is that the kdb_active and panicstr checks were added at the quite
early era of FreeBSD SMP, where we didn't have a mechanism to stop/block other
CPUs when kdb or panic were entered.  We didn't even prevent parallel execution
of panic.
So the above code was a sort of defense where we hoped that "other" CPUs would
eventually stumble upon some held spinlock and would be captured there.  Maybe
there was a specific set of spinlocks, which were supposed to help.

Nowadays, we do try to stop other CPUs during panic and kdb activation and there
are good chances that they are indeed stopped.  In this circumstances, should
the main CPU be so unlucky as to run into the held spinlock, the above check
would do more harm than good - the main CPU would just spin there forever,
because a lock owner is also spinning in the stop loop and so can't release the
lock.
Actually, this is only true for the kdb case.  In the panic case we make a check
earlier and simply ignore/skip/bust all the locks.  That makes the panicstr
check in the code in question both harmless and useless.

So I'd like to propose to remove those checks altogether.  Or perhaps to
"reverse" them and immediately induce a (possibly secondary) panic if we ever
get to that wait loop and kdb_active || panicstr != NULL.

What do you think?
-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 17 12:07:02 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id DCC384CF;
 Wed, 17 Oct 2012 12:07:02 +0000 (UTC)
 (envelope-from mdf356@gmail.com)
Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com
 [209.85.160.54])
 by mx1.freebsd.org (Postfix) with ESMTP id A69C48FC0C;
 Wed, 17 Oct 2012 12:07:02 +0000 (UTC)
Received: by mail-pb0-f54.google.com with SMTP id rp8so7642436pbb.13
 for <multiple recipients>; Wed, 17 Oct 2012 05:07:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date
 :x-google-sender-auth:message-id:subject:from:to:cc:content-type;
 bh=r/OnoF1seo7XedSc2dDPTWZG6lgDn9TtYN3JbG9Sg9w=;
 b=VEaD5QxkQcBwLvBPgOCSm54ayNtAE1CFeGEQJcROoRExJdrGnsJlpzD6uxhpGKCZk2
 i4RdX/e6yChyyIouSrJn3/7ltzq5LkxelM7Q0mNEXwVAbQtROwj0/bMMVTfDD8+gi6By
 eZ5O8vSlZ4RbHivYRKciQ5H2OpzkxxzOBNeuBvawdk3reOWwOigERVobiCkC0pP3g6Y/
 a8CQNfgBKuLhhfKM09FD2NH07qcY8Xl9R57VLx1AqNfotWsAD/tvPMEgYBtSq8EuprF/
 C+xOHNrdx2su6E54UUxwKBRIxzvTXR5RQANc2wifuh4EVwNT13nKff877aEIkqzIa6az
 GAYA==
MIME-Version: 1.0
Received: by 10.68.189.65 with SMTP id gg1mr55471329pbc.106.1350475621804;
 Wed, 17 Oct 2012 05:07:01 -0700 (PDT)
Sender: mdf356@gmail.com
Received: by 10.68.223.105 with HTTP; Wed, 17 Oct 2012 05:07:01 -0700 (PDT)
In-Reply-To: <507E9498.10905@FreeBSD.org>
References: <507E9498.10905@FreeBSD.org>
Date: Wed, 17 Oct 2012 05:07:01 -0700
X-Google-Sender-Auth: OcNcUhB4GUhqKdPTqH-GQKIBKKg
Message-ID: <CAMBSHm_P0Af3CemFo0X-_HgJNdndKDRD9Fav1yQwh=8T35rWdg@mail.gmail.com>
Subject: Re: _mtx_lock_spin: obsolete historic handling of kdb_active and
 panicstr?
From: mdf@FreeBSD.org
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-hackers <freebsd-hackers@freebsd.org>,
 Bruce Evans <brde@optusnet.com.au>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Oct 2012 12:07:03 -0000

On Wed, Oct 17, 2012 at 4:20 AM, Andriy Gapon <avg@freebsd.org> wrote:
>
> _mtx_lock_spin has the following check in its retry loop:
> if (i < 60000000 || kdb_active || panicstr != NULL)
>         DELAY(1);
> else
>         _mtx_lock_spin_failed(m);
>
[snip analysis]
>
> So I'd like to propose to remove those checks altogether.  Or perhaps to
> "reverse" them and immediately induce a (possibly secondary) panic if we ever
> get to that wait loop and kdb_active || panicstr != NULL.

The panicstr can clearly be removed.  I think there can be race
conditions with entering kdb and taking a spinlock, because the
spinlock acquire will block interrupts.  I don't remember if we always
NMI for kdb enter or if that's configurable.  The old code was clearer
(or maybe I'm just remembering an Isilon hack); looking at
stop_cpus_hard() I don't see that it uses an NMI.  So a CPU can block
interrupts, then if it sees kdb_active it will spin until we leave
kdb, rather than panic.  Of course this would only be relevant if the
CPU it's trying to acquire is already held; otherwise it should find
the lock unowned and this isn't relevant.  And if the lock is owned by
the thread entering kdb, that would be a real panic, not a recoverable
kdb entry.

So I think maybe the kdb_active check is also not helpful after all.

Cheers,
matthew

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 17 14:27:59 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 46515B04;
 Wed, 17 Oct 2012 14:27:59 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 5AC578FC08;
 Wed, 17 Oct 2012 14:27:57 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
 [212.40.38.101])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA06640;
 Wed, 17 Oct 2012 17:25:58 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Message-ID: <507EBFF6.5080904@FreeBSD.org>
Date: Wed, 17 Oct 2012 17:25:58 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:16.0) Gecko/20121012 Thunderbird/16.0.1
MIME-Version: 1.0
To: mdf@FreeBSD.org
Subject: Re: _mtx_lock_spin: obsolete historic handling of kdb_active and
 panicstr?
References: <507E9498.10905@FreeBSD.org>
 <CAMBSHm_P0Af3CemFo0X-_HgJNdndKDRD9Fav1yQwh=8T35rWdg@mail.gmail.com>
In-Reply-To: <CAMBSHm_P0Af3CemFo0X-_HgJNdndKDRD9Fav1yQwh=8T35rWdg@mail.gmail.com>
X-Enigmail-Version: 1.4.5
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers <freebsd-hackers@FreeBSD.org>,
 Bruce Evans <brde@optusnet.com.au>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Oct 2012 14:27:59 -0000

on 17/10/2012 15:07 mdf@FreeBSD.org said the following:
> On Wed, Oct 17, 2012 at 4:20 AM, Andriy Gapon <avg@freebsd.org> wrote:
>>
>> _mtx_lock_spin has the following check in its retry loop:
>> if (i < 60000000 || kdb_active || panicstr != NULL)
>>         DELAY(1);
>> else
>>         _mtx_lock_spin_failed(m);
>>
> [snip analysis]
>>
>> So I'd like to propose to remove those checks altogether.  Or perhaps to
>> "reverse" them and immediately induce a (possibly secondary) panic if we ever
>> get to that wait loop and kdb_active || panicstr != NULL.
> 
> The panicstr can clearly be removed.  I think there can be race
> conditions with entering kdb and taking a spinlock, because the
> spinlock acquire will block interrupts.  I don't remember if we always
> NMI for kdb enter or if that's configurable.  The old code was clearer
> (or maybe I'm just remembering an Isilon hack); looking at
> stop_cpus_hard() I don't see that it uses an NMI.

kdb always uses stop_cpus_hard and stop_cpus_hard always uses NMI on x86.
>From sys/x86/x86/local_apic.c:
if (vector == IPI_STOP_HARD)
        icrlo |= APIC_DELMODE_NMI | APIC_LEVEL_ASSERT;

>  So a CPU can block
> interrupts, then if it sees kdb_active it will spin until we leave
> kdb, rather than panic.  Of course this would only be relevant if the
> CPU it's trying to acquire is already held; otherwise it should find
> the lock unowned and this isn't relevant.  And if the lock is owned by
> the thread entering kdb, that would be a real panic, not a recoverable
> kdb entry.
> 
> So I think maybe the kdb_active check is also not helpful after all.
> 
> Cheers,
> matthew
> 


-- 
Andriy Gapon

From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 17 18:46:11 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 8ED34BD;
 Wed, 17 Oct 2012 18:46:11 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
 [IPv6:2001:470:1f10:75::2])
 by mx1.freebsd.org (Postfix) with ESMTP id 60E9F8FC12;
 Wed, 17 Oct 2012 18:46:11 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id BB801B94F;
 Wed, 17 Oct 2012 14:46:10 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Andriy Gapon <avg@freebsd.org>
Subject: Re: _mtx_lock_spin: obsolete historic handling of kdb_active and
 panicstr?
Date: Wed, 17 Oct 2012 10:12:05 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; )
References: <507E9498.10905@FreeBSD.org>
In-Reply-To: <507E9498.10905@FreeBSD.org>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201210171012.05392.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Wed, 17 Oct 2012 14:46:10 -0400 (EDT)
Cc: freebsd-hackers <freebsd-hackers@freebsd.org>,
 Bruce Evans <brde@optusnet.com.au>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 17 Oct 2012 18:46:11 -0000

On Wednesday, October 17, 2012 7:20:56 am Andriy Gapon wrote:
> 
> _mtx_lock_spin has the following check in its retry loop:
> if (i < 60000000 || kdb_active || panicstr != NULL)
>         DELAY(1);
> else
>         _mtx_lock_spin_failed(m);
> 
> Which means that in the (!kdb_active && panicstr == NULL) case we will make at
> most 60000000 iterations and then call _mtx_lock_spin_failed (which proceeds to
> panic).  When either kdb_active or panicstr is set, then we are going to loop
> forever.
> 
> I've done some digging through the lengthy history and many evolutions of the
> code (unfortunately I haven't kept records during the research), and my
> conclusion is that the kdb_active and panicstr checks were added at the quite
> early era of FreeBSD SMP, where we didn't have a mechanism to stop/block other
> CPUs when kdb or panic were entered.  We didn't even prevent parallel execution
> of panic.
> So the above code was a sort of defense where we hoped that "other" CPUs would
> eventually stumble upon some held spinlock and would be captured there.  Maybe
> there was a specific set of spinlocks, which were supposed to help.

It wasn't so much as a way of hoping CPUs would stop so much as a way to prevent
other CPUs from panic'ing while another CPU had already panic'd or was already
in DDB making debugging harder.

> Nowadays, we do try to stop other CPUs during panic and kdb activation and there
> are good chances that they are indeed stopped.  In this circumstances, should
> the main CPU be so unlucky as to run into the held spinlock, the above check
> would do more harm than good - the main CPU would just spin there forever,
> because a lock owner is also spinning in the stop loop and so can't release the
> lock.
> Actually, this is only true for the kdb case.  In the panic case we make a check
> earlier and simply ignore/skip/bust all the locks.  That makes the panicstr
> check in the code in question both harmless and useless.
> 
> So I'd like to propose to remove those checks altogether.  Or perhaps to
> "reverse" them and immediately induce a (possibly secondary) panic if we ever
> get to that wait loop and kdb_active || panicstr != NULL.
> 
> What do you think?

I think this sounds fine.  I do think though that there are two behaviors.  If
for some reason you are not able to stop the other CPUs, you would rather them
spin than trigger another panic while you are in DDB or writing out a crashdump.
However, the CPU that is currently in the debugger or writing out a crashdump
should probably bust all locks (code executed in debugger backends should
generally avoid all locking at all, and depend on things like try locks where it
gracefully fails if it must use locking.  That would make the kdb_active case
here irrelevant, and the panic case is already handled as you noted.)

-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 18 00:09:29 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id D11F13EE
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 00:09:29 +0000 (UTC)
 (envelope-from tris_vern@hotmail.com)
Received: from snt0-omc2-s14.snt0.hotmail.com (snt0-omc2-s14.snt0.hotmail.com
 [65.55.90.89]) by mx1.freebsd.org (Postfix) with ESMTP id 9EFCA8FC08
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 00:09:29 +0000 (UTC)
Received: from SNT124-W20 ([65.55.90.73]) by snt0-omc2-s14.snt0.hotmail.com
 with Microsoft SMTPSVC(6.0.3790.4675); 
 Wed, 17 Oct 2012 17:08:23 -0700
Message-ID: <SNT124-W20F26CF7B468F7F09B9B4983760@phx.gbl>
X-Originating-IP: [165.228.7.150]
From: Tristan Verniquet <tris_vern@hotmail.com>
To: <freebsd-hackers@freebsd.org>
Subject: syncing large mmaped files
Date: Thu, 18 Oct 2012 10:08:22 +1000
Importance: Normal
MIME-Version: 1.0
X-OriginalArrivalTime: 18 Oct 2012 00:08:23.0629 (UTC)
 FILETIME=[B0606FD0:01CDACC4]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Oct 2012 00:09:29 -0000


I want to work with large (1-10G) files in memory but eventually sync them =
back out to disk. The problem is that the sync process appears to lock the =
file in kernel for the duration of the sync=2C which can run into minutes. =
This prevents other processes from reading from the file (unless they alrea=
dy have it mapped) for this whole time. Is there any way to prevent this? I=
 think I read in a post somewhere about openbsd implementing partial-writes=
 when it hits a file with lots of dirty pages in order to prevent this. Is =
there anything available for FreeBSD or is there another way around it?

Sorry if this is the wrong mailing list.
 		 	   		  =

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 18 07:55:55 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 50A01930
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 07:55:55 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-wi0-f170.google.com (mail-wi0-f170.google.com
 [209.85.212.170])
 by mx1.freebsd.org (Postfix) with ESMTP id CED778FC0A
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 07:55:54 +0000 (UTC)
Received: by mail-wi0-f170.google.com with SMTP id hm2so1503703wib.1
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 00:55:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=nvTwhd1fvO7rapkIRWslItyX6YLp02h2Recf23Z4aHg=;
 b=u7vsaNSI9XVPWpzgGuAj5tU7Ve5nAOkOQ6v5Wp2TqwEv6t5bKLWouKWvyWFpY6+dmr
 LIev+PsUqc+EvDam7fVG2xhWZHcORPkMq0wB/+ASGVFdkYuoCTd63ddRgfzBJ4YtBHlU
 GKHxzKbk2g9ng6HxscabaBImRGGJrqq6voXQmSNaDasFJnyRh6xikGt0PGS8znOvCzK5
 3xYpoR4dBid5pj1BM10P8u3s8mFfJB2oNnAXTHnYstJ78aYvkjbnzCRtDYu3bG6dnWfL
 rBRVhCGcUulT/6TN8JbyGu+TbZmlIg1t+6smnW8ysXgPY9CAsHDOG3QFm6DGXTkorHYD
 97cQ==
Received: by 10.216.217.194 with SMTP id i44mr12368145wep.60.1350546947720;
 Thu, 18 Oct 2012 00:55:47 -0700 (PDT)
Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com.
 [217.18.249.148])
 by mx.google.com with ESMTPS id j8sm28361581wiy.9.2012.10.18.00.55.45
 (version=TLSv1/SSLv3 cipher=OTHER);
 Thu, 18 Oct 2012 00:55:46 -0700 (PDT)
Subject: Re: syncing large mmaped files
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <SNT124-W20F26CF7B468F7F09B9B4983760@phx.gbl>
Date: Thu, 18 Oct 2012 10:55:46 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <B7855BF6-B717-4D34-AE5D-760FFA7462A5@gmail.com>
References: <SNT124-W20F26CF7B468F7F09B9B4983760@phx.gbl>
To: Tristan Verniquet <tris_vern@hotmail.com>
X-Mailer: Apple Mail (2.1498)
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Oct 2012 07:55:55 -0000


On Oct 18, 2012, at 3:08 AM, Tristan Verniquet <tris_vern@hotmail.com> =
wrote:

>=20
> I want to work with large (1-10G) files in memory but eventually sync =
them back out to disk. The problem is that the sync process appears to =
lock the file in kernel for the duration of the sync, which can run into =
minutes. This prevents other processes from reading from the file =
(unless they already have it mapped) for this whole time. Is there any =
way to prevent this? I think I read in a post somewhere about openbsd =
implementing partial-writes when it hits a file with lots of dirty pages =
in order to prevent this. Is there anything available for FreeBSD or is =
there another way around it?
>=20
> Sorry if this is the wrong mailing list.
> 		 	   		 =20
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to =
"freebsd-hackers-unsubscribe@freebsd.org"

Isn't msync(2) what you are looking for?=

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 18 08:35:45 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id D123940F
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 08:35:45 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
 by mx1.freebsd.org (Postfix) with ESMTP id 4AEC28FC08
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 08:35:44 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
 by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q9I8Zn6c015975;
 Thu, 18 Oct 2012 11:35:49 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
 by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q9I8Zbbc004754;
 Thu, 18 Oct 2012 11:35:37 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
 by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q9I8Zbul004753;
 Thu, 18 Oct 2012 11:35:37 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
 kostikbel@gmail.com using -f
Date: Thu, 18 Oct 2012 11:35:37 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Tristan Verniquet <tris_vern@hotmail.com>
Subject: Re: syncing large mmaped files
Message-ID: <20121018083537.GQ35915@deviant.kiev.zoral.com.ua>
References: <SNT124-W20F26CF7B468F7F09B9B4983760@phx.gbl>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="XA7quakUSnawneuz"
Content-Disposition: inline
In-Reply-To: <SNT124-W20F26CF7B468F7F09B9B4983760@phx.gbl>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
 autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
 skuns.kiev.zoral.com.ua
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Oct 2012 08:35:45 -0000


--XA7quakUSnawneuz
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote:
>=20
> I want to work with large (1-10G) files in memory but eventually sync
> them back out to disk. The problem is that the sync process appears to
> lock the file in kernel for the duration of the sync, which can run
> into minutes. This prevents other processes from reading from the file
> (unless they already have it mapped) for this whole time. Is there
> any way to prevent this? I think I read in a post somewhere about
> openbsd implementing partial-writes when it hits a file with lots of
> dirty pages in order to prevent this. Is there anything available for
> FreeBSD or is there another way around it?
>
No, currently the vnode lock is held exclusive for the whole duration
of the msync(2) syscall or its analog from the syncer.

Making a change to periodically drop the vnode lock in
vm_object_page_clean() might be possible, but requires the benchmarking
to make sure that we do not pessimize the common case. Also, this opens
a possibility for the vnode reclamation meantime.

Anyway, note that you cannot 'work with large files in memory', even if
you have enough RAM and no pressure to hold all the file pages resident.
The syncer will do a writeback periodically regardless of the application
calling msync(2) or not, with the interval of approximately 30 seconds.

--XA7quakUSnawneuz
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAlB/v1kACgkQC3+MBN1Mb4joagCgj1oYiDMQjM9s9kK7HniP4JiL
RVEAn1294Rq3lIUMnPdt2G2ue1z3Jppa
=Z1TH
-----END PGP SIGNATURE-----

--XA7quakUSnawneuz--

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 18 13:47:36 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 5AFDD7B5
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 13:47:36 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
 [IPv6:2001:470:1f10:75::2])
 by mx1.freebsd.org (Postfix) with ESMTP id 2CA8E8FC17
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 13:47:36 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7F97AB98E;
 Thu, 18 Oct 2012 09:47:35 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-hackers@freebsd.org
Subject: Re: syncing large mmaped files
Date: Thu, 18 Oct 2012 09:39:34 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; )
References: <SNT124-W20F26CF7B468F7F09B9B4983760@phx.gbl>
 <20121018083537.GQ35915@deviant.kiev.zoral.com.ua>
In-Reply-To: <20121018083537.GQ35915@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Message-Id: <201210180939.34861.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Thu, 18 Oct 2012 09:47:35 -0400 (EDT)
Cc: Konstantin Belousov <kostikbel@gmail.com>,
 Tristan Verniquet <tris_vern@hotmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Oct 2012 13:47:36 -0000

On Thursday, October 18, 2012 4:35:37 am Konstantin Belousov wrote:
> On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote:
> > 
> > I want to work with large (1-10G) files in memory but eventually sync
> > them back out to disk. The problem is that the sync process appears to
> > lock the file in kernel for the duration of the sync, which can run
> > into minutes. This prevents other processes from reading from the file
> > (unless they already have it mapped) for this whole time. Is there
> > any way to prevent this? I think I read in a post somewhere about
> > openbsd implementing partial-writes when it hits a file with lots of
> > dirty pages in order to prevent this. Is there anything available for
> > FreeBSD or is there another way around it?
> >
> No, currently the vnode lock is held exclusive for the whole duration
> of the msync(2) syscall or its analog from the syncer.
> 
> Making a change to periodically drop the vnode lock in
> vm_object_page_clean() might be possible, but requires the benchmarking
> to make sure that we do not pessimize the common case. Also, this opens
> a possibility for the vnode reclamation meantime.

You can simulate this in userland by breaking up your msync() into multiple
msync() calls where each call just syncs a portion of the file.

> Anyway, note that you cannot 'work with large files in memory', even if
> you have enough RAM and no pressure to hold all the file pages resident.
> The syncer will do a writeback periodically regardless of the application
> calling msync(2) or not, with the interval of approximately 30 seconds.

You can mmap with MAP_NOSYNC to prevent the syncer from writing the file out
every 30 seconds.

-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 18 15:11:58 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id B5C3272C;
 Thu, 18 Oct 2012 15:11:58 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-wi0-f170.google.com (mail-wi0-f170.google.com
 [209.85.212.170])
 by mx1.freebsd.org (Postfix) with ESMTP id 166468FC1B;
 Thu, 18 Oct 2012 15:11:56 +0000 (UTC)
Received: by mail-wi0-f170.google.com with SMTP id hm2so1945909wib.1
 for <multiple recipients>; Thu, 18 Oct 2012 08:11:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=9EAGJ3tBEIhFVcRVS8g2aCLO95xdlOnsmAkaMcXLB7o=;
 b=ACHnkuttJboOB8/p1FV9vKYYayRuAfHVQxJFcZ0pdrZwgYqjmFa07Nqx/YWOCyz/gM
 SNGAYxviLoWBobbThjqQo1VaW2I1oCR2XJk9jaLu4xrObYBmjagMpL+NcevSRbJBU8uR
 Ry2smtbX1Xg/O6C+ShAIvQyQ4b/ftXHSZEhpd0kOV5Txhj+hY5SoXgbj1W396FOYti4h
 TbBqWMuOkqlcZrNN9Fpt15lbRqvHHZxZytm38jS6T+vSskWHuG6b+oWaOCs+XTozgP+I
 pXAbJQk+maT76grejsFWUDeorEuXJKir3RSdWwNJTb7OOKW1kBiQRaDc04Pt81KAqtyE
 9r2Q==
Received: by 10.180.85.99 with SMTP id g3mr12044511wiz.5.1350573115711;
 Thu, 18 Oct 2012 08:11:55 -0700 (PDT)
Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com.
 [217.18.249.148])
 by mx.google.com with ESMTPS id ay10sm34034836wib.2.2012.10.18.08.11.51
 (version=TLSv1/SSLv3 cipher=OTHER);
 Thu, 18 Oct 2012 08:11:52 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <CAF-QHFUU0hhtRNK1_p9zks2w+e22bfWOtv+XaqgFqTiURcJBbQ@mail.gmail.com>
Date: Thu, 18 Oct 2012 18:11:51 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com>
References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
 <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <k5gtdh$nc0$1@ger.gmane.org>
 <0857D79A-6276-433F-9603-D52125CF190F@gmail.com>
 <CAF-QHFUU0hhtRNK1_p9zks2w+e22bfWOtv+XaqgFqTiURcJBbQ@mail.gmail.com>
To: Ivan Voras <ivoras@freebsd.org>
X-Mailer: Apple Mail (2.1498)
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Oct 2012 15:11:58 -0000


On Oct 15, 2012, at 5:34 PM, Ivan Voras <ivoras@freebsd.org> wrote:

> On 15 October 2012 16:31, Nikolay Denev <ndenev@gmail.com> wrote:
>>=20
>> On Oct 15, 2012, at 2:52 PM, Ivan Voras <ivoras@freebsd.org> wrote:
>=20
>>> http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch
>>>=20
>>> It should apply to HEAD without Rick's patches.
>>>=20
>>> It's a bit different approach than Rick's, breaking down locks even =
more.
>>=20
>> Applied and compiled OK, I will be able to test it tomorrow.
>=20
> Ok, thanks!
>=20
> The differences should be most visible in edge cases with a larger
> number of nfsd processes (16+) and many CPU cores.

I'm now rebooting with your patch, and hopefully will have some results =
tomorrow.


From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 18 15:17:11 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id E9EA0A54;
 Thu, 18 Oct 2012 15:17:11 +0000 (UTC)
 (envelope-from tris_vern@hotmail.com)
Received: from snt0-omc3-s36.snt0.hotmail.com (snt0-omc3-s36.snt0.hotmail.com
 [65.55.90.175])
 by mx1.freebsd.org (Postfix) with ESMTP id B638B8FC08;
 Thu, 18 Oct 2012 15:17:11 +0000 (UTC)
Received: from SNT124-W29 ([65.55.90.135]) by snt0-omc3-s36.snt0.hotmail.com
 with Microsoft SMTPSVC(6.0.3790.4675); 
 Thu, 18 Oct 2012 08:16:04 -0700
Message-ID: <SNT124-W29475579115970B1FBCDB683760@phx.gbl>
X-Originating-IP: [165.228.7.150]
From: Tristan Verniquet <tris_vern@hotmail.com>
To: <jhb@freebsd.org>, freebsd hackers <freebsd-hackers@freebsd.org>
Subject: RE: syncing large mmaped files
Date: Fri, 19 Oct 2012 01:16:04 +1000
Importance: Normal
In-Reply-To: <201210180939.34861.jhb@freebsd.org>
References: <SNT124-W20F26CF7B468F7F09B9B4983760@phx.gbl>,
 <20121018083537.GQ35915@deviant.kiev.zoral.com.ua>,
 <201210180939.34861.jhb@freebsd.org>
MIME-Version: 1.0
X-OriginalArrivalTime: 18 Oct 2012 15:16:04.0711 (UTC)
 FILETIME=[7DB60F70:01CDAD43]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: kostikbel@gmail.com
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Oct 2012 15:17:12 -0000


> From: jhb@freebsd.org
> To: freebsd-hackers@freebsd.org
> Subject: Re: syncing large mmaped files
> Date: Thu=2C 18 Oct 2012 09:39:34 -0400
> CC: kostikbel@gmail.com=3B tris_vern@hotmail.com
>=20
> On Thursday=2C October 18=2C 2012 4:35:37 am Konstantin Belousov wrote:
> > On Thu=2C Oct 18=2C 2012 at 10:08:22AM +1000=2C Tristan Verniquet wrote=
:
> > >=20
> > > I want to work with large (1-10G) files in memory but eventually sync
> > > them back out to disk. The problem is that the sync process appears t=
o
> > > lock the file in kernel for the duration of the sync=2C which can run
> > > into minutes. This prevents other processes from reading from the fil=
e
> > > (unless they already have it mapped) for this whole time. Is there
> > > any way to prevent this? I think I read in a post somewhere about
> > > openbsd implementing partial-writes when it hits a file with lots of
> > > dirty pages in order to prevent this. Is there anything available for
> > > FreeBSD or is there another way around it?
> > >
> > No=2C currently the vnode lock is held exclusive for the whole duration
> > of the msync(2) syscall or its analog from the syncer.
> >=20
> > Making a change to periodically drop the vnode lock in
> > vm_object_page_clean() might be possible=2C but requires the benchmarki=
ng
> > to make sure that we do not pessimize the common case. Also=2C this ope=
ns
> > a possibility for the vnode reclamation meantime.
>=20
> You can simulate this in userland by breaking up your msync() into multip=
le
> msync() calls where each call just syncs a portion of the file.

Thanks=2C I was doing this and I thought I was getting much worse performan=
ce from the msync over the fsync=2C however I am trying it again now and th=
e difference doesn't seem as large as I first imagined. It is still taking =
about 4x as long for the case where all the pages are dirty but catches up =
when the file is more sparsely written. I guess that is probably acceptable=
.

When all pages are dirty=2C iostat shows that the fsync will write 128KB/Tr=
ansaction=2C whereas msync always does 16 KB/Transaction and a lower MB/s. =
It will continue to do this if I only dirty every 2nd=2C 3rd or 4th page. W=
hen I only dirty every 5th page the fsync seems to kick into another mode a=
nd starts doing 16KB/Transaction and the time starts becoming comparable to=
 msync.

Is there anyway to get that fsync 128K/Transaction performance increase whe=
n all pages are dirty with msync?=20


> > Anyway=2C note that you cannot 'work with large files in memory'=2C eve=
n if
> > you have enough RAM and no pressure to hold all the file pages resident=
.
> > The syncer will do a writeback periodically regardless of the applicati=
on
> > calling msync(2) or not=2C with the interval of approximately 30 second=
s.
>=20
> You can mmap with MAP_NOSYNC to prevent the syncer from writing the file =
out
> every 30 seconds.

Yes=2C I was mapping MAP_NOSYNC.
=20
> --=20
> John Baldwin
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe=2C send any mail to "freebsd-hackers-unsubscribe@freebsd.o=
rg"
 		 	   		  =

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 18 16:32:23 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id D28FAD1A
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 16:32:23 +0000 (UTC)
 (envelope-from vasanth.raonaik@gmail.com)
Received: from mail-da0-f54.google.com (mail-da0-f54.google.com
 [209.85.210.54])
 by mx1.freebsd.org (Postfix) with ESMTP id A40DA8FC14
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 16:32:23 +0000 (UTC)
Received: by mail-da0-f54.google.com with SMTP id z9so3797989dad.13
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 09:32:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=p6bawuOax3Ag2hFn+tB4fwWSlIqpVdscztw+fEJVsdA=;
 b=POTCzF2caHPqyDYhUqNZ4tW4xi44k0tE5I0EKeLSeNvuTH8FlFukMPjvHf40Ha+azR
 Kn6vb6FbBbXtszLtkQb9yf8pOBBW+D/xgGpwy7Iz9lN5M6S3srvBUtaSX+u0In7JxiwM
 l+Q20HxQ8AvwuP74iEh4bvfHdjOnnNcn5AN4cHarFnnh7f/2jRVV5Lh91GrG9MWxORxx
 3Bq5ErUt9wZMX5G20TgQ5OVhgWkyt4cy4YbQj0Czfyi8XGCrmNUEcWD/Xsio7I2CWByS
 BBymzqagF+skvy+1ZcEV6z37PD/M+uUSIhB6nMOiZT9yXzEwf0eC4GcxHFlSOYs7OfhH
 tBKw==
MIME-Version: 1.0
Received: by 10.66.85.233 with SMTP id k9mr60443452paz.73.1350577943165; Thu,
 18 Oct 2012 09:32:23 -0700 (PDT)
Received: by 10.66.217.138 with HTTP; Thu, 18 Oct 2012 09:32:23 -0700 (PDT)
Date: Thu, 18 Oct 2012 12:32:23 -0400
Message-ID: <CAAuizBgYZYri0MzjfhYm=qpg7zS+8NsoXGhYRb4eaBSSxxXQQA@mail.gmail.com>
Subject: dtrace failed to resolve struct thread
From: vasanth rao naik sabavat <vasanth.raonaik@gmail.com>
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Oct 2012 16:32:23 -0000

Hi,

I have an issue with latest FreeBSD when enabling dtrace

dtrace -s schedgraph.d
./hotkernel
both return the following error.

: "/usr/lib/dtrace/psinfo.d", line 88: failed to resolve type kernel`struct
thread * for identifier curthread: Unknown type name

10.0-CURRENT FreeBSD 10.0-CURRENT #0: Wed Oct 17 12:04:00 PDT 2012

I see that there was a problem report on FreeBSD which got closed as fixed.
What is the fix for this issue?

http://www.freebsd.org/cgi/query-pr.cgi?pr=130998

-- 
Thanks,
Vasanth

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 18 16:42:27 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 154702AF;
 Thu, 18 Oct 2012 16:42:27 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
 by mx1.freebsd.org (Postfix) with ESMTP id 877128FC14;
 Thu, 18 Oct 2012 16:42:26 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
 by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q9IGgUJM079973;
 Thu, 18 Oct 2012 19:42:31 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
 by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q9IGgIYL086939;
 Thu, 18 Oct 2012 19:42:18 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
 by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q9IGgI11086938;
 Thu, 18 Oct 2012 19:42:18 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
 kostikbel@gmail.com using -f
Date: Thu, 18 Oct 2012 19:42:18 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Subject: Re: syncing large mmaped files
Message-ID: <20121018164218.GR35915@deviant.kiev.zoral.com.ua>
References: <SNT124-W20F26CF7B468F7F09B9B4983760@phx.gbl>
 <20121018083537.GQ35915@deviant.kiev.zoral.com.ua>
 <201210180939.34861.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="DI3e56nQDAJ1LWZd"
Content-Disposition: inline
In-Reply-To: <201210180939.34861.jhb@freebsd.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
 autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
 skuns.kiev.zoral.com.ua
Cc: freebsd-hackers@freebsd.org, Tristan Verniquet <tris_vern@hotmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Oct 2012 16:42:27 -0000


--DI3e56nQDAJ1LWZd
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Oct 18, 2012 at 09:39:34AM -0400, John Baldwin wrote:
> On Thursday, October 18, 2012 4:35:37 am Konstantin Belousov wrote:
> > On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote:
> > >=20
> > > I want to work with large (1-10G) files in memory but eventually sync
> > > them back out to disk. The problem is that the sync process appears to
> > > lock the file in kernel for the duration of the sync, which can run
> > > into minutes. This prevents other processes from reading from the file
> > > (unless they already have it mapped) for this whole time. Is there
> > > any way to prevent this? I think I read in a post somewhere about
> > > openbsd implementing partial-writes when it hits a file with lots of
> > > dirty pages in order to prevent this. Is there anything available for
> > > FreeBSD or is there another way around it?
> > >
> > No, currently the vnode lock is held exclusive for the whole duration
> > of the msync(2) syscall or its analog from the syncer.
> >=20
> > Making a change to periodically drop the vnode lock in
> > vm_object_page_clean() might be possible, but requires the benchmarking
> > to make sure that we do not pessimize the common case. Also, this opens
> > a possibility for the vnode reclamation meantime.
>=20
> You can simulate this in userland by breaking up your msync() into multip=
le
> msync() calls where each call just syncs a portion of the file.
Be aware that this is much-much slower than msyncing the whole file, even
if file is very large. The reason is that pager initiates asynchronous
_immediate_ clustered write for such situations. Async writes (AKA
bdwrite()) are only specified for full range msyncing.

>=20
> > Anyway, note that you cannot 'work with large files in memory', even if
> > you have enough RAM and no pressure to hold all the file pages resident.
> > The syncer will do a writeback periodically regardless of the applicati=
on
> > calling msync(2) or not, with the interval of approximately 30 seconds.
>=20
> You can mmap with MAP_NOSYNC to prevent the syncer from writing the file =
out
> every 30 seconds.

This also prevents msync(2) from syncing the region. The flag is fine
for throw-away data, but not for the scenario that was described, I
think.

--DI3e56nQDAJ1LWZd
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAlCAMWoACgkQC3+MBN1Mb4iV7ACfeO+DqO2Onc8uMS29tjTbykJF
Ek0An1i+6oS2OaxLly9sI5pAGmKlXw8F
=UG69
-----END PGP SIGNATURE-----

--DI3e56nQDAJ1LWZd--

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 18 17:49:03 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id EBE46CAC
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 17:49:03 +0000 (UTC)
 (envelope-from rysto32@gmail.com)
Received: from mail-qc0-f182.google.com (mail-qc0-f182.google.com
 [209.85.216.182])
 by mx1.freebsd.org (Postfix) with ESMTP id 9DDB48FC16
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 17:49:03 +0000 (UTC)
Received: by mail-qc0-f182.google.com with SMTP id l39so8741935qcs.13
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 10:49:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=6/uNn96bjJavH1dHQDLd7ndwQ6V1nZcAPMbQ9SXyqMQ=;
 b=WJ5Ij3FoNCBBTLj6+NaDCNcA9GUtb/YJk0km2/TEnMn0FLrOqy7QDF3UlpB20kU9IU
 aG1OQ0Ljt2e6pwmpV2BVdVvKHWIiGrYcIOnkRp5u+eu1LWLW5w2ZUVQb4noqRS3ddegj
 4CXH7DETF45h7N7GRxve51toQcFuHIAxftgRe9N0v+7nG8UEzo5dfZy41hxhpqCH6mUP
 04GLbH6gqdePPhf/LkXC3YZdHli0DHdDgCjxrcm8eSbjjricEyclDL34nBoQvamlxJA0
 hTPYBY+VE8ZW9yC9omJIm1XxPqrX8DjVwzDQdo2z2DE/VVtC6FfF35UlgsXB+X+VfeUf
 TqTQ==
MIME-Version: 1.0
Received: by 10.224.33.205 with SMTP id i13mr16598608qad.35.1350582543116;
 Thu, 18 Oct 2012 10:49:03 -0700 (PDT)
Received: by 10.49.81.234 with HTTP; Thu, 18 Oct 2012 10:49:03 -0700 (PDT)
In-Reply-To: <CAAuizBgYZYri0MzjfhYm=qpg7zS+8NsoXGhYRb4eaBSSxxXQQA@mail.gmail.com>
References: <CAAuizBgYZYri0MzjfhYm=qpg7zS+8NsoXGhYRb4eaBSSxxXQQA@mail.gmail.com>
Date: Thu, 18 Oct 2012 13:49:03 -0400
Message-ID: <CAFMmRNywPFta2_rPyqiNYQT9dyxwU0xN--=f-eaJQPPs7A0NvA@mail.gmail.com>
Subject: Re: dtrace failed to resolve struct thread
From: Ryan Stone <rysto32@gmail.com>
To: vasanth rao naik sabavat <vasanth.raonaik@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Oct 2012 17:49:04 -0000

On Thu, Oct 18, 2012 at 12:32 PM, vasanth rao naik sabavat
<vasanth.raonaik@gmail.com> wrote:
> Hi,
>
> I have an issue with latest FreeBSD when enabling dtrace
>
> dtrace -s schedgraph.d
> ./hotkernel
> both return the following error.
>
> : "/usr/lib/dtrace/psinfo.d", line 88: failed to resolve type kernel`struct
> thread * for identifier curthread: Unknown type name
>
> 10.0-CURRENT FreeBSD 10.0-CURRENT #0: Wed Oct 17 12:04:00 PDT 2012
>
> I see that there was a problem report on FreeBSD which got closed as fixed.
> What is the fix for this issue?
>
> http://www.freebsd.org/cgi/query-pr.cgi?pr=130998

Did you buildkernel with WITH_CTF=1?  You can check this by running
ctfdump on /boot/kernel/kernel and seeing if that produces output.  It
will print the following error if you forgot to build with WITH_CTF=1:

/boot/kernel/kernel does not contain .SUNW_ctf data

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 18 19:43:28 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id EC9F7CDE
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 19:43:28 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
 [IPv6:2001:470:1f10:75::2])
 by mx1.freebsd.org (Postfix) with ESMTP id BD6838FC08
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 19:43:28 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 28EA3B986;
 Thu, 18 Oct 2012 15:43:28 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: syncing large mmaped files
Date: Thu, 18 Oct 2012 15:43:25 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; )
References: <SNT124-W20F26CF7B468F7F09B9B4983760@phx.gbl>
 <201210180939.34861.jhb@freebsd.org>
 <20121018164218.GR35915@deviant.kiev.zoral.com.ua>
In-Reply-To: <20121018164218.GR35915@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
Message-Id: <201210181543.25191.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Thu, 18 Oct 2012 15:43:28 -0400 (EDT)
Cc: freebsd-hackers@freebsd.org, Tristan Verniquet <tris_vern@hotmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Oct 2012 19:43:29 -0000

On Thursday, October 18, 2012 12:42:18 pm Konstantin Belousov wrote:
> On Thu, Oct 18, 2012 at 09:39:34AM -0400, John Baldwin wrote:
> > On Thursday, October 18, 2012 4:35:37 am Konstantin Belousov wrote:
> > > On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote:
> > > > 
> > > > I want to work with large (1-10G) files in memory but eventually sync
> > > > them back out to disk. The problem is that the sync process appears to
> > > > lock the file in kernel for the duration of the sync, which can run
> > > > into minutes. This prevents other processes from reading from the file
> > > > (unless they already have it mapped) for this whole time. Is there
> > > > any way to prevent this? I think I read in a post somewhere about
> > > > openbsd implementing partial-writes when it hits a file with lots of
> > > > dirty pages in order to prevent this. Is there anything available for
> > > > FreeBSD or is there another way around it?
> > > >
> > > No, currently the vnode lock is held exclusive for the whole duration
> > > of the msync(2) syscall or its analog from the syncer.
> > > 
> > > Making a change to periodically drop the vnode lock in
> > > vm_object_page_clean() might be possible, but requires the benchmarking
> > > to make sure that we do not pessimize the common case. Also, this opens
> > > a possibility for the vnode reclamation meantime.
> > 
> > You can simulate this in userland by breaking up your msync() into multiple
> > msync() calls where each call just syncs a portion of the file.
> Be aware that this is much-much slower than msyncing the whole file, even
> if file is very large. The reason is that pager initiates asynchronous
> _immediate_ clustered write for such situations. Async writes (AKA
> bdwrite()) are only specified for full range msyncing.

Ugh.  It would seem to me that msync(MS_ASYNC) should be doing delayed
writes.

> > > Anyway, note that you cannot 'work with large files in memory', even if
> > > you have enough RAM and no pressure to hold all the file pages resident.
> > > The syncer will do a writeback periodically regardless of the application
> > > calling msync(2) or not, with the interval of approximately 30 seconds.
> > 
> > You can mmap with MAP_NOSYNC to prevent the syncer from writing the file out
> > every 30 seconds.
> 
> This also prevents msync(2) from syncing the region. The flag is fine
> for throw-away data, but not for the scenario that was described, I
> think.

Oof.  I could see that in certain situations you might want to control this
behavior from an application (similar to how I now make use of fadvise() at
work).  Having a way to disable syncer but having msync(MS_ASYNC) do
something useful would be good.

-- 
John Baldwin

From owner-freebsd-hackers@FreeBSD.ORG  Thu Oct 18 20:24:35 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id C747F6F9
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 20:24:35 +0000 (UTC)
 (envelope-from vasanth.raonaik@gmail.com)
Received: from mail-da0-f54.google.com (mail-da0-f54.google.com
 [209.85.210.54])
 by mx1.freebsd.org (Postfix) with ESMTP id 94D7D8FC0C
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 20:24:35 +0000 (UTC)
Received: by mail-da0-f54.google.com with SMTP id z9so3908098dad.13
 for <freebsd-hackers@freebsd.org>; Thu, 18 Oct 2012 13:24:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=RjQK6agvOK0VR5Df8kVISrfrDm45wHDimFNnxqeoStQ=;
 b=CBiA8d3ykIvoeLFMRMebXW1wniZ7bSf2HQMk2Tq+Uklg8CqEdhV+z41XD5SLlDvxFi
 UZoh4smHMSHd8BvvWM6qDkhv7Vw+LA4u+0abpBEc+IC3qyc7npAWsy4IO40/KApfgouk
 ZT/d0EpL9jmfIhPU0Fpv4JmIuPacqq0Mzb6yXuYj/vufMUZhHvReesnTP1PaCMzWl802
 8q0/zKkbf91EHfIMDyeBlOiSLd5N+I3MJOXOknHaddYDO33VCQpu63strqHiafJTFi4P
 jNuyKoOjl1Ew5AHWeWYjnvyQ1JZ074xlQzj25K0HbANs3VgxxFMuC6tOyr3X6nVDSK1s
 H0pQ==
MIME-Version: 1.0
Received: by 10.68.232.163 with SMTP id tp3mr69957905pbc.44.1350591874853;
 Thu, 18 Oct 2012 13:24:34 -0700 (PDT)
Received: by 10.66.217.138 with HTTP; Thu, 18 Oct 2012 13:24:34 -0700 (PDT)
In-Reply-To: <CAFMmRNywPFta2_rPyqiNYQT9dyxwU0xN--=f-eaJQPPs7A0NvA@mail.gmail.com>
References: <CAAuizBgYZYri0MzjfhYm=qpg7zS+8NsoXGhYRb4eaBSSxxXQQA@mail.gmail.com>
 <CAFMmRNywPFta2_rPyqiNYQT9dyxwU0xN--=f-eaJQPPs7A0NvA@mail.gmail.com>
Date: Thu, 18 Oct 2012 16:24:34 -0400
Message-ID: <CAAuizBgejVu-pSb3T--iZ0S-ZpZW1kLZHw-p72PoS8RfWiioqw@mail.gmail.com>
Subject: Re: dtrace failed to resolve struct thread
From: vasanth rao naik sabavat <vasanth.raonaik@gmail.com>
To: Ryan Stone <rysto32@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Oct 2012 20:24:35 -0000

Thanks Ryan,

I checked out latest source code and compiled the kernel which has the
option you have mentioned.

I can now run dtrace -s schedgraph.d without any issues.

Thanks,
Vasanth

On Thu, Oct 18, 2012 at 1:49 PM, Ryan Stone <rysto32@gmail.com> wrote:

> On Thu, Oct 18, 2012 at 12:32 PM, vasanth rao naik sabavat
> <vasanth.raonaik@gmail.com> wrote:
> > Hi,
> >
> > I have an issue with latest FreeBSD when enabling dtrace
> >
> > dtrace -s schedgraph.d
> > ./hotkernel
> > both return the following error.
> >
> > : "/usr/lib/dtrace/psinfo.d", line 88: failed to resolve type
> kernel`struct
> > thread * for identifier curthread: Unknown type name
> >
> > 10.0-CURRENT FreeBSD 10.0-CURRENT #0: Wed Oct 17 12:04:00 PDT 2012
> >
> > I see that there was a problem report on FreeBSD which got closed as
> fixed.
> > What is the fix for this issue?
> >
> > http://www.freebsd.org/cgi/query-pr.cgi?pr=130998
>
> Did you buildkernel with WITH_CTF=1?  You can check this by running
> ctfdump on /boot/kernel/kernel and seeing if that produces output.  It
> will print the following error if you forgot to build with WITH_CTF=1:
>
> /boot/kernel/kernel does not contain .SUNW_ctf data
>


-- 
Thanks,
Vasanth

From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct 19 00:11:42 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id C1AABB91;
 Fri, 19 Oct 2012 00:11:42 +0000 (UTC)
 (envelope-from tris_vern@hotmail.com)
Received: from snt0-omc3-s48.snt0.hotmail.com (snt0-omc3-s48.snt0.hotmail.com
 [65.54.51.85]) by mx1.freebsd.org (Postfix) with ESMTP id 8A9518FC1B;
 Fri, 19 Oct 2012 00:11:42 +0000 (UTC)
Received: from SNT124-W23 ([65.55.90.137]) by snt0-omc3-s48.snt0.hotmail.com
 with Microsoft SMTPSVC(6.0.3790.4675); 
 Thu, 18 Oct 2012 17:11:36 -0700
Message-ID: <SNT124-W23A8A38DF1467ECDA41DD883750@phx.gbl>
X-Originating-IP: [165.228.7.150]
From: Tristan Verniquet <tris_vern@hotmail.com>
To: <jhb@freebsd.org>, <kostikbel@gmail.com>
Subject: RE: syncing large mmaped files
Date: Fri, 19 Oct 2012 10:11:35 +1000
Importance: Normal
In-Reply-To: <201210181543.25191.jhb@freebsd.org>
References: <SNT124-W20F26CF7B468F7F09B9B4983760@phx.gbl>,
 <201210180939.34861.jhb@freebsd.org>,
 <20121018164218.GR35915@deviant.kiev.zoral.com.ua>,
 <201210181543.25191.jhb@freebsd.org>
MIME-Version: 1.0
X-OriginalArrivalTime: 19 Oct 2012 00:11:36.0301 (UTC)
 FILETIME=[4DA199D0:01CDAD8E]
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: freebsd hackers <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Oct 2012 00:11:43 -0000


> From: jhb@freebsd.org
> To: kostikbel@gmail.com
> Subject: Re: syncing large mmaped files
> Date: Thu=2C 18 Oct 2012 15:43:25 -0400
> CC: freebsd-hackers@freebsd.org=3B tris_vern@hotmail.com
>=20
> On Thursday=2C October 18=2C 2012 12:42:18 pm Konstantin Belousov wrote:
> > On Thu=2C Oct 18=2C 2012 at 09:39:34AM -0400=2C John Baldwin wrote:
> > > On Thursday=2C October 18=2C 2012 4:35:37 am Konstantin Belousov wrot=
e:
> > > > On Thu=2C Oct 18=2C 2012 at 10:08:22AM +1000=2C Tristan Verniquet w=
rote:
> > > > >=20
> > > > > I want to work with large (1-10G) files in memory but eventually =
sync
> > > > > them back out to disk. The problem is that the sync process appea=
rs to
> > > > > lock the file in kernel for the duration of the sync=2C which can=
 run
> > > > > into minutes. This prevents other processes from reading from the=
 file
> > > > > (unless they already have it mapped) for this whole time. Is ther=
e
> > > > > any way to prevent this? I think I read in a post somewhere about
> > > > > openbsd implementing partial-writes when it hits a file with lots=
 of
> > > > > dirty pages in order to prevent this. Is there anything available=
 for
> > > > > FreeBSD or is there another way around it?
> > > > >
> > > > No=2C currently the vnode lock is held exclusive for the whole dura=
tion
> > > > of the msync(2) syscall or its analog from the syncer.
> > > >=20
> > > > Making a change to periodically drop the vnode lock in
> > > > vm_object_page_clean() might be possible=2C but requires the benchm=
arking
> > > > to make sure that we do not pessimize the common case. Also=2C this=
 opens
> > > > a possibility for the vnode reclamation meantime.
> > >=20
> > > You can simulate this in userland by breaking up your msync() into mu=
ltiple
> > > msync() calls where each call just syncs a portion of the file.
> > Be aware that this is much-much slower than msyncing the whole file=2C =
even
> > if file is very large. The reason is that pager initiates asynchronous
> > _immediate_ clustered write for such situations. Async writes (AKA
> > bdwrite()) are only specified for full range msyncing.
>=20
> Ugh.  It would seem to me that msync(MS_ASYNC) should be doing delayed
> writes.

Ahh=2C using MS_ASYNC seems to get me the behaviour I was looking for. It i=
s just as fast as fsync for cases when all the pages are dirtied but it rel=
eases the lock allowing other programs to open and read the file. So it see=
ms to be doing what I would expect.

> > > > Anyway=2C note that you cannot 'work with large files in memory'=2C=
 even if
> > > > you have enough RAM and no pressure to hold all the file pages resi=
dent.
> > > > The syncer will do a writeback periodically regardless of the appli=
cation
> > > > calling msync(2) or not=2C with the interval of approximately 30 se=
conds.
> > >=20
> > > You can mmap with MAP_NOSYNC to prevent the syncer from writing the f=
ile out
> > > every 30 seconds.
> >=20
> > This also prevents msync(2) from syncing the region. The flag is fine
> > for throw-away data=2C but not for the scenario that was described=2C I
> > think.
>=20
> Oof.  I could see that in certain situations you might want to control th=
is
> behavior from an application (similar to how I now make use of fadvise() =
at
> work).  Having a way to disable syncer but having msync(MS_ASYNC) do
> something useful would be good.
>=20

When I map using MAP_NOSYNC I still seem to be able to msync(2) the regions=
? I see memory move from Wired/Active to Invalid and the disk is busy.

The madvise man page has a MADV_AUTOSYNC section which says that pages that=
 are already dirtied can be guaranteed to be reverted using msync(2) or fsy=
nc(2).  This is FreeBSD 8.3. So even if there is something wrong with sync'=
ing MAP_NOSYNC pages=2C I guess I could always madvise MADV_AUTOSYNC them f=
irst.

> --=20
> John Baldwin
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe=2C send any mail to "freebsd-hackers-unsubscribe@freebsd.o=
rg"
 		 	   		  =

From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct 19 11:45:49 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id E8A641BA;
 Fri, 19 Oct 2012 11:45:49 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
 by mx1.freebsd.org (Postfix) with ESMTP id 61E5D8FC0C;
 Fri, 19 Oct 2012 11:45:48 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
 by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q9JBjuAl018276;
 Fri, 19 Oct 2012 14:45:56 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
 by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q9JBjiCD093779;
 Fri, 19 Oct 2012 14:45:44 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
 by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q9JBjiPp093778;
 Fri, 19 Oct 2012 14:45:44 +0300 (EEST)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
 kostikbel@gmail.com using -f
Date: Fri, 19 Oct 2012 14:45:44 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: John Baldwin <jhb@freebsd.org>
Subject: Re: syncing large mmaped files
Message-ID: <20121019114544.GX35915@deviant.kiev.zoral.com.ua>
References: <SNT124-W20F26CF7B468F7F09B9B4983760@phx.gbl>
 <201210180939.34861.jhb@freebsd.org>
 <20121018164218.GR35915@deviant.kiev.zoral.com.ua>
 <201210181543.25191.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="/1aF9qoWKhphZS4n"
Content-Disposition: inline
In-Reply-To: <201210181543.25191.jhb@freebsd.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
 autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
 skuns.kiev.zoral.com.ua
Cc: freebsd-hackers@freebsd.org, Tristan Verniquet <tris_vern@hotmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Oct 2012 11:45:50 -0000


--/1aF9qoWKhphZS4n
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Oct 18, 2012 at 03:43:25PM -0400, John Baldwin wrote:
> On Thursday, October 18, 2012 12:42:18 pm Konstantin Belousov wrote:
> > On Thu, Oct 18, 2012 at 09:39:34AM -0400, John Baldwin wrote:
> > > On Thursday, October 18, 2012 4:35:37 am Konstantin Belousov wrote:
> > > > On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote:
> > > > >=20
> > > > > I want to work with large (1-10G) files in memory but eventually =
sync
> > > > > them back out to disk. The problem is that the sync process appea=
rs to
> > > > > lock the file in kernel for the duration of the sync, which can r=
un
> > > > > into minutes. This prevents other processes from reading from the=
 file
> > > > > (unless they already have it mapped) for this whole time. Is there
> > > > > any way to prevent this? I think I read in a post somewhere about
> > > > > openbsd implementing partial-writes when it hits a file with lots=
 of
> > > > > dirty pages in order to prevent this. Is there anything available=
 for
> > > > > FreeBSD or is there another way around it?
> > > > >
> > > > No, currently the vnode lock is held exclusive for the whole durati=
on
> > > > of the msync(2) syscall or its analog from the syncer.
> > > >=20
> > > > Making a change to periodically drop the vnode lock in
> > > > vm_object_page_clean() might be possible, but requires the benchmar=
king
> > > > to make sure that we do not pessimize the common case. Also, this o=
pens
> > > > a possibility for the vnode reclamation meantime.
> > >=20
> > > You can simulate this in userland by breaking up your msync() into mu=
ltiple
> > > msync() calls where each call just syncs a portion of the file.
> > Be aware that this is much-much slower than msyncing the whole file, ev=
en
> > if file is very large. The reason is that pager initiates asynchronous
> > _immediate_ clustered write for such situations. Async writes (AKA
> > bdwrite()) are only specified for full range msyncing.
>=20
> Ugh.  It would seem to me that msync(MS_ASYNC) should be doing delayed
> writes.
The vm_pager_putpages() is called with the VM_PAGER_CLUSTER_OK flag
for MS_ASYNC, according to my reading of the code. This results
in neither IO_SYNC nor IO_ASYNC flags passed to VOP_WRITE() from
vnode_pager_generic_putpages().

Since the mapped regions are typically large enough to mmap the whole
fs blocks, the code in ffs_vnops.c:ffs_write() ends up in the cluster_write=
().
Usually, fully populated cluster is written asynchronously.

>=20
> > > > Anyway, note that you cannot 'work with large files in memory', eve=
n if
> > > > you have enough RAM and no pressure to hold all the file pages resi=
dent.
> > > > The syncer will do a writeback periodically regardless of the appli=
cation
> > > > calling msync(2) or not, with the interval of approximately 30 seco=
nds.
> > >=20
> > > You can mmap with MAP_NOSYNC to prevent the syncer from writing the f=
ile out
> > > every 30 seconds.
> >=20
> > This also prevents msync(2) from syncing the region. The flag is fine
> > for throw-away data, but not for the scenario that was described, I
> > think.
>=20
> Oof.  I could see that in certain situations you might want to control th=
is
> behavior from an application (similar to how I now make use of fadvise() =
at
> work).  Having a way to disable syncer but having msync(MS_ASYNC) do
> something useful would be good.

I was wrong there, sorry. Only syncer and fsync(2) would ignore
VPO_NOSYNC pages.

--/1aF9qoWKhphZS4n
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEUEARECAAYFAlCBPWcACgkQC3+MBN1Mb4jXnACeJAiNxO9S+ZVcJnKBzcxgwDT0
MfAAl1QgedvFLssA2kWLONoF7QJgX4o=
=cxYS
-----END PGP SIGNATURE-----

--/1aF9qoWKhphZS4n--

From owner-freebsd-hackers@FreeBSD.ORG  Fri Oct 19 23:07:40 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id BD71B9CB
 for <hackers@FreeBSD.org>; Fri, 19 Oct 2012 23:07:40 +0000 (UTC)
 (envelope-from ryao@gentoo.org)
Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183])
 by mx1.freebsd.org (Postfix) with ESMTP id 9A0BD8FC14
 for <hackers@FreeBSD.org>; Fri, 19 Oct 2012 23:07:40 +0000 (UTC)
Received: from [192.168.1.2] (pool-72-89-250-138.nycmny.fios.verizon.net
 [72.89.250.138])
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested) (Authenticated sender: ryao)
 by smtp.gentoo.org (Postfix) with ESMTPSA id 68D6B33DAC7
 for <hackers@FreeBSD.org>; Fri, 19 Oct 2012 23:07:34 +0000 (UTC)
Message-ID: <5081DCA1.80906@gentoo.org>
Date: Fri, 19 Oct 2012 19:05:05 -0400
From: Richard Yao <ryao@gentoo.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:10.0.7) Gecko/20120917 Thunderbird/10.0.7
MIME-Version: 1.0
To: "hackers@FreeBSD.org" <hackers@FreeBSD.org>
Subject: Loader-kernel interaction
X-Enigmail-Version: 1.3.5
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature";
 boundary="------------enig42DD07A214635D3CBFCBB299"
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Oct 2012 23:07:40 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig42DD07A214635D3CBFCBB299
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Dear Everyone,

I know that the kernel is a BTX client, but I do not understand the
protocol used by loader to pass sysctl settings and loadable modules to
the kernel. Is there documentation on this?

Yours truly,
Richard Yao


--------------enig42DD07A214635D3CBFCBB299
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJQgdy2AAoJECDuEZm+6Exk/P4P/0YC3Kq+HK3EeMf8fhmM+V7N
gUh4DUbnVGpjBhyRq713/s809XIAw0dNxlKZIkZKl7aMLp/mGSOTgMqjUf7iW6d7
ovkp/xLgn+ycEQxizBBXa5HMVTsaC2aO5XcrJdoiiAhan9J3irleNA3lX7IET6XC
NMsrxfrsXkdS+2EGv11S4ifw3RTGe1435uhFd5+3zoy3Zlvw7Zh74RNiZQr/HMR3
+OjMLQ65aLzhtvaFh7in8z2ra+fzC2AdHAkB/ApbQfIDXZG5xzHDZYMf26Cl8+AI
OTNj13y5Vp/qqvhZfofqQ6Z2fT0fdQt4JmcJQn7u3SO2eYFU3dD/q0WOtqDh0A4T
D+QwmxXENw5vTrirAHhvHs6s8sBLazfvpRFuB/+05TmgJGL2gDDSdm18l8Cmm3Qw
wMKciIbaeT8TS8utVDpE8b4oDxYzi47qfxHpmzRb1YT3KhRBUHL95oLbWlp+rELZ
/FIKu8niWRNNqhB5itNno1NMepZHKs8krKFPePLHuJA8tbWu9ROvfKKUXc34yE3W
WC7Dz3ETp9I9zymbKS6/xI5pmmd6fdyK2EktsvWjXQbxbK5uOwASv8suSM5GdjS/
aboxIp/vN+Dl4/RBqaHq/KnVtLvb4PHF4kkOWfvWRt8rWIjOu4UF2usSogWtXYSY
srJGKN4t8h/TV8u1ufD9
=dQSi
-----END PGP SIGNATURE-----

--------------enig42DD07A214635D3CBFCBB299--

From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 20 11:43:07 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 521DCDE7;
 Sat, 20 Oct 2012 11:43:07 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com
 [209.85.212.172])
 by mx1.freebsd.org (Postfix) with ESMTP id A71B68FC12;
 Sat, 20 Oct 2012 11:43:06 +0000 (UTC)
Received: by mail-wi0-f172.google.com with SMTP id hq12so886566wib.13
 for <multiple recipients>; Sat, 20 Oct 2012 04:43:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=RXy+Wea31o1pgJWOtbZ2ETzHeP87Yg9MjjYy7y21xJE=;
 b=mgDjQfuTLMxX8KrXeKbpnsRtEM+RWmhKcfL2UtQOSAFZkfFF5b1ruhBPoiKLeOcCze
 NNkpmAkr7p3pA0UsHA4gKVz6k3/etUmRPYmokzxpWGE8S7EYYUhmOK1GSefsAKz7GR3C
 TlOpW/n8Bs0fzSTvuvfT7jggG+6TEBzQikOVDLpXgLtzc/vQR1aLocipBpF3QCsFto0c
 T5Bzw3TLqbLhZ/pV5iy6BuuV2f3tUW9Qez9ev8haTS0OECwa+Mdzx/7pPV8PJ48sgZTQ
 kOqLh+5Ytd+ucZM3smeil4SWXI+XWX40DtGthXb73eQS02ZyU8Dnb0D8FTodaXUXuRUz
 SgwA==
Received: by 10.216.195.144 with SMTP id p16mr2186313wen.174.1350733379998;
 Sat, 20 Oct 2012 04:42:59 -0700 (PDT)
Received: from [10.0.0.86] ([93.152.184.10])
 by mx.google.com with ESMTPS id v3sm9085964wiy.5.2012.10.20.04.42.58
 (version=TLSv1/SSLv3 cipher=OTHER);
 Sat, 20 Oct 2012 04:42:58 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com>
Date: Sat, 20 Oct 2012 14:42:56 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <23D7CB3A-BD66-427E-A7F5-6C9D3890EE1B@gmail.com>
References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
 <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <k5gtdh$nc0$1@ger.gmane.org>
 <0857D79A-6276-433F-9603-D52125CF190F@gmail.com>
 <CAF-QHFUU0hhtRNK1_p9zks2w+e22bfWOtv+XaqgFqTiURcJBbQ@mail.gmail.com>
 <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com>
To: "freebsd-hackers@freebsd.org Hackers" <freebsd-hackers@freebsd.org>
X-Mailer: Apple Mail (2.1498)
Cc: Rick Macklem <rmacklem@uoguelph.ca>, Ivan Voras <ivoras@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Oct 2012 11:43:07 -0000


On Oct 18, 2012, at 6:11 PM, Nikolay Denev <ndenev@gmail.com> wrote:

>=20
> On Oct 15, 2012, at 5:34 PM, Ivan Voras <ivoras@freebsd.org> wrote:
>=20
>> On 15 October 2012 16:31, Nikolay Denev <ndenev@gmail.com> wrote:
>>>=20
>>> On Oct 15, 2012, at 2:52 PM, Ivan Voras <ivoras@freebsd.org> wrote:
>>=20
>>>> http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch
>>>>=20
>>>> It should apply to HEAD without Rick's patches.
>>>>=20
>>>> It's a bit different approach than Rick's, breaking down locks even =
more.
>>>=20
>>> Applied and compiled OK, I will be able to test it tomorrow.
>>=20
>> Ok, thanks!
>>=20
>> The differences should be most visible in edge cases with a larger
>> number of nfsd processes (16+) and many CPU cores.
>=20
> I'm now rebooting with your patch, and hopefully will have some =
results tomorrow.
>=20

Here are the results from testing both patches : =
http://home.totalterror.net/freebsd/nfstest/results.html
Both tests ran for about 14 hours ( a bit too much, but I wanted to =
compare different zfs recordsize settings ),
and were done first after a fresh reboot.
The only noticeable difference seems to be much more context switches =
with Ivan's patch.


From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 20 12:11:57 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 906176FF
 for <freebsd-hackers@freebsd.org>; Sat, 20 Oct 2012 12:11:57 +0000 (UTC)
 (envelope-from ivoras@gmail.com)
Received: from mail-qa0-f54.google.com (mail-qa0-f54.google.com
 [209.85.216.54])
 by mx1.freebsd.org (Postfix) with ESMTP id 3BB248FC12
 for <freebsd-hackers@freebsd.org>; Sat, 20 Oct 2012 12:11:57 +0000 (UTC)
Received: by mail-qa0-f54.google.com with SMTP id p27so648224qat.13
 for <freebsd-hackers@freebsd.org>; Sat, 20 Oct 2012 05:11:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:from:date
 :x-google-sender-auth:message-id:subject:to:cc:content-type;
 bh=xImakIC7JV2zly/Y0OKoJJDqcQE9Q+/z4zufdJBv3uw=;
 b=GESJwgsCwPgkl8atHyIDbENcpbKKqfayG2jHSiuYK95Wfy7+uyuejr3G6W8dfL1OkZ
 Vz7tiIF/fMcHqtZiLjXqAQgesFctaWEhSI9DhaNGz+Oubj7cZqB+AT4pXh/uyIt8vHvJ
 Kx+VdiHFQ40N6qnsuZgk17bQIxgA/HP52b6cI+tewv8o4MeI/ZORpLmV8u8vZxk7mS6n
 79y/kEfxMRhK6jipij1IHkRzJlDc2HQw5XSy2NYJ8XVRlDVJC4K0J1q3qIqatL4ir7Cr
 2D/EfOEhlZFnE6ykipFQ7fs1QhyYNNRFNTfxeAZOOzNFMe0L/Nf4rSshzb4RFhcQCoLs
 u7Dw==
Received: by 10.224.178.4 with SMTP id bk4mr1928123qab.38.1350735110677; Sat,
 20 Oct 2012 05:11:50 -0700 (PDT)
MIME-Version: 1.0
Sender: ivoras@gmail.com
Received: by 10.49.82.231 with HTTP; Sat, 20 Oct 2012 05:11:10 -0700 (PDT)
In-Reply-To: <23D7CB3A-BD66-427E-A7F5-6C9D3890EE1B@gmail.com>
References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
 <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <k5gtdh$nc0$1@ger.gmane.org>
 <0857D79A-6276-433F-9603-D52125CF190F@gmail.com>
 <CAF-QHFUU0hhtRNK1_p9zks2w+e22bfWOtv+XaqgFqTiURcJBbQ@mail.gmail.com>
 <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com>
 <23D7CB3A-BD66-427E-A7F5-6C9D3890EE1B@gmail.com>
From: Ivan Voras <ivoras@freebsd.org>
Date: Sat, 20 Oct 2012 14:11:10 +0200
X-Google-Sender-Auth: PsZqbdk8UhFAMP1_lmPtiJzhIQk
Message-ID: <CAF-QHFWY0drcrUpo7GGD1zQNSDWsEeB_LHAjEbUKrX2ovQHNxw@mail.gmail.com>
Subject: Re: NFS server bottlenecks
To: Nikolay Denev <ndenev@gmail.com>
Content-Type: text/plain; charset=UTF-8
Cc: "freebsd-hackers@freebsd.org Hackers" <freebsd-hackers@freebsd.org>,
 Rick Macklem <rmacklem@uoguelph.ca>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Oct 2012 12:11:57 -0000

On 20 October 2012 13:42, Nikolay Denev <ndenev@gmail.com> wrote:

> Here are the results from testing both patches : http://home.totalterror.net/freebsd/nfstest/results.html
> Both tests ran for about 14 hours ( a bit too much, but I wanted to compare different zfs recordsize settings ),
> and were done first after a fresh reboot.
> The only noticeable difference seems to be much more context switches with Ivan's patch.

Thank you very much for your extensive testing!

I don't know how to interpret the rise in context switches; as this is
kernel code, I'd expect no context switches. I hope someone else can
explain.

But, you have also shown that my patch doesn't do any better than
Rick's even on a fairly large configuration, so I don't think there's
value in adding the extra complexity, and Rick knows NFS much better
than I do.

But there are a few things other than that I'm interested in: like why
does your load average spike almost to 20-ties, and how come that with
24 drives in RAID-10 you only push through 600 MBit/s through the 10
GBit/s Ethernet. Have you tested your drive setup locally (AESNI
shouldn't be a bottleneck, you should be able to encrypt well into
Gbyte/s range) and the network?

If you have the time, could you repeat the tests but with a recent
Samba server and a CIFS mount on the client side? This is probably not
important, but I'm just curious of how would it perform on your
machine.

From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 20 12:46:42 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id E8803C31;
 Sat, 20 Oct 2012 12:46:42 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id 820768FC16;
 Sat, 20 Oct 2012 12:46:42 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEANebglCDaFvO/2dsb2JhbAA8CIYUvACCIAEBAQMBAQEBICsfAQsFFg4KAgINGQIjBgEJJgYIBwQBHASHUQMJBguoaohRDYlUgSCJUmgWBIVDgRIDk0FYgVWBF4oRhRCDC4FHNQ
X-IronPort-AV: E=Sophos;i="4.80,621,1344225600"; d="scan'208";a="184523521"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 20 Oct 2012 08:45:32 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 6AFCAB405E;
 Sat, 20 Oct 2012 08:45:32 -0400 (EDT)
Date: Sat, 20 Oct 2012 08:45:32 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Ivan Voras <ivoras@freebsd.org>
Message-ID: <191784842.2570110.1350737132305.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <CAF-QHFWY0drcrUpo7GGD1zQNSDWsEeB_LHAjEbUKrX2ovQHNxw@mail.gmail.com>
Subject: Re: NFS server bottlenecks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: "freebsd-hackers@freebsd.org Hackers" <freebsd-hackers@freebsd.org>,
 Nikolay Denev <ndenev@gmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Oct 2012 12:46:43 -0000

Ivan Voras wrote:
> On 20 October 2012 13:42, Nikolay Denev <ndenev@gmail.com> wrote:
> 
> > Here are the results from testing both patches :
> > http://home.totalterror.net/freebsd/nfstest/results.html
> > Both tests ran for about 14 hours ( a bit too much, but I wanted to
> > compare different zfs recordsize settings ),
> > and were done first after a fresh reboot.
> > The only noticeable difference seems to be much more context
> > switches with Ivan's patch.
> 
> Thank you very much for your extensive testing!
> 
> I don't know how to interpret the rise in context switches; as this is
> kernel code, I'd expect no context switches. I hope someone else can
> explain.
> 
Don't the mtx_lock() calls spin for a little while and then context
switch if another thread still has it locked?

> But, you have also shown that my patch doesn't do any better than
> Rick's even on a fairly large configuration, so I don't think there's
> value in adding the extra complexity, and Rick knows NFS much better
> than I do.
> 
Hmm, I didn't look, but were there any tests using UDP mounts?
(I would have thought that your patch would mainly affect UDP mounts,
 since that is when my version still has the single LRU queue/mutex.
 As I think you know, my concern with your patch would be correctness
 for UDP, not performance.)

Anyhow, sounds like you guys are having fun with it and learning
some useful things.

Keep up the good work, rick
> But there are a few things other than that I'm interested in: like why
> does your load average spike almost to 20-ties, and how come that with
> 24 drives in RAID-10 you only push through 600 MBit/s through the 10
> GBit/s Ethernet. Have you tested your drive setup locally (AESNI
> shouldn't be a bottleneck, you should be able to encrypt well into
> Gbyte/s range) and the network?
> 
> If you have the time, could you repeat the tests but with a recent
> Samba server and a CIFS mount on the client side? This is probably not
> important, but I'm just curious of how would it perform on your
> machine.
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 20 12:52:13 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 831A6DB6;
 Sat, 20 Oct 2012 12:52:13 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-we0-f182.google.com (mail-we0-f182.google.com
 [74.125.82.182])
 by mx1.freebsd.org (Postfix) with ESMTP id D21198FC16;
 Sat, 20 Oct 2012 12:52:12 +0000 (UTC)
Received: by mail-we0-f182.google.com with SMTP id x43so869989wey.13
 for <multiple recipients>; Sat, 20 Oct 2012 05:52:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=KX31H4vAEQx/m8YgRZMoSUYzLTD6QAl62NO0N3TWDjA=;
 b=YYVIrXg0YBzHT5gCRanTroqgspOLz/h+aXR+CUzwyRqVU4sa/gdUfLnVibOOCS+gLp
 Xy8SIdbiq10qfQgWvixOmsEBnnmaQlPFO7L6+/6AODlPFEk3MKXwTDqmPKztQlLe6TqM
 4bGgvspLBIoa81WRgewcqn8rcFkxQp/73X6EaEBjZb4brzXhMCik7nAnhiaZQH7sw44C
 kyuvXGatHqNeDW6pi2FO2+VMk5TXcoOzzDT/KL1t2Nj6jmV6hxBl0Ugdi6Q2nsrNSVCk
 +sFDPUAo93DFcWTbyZpmMwL+50Ksa8wgk7Op5lJn6nkZN+uUXFFuxupQPnRwnb8mmbBR
 Q5bg==
Received: by 10.180.87.74 with SMTP id v10mr9385146wiz.21.1350737526192;
 Sat, 20 Oct 2012 05:52:06 -0700 (PDT)
Received: from [10.0.0.86] ([93.152.184.10])
 by mx.google.com with ESMTPS id v3sm9416314wiy.5.2012.10.20.05.52.04
 (version=TLSv1/SSLv3 cipher=OTHER);
 Sat, 20 Oct 2012 05:52:05 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=windows-1252
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <CAF-QHFWY0drcrUpo7GGD1zQNSDWsEeB_LHAjEbUKrX2ovQHNxw@mail.gmail.com>
Date: Sat, 20 Oct 2012 15:52:03 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <942B9B96-7F2B-4833-865F-33DDCCA3500A@gmail.com>
References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
 <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <k5gtdh$nc0$1@ger.gmane.org>
 <0857D79A-6276-433F-9603-D52125CF190F@gmail.com>
 <CAF-QHFUU0hhtRNK1_p9zks2w+e22bfWOtv+XaqgFqTiURcJBbQ@mail.gmail.com>
 <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com>
 <23D7CB3A-BD66-427E-A7F5-6C9D3890EE1B@gmail.com>
 <CAF-QHFWY0drcrUpo7GGD1zQNSDWsEeB_LHAjEbUKrX2ovQHNxw@mail.gmail.com>
To: Ivan Voras <ivoras@freebsd.org>
X-Mailer: Apple Mail (2.1498)
Cc: "freebsd-hackers@freebsd.org Hackers" <freebsd-hackers@freebsd.org>,
 Rick Macklem <rmacklem@uoguelph.ca>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Oct 2012 12:52:13 -0000


On Oct 20, 2012, at 3:11 PM, Ivan Voras <ivoras@freebsd.org> wrote:

> On 20 October 2012 13:42, Nikolay Denev <ndenev@gmail.com> wrote:
>=20
>> Here are the results from testing both patches : =
http://home.totalterror.net/freebsd/nfstest/results.html
>> Both tests ran for about 14 hours ( a bit too much, but I wanted to =
compare different zfs recordsize settings ),
>> and were done first after a fresh reboot.
>> The only noticeable difference seems to be much more context switches =
with Ivan's patch.
>=20
> Thank you very much for your extensive testing!
>=20
> I don't know how to interpret the rise in context switches; as this is
> kernel code, I'd expect no context switches. I hope someone else can
> explain.
>=20
> But, you have also shown that my patch doesn't do any better than
> Rick's even on a fairly large configuration, so I don't think there's
> value in adding the extra complexity, and Rick knows NFS much better
> than I do.
>=20
> But there are a few things other than that I'm interested in: like why
> does your load average spike almost to 20-ties, and how come that with
> 24 drives in RAID-10 you only push through 600 MBit/s through the 10
> GBit/s Ethernet. Have you tested your drive setup locally (AESNI
> shouldn't be a bottleneck, you should be able to encrypt well into
> Gbyte/s range) and the network?
>=20
> If you have the time, could you repeat the tests but with a recent
> Samba server and a CIFS mount on the client side? This is probably not
> important, but I'm just curious of how would it perform on your
> machine.

I've now started this test locally.
But from previous different iozone runs, I remember locally the speed =
was much better,
but I will wait for this test to finish, as the comparison will be =
better.

But I think there is still something fishy=85 I have cases where I have =
reached 1000MB/s over NFS
(from network stats, not local machine stats), but sometimes it is very =
slow even for=20
file completely in ARC. Rick mentioned that this could be due to RPC =
overhead and network round trip time, but
earlier in this thread I've done a test only on the server by mounting =
the NFS exported ZFS dataset locally and did some tests with "dd":

> To take the network out of the equation I redid the test by mounting =
the same filesystem over NFS on the server:
>=20
> [18:23]root@goliath:~#  mount -t nfs -o =
rw,hard,intr,tcp,nfsv3,rsize=3D1048576,wsize=3D1048576 =
localhost:/tank/spa_db/undo /mnt
> [18:24]root@goliath:~# dd if=3D/mnt/data.dbf of=3D/dev/null bs=3D1M=20
> 30720+1 records in
> 30720+1 records out
> 32212262912 bytes transferred in 79.793343 secs (403696120 bytes/sec)
> [18:25]root@goliath:~# dd if=3D/mnt/data.dbf of=3D/dev/null bs=3D1M
> 30720+1 records in
> 30720+1 records out
> 32212262912 bytes transferred in 12.033420 secs (2676900110 bytes/sec)
>=20
> During the first run I saw several nfsd threads in top, along with dd =
and again zero disk I/O.
> There was increase in memory usage because of the double buffering =
ARC->buffercahe.
> The second run was with all of the nfsd threads totally idle, and read =
directly from the buffercache.


From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 20 13:00:11 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 377D7937;
 Sat, 20 Oct 2012 13:00:11 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 8D14D8FC17;
 Sat, 20 Oct 2012 13:00:10 +0000 (UTC)
Received: by mail-wg0-f42.google.com with SMTP id fm10so652080wgb.1
 for <multiple recipients>; Sat, 20 Oct 2012 06:00:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=iCf7zOwb41trqzi1oKnj/9ho1Rc051rOTPPSVQNGpsA=;
 b=SvfoV3DishYCLiATdfSC6iMxwLWV2YTRdQibwtVpKlkYIKSJGrM2P3sXPIZZogoO+n
 Rzab3c4axJAyeHH+EWVmB8aIPQCaJpNEEAuoNfZRnK/7gAzxHcTrXgnyr/1DjsgpkZxt
 iRSvorX6nINTW4kZRTC2jH0zE9sPAAqcBXEAuQUV4v387UHnNG01a6xIUfEe6Ev6IhTy
 VF/9yWjXuCLn1bJSbk3NNb+L15vLjRg1386rHXRiBxVlYHilRg84zqjDm9h57NPQilmJ
 zSzcXr3jzitgbULLmGZ+ROE0SC0B2eDCPho6NVWIxKQhcTdebLWX6aoY8H//Zh2qAJ6x
 nDEA==
Received: by 10.216.203.1 with SMTP id e1mr2538238weo.103.1350738004180;
 Sat, 20 Oct 2012 06:00:04 -0700 (PDT)
Received: from [10.0.0.86] ([93.152.184.10])
 by mx.google.com with ESMTPS id ay10sm9461879wib.2.2012.10.20.06.00.02
 (version=TLSv1/SSLv3 cipher=OTHER);
 Sat, 20 Oct 2012 06:00:03 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <CAF-QHFWY0drcrUpo7GGD1zQNSDWsEeB_LHAjEbUKrX2ovQHNxw@mail.gmail.com>
Date: Sat, 20 Oct 2012 16:00:01 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <C10B14C4-943E-47CC-B6A7-4596A2D11D73@gmail.com>
References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
 <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <k5gtdh$nc0$1@ger.gmane.org>
 <0857D79A-6276-433F-9603-D52125CF190F@gmail.com>
 <CAF-QHFUU0hhtRNK1_p9zks2w+e22bfWOtv+XaqgFqTiURcJBbQ@mail.gmail.com>
 <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com>
 <23D7CB3A-BD66-427E-A7F5-6C9D3890EE1B@gmail.com>
 <CAF-QHFWY0drcrUpo7GGD1zQNSDWsEeB_LHAjEbUKrX2ovQHNxw@mail.gmail.com>
To: Ivan Voras <ivoras@freebsd.org>
X-Mailer: Apple Mail (2.1498)
Cc: "freebsd-hackers@freebsd.org Hackers" <freebsd-hackers@freebsd.org>,
 Rick Macklem <rmacklem@uoguelph.ca>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Oct 2012 13:00:11 -0000


On Oct 20, 2012, at 3:11 PM, Ivan Voras <ivoras@freebsd.org> wrote:

> On 20 October 2012 13:42, Nikolay Denev <ndenev@gmail.com> wrote:
>=20
>> Here are the results from testing both patches : =
http://home.totalterror.net/freebsd/nfstest/results.html
>> Both tests ran for about 14 hours ( a bit too much, but I wanted to =
compare different zfs recordsize settings ),
>> and were done first after a fresh reboot.
>> The only noticeable difference seems to be much more context switches =
with Ivan's patch.
>=20
> Thank you very much for your extensive testing!
>=20
> I don't know how to interpret the rise in context switches; as this is
> kernel code, I'd expect no context switches. I hope someone else can
> explain.
>=20
> But, you have also shown that my patch doesn't do any better than
> Rick's even on a fairly large configuration, so I don't think there's
> value in adding the extra complexity, and Rick knows NFS much better
> than I do.
>=20
> But there are a few things other than that I'm interested in: like why
> does your load average spike almost to 20-ties, and how come that with
> 24 drives in RAID-10 you only push through 600 MBit/s through the 10
> GBit/s Ethernet. Have you tested your drive setup locally (AESNI
> shouldn't be a bottleneck, you should be able to encrypt well into
> Gbyte/s range) and the network?
>=20
> If you have the time, could you repeat the tests but with a recent
> Samba server and a CIFS mount on the client side? This is probably not
> important, but I'm just curious of how would it perform on your
> machine.

The first iozone local run finished, I'll paste just the result here, =
and also the same test over NFS for comparison:
(This is iozone doing 8k sized IO ops, on ZFS dataset with =
recordsize=3D8k)

NFS:
                                                            random  =
random    bkwd   record   stride                                  =20
              KB  reclen   write rewrite    read    reread    read   =
write    read  rewrite     read                                  =20
        33554432       8    4973    5522     2930     2906    2908    =
3886                                         =20

Local:
                                                            random  =
random    bkwd   record   stride                                  =20
              KB  reclen   write rewrite    read    reread    read   =
write    read  rewrite     read                                  =20
        33554432       8   34740   41390   135442   142534   24992   =
12493                                         =20


P.S.: I forgot to mention that the network is with 9K mtu.=

From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 20 18:58:10 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 6E1DECDC;
 Sat, 20 Oct 2012 18:58:10 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50])
 by mx1.freebsd.org (Postfix) with ESMTP id C5CC28FC0A;
 Sat, 20 Oct 2012 18:58:09 +0000 (UTC)
Received: by mail-wg0-f50.google.com with SMTP id 16so1196254wgi.31
 for <multiple recipients>; Sat, 20 Oct 2012 11:58:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=Hx/e2v4Q3S5kLxIH99nIJbATi8H1lEf/1RUNEZ222qQ=;
 b=NEjvDy2+6+IO0RzvPcpzx03w7NGsTuI9ph0+wtlA8tevitJ6aCDNy0fZRurIldtxHk
 y2TKZWFgAjxUjxk2LN7rTob0EB0+provIFCrtJkLijGjEm9gLnJqy0gwa3wVeoun3+g4
 rDYWF4EPVXeGPbOgcEY9QO7CdtpPonp3hmFrCdaG+wXHU8djJ1XqUXxL6b5Uw4UOpdQZ
 C5J58f+Q0XY4sABI+8zqE1E7/mfK8ESvNzdjvk2jmw//7jmMhi4V2jpU95MZZU1DHwBt
 ZVh6AfOnKLOgYcVl2ycCaM2zJURWZvV7Q4tlcxe4qGyEY5IjKcLcHAMCbEOMQ2b56hCu
 7uBw==
Received: by 10.180.106.9 with SMTP id gq9mr10810119wib.12.1350759488552;
 Sat, 20 Oct 2012 11:58:08 -0700 (PDT)
Received: from [10.0.0.86] ([93.152.184.10])
 by mx.google.com with ESMTPS id w8sm41666214wif.4.2012.10.20.11.58.05
 (version=TLSv1/SSLv3 cipher=OTHER);
 Sat, 20 Oct 2012 11:58:07 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=us-ascii
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <C10B14C4-943E-47CC-B6A7-4596A2D11D73@gmail.com>
Date: Sat, 20 Oct 2012 21:58:03 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <BD7A192B-6208-4560-A955-A4B3C3563E5F@gmail.com>
References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca>
 <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <k5gtdh$nc0$1@ger.gmane.org>
 <0857D79A-6276-433F-9603-D52125CF190F@gmail.com>
 <CAF-QHFUU0hhtRNK1_p9zks2w+e22bfWOtv+XaqgFqTiURcJBbQ@mail.gmail.com>
 <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com>
 <23D7CB3A-BD66-427E-A7F5-6C9D3890EE1B@gmail.com>
 <CAF-QHFWY0drcrUpo7GGD1zQNSDWsEeB_LHAjEbUKrX2ovQHNxw@mail.gmail.com>
 <C10B14C4-943E-47CC-B6A7-4596A2D11D73@gmail.com>
To: "freebsd-hackers@freebsd.org Hackers" <freebsd-hackers@freebsd.org>
X-Mailer: Apple Mail (2.1498)
Cc: Rick Macklem <rmacklem@uoguelph.ca>, Ivan Voras <ivoras@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Oct 2012 18:58:10 -0000


On Oct 20, 2012, at 4:00 PM, Nikolay Denev <ndenev@gmail.com> wrote:

>=20
> On Oct 20, 2012, at 3:11 PM, Ivan Voras <ivoras@freebsd.org> wrote:
>=20
>> On 20 October 2012 13:42, Nikolay Denev <ndenev@gmail.com> wrote:
>>=20
>>> Here are the results from testing both patches : =
http://home.totalterror.net/freebsd/nfstest/results.html
>>> Both tests ran for about 14 hours ( a bit too much, but I wanted to =
compare different zfs recordsize settings ),
>>> and were done first after a fresh reboot.
>>> The only noticeable difference seems to be much more context =
switches with Ivan's patch.
>>=20
>> Thank you very much for your extensive testing!
>>=20
>> I don't know how to interpret the rise in context switches; as this =
is
>> kernel code, I'd expect no context switches. I hope someone else can
>> explain.
>>=20
>> But, you have also shown that my patch doesn't do any better than
>> Rick's even on a fairly large configuration, so I don't think there's
>> value in adding the extra complexity, and Rick knows NFS much better
>> than I do.
>>=20
>> But there are a few things other than that I'm interested in: like =
why
>> does your load average spike almost to 20-ties, and how come that =
with
>> 24 drives in RAID-10 you only push through 600 MBit/s through the 10
>> GBit/s Ethernet. Have you tested your drive setup locally (AESNI
>> shouldn't be a bottleneck, you should be able to encrypt well into
>> Gbyte/s range) and the network?
>>=20
>> If you have the time, could you repeat the tests but with a recent
>> Samba server and a CIFS mount on the client side? This is probably =
not
>> important, but I'm just curious of how would it perform on your
>> machine.
>=20
> The first iozone local run finished, I'll paste just the result here, =
and also the same test over NFS for comparison:
> (This is iozone doing 8k sized IO ops, on ZFS dataset with =
recordsize=3D8k)
>=20
> NFS:
>                                                            random  =
random    bkwd   record   stride                                  =20
>              KB  reclen   write rewrite    read    reread    read   =
write    read  rewrite     read                                  =20
>        33554432       8    4973    5522     2930     2906    2908    =
3886                                         =20
>=20
> Local:
>                                                            random  =
random    bkwd   record   stride                                  =20
>              KB  reclen   write rewrite    read    reread    read   =
write    read  rewrite     read                                  =20
>        33554432       8   34740   41390   135442   142534   24992   =
12493                                         =20
>=20
>=20
> P.S.: I forgot to mention that the network is with 9K mtu.


Here are the full results of the test on the local fs :

http://home.totalterror.net/freebsd/nfstest/local_fs/

I'm now running the same test on NFS mount over the loopback interface =
on the NFS server machine.


From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 20 19:29:08 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 16EA27D3
 for <freebsd-hackers@freebsd.org>; Sat, 20 Oct 2012 19:29:08 +0000 (UTC)
 (envelope-from ivoras@gmail.com)
Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com
 [209.85.212.54])
 by mx1.freebsd.org (Postfix) with ESMTP id B6B238FC08
 for <freebsd-hackers@freebsd.org>; Sat, 20 Oct 2012 19:29:07 +0000 (UTC)
Received: by mail-vb0-f54.google.com with SMTP id v11so2112709vbm.13
 for <freebsd-hackers@freebsd.org>; Sat, 20 Oct 2012 12:29:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:from:date
 :x-google-sender-auth:message-id:subject:to:cc:content-type;
 bh=zKMs9CrwkpCz0yGkAB3OOu/AB2aht1AGy/PjsjjeaF8=;
 b=Y8kzFmDttr353vWofQKNUaa4zNA7vsVjHTKXZPS1VIg2JJUo4dbmmLe9wfXQbaf8Eu
 U+efeeeI1f287OK8Q9AC93Zgk8sywuBkvDuTEqMmU2qkfTmu1eT3WQbw0TBOou+5N1Cp
 cQkq0n2hUGLPGrT01+s0LRAoDZUz9iXfGyJBJ6aTh0ngsL0V+k5LnjC3C3+CHrcZLJSI
 nIvxEq2khvUBZl6NJibuEBOA3k6DguLhdMha9CM2VhcmqsLhmFIA1a+Pau//qcsm9JWX
 Uzxy8C2Lt2gQjoLI1xrieVbOJVifRsA79BbABPbGZkk/77oz2mia81u2Is6p2s+6UafD
 5qxA==
Received: by 10.220.208.141 with SMTP id gc13mr7125636vcb.55.1350761346877;
 Sat, 20 Oct 2012 12:29:06 -0700 (PDT)
MIME-Version: 1.0
Sender: ivoras@gmail.com
Received: by 10.59.0.37 with HTTP; Sat, 20 Oct 2012 12:28:25 -0700 (PDT)
In-Reply-To: <191784842.2570110.1350737132305.JavaMail.root@erie.cs.uoguelph.ca>
References: <CAF-QHFWY0drcrUpo7GGD1zQNSDWsEeB_LHAjEbUKrX2ovQHNxw@mail.gmail.com>
 <191784842.2570110.1350737132305.JavaMail.root@erie.cs.uoguelph.ca>
From: Ivan Voras <ivoras@freebsd.org>
Date: Sat, 20 Oct 2012 21:28:25 +0200
X-Google-Sender-Auth: sdtoGD1C3xtDti_eUN65Oyy525w
Message-ID: <CAF-QHFXB=yfD2EPoQf4C8YyX=0BA0Awndg0QNsWO8_rq=StHhQ@mail.gmail.com>
Subject: Re: NFS server bottlenecks
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=UTF-8
Cc: "freebsd-hackers@freebsd.org Hackers" <freebsd-hackers@freebsd.org>,
 Nikolay Denev <ndenev@gmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Oct 2012 19:29:08 -0000

On 20 October 2012 14:45, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Ivan Voras wrote:

>> I don't know how to interpret the rise in context switches; as this is
>> kernel code, I'd expect no context switches. I hope someone else can
>> explain.
>>
> Don't the mtx_lock() calls spin for a little while and then context
> switch if another thread still has it locked?

Yes, but are in-kernel context switches also counted? I was assuming
they are light-weight enough not to count.

> Hmm, I didn't look, but were there any tests using UDP mounts?
> (I would have thought that your patch would mainly affect UDP mounts,
>  since that is when my version still has the single LRU queue/mutex.

Another assumption - I thought UDP was the default.

>  As I think you know, my concern with your patch would be correctness
>  for UDP, not performance.)

Yes.

From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 20 19:45:06 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 7DFB6104;
 Sat, 20 Oct 2012 19:45:06 +0000 (UTC)
 (envelope-from outbackdingo@gmail.com)
Received: from mail-ia0-f182.google.com (mail-ia0-f182.google.com
 [209.85.210.182])
 by mx1.freebsd.org (Postfix) with ESMTP id 3203D8FC0A;
 Sat, 20 Oct 2012 19:45:05 +0000 (UTC)
Received: by mail-ia0-f182.google.com with SMTP id k10so1515452iag.13
 for <multiple recipients>; Sat, 20 Oct 2012 12:45:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=8EPTniMXRD0pdaiNVYFGJU3u0zv3LD/ySgEjTeuarng=;
 b=O0PDbcluqJ1TkFSx3vixPNI+BIYBGc6/AX8KJ9e0pPp8K1mZx+QMgu1J+Yvk4d9wcL
 rPTRVKLVGLvzD+LmvS0k9WYpGrfMf5G1vINQPPaGODJobmdph450P6mWw1wHvLnHrBgv
 k6uQfagrvpFnkITXKuYF5l497raTryWdZTtNFXY0Db53zEVEWT1S9epU4FoWPHFhTw2c
 JycUhfFV10kd1TQWk0Av5QEeZ0k3JoYJn4+W/wvSQIPv9FTvg4QKR2VdHzxw9KbGWbYr
 1HZVFzs1JI+JGDO6SEt5qzmgkzLESfhQf2Y4VBW+TN86cWlMMMWrdEtE1BsbKCzKUxtP
 /E7w==
MIME-Version: 1.0
Received: by 10.50.40.225 with SMTP id a1mr5939442igl.7.1350762305625; Sat, 20
 Oct 2012 12:45:05 -0700 (PDT)
Received: by 10.64.72.135 with HTTP; Sat, 20 Oct 2012 12:45:05 -0700 (PDT)
In-Reply-To: <CAF-QHFXB=yfD2EPoQf4C8YyX=0BA0Awndg0QNsWO8_rq=StHhQ@mail.gmail.com>
References: <CAF-QHFWY0drcrUpo7GGD1zQNSDWsEeB_LHAjEbUKrX2ovQHNxw@mail.gmail.com>
 <191784842.2570110.1350737132305.JavaMail.root@erie.cs.uoguelph.ca>
 <CAF-QHFXB=yfD2EPoQf4C8YyX=0BA0Awndg0QNsWO8_rq=StHhQ@mail.gmail.com>
Date: Sat, 20 Oct 2012 15:45:05 -0400
Message-ID: <CAKYr3zzvb+iJzX8zmxUeH1ZjNEnc1FuuE5SdmYUAgQH84O64Mg@mail.gmail.com>
Subject: Re: NFS server bottlenecks
From: Outback Dingo <outbackdingo@gmail.com>
To: Ivan Voras <ivoras@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: "freebsd-hackers@freebsd.org Hackers" <freebsd-hackers@freebsd.org>,
 Rick Macklem <rmacklem@uoguelph.ca>, Nikolay Denev <ndenev@gmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Oct 2012 19:45:06 -0000

On Sat, Oct 20, 2012 at 3:28 PM, Ivan Voras <ivoras@freebsd.org> wrote:
> On 20 October 2012 14:45, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>> Ivan Voras wrote:
>
>>> I don't know how to interpret the rise in context switches; as this is
>>> kernel code, I'd expect no context switches. I hope someone else can
>>> explain.
>>>
>> Don't the mtx_lock() calls spin for a little while and then context
>> switch if another thread still has it locked?
>
> Yes, but are in-kernel context switches also counted? I was assuming
> they are light-weight enough not to count.
>
>> Hmm, I didn't look, but were there any tests using UDP mounts?
>> (I would have thought that your patch would mainly affect UDP mounts,
>>  since that is when my version still has the single LRU queue/mutex.
>
> Another assumption - I thought UDP was the default.
>
>>  As I think you know, my concern with your patch would be correctness
>>  for UDP, not performance.)
>
> Yes.

Ive got a similar box config here, with 2x 10GB intel nics, and 24 2TB
drives on an LSI controller.
Im watching the thread patiently, im kinda looking for results, and
answers, Though Im also tempted to
run benchmarks on my system also see if i get similar results I also
considered that netmap might be one
but not quite sure if it would help NFS, since its to hard to tell if
its a network bottle neck, though it appears
to be network related.

> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 20 19:53:36 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 94974295;
 Sat, 20 Oct 2012 19:53:36 +0000 (UTC)
 (envelope-from ndenev@gmail.com)
Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50])
 by mx1.freebsd.org (Postfix) with ESMTP id E1B638FC08;
 Sat, 20 Oct 2012 19:53:35 +0000 (UTC)
Received: by mail-wg0-f50.google.com with SMTP id 16so1215971wgi.31
 for <multiple recipients>; Sat, 20 Oct 2012 12:53:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=subject:mime-version:content-type:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to:x-mailer;
 bh=8OJs8Z6barj7ClQvDt1qnWXKmAlqaQhZpsG86iyrlYY=;
 b=o0kEJq7puhWUgm0D7pR0Ip1ghaczPi22n4Z43E4NvvZBRBxDn8I1XUmh+iq3DrDhCt
 V/rl+4qicmG4rtUC+W4m31Lv7JDaDiJqZoddMD7oP0KDHfXBkjouHSKOHwuRHazWc9J7
 xPDOn4rMBksRxQxeCkkwK1/HFUCGHx3E4AiDaARFInS2fPUf2KVPvnvJt9cidHYW54Wo
 UgNpBU9hMOdr/KXweVgyT2u92nh9aCxwr0yS5j8Z9CwlT986sqoEXuuHLAe4Hcv1loXT
 h/QBVut176+6+s59dkab6SEwB5CC5RRS3rPXQFymTmBHknrxgWt9qHD8Tf6k+7giZQSj
 JrZQ==
Received: by 10.180.95.130 with SMTP id dk2mr10928240wib.18.1350762814815;
 Sat, 20 Oct 2012 12:53:34 -0700 (PDT)
Received: from [10.0.0.86] ([93.152.184.10])
 by mx.google.com with ESMTPS id eq2sm11472426wib.1.2012.10.20.12.53.33
 (version=TLSv1/SSLv3 cipher=OTHER);
 Sat, 20 Oct 2012 12:53:34 -0700 (PDT)
Subject: Re: NFS server bottlenecks
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))
Content-Type: text/plain; charset=iso-8859-1
From: Nikolay Denev <ndenev@gmail.com>
In-Reply-To: <CAKYr3zzvb+iJzX8zmxUeH1ZjNEnc1FuuE5SdmYUAgQH84O64Mg@mail.gmail.com>
Date: Sat, 20 Oct 2012 22:53:31 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <A16B8A34-D365-4A4B-9EB4-F2419B6BC430@gmail.com>
References: <CAF-QHFWY0drcrUpo7GGD1zQNSDWsEeB_LHAjEbUKrX2ovQHNxw@mail.gmail.com>
 <191784842.2570110.1350737132305.JavaMail.root@erie.cs.uoguelph.ca>
 <CAF-QHFXB=yfD2EPoQf4C8YyX=0BA0Awndg0QNsWO8_rq=StHhQ@mail.gmail.com>
 <CAKYr3zzvb+iJzX8zmxUeH1ZjNEnc1FuuE5SdmYUAgQH84O64Mg@mail.gmail.com>
To: Outback Dingo <outbackdingo@gmail.com>
X-Mailer: Apple Mail (2.1498)
Cc: "freebsd-hackers@freebsd.org Hackers" <freebsd-hackers@freebsd.org>,
 Rick Macklem <rmacklem@uoguelph.ca>, Ivan Voras <ivoras@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Oct 2012 19:53:36 -0000


On Oct 20, 2012, at 10:45 PM, Outback Dingo <outbackdingo@gmail.com> =
wrote:

> On Sat, Oct 20, 2012 at 3:28 PM, Ivan Voras <ivoras@freebsd.org> =
wrote:
>> On 20 October 2012 14:45, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>>> Ivan Voras wrote:
>>=20
>>>> I don't know how to interpret the rise in context switches; as this =
is
>>>> kernel code, I'd expect no context switches. I hope someone else =
can
>>>> explain.
>>>>=20
>>> Don't the mtx_lock() calls spin for a little while and then context
>>> switch if another thread still has it locked?
>>=20
>> Yes, but are in-kernel context switches also counted? I was assuming
>> they are light-weight enough not to count.
>>=20
>>> Hmm, I didn't look, but were there any tests using UDP mounts?
>>> (I would have thought that your patch would mainly affect UDP =
mounts,
>>> since that is when my version still has the single LRU queue/mutex.
>>=20
>> Another assumption - I thought UDP was the default.
>>=20
>>> As I think you know, my concern with your patch would be correctness
>>> for UDP, not performance.)
>>=20
>> Yes.
>=20
> Ive got a similar box config here, with 2x 10GB intel nics, and 24 2TB
> drives on an LSI controller.
> Im watching the thread patiently, im kinda looking for results, and
> answers, Though Im also tempted to
> run benchmarks on my system also see if i get similar results I also
> considered that netmap might be one
> but not quite sure if it would help NFS, since its to hard to tell if
> its a network bottle neck, though it appears
> to be network related.
>=20
>> _______________________________________________
>> freebsd-hackers@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to =
"freebsd-hackers-unsubscribe@freebsd.org"

Doesn't look like network issue to me. =46rom my observations it's more =
like some overhead in nfs and arc.
The boxes easily push 10G with simple iperf test.
Running two iperf test over each port of the dual ported 10G nics gives =
960MB/sec regardles which machine is the server.
Also, I've seen over 960Gb/sec over NFS with this setup, but I can't =
understand what type of workload was able to do this.
At some point I was able to do this with simple dd, then after a reboot =
I was no longer to push this traffic.
I'm thinking something like ARC/kmem fragmentation might be the issue?
=20


From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct 20 21:03:11 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id D8D84E2D;
 Sat, 20 Oct 2012 21:03:11 +0000 (UTC)
 (envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44])
 by mx1.freebsd.org (Postfix) with ESMTP id 79C038FC0A;
 Sat, 20 Oct 2012 21:03:10 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EAPUQg1CDaFvO/2dsb2JhbAA7CIYUvAKCIAEBAQMBAQEBICsgCxsYAgINGQIpAQkmBggHBAEcBIddBguoYJITgSCKPxYEhUOBEgOTRIItgRePIoMLgUc1
X-IronPort-AV: E=Sophos;i="4.80,622,1344225600"; d="scan'208";a="184554476"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.206])
 by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 20 Oct 2012 17:03:09 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id BBB63B402D;
 Sat, 20 Oct 2012 17:03:09 -0400 (EDT)
Date: Sat, 20 Oct 2012 17:03:09 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Outback Dingo <outbackdingo@gmail.com>
Message-ID: <1800695432.2577499.1350766989710.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <CAKYr3zzvb+iJzX8zmxUeH1ZjNEnc1FuuE5SdmYUAgQH84O64Mg@mail.gmail.com>
Subject: Re: NFS server bottlenecks
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: "freebsd-hackers@freebsd.org Hackers" <freebsd-hackers@freebsd.org>,
 Ivan Voras <ivoras@freebsd.org>, Nikolay Denev <ndenev@gmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Oct 2012 21:03:12 -0000

Outback Dingo wrote:
> On Sat, Oct 20, 2012 at 3:28 PM, Ivan Voras <ivoras@freebsd.org>
> wrote:
> > On 20 October 2012 14:45, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> >> Ivan Voras wrote:
> >
> >>> I don't know how to interpret the rise in context switches; as
> >>> this is
> >>> kernel code, I'd expect no context switches. I hope someone else
> >>> can
> >>> explain.
> >>>
> >> Don't the mtx_lock() calls spin for a little while and then context
> >> switch if another thread still has it locked?
> >
> > Yes, but are in-kernel context switches also counted? I was assuming
> > they are light-weight enough not to count.
> >
> >> Hmm, I didn't look, but were there any tests using UDP mounts?
> >> (I would have thought that your patch would mainly affect UDP
> >> mounts,
> >>  since that is when my version still has the single LRU
> >>  queue/mutex.
> >
> > Another assumption - I thought UDP was the default.
> >
TCP has been the default for a FreeBSD client for a long time. It was
changed for the old NFS client before I became a committer. (You can
explicitly set one or the other as mount options or check via wireshark/tcpdump)

> >>  As I think you know, my concern with your patch would be
> >>  correctness
> >>  for UDP, not performance.)
> >
> > Yes.
> 
> Ive got a similar box config here, with 2x 10GB intel nics, and 24 2TB
> drives on an LSI controller.
> Im watching the thread patiently, im kinda looking for results, and
> answers, Though Im also tempted to
> run benchmarks on my system also see if i get similar results I also
> considered that netmap might be one
> but not quite sure if it would help NFS, since its to hard to tell if
> its a network bottle neck, though it appears
> to be network related.
> 
NFS network traffic looks very different that a TCP stream (ala bit torrent
or ...). I've seen this cause issues before. You can look at a packet trace
in wireshark and see if TCP is retransmitting segments.

rick

> > _______________________________________________
> > freebsd-hackers@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> > To unsubscribe, send any mail to
> > "freebsd-hackers-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe@freebsd.org"