From owner-freebsd-hackers@FreeBSD.ORG Sun Oct 14 14:42:29 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3B9793D2 for ; Sun, 14 Oct 2012 14:42:29 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (unknown [IPv6:2001:610:1108:5012::107]) by mx1.freebsd.org (Postfix) with ESMTP id C964C8FC12 for ; Sun, 14 Oct 2012 14:42:27 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 5C9351203CC; Sun, 14 Oct 2012 16:42:23 +0200 (CEST) Received: by snail.stack.nl (Postfix, from userid 1677) id 299192848C; Sun, 14 Oct 2012 16:42:23 +0200 (CEST) Date: Sun, 14 Oct 2012 16:42:23 +0200 From: Jilles Tjoelker To: Eitan Adler Subject: Re: -lpthread vs -pthread: does -D_REENTRANT matter? Message-ID: <20121014144222.GA14503@stack.nl> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: FreeBSD Hackers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Oct 2012 14:42:29 -0000 On Mon, Oct 08, 2012 at 12:17:08PM -0400, Eitan Adler wrote: > The only difference between -lpthread and -pthread that I could see is > that the latter also sets -D_REENTRANT. > However, I can't find any uses of _REENTRANT anywhere outside of a few > utilities that seem to define it manually. > Testing with various manually written pthread programs resulted in > identical binaries, let alone identical results. > Is there an actual difference between -pthread and -lpthread or is > this just a historical artifact? In some cases, -pthread also affects the compiler's code generation. On some RISC architectures, compilers may try to avoid loads and stores of less than 32 bits. For example (untested): struct { int n; char a, b, c, d; } *p; p->a = p->b = p->c = 0; The compiler might load p->d and then store the 32 bits containing a, b, c and d at once. This causes a race condition if p->d is written concurrently. Because C99 does not specify threading, it allows these transformations. In C11, they are forbidden. Passing -pthread disables them as well. -- Jilles Tjoelker From owner-freebsd-hackers@FreeBSD.ORG Sun Oct 14 16:19:49 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3E9265C8 for ; Sun, 14 Oct 2012 16:19:49 +0000 (UTC) (envelope-from dcherednik@roshianokatachi.com) Received: from smtp.nanocore.sportcomitet.org (unknown [IPv6:2a01:4f8:d13:2941::1:3]) by mx1.freebsd.org (Postfix) with ESMTP id AE8748FC0A for ; Sun, 14 Oct 2012 16:19:48 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp.nanocore.sportcomitet.org (Postfix) with SMTP id A666AC03A1 for ; Sun, 14 Oct 2012 20:19:47 +0400 (MSK) Received: from [192.168.11.92] (ppp91-76-136-49.pppoe.mtu-net.ru [91.76.136.49]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: dcherednik@roshianokatachi.com) by smtp.nanocore.sportcomitet.org (Postfix) with ESMTPSA id 574D3C01B5; Sun, 14 Oct 2012 20:19:46 +0400 (MSK) Message-ID: <507AE61D.7030709@roshianokatachi.com> Date: Sun, 14 Oct 2012 20:19:41 +0400 From: Daniil Cherednik User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Subject: Re: Fast syscalls via sysenter References: <201206182256.30535.dcherednik@roshianokatachi.com> <201206210811.20427.jhb@freebsd.org> <4FE55F91.5070303@gmail.com> <20120623165823.GX2337@deviant.kiev.zoral.com.ua> In-Reply-To: <20120623165823.GX2337@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-DSPAM-Result: Innocent X-DSPAM-Processed: Sun Oct 14 20:19:47 2012 X-DSPAM-Confidence: 1.0000 X-DSPAM-Improbability: 1 in 98689409 chance of being spam X-DSPAM-Probability: 0.0023 X-DSPAM-Signature: 24,507ae62312612005967964 X-DSPAM-Factors: 27, amd64+#+reasonable, 0.40000, in+#+#+#+current, 0.40000, References*gmail.com+#+deviant.kiev.zoral.com.ua, 0.40000, situation+#+#+is, 0.40000, shared+#+content, 0.40000, Message-ID*507AE61D.7030709+roshianokatachi.com, 0.40000, done+though, 0.40000, 9+#+#+p4, 0.40000, was+#+several, 0.40000, Baldwin+#+#+Monday, 0.40000, would+#+#+solution, 0.40000, function+No, 0.40000, time+#+#+#+like, 0.40000, c+#+No, 0.40000, On+Monday, 0.40000, David+#+#+#+2012, 0.40000, to+#+#+#+to, 0.40000, On+#+#+#+using, 0.40000, rules+#+#+#+see, 0.40000, know+#+#+#+it, 0.40000, vdso+syscall, 0.40000, beginner+#+kernel, 0.40000, Subject*Re+#+#+via, 0.40000, Received*Postfix+with, 0.40000, pushl+#+#+3, 0.40000, looks+#+#+#+some, 0.40000, calls+#+#+#+values, 0.40000 Cc: Konstantin Belousov , davidxu@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Oct 2012 16:19:49 -0000 On 06/23/2012 08:58 PM, Konstantin Belousov wrote: > On Sat, Jun 23, 2012 at 02:17:53PM +0800, David Xu wrote: >> On 2012/06/21 20:11, John Baldwin wrote: >>> On Monday, June 18, 2012 2:56:30 pm Daniil Cherednik wrote: >>>> Hi! >>>> >>>> I am trying to continue the work started by DavidXu on implemention of >>>> fast >>>> syscalls via sysenter/sysexit. >>>> http://people.freebsd.org/~davidxu/sysenter/kernel/ >>>> I have ported it on FreeBSD9. It looks like it works. Unfortunately I am a >>>> beginner in kernel so I have some questions: >>>> >>>> 1. see http://people.freebsd.org/~davidxu/sysenter/kernel/kernel.patch >>>> /* >>>> * If %edx was changed, we can not use sysexit, because it >>>> * needs %edx to restore userland %eip. >>>> */ >>>> if (orig_edx != frame.tf_edx) >>>> td->td_pcb->pcb_flags |= PCB_FULLCTX; >>>> >>>> What is the reason why we have to do this additional check? In >>>> http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s >>>> we store %edx to the stack in >>>> pushl %edx /* ring 3 next %eip */ >>>> and we restore the register in >>>> popl %edx /* ring 3 %eip */ >>> Some system calls return two return values (pipe(2)) or return a 64-bit >>> off_t (lseek(2)). Those system calls change %edx's value and need that >>> changed value to make it out to userland. >>> >>>> 2. see http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s >>>> movl PCPU(CURPCB),%esi >>>> call syscall >>>> >>>> Why do we movl PCPU(CURPCB),%esi before calling syscall? syscall is just >>>> c- >>>> function. >>> No clue on this one, looks like it is not needed. >>> >> [kib@ is cc'ed] >> I implemented the sysenter syscall long time ago, it indeed can reduce >> system call overhead on i386. I think it might be the time to implement >> linux like vdso syscall now based on the work kib@ recently has done, >> though I don''t know how to hook it into kib's code. >> I quick googled it, and found they put some data into aux vector: >> http://www.trilithium.com/johan/2005/08/linux-gate/ >> http://www.takatan.net/lxr/source/arch/um/os-Linux/elf_aux.c?a=x86_64#L40 > Yes, intent is to eventually switch to VDSO from current situation were > libc is aware of shared page content. This was extensively discussed in > flame that resulted in me writing the current gettimeofday(2) patch. > It was arch@ several weeks ago, AFAIR. > > Committed gettimeofday() code structure allows for VDSO interposing without > breaking normal symbol visibility rules. > > I do not see a sense in implementing syscall or sysenter support for > i386 kernel. On the other hand, using syscall for 32bit binaries on amd64 > looks reasonable. I was not able to write some time, sorry. So. What about implementing vdso now? I know it was a patch and feature request http://lists.freebsd.org/pipermail/freebsd-bugs/2010-April/039597.html About sysenter: I have ported sysenter patch for 9.0-RELEASE-p4, it looks fine. I made some fixes in SYS.h. The reason is (if i understand it right) we have to get elf without DT_TEXTREL in ld-elf.so You can find the patch here: https://redmine.sportcomitet.org/projects/dev-freebsd/repository/revisions/master/raw/sysenter.patch https://redmine.sportcomitet.org/projects/dev-freebsd/repository/revisions/master/raw/sys/i386/i386/sysenter.s But now, this patch breaks compatibility with i386 XEN PV kernel. I wanted to fix it, but without VDSO it would be limited solution. It is one of reasons why I am interested about vdso status. So, about using 32bit binaries on amd64. It is reasonable. But if we will use it I think we have to implement vdso support in i386 kernel too for compatibility and it is better to implement sysenter too. From owner-freebsd-hackers@FreeBSD.ORG Sun Oct 14 16:53:08 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 307E9F32 for ; Sun, 14 Oct 2012 16:53:08 +0000 (UTC) (envelope-from freebsd-hackers@m.gmane.org) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) by mx1.freebsd.org (Postfix) with ESMTP id DBF7D8FC12 for ; Sun, 14 Oct 2012 16:53:07 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1TNRR1-00058J-Vg for freebsd-hackers@freebsd.org; Sun, 14 Oct 2012 18:53:03 +0200 Received: from dsl-hkibrasgw3-ffd6c300-228.dhcp.inet.fi ([88.195.214.228]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 14 Oct 2012 18:53:03 +0200 Received: from rakuco by dsl-hkibrasgw3-ffd6c300-228.dhcp.inet.fi with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 14 Oct 2012 18:53:03 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-hackers@freebsd.org From: Raphael Kubo da Costa Subject: Re: -lpthread vs -pthread: does -D_REENTRANT matter? Date: Sun, 14 Oct 2012 19:52:59 +0300 Lines: 24 Message-ID: <87zk3pnggk.fsf@FreeBSD.org> References: <20121014144222.GA14503@stack.nl> Mime-Version: 1.0 Content-Type: text/plain X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: dsl-hkibrasgw3-ffd6c300-228.dhcp.inet.fi User-Agent: Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.2 (berkeley-unix) Cancel-Lock: sha1:bHLg9z9aPYiFLmKemVP/DHaRnN8= X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Oct 2012 16:53:08 -0000 Jilles Tjoelker writes: > On Mon, Oct 08, 2012 at 12:17:08PM -0400, Eitan Adler wrote: >> The only difference between -lpthread and -pthread that I could see is >> that the latter also sets -D_REENTRANT. >> However, I can't find any uses of _REENTRANT anywhere outside of a few >> utilities that seem to define it manually. > >> Testing with various manually written pthread programs resulted in >> identical binaries, let alone identical results. > >> Is there an actual difference between -pthread and -lpthread or is >> this just a historical artifact? > > In some cases, -pthread also affects the compiler's code generation. On > some RISC architectures, compilers may try to avoid loads and stores of > less than 32 bits. [...] > Because C99 does not specify threading, it allows these transformations. > In C11, they are forbidden. Passing -pthread disables them as well. And does this not happen at all if one uses -lpthread instead? From owner-freebsd-hackers@FreeBSD.ORG Sun Oct 14 20:55:38 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4A61CF8 for ; Sun, 14 Oct 2012 20:55:38 +0000 (UTC) (envelope-from lists@eitanadler.com) Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 10FE48FC0A for ; Sun, 14 Oct 2012 20:55:37 +0000 (UTC) Received: by mail-pb0-f54.google.com with SMTP id rp8so4597923pbb.13 for ; Sun, 14 Oct 2012 13:55:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eitanadler.com; s=0xdeadbeef; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=m5rzsz88BPSza7kF+Hk4yrimrNG5v3EejbNq2dBx7es=; b=UlPxpIUQD0EggJ3kfIa3eIPiOB+3qojUq7OM+/VOT98BLpoeI1rS+VUHd3WywML9Or ON/X/pKMJQePHrW3pqarsY6UR2YnBlNmqnRE7ehzvaPy3pnBChqEfTqxeIpJIgQ12M8y 21tUIDELLoh28kQL3j8VwVJuWLm/3s9qltIcs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-gm-message-state; bh=m5rzsz88BPSza7kF+Hk4yrimrNG5v3EejbNq2dBx7es=; b=JgJ9zEnsx89t5ONVEpjiaRpCUu1tljb/jBSzeoa3/o5KPtDGt8f6V6yrOw4EIeZaf9 SY3i2V1BXej3hvVJiwlHNXjj6iOgWDwsMB6bYT4kNThhmeRtLdfQp/OqP3Fp16JEhVb7 WBrcFf+yEmq6FuLfGKuzhD9CzymsGpOZgiBh35r3UWwLrc8Ij0Xpb8x7nm3saaPZj1Bw UEM8f0QZ1wH/wL5ARgZmnXO8SkkCTtmBJMuWERV8gXos0u3fqU03lTOX+wWfjXTjzMxp Z6BXlHKpOLurXQH03vbJhquKzRv/zeA6nKSOHrhHkWanNPePkQtm7f/igeMInu1LK3Fu p55Q== Received: by 10.66.79.65 with SMTP id h1mr3521393pax.71.1350248137633; Sun, 14 Oct 2012 13:55:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.66.161.163 with HTTP; Sun, 14 Oct 2012 13:55:07 -0700 (PDT) In-Reply-To: <20121014144222.GA14503@stack.nl> References: <20121014144222.GA14503@stack.nl> From: Eitan Adler Date: Sun, 14 Oct 2012 16:55:07 -0400 Message-ID: Subject: Re: -lpthread vs -pthread: does -D_REENTRANT matter? To: Jilles Tjoelker Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQmCW4h+CBQAn5jRCVajiVFxci0rTxP/zLIaKNTRozenTyZw6/pL9KCDgSIJ36OHZitDahdF Cc: FreeBSD Hackers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Oct 2012 20:55:38 -0000 On 14 October 2012 10:42, Jilles Tjoelker wrote: > Because C99 does not specify threading, it allows these transformations. > In C11, they are forbidden. Passing -pthread disables them as well. Is the man page wrong or do I misunderstand? This option sets flags for both the preprocessor and linker. It does not affect the thread safety of object code produced by the compiler or that of libraries supplied with it. -- Eitan Adler From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 15 11:52:32 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 988C2F62 for ; Mon, 15 Oct 2012 11:52:32 +0000 (UTC) (envelope-from freebsd-hackers@m.gmane.org) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) by mx1.freebsd.org (Postfix) with ESMTP id 496E88FC0A for ; Mon, 15 Oct 2012 11:52:30 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1TNjDn-0005Xi-Q1 for freebsd-hackers@freebsd.org; Mon, 15 Oct 2012 13:52:35 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 15 Oct 2012 13:52:35 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 15 Oct 2012 13:52:35 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-hackers@freebsd.org From: Ivan Voras Subject: Re: NFS server bottlenecks Date: Mon, 15 Oct 2012 13:52:14 +0200 Lines: 40 Message-ID: References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca> <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig01F90BCB0CE5715825E41507" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:14.0) Gecko/20120812 Thunderbird/14.0 In-Reply-To: <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> X-Enigmail-Version: 1.4.3 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 11:52:32 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig01F90BCB0CE5715825E41507 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 13/10/2012 17:22, Nikolay Denev wrote: > drc3.patch applied and build cleanly and shows nice improvement! >=20 > I've done a quick benchmark using iozone over the NFS mount from the Li= nux host. >=20 Hi, If you are already testing, could you please also test this patch: http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch It should apply to HEAD without Rick's patches. It's a bit different approach than Rick's, breaking down locks even more.= --------------enig01F90BCB0CE5715825E41507 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlB7+O8ACgkQ/QjVBj3/HSx2ngCdE8Ab3oSlQY4uF+hzaMG2dOqK 3PwAn2FLAC/FsS36u4/5UljuM8qsTHym =FhY4 -----END PGP SIGNATURE----- --------------enig01F90BCB0CE5715825E41507-- From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 15 13:41:43 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7D7AF39E; Mon, 15 Oct 2012 13:41:43 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id DA9598FC16; Mon, 15 Oct 2012 13:41:42 +0000 (UTC) Received: by mail-wg0-f50.google.com with SMTP id 16so4160000wgi.31 for ; Mon, 15 Oct 2012 06:41:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=yOMhe4W/6hVDQgGMmzza+LSqsVDzldd2LqJR8geimAg=; b=QJVtMXJtLKPRK0cTOdh+e2N/sviqmzMaxHYyhALEfKyUQ/t0JKWa2gbW8nJuB50xyC oFVylLtfjj+p48zlyz45NVWPnsCn1T37jLLRi9AvhG7lec32xKKp9qh4L+XzSkakN1+c cR0uT4SULFlNsuCBufVKtSXvLemSRLstggSA93zDTvNdLaleaONwkYNCq3Rt9oSmlhan H2ZbjUMj70Nm0hlI6kAZQt1Qn7oFuWP54mhj1f/lR+cUmHX0wzNeuJ3JwYYy1bxb6UAk XHknq+soArTye9a5sLskIXePEf5I/0ctoqPFqPdvD+mJ8N42tdq3dBIvTnLMf9UvZEYv WoIw== Received: by 10.180.102.131 with SMTP id fo3mr23997987wib.1.1350308501549; Mon, 15 Oct 2012 06:41:41 -0700 (PDT) Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com. [217.18.249.148]) by mx.google.com with ESMTPS id m14sm14025191wie.8.2012.10.15.06.41.39 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 15 Oct 2012 06:41:40 -0700 (PDT) Subject: Re: NFS server bottlenecks Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Content-Type: text/plain; charset=us-ascii From: Nikolay Denev In-Reply-To: Date: Mon, 15 Oct 2012 16:41:38 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <752224AF-F6B6-413C-8597-61829800E0BC@gmail.com> References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca> <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> To: Ivan Voras X-Mailer: Apple Mail (2.1498) Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 13:41:43 -0000 On Oct 15, 2012, at 2:52 PM, Ivan Voras wrote: > On 13/10/2012 17:22, Nikolay Denev wrote: >=20 >> drc3.patch applied and build cleanly and shows nice improvement! >>=20 >> I've done a quick benchmark using iozone over the NFS mount from the = Linux host. >>=20 >=20 > Hi, >=20 > If you are already testing, could you please also test this patch: >=20 > http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch >=20 > It should apply to HEAD without Rick's patches. >=20 > It's a bit different approach than Rick's, breaking down locks even = more. >=20 I will try to apply it to RELENG_9 as that's what I'm running and = compare the results. From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 15 14:31:55 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1A321381; Mon, 15 Oct 2012 14:31:55 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42]) by mx1.freebsd.org (Postfix) with ESMTP id 771528FC16; Mon, 15 Oct 2012 14:31:54 +0000 (UTC) Received: by mail-wg0-f42.google.com with SMTP id fm10so227346wgb.1 for ; Mon, 15 Oct 2012 07:31:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=2XGEsJ0S/5NHKzrTI5cvpDNx5j0GmK+mAdfGoRlgRTs=; b=0mKNrVPvn7S6O4tHoMc4B1/7LNgJTD/4g1AyuYl7KqDDfEURRTuNXZe5uu+2iTcQAz 6XDA/ycETS1iWm8AOtDTx3VRr3UJuYHTjDYfD/9dxj2oSZi+kkdzSGg7qCKAuK7itSqx H0gp8kkDekqxhesgJnTqNT++NPcpXdVmsSCtCp8tz5vz3DuEoM0ZfSeXfTptNqZzUhsF waQ8vqADgSZlhuX+n0BVLUrZZWq1OyLqnkKCcwx3JJJ8luLr4Z6IaY6sPtG8YAC//5qy tsgpAKEnPQiL4xhO4WcHRsT+JMlaA7qVGWWdlk4ds2XSq1dgURFERADPqX0PGRqG94b3 G7mw== Received: by 10.180.87.34 with SMTP id u2mr24301324wiz.4.1350311513236; Mon, 15 Oct 2012 07:31:53 -0700 (PDT) Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com. [217.18.249.148]) by mx.google.com with ESMTPS id bn7sm16148172wib.8.2012.10.15.07.31.51 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 15 Oct 2012 07:31:51 -0700 (PDT) Subject: Re: NFS server bottlenecks Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Content-Type: text/plain; charset=us-ascii From: Nikolay Denev In-Reply-To: Date: Mon, 15 Oct 2012 17:31:50 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <0857D79A-6276-433F-9603-D52125CF190F@gmail.com> References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca> <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> To: Ivan Voras X-Mailer: Apple Mail (2.1498) Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 14:31:55 -0000 On Oct 15, 2012, at 2:52 PM, Ivan Voras wrote: > On 13/10/2012 17:22, Nikolay Denev wrote: >=20 >> drc3.patch applied and build cleanly and shows nice improvement! >>=20 >> I've done a quick benchmark using iozone over the NFS mount from the = Linux host. >>=20 >=20 > Hi, >=20 > If you are already testing, could you please also test this patch: >=20 > http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch >=20 > It should apply to HEAD without Rick's patches. >=20 > It's a bit different approach than Rick's, breaking down locks even = more. >=20 Applied and compiled OK, I will be able to test it tomorrow. From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 15 14:34:54 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 00CB14BF for ; Mon, 15 Oct 2012 14:34:53 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-qc0-f182.google.com (mail-qc0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id A827D8FC08 for ; Mon, 15 Oct 2012 14:34:53 +0000 (UTC) Received: by mail-qc0-f182.google.com with SMTP id l39so5089168qcs.13 for ; Mon, 15 Oct 2012 07:34:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=GtQaUSCGUYwr9Q823OR3frvUr5WBCW4SXWpCkkve7Gg=; b=jkoIapGXbbN3JTQRxuvUiP/ddvEhadGGOuZV4s3QbGuLNcGAv5oDuY1QDB37davr2O gQ8gkW8VVOVYxaZZLLvujNVCkgGMKFFCuELJHJ30Pz4gHCjU3XYL0jwXXRCSSpxvRFmw VMpmycuibXpekSoMQVX2lYbRd7cmARqM4lsKbpY/LtS6ewBBfynkR4WDsje8Hy3YAK4A PNs2ZnnQcm4m2l84WeBvXNbrWgdGDtiFfcE79/dRUxUeeGK8W32zSomJXPkUq7X39iNa /0XDh8v8aJMwmk1/POseKDsmFynDB6K2jNKNhcyhcvD8B+fJp6IOgzQp4rT0hs6igB/Y 7n/g== Received: by 10.224.188.200 with SMTP id db8mr20814938qab.86.1350311687553; Mon, 15 Oct 2012 07:34:47 -0700 (PDT) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.49.82.231 with HTTP; Mon, 15 Oct 2012 07:34:07 -0700 (PDT) In-Reply-To: <0857D79A-6276-433F-9603-D52125CF190F@gmail.com> References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca> <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <0857D79A-6276-433F-9603-D52125CF190F@gmail.com> From: Ivan Voras Date: Mon, 15 Oct 2012 16:34:07 +0200 X-Google-Sender-Auth: HkkKm6ZDX-JowGb2TX1H2Dk6O5I Message-ID: Subject: Re: NFS server bottlenecks To: Nikolay Denev Content-Type: text/plain; charset=UTF-8 Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 14:34:54 -0000 On 15 October 2012 16:31, Nikolay Denev wrote: > > On Oct 15, 2012, at 2:52 PM, Ivan Voras wrote: >> http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch >> >> It should apply to HEAD without Rick's patches. >> >> It's a bit different approach than Rick's, breaking down locks even more. > > Applied and compiled OK, I will be able to test it tomorrow. Ok, thanks! The differences should be most visible in edge cases with a larger number of nfsd processes (16+) and many CPU cores. From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 15 15:17:28 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1D50881C; Mon, 15 Oct 2012 15:17:28 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id E4DE18FC12; Mon, 15 Oct 2012 15:17:27 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 3DB1BB924; Mon, 15 Oct 2012 11:17:27 -0400 (EDT) From: John Baldwin To: Rick Macklem Subject: Re: NFS server bottlenecks Date: Mon, 15 Oct 2012 11:08:09 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; ) References: <611092759.2189637.1350133402953.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <611092759.2189637.1350133402953.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201210151108.09113.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 15 Oct 2012 11:17:27 -0400 (EDT) Cc: Nikolay Denev , Garrett Wollman , FreeBSD Hackers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 15:17:28 -0000 On Saturday, October 13, 2012 9:03:22 am Rick Macklem wrote: > rick > ps: I hope John doesn't mind being added to the cc list yet again. It's > just that I suspect he knows a fair bit about mutex implementation > and possible hardware cache line effects. Currently mtx_pool just uses a simple array (I have patches to force the array members to be cache-aligned, but they haven't been shown to help in any benchmarks to date). I do think though that I would prefer embedding the mutexes in the hash table entries directly. This is what we do for the turnstile and sleep queue hash tables. -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 15 20:58:18 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8FEF1350; Mon, 15 Oct 2012 20:58:18 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 34CA68FC08; Mon, 15 Oct 2012 20:58:17 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EABp4fFCDaFvO/2dsb2JhbABFFoV8un6CIAEBBAEjVgUWDgoCAg0ZAlkGiBEGC6oFkwmBIYo4hSuBEgOVbIEVjxuDCYF7 X-IronPort-AV: E=Sophos;i="4.80,590,1344225600"; d="scan'208";a="186505520" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 15 Oct 2012 16:58:16 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4A5B1B4039; Mon, 15 Oct 2012 16:58:16 -0400 (EDT) Date: Mon, 15 Oct 2012 16:58:16 -0400 (EDT) From: Rick Macklem To: Ivan Voras Message-ID: <1516511249.2287339.1350334696127.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: NFS server bottlenecks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 20:58:18 -0000 Ivan Voras wrote: > On 13/10/2012 17:22, Nikolay Denev wrote: > > > drc3.patch applied and build cleanly and shows nice improvement! > > > > I've done a quick benchmark using iozone over the NFS mount from the > > Linux host. > > > > Hi, > > If you are already testing, could you please also test this patch: > > http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch > I don't think (it is hard to test this) your trim cache algorithm will choose the correct entries to delete. The problem is that UDP entries very seldom time out (unless the NFS server isn't seeing hardly any load) and are mostly trimmed because the size exceeds the highwater mark. With your code, it will clear out all of the entries in the first hash buckets that aren't currently busy, until the total count drops below the high water mark. (If you monitor a busy server with "nfsstat -e -s", you'll see the cache never goes below the high water mark, which is 500 by default.) This would delete entries of fairly recent requests. If you are going to replace the global LRU list with ones for each hash bucket, then you'll have to compare the time stamps on the least recently used entries of all the hash buckets and then delete those. If you keep the timestamp of the least recent one for that hash bucket in the hash bucket head, you could at least use that to select which bucket to delete from next, but you'll still need to: - lock that hash bucket - delete a few entries from that bucket's lru list - unlock hash bucket - repeat for various buckets until the count is beloew the high water mark Or something like that. I think you'll find it a lot more work that one LRU list and one mutex. Remember that mutex isn't held for long. Btw, the code looks very nice. (If I was being a style(9) zealot, I'd remind you that it likes "return (X);" and not "return X;". rick > It should apply to HEAD without Rick's patches. > > It's a bit different approach than Rick's, breaking down locks even > more. From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 15 21:58:28 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 49512758 for ; Mon, 15 Oct 2012 21:58:28 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id F106A8FC0C for ; Mon, 15 Oct 2012 21:58:27 +0000 (UTC) Received: by mail-vb0-f54.google.com with SMTP id v11so7416671vbm.13 for ; Mon, 15 Oct 2012 14:58:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=tJnSc+/HlFMB8Izt5qAntTCXsSyQVlSF+S8w2v+54Us=; b=oOMtU+wBsxXV1y5zjhyl5oH0K8Y1tEcStzcs6ToVXxGU01IUaWpy4hcnXQW4/3HMoz TpjLt0noi6WeqIzJ2PUgeKKKXtACetPIFyIVokwf7xukLLCo5TN8wpHeDS9p6nP2uWEL udmcsh+8SvT/65tVtRiwZ4vXfHSlLcs+VqEl4V7fy/WlvCAqihdPq47alBpghz2pWEjD /R8Y7Own8x8bTPrDWSHKHuezPwVl8ME1ftO4jeYKFoqp2uhaGsnLZdxzpGpghkaERcxp oq0sv0MC3UCESxVmph3/5x8955BWA4TRXCdFu2sniN6k5MHrdQbvTksXp9JTO6QJiHWA bWKQ== Received: by 10.58.32.234 with SMTP id m10mr4658629vei.60.1350338306884; Mon, 15 Oct 2012 14:58:26 -0700 (PDT) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.59.0.37 with HTTP; Mon, 15 Oct 2012 14:57:46 -0700 (PDT) In-Reply-To: <1516511249.2287339.1350334696127.JavaMail.root@erie.cs.uoguelph.ca> References: <1516511249.2287339.1350334696127.JavaMail.root@erie.cs.uoguelph.ca> From: Ivan Voras Date: Mon, 15 Oct 2012 23:57:46 +0200 X-Google-Sender-Auth: l_I3vdWWsVANF5pElddtbH_WqPE Message-ID: Subject: Re: NFS server bottlenecks To: Rick Macklem Content-Type: text/plain; charset=UTF-8 Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2012 21:58:28 -0000 On 15 October 2012 22:58, Rick Macklem wrote: > The problem is that UDP entries very seldom time out (unless the > NFS server isn't seeing hardly any load) and are mostly trimmed > because the size exceeds the highwater mark. > > With your code, it will clear out all of the entries in the first > hash buckets that aren't currently busy, until the total count > drops below the high water mark. (If you monitor a busy server > with "nfsstat -e -s", you'll see the cache never goes below the > high water mark, which is 500 by default.) This would delete > entries of fairly recent requests. You are right about that, if testing by Nikolay goes reasonably well, I'll work on that. > If you are going to replace the global LRU list with ones for > each hash bucket, then you'll have to compare the time stamps > on the least recently used entries of all the hash buckets and > then delete those. If you keep the timestamp of the least recent > one for that hash bucket in the hash bucket head, you could at least > use that to select which bucket to delete from next, but you'll still > need to: > - lock that hash bucket > - delete a few entries from that bucket's lru list > - unlock hash bucket > - repeat for various buckets until the count is beloew the high > water mark Ah, I think I get it: is the reliance on the high watermark as a criteria for cache expiry the reason the list is a LRU instead of an ordinary unordered list? > Or something like that. I think you'll find it a lot more work that > one LRU list and one mutex. Remember that mutex isn't held for long. It could be, but the current state of my code is just groundwork for the next things I have in plan: 1) Move the expiry code (the trim function) into a separate thread, run periodically (or as a callout, I'll need to talk with someone about which one is cheaper) 2) Replace the mutex with a rwlock. The only thing which is preventing me from doing this right away is the LRU list, since each read access modifies it (and requires a write lock). This is why I was asking you if we can do away with the LRU algorithm. > Btw, the code looks very nice. (If I was being a style(9) zealot, > I'd remind you that it likes "return (X);" and not "return X;". Thanks, I'll make it more style(9) compliant as I go along. From owner-freebsd-hackers@FreeBSD.ORG Tue Oct 16 00:45:16 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4FE9C6C6; Tue, 16 Oct 2012 00:45:16 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id DD0A08FC08; Tue, 16 Oct 2012 00:45:15 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAMisfFCDaFvO/2dsb2JhbABFhhK7C4IgAQEBAwEBAQEgKyALGw4KAgINGQIpAQkmBggHBAEcBIddBguoWJMQgSGKOBqFEYESA5M/gi2BFY8bgwmBRzQ X-IronPort-AV: E=Sophos;i="4.80,592,1344225600"; d="scan'208";a="186531105" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 15 Oct 2012 20:45:13 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id E3BCEB410E; Mon, 15 Oct 2012 20:45:13 -0400 (EDT) Date: Mon, 15 Oct 2012 20:45:13 -0400 (EDT) From: Rick Macklem To: Ivan Voras Message-ID: <230083937.2296102.1350348313903.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: NFS server bottlenecks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Oct 2012 00:45:16 -0000 Ivan Voras wrote: > On 15 October 2012 22:58, Rick Macklem wrote: > > > The problem is that UDP entries very seldom time out (unless the > > NFS server isn't seeing hardly any load) and are mostly trimmed > > because the size exceeds the highwater mark. > > > > With your code, it will clear out all of the entries in the first > > hash buckets that aren't currently busy, until the total count > > drops below the high water mark. (If you monitor a busy server > > with "nfsstat -e -s", you'll see the cache never goes below the > > high water mark, which is 500 by default.) This would delete > > entries of fairly recent requests. > > You are right about that, if testing by Nikolay goes reasonably well, > I'll work on that. > > > If you are going to replace the global LRU list with ones for > > each hash bucket, then you'll have to compare the time stamps > > on the least recently used entries of all the hash buckets and > > then delete those. If you keep the timestamp of the least recent > > one for that hash bucket in the hash bucket head, you could at least > > use that to select which bucket to delete from next, but you'll > > still > > need to: > > - lock that hash bucket > > - delete a few entries from that bucket's lru list > > - unlock hash bucket > > - repeat for various buckets until the count is beloew the high > > water mark > > Ah, I think I get it: is the reliance on the high watermark as a > criteria for cache expiry the reason the list is a LRU instead of an > ordinary unordered list? > Yes, I think you've gt it;-) Have fun with it, rick > > Or something like that. I think you'll find it a lot more work that > > one LRU list and one mutex. Remember that mutex isn't held for long. > > It could be, but the current state of my code is just groundwork for > the next things I have in plan: > > 1) Move the expiry code (the trim function) into a separate thread, > run periodically (or as a callout, I'll need to talk with someone > about which one is cheaper) > > 2) Replace the mutex with a rwlock. The only thing which is preventing > me from doing this right away is the LRU list, since each read access > modifies it (and requires a write lock). This is why I was asking you > if we can do away with the LRU algorithm. > > > Btw, the code looks very nice. (If I was being a style(9) zealot, > > I'd remind you that it likes "return (X);" and not "return X;". > > Thanks, I'll make it more style(9) compliant as I go along. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@FreeBSD.ORG Tue Oct 16 10:29:48 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id ACAC0A61; Tue, 16 Oct 2012 10:29:48 +0000 (UTC) (envelope-from wkoszek@freebsd.czest.pl) Received: from freebsd.czest.pl (freebsd.czest.pl [212.87.224.105]) by mx1.freebsd.org (Postfix) with ESMTP id 321578FC1B; Tue, 16 Oct 2012 10:29:47 +0000 (UTC) Received: from freebsd.czest.pl (freebsd.czest.pl [212.87.224.105]) by freebsd.czest.pl (8.14.5/8.14.5) with ESMTP id q9GAJvKR007057; Tue, 16 Oct 2012 10:19:57 GMT (envelope-from wkoszek@freebsd.czest.pl) Received: (from wkoszek@localhost) by freebsd.czest.pl (8.14.5/8.14.5/Submit) id q9GAJv5B007056; Tue, 16 Oct 2012 10:19:57 GMT (envelope-from wkoszek) Date: Tue, 16 Oct 2012 10:19:57 +0000 From: "Wojciech A. Koszek" To: freebsd-current@freebsd.org, freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org Subject: FreeBSD in Google Code-In 2012? You can help too! Message-ID: <20121016101957.GB53800@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (freebsd.czest.pl [212.87.224.105]); Tue, 16 Oct 2012 10:19:58 +0000 (UTC) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Oct 2012 10:29:48 -0000 (cross-posted message; please keep discussion on freebsd-hackers@) Hello, Last year FreeBSD qualified for Google Code-In 2011 event--contest for youngest open-source hackers in 13-17yr age range: http://www.google-melange.com/gci/homepage/google/gci2012 It was successful. We gained one more FreeBSD developer thanks to that (Isabell Long) We're pondering participating in the contest this year as well. For now we only have 25 ideas. We need at least 100. I felt all members of the FreeBSD community should help, so please submit your own Google Code-In 2012 ideas here: http://www.emailmeform.com/builder/form/4aU93Obxo4NYdVAgb1 Examples of previously completed tasks: http://wiki.freebsd.org/GoogleCodeIn/2011Tasks Those of you who have Wiki access, please spent 2 more minutes and submit straight to Wiki: http://wiki.freebsd.org/GoogleCodeIn/2012Tasks I plan to send out next e-mail if there's any progress on this project. Help will be appreciated. Thanks, -- Wojciech A. Koszek wkoszek@FreeBSD.czest.pl http://FreeBSD.czest.pl/~wkoszek/ From owner-freebsd-hackers@FreeBSD.ORG Tue Oct 16 17:00:43 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1D9E966E; Tue, 16 Oct 2012 17:00:43 +0000 (UTC) (envelope-from wkoszek@freebsd.czest.pl) Received: from freebsd.czest.pl (freebsd.czest.pl [212.87.224.105]) by mx1.freebsd.org (Postfix) with ESMTP id 938848FC16; Tue, 16 Oct 2012 17:00:42 +0000 (UTC) Received: from freebsd.czest.pl (freebsd.czest.pl [212.87.224.105]) by freebsd.czest.pl (8.14.5/8.14.5) with ESMTP id q9GGopOU009154; Tue, 16 Oct 2012 16:50:51 GMT (envelope-from wkoszek@freebsd.czest.pl) Received: (from wkoszek@localhost) by freebsd.czest.pl (8.14.5/8.14.5/Submit) id q9GGopps009153; Tue, 16 Oct 2012 16:50:51 GMT (envelope-from wkoszek) Date: Tue, 16 Oct 2012 16:50:51 +0000 From: "Wojciech A. Koszek" To: freebsd-current@freebsd.org, freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: FreeBSD in Google Code-In 2012? You can help too! Message-ID: <20121016165051.GC53800@FreeBSD.org> References: <20121016101957.GB53800@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline In-Reply-To: <20121016101957.GB53800@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (freebsd.czest.pl [212.87.224.105]); Tue, 16 Oct 2012 16:50:52 +0000 (UTC) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Oct 2012 17:00:43 -0000 On Tue, Oct 16, 2012 at 10:19:57AM +0000, Wojciech A. Koszek wrote: > (cross-posted message; please keep discussion on freebsd-hackers@) > > Hello, > > Last year FreeBSD qualified for Google Code-In 2011 event--contest for > youngest open-source hackers in 13-17yr age range: > > http://www.google-melange.com/gci/homepage/google/gci2012 > > It was successful. We gained one more FreeBSD developer thanks to that > (Isabell Long) We're pondering participating in the contest this year as > well. > > For now we only have 25 ideas. We need at least 100. > > I felt all members of the FreeBSD community should help, so please submit > your own Google Code-In 2012 ideas here: > > http://www.emailmeform.com/builder/form/4aU93Obxo4NYdVAgb1 > > Examples of previously completed tasks: > > http://wiki.freebsd.org/GoogleCodeIn/2011Tasks > > Those of you who have Wiki access, please spent 2 more minutes and submit > straight to Wiki: > > http://wiki.freebsd.org/GoogleCodeIn/2012Tasks > > I plan to send out next e-mail if there's any progress on this project. > > Help will be appreciated. Hi, (cross-posted message; please keep discussion on freebsd-hackers@) I made a mistake -- the web form didn't have "Contributor's name", thus I don't know who of you guys contributed first 9 ideas; e-mail me which ideas are yours, so that your name can be mentioned on Wiki: http://wiki.freebsd.org/GoogleCodeIn/2012Tasks I made slight adjustments to the form to make some fields more precise: http://www.emailmeform.com/builder/form/4aU93Obxo4NYdVAgb1 Sorry and thanks, -- Wojciech A. Koszek wkoszek@FreeBSD.czest.pl http://FreeBSD.czest.pl/~wkoszek/ From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 17 11:21:02 2012 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F1BF092A; Wed, 17 Oct 2012 11:21:01 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 122A08FC0C; Wed, 17 Oct 2012 11:21:00 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA04290; Wed, 17 Oct 2012 14:20:58 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TORgI-000Pc7-Do; Wed, 17 Oct 2012 14:20:58 +0300 Message-ID: <507E9498.10905@FreeBSD.org> Date: Wed, 17 Oct 2012 14:20:56 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121013 Thunderbird/16.0.1 MIME-Version: 1.0 To: freebsd-hackers Subject: _mtx_lock_spin: obsolete historic handling of kdb_active and panicstr? X-Enigmail-Version: 1.4.5 Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit Cc: Bruce Evans X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2012 11:21:02 -0000 _mtx_lock_spin has the following check in its retry loop: if (i < 60000000 || kdb_active || panicstr != NULL) DELAY(1); else _mtx_lock_spin_failed(m); Which means that in the (!kdb_active && panicstr == NULL) case we will make at most 60000000 iterations and then call _mtx_lock_spin_failed (which proceeds to panic). When either kdb_active or panicstr is set, then we are going to loop forever. I've done some digging through the lengthy history and many evolutions of the code (unfortunately I haven't kept records during the research), and my conclusion is that the kdb_active and panicstr checks were added at the quite early era of FreeBSD SMP, where we didn't have a mechanism to stop/block other CPUs when kdb or panic were entered. We didn't even prevent parallel execution of panic. So the above code was a sort of defense where we hoped that "other" CPUs would eventually stumble upon some held spinlock and would be captured there. Maybe there was a specific set of spinlocks, which were supposed to help. Nowadays, we do try to stop other CPUs during panic and kdb activation and there are good chances that they are indeed stopped. In this circumstances, should the main CPU be so unlucky as to run into the held spinlock, the above check would do more harm than good - the main CPU would just spin there forever, because a lock owner is also spinning in the stop loop and so can't release the lock. Actually, this is only true for the kdb case. In the panic case we make a check earlier and simply ignore/skip/bust all the locks. That makes the panicstr check in the code in question both harmless and useless. So I'd like to propose to remove those checks altogether. Or perhaps to "reverse" them and immediately induce a (possibly secondary) panic if we ever get to that wait loop and kdb_active || panicstr != NULL. What do you think? -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 17 12:07:02 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DCC384CF; Wed, 17 Oct 2012 12:07:02 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id A69C48FC0C; Wed, 17 Oct 2012 12:07:02 +0000 (UTC) Received: by mail-pb0-f54.google.com with SMTP id rp8so7642436pbb.13 for ; Wed, 17 Oct 2012 05:07:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=r/OnoF1seo7XedSc2dDPTWZG6lgDn9TtYN3JbG9Sg9w=; b=VEaD5QxkQcBwLvBPgOCSm54ayNtAE1CFeGEQJcROoRExJdrGnsJlpzD6uxhpGKCZk2 i4RdX/e6yChyyIouSrJn3/7ltzq5LkxelM7Q0mNEXwVAbQtROwj0/bMMVTfDD8+gi6By eZ5O8vSlZ4RbHivYRKciQ5H2OpzkxxzOBNeuBvawdk3reOWwOigERVobiCkC0pP3g6Y/ a8CQNfgBKuLhhfKM09FD2NH07qcY8Xl9R57VLx1AqNfotWsAD/tvPMEgYBtSq8EuprF/ C+xOHNrdx2su6E54UUxwKBRIxzvTXR5RQANc2wifuh4EVwNT13nKff877aEIkqzIa6az GAYA== MIME-Version: 1.0 Received: by 10.68.189.65 with SMTP id gg1mr55471329pbc.106.1350475621804; Wed, 17 Oct 2012 05:07:01 -0700 (PDT) Sender: mdf356@gmail.com Received: by 10.68.223.105 with HTTP; Wed, 17 Oct 2012 05:07:01 -0700 (PDT) In-Reply-To: <507E9498.10905@FreeBSD.org> References: <507E9498.10905@FreeBSD.org> Date: Wed, 17 Oct 2012 05:07:01 -0700 X-Google-Sender-Auth: OcNcUhB4GUhqKdPTqH-GQKIBKKg Message-ID: Subject: Re: _mtx_lock_spin: obsolete historic handling of kdb_active and panicstr? From: mdf@FreeBSD.org To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-hackers , Bruce Evans X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2012 12:07:03 -0000 On Wed, Oct 17, 2012 at 4:20 AM, Andriy Gapon wrote: > > _mtx_lock_spin has the following check in its retry loop: > if (i < 60000000 || kdb_active || panicstr != NULL) > DELAY(1); > else > _mtx_lock_spin_failed(m); > [snip analysis] > > So I'd like to propose to remove those checks altogether. Or perhaps to > "reverse" them and immediately induce a (possibly secondary) panic if we ever > get to that wait loop and kdb_active || panicstr != NULL. The panicstr can clearly be removed. I think there can be race conditions with entering kdb and taking a spinlock, because the spinlock acquire will block interrupts. I don't remember if we always NMI for kdb enter or if that's configurable. The old code was clearer (or maybe I'm just remembering an Isilon hack); looking at stop_cpus_hard() I don't see that it uses an NMI. So a CPU can block interrupts, then if it sees kdb_active it will spin until we leave kdb, rather than panic. Of course this would only be relevant if the CPU it's trying to acquire is already held; otherwise it should find the lock unowned and this isn't relevant. And if the lock is owned by the thread entering kdb, that would be a real panic, not a recoverable kdb entry. So I think maybe the kdb_active check is also not helpful after all. Cheers, matthew From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 17 14:27:59 2012 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 46515B04; Wed, 17 Oct 2012 14:27:59 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 5AC578FC08; Wed, 17 Oct 2012 14:27:57 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA06640; Wed, 17 Oct 2012 17:25:58 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <507EBFF6.5080904@FreeBSD.org> Date: Wed, 17 Oct 2012 17:25:58 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:16.0) Gecko/20121012 Thunderbird/16.0.1 MIME-Version: 1.0 To: mdf@FreeBSD.org Subject: Re: _mtx_lock_spin: obsolete historic handling of kdb_active and panicstr? References: <507E9498.10905@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.4.5 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-hackers , Bruce Evans X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2012 14:27:59 -0000 on 17/10/2012 15:07 mdf@FreeBSD.org said the following: > On Wed, Oct 17, 2012 at 4:20 AM, Andriy Gapon wrote: >> >> _mtx_lock_spin has the following check in its retry loop: >> if (i < 60000000 || kdb_active || panicstr != NULL) >> DELAY(1); >> else >> _mtx_lock_spin_failed(m); >> > [snip analysis] >> >> So I'd like to propose to remove those checks altogether. Or perhaps to >> "reverse" them and immediately induce a (possibly secondary) panic if we ever >> get to that wait loop and kdb_active || panicstr != NULL. > > The panicstr can clearly be removed. I think there can be race > conditions with entering kdb and taking a spinlock, because the > spinlock acquire will block interrupts. I don't remember if we always > NMI for kdb enter or if that's configurable. The old code was clearer > (or maybe I'm just remembering an Isilon hack); looking at > stop_cpus_hard() I don't see that it uses an NMI. kdb always uses stop_cpus_hard and stop_cpus_hard always uses NMI on x86. >From sys/x86/x86/local_apic.c: if (vector == IPI_STOP_HARD) icrlo |= APIC_DELMODE_NMI | APIC_LEVEL_ASSERT; > So a CPU can block > interrupts, then if it sees kdb_active it will spin until we leave > kdb, rather than panic. Of course this would only be relevant if the > CPU it's trying to acquire is already held; otherwise it should find > the lock unowned and this isn't relevant. And if the lock is owned by > the thread entering kdb, that would be a real panic, not a recoverable > kdb entry. > > So I think maybe the kdb_active check is also not helpful after all. > > Cheers, > matthew > -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 17 18:46:11 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8ED34BD; Wed, 17 Oct 2012 18:46:11 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 60E9F8FC12; Wed, 17 Oct 2012 18:46:11 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id BB801B94F; Wed, 17 Oct 2012 14:46:10 -0400 (EDT) From: John Baldwin To: Andriy Gapon Subject: Re: _mtx_lock_spin: obsolete historic handling of kdb_active and panicstr? Date: Wed, 17 Oct 2012 10:12:05 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; ) References: <507E9498.10905@FreeBSD.org> In-Reply-To: <507E9498.10905@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201210171012.05392.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 17 Oct 2012 14:46:10 -0400 (EDT) Cc: freebsd-hackers , Bruce Evans X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2012 18:46:11 -0000 On Wednesday, October 17, 2012 7:20:56 am Andriy Gapon wrote: > > _mtx_lock_spin has the following check in its retry loop: > if (i < 60000000 || kdb_active || panicstr != NULL) > DELAY(1); > else > _mtx_lock_spin_failed(m); > > Which means that in the (!kdb_active && panicstr == NULL) case we will make at > most 60000000 iterations and then call _mtx_lock_spin_failed (which proceeds to > panic). When either kdb_active or panicstr is set, then we are going to loop > forever. > > I've done some digging through the lengthy history and many evolutions of the > code (unfortunately I haven't kept records during the research), and my > conclusion is that the kdb_active and panicstr checks were added at the quite > early era of FreeBSD SMP, where we didn't have a mechanism to stop/block other > CPUs when kdb or panic were entered. We didn't even prevent parallel execution > of panic. > So the above code was a sort of defense where we hoped that "other" CPUs would > eventually stumble upon some held spinlock and would be captured there. Maybe > there was a specific set of spinlocks, which were supposed to help. It wasn't so much as a way of hoping CPUs would stop so much as a way to prevent other CPUs from panic'ing while another CPU had already panic'd or was already in DDB making debugging harder. > Nowadays, we do try to stop other CPUs during panic and kdb activation and there > are good chances that they are indeed stopped. In this circumstances, should > the main CPU be so unlucky as to run into the held spinlock, the above check > would do more harm than good - the main CPU would just spin there forever, > because a lock owner is also spinning in the stop loop and so can't release the > lock. > Actually, this is only true for the kdb case. In the panic case we make a check > earlier and simply ignore/skip/bust all the locks. That makes the panicstr > check in the code in question both harmless and useless. > > So I'd like to propose to remove those checks altogether. Or perhaps to > "reverse" them and immediately induce a (possibly secondary) panic if we ever > get to that wait loop and kdb_active || panicstr != NULL. > > What do you think? I think this sounds fine. I do think though that there are two behaviors. If for some reason you are not able to stop the other CPUs, you would rather them spin than trigger another panic while you are in DDB or writing out a crashdump. However, the CPU that is currently in the debugger or writing out a crashdump should probably bust all locks (code executed in debugger backends should generally avoid all locking at all, and depend on things like try locks where it gracefully fails if it must use locking. That would make the kdb_active case here irrelevant, and the panic case is already handled as you noted.) -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 18 00:09:29 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D11F13EE for ; Thu, 18 Oct 2012 00:09:29 +0000 (UTC) (envelope-from tris_vern@hotmail.com) Received: from snt0-omc2-s14.snt0.hotmail.com (snt0-omc2-s14.snt0.hotmail.com [65.55.90.89]) by mx1.freebsd.org (Postfix) with ESMTP id 9EFCA8FC08 for ; Thu, 18 Oct 2012 00:09:29 +0000 (UTC) Received: from SNT124-W20 ([65.55.90.73]) by snt0-omc2-s14.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 17 Oct 2012 17:08:23 -0700 Message-ID: X-Originating-IP: [165.228.7.150] From: Tristan Verniquet To: Subject: syncing large mmaped files Date: Thu, 18 Oct 2012 10:08:22 +1000 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 18 Oct 2012 00:08:23.0629 (UTC) FILETIME=[B0606FD0:01CDACC4] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Oct 2012 00:09:29 -0000 I want to work with large (1-10G) files in memory but eventually sync them = back out to disk. The problem is that the sync process appears to lock the = file in kernel for the duration of the sync=2C which can run into minutes. = This prevents other processes from reading from the file (unless they alrea= dy have it mapped) for this whole time. Is there any way to prevent this? I= think I read in a post somewhere about openbsd implementing partial-writes= when it hits a file with lots of dirty pages in order to prevent this. Is = there anything available for FreeBSD or is there another way around it? Sorry if this is the wrong mailing list. = From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 18 07:55:55 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 50A01930 for ; Thu, 18 Oct 2012 07:55:55 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-wi0-f170.google.com (mail-wi0-f170.google.com [209.85.212.170]) by mx1.freebsd.org (Postfix) with ESMTP id CED778FC0A for ; Thu, 18 Oct 2012 07:55:54 +0000 (UTC) Received: by mail-wi0-f170.google.com with SMTP id hm2so1503703wib.1 for ; Thu, 18 Oct 2012 00:55:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=nvTwhd1fvO7rapkIRWslItyX6YLp02h2Recf23Z4aHg=; b=u7vsaNSI9XVPWpzgGuAj5tU7Ve5nAOkOQ6v5Wp2TqwEv6t5bKLWouKWvyWFpY6+dmr LIev+PsUqc+EvDam7fVG2xhWZHcORPkMq0wB/+ASGVFdkYuoCTd63ddRgfzBJ4YtBHlU GKHxzKbk2g9ng6HxscabaBImRGGJrqq6voXQmSNaDasFJnyRh6xikGt0PGS8znOvCzK5 3xYpoR4dBid5pj1BM10P8u3s8mFfJB2oNnAXTHnYstJ78aYvkjbnzCRtDYu3bG6dnWfL rBRVhCGcUulT/6TN8JbyGu+TbZmlIg1t+6smnW8ysXgPY9CAsHDOG3QFm6DGXTkorHYD 97cQ== Received: by 10.216.217.194 with SMTP id i44mr12368145wep.60.1350546947720; Thu, 18 Oct 2012 00:55:47 -0700 (PDT) Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com. [217.18.249.148]) by mx.google.com with ESMTPS id j8sm28361581wiy.9.2012.10.18.00.55.45 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 18 Oct 2012 00:55:46 -0700 (PDT) Subject: Re: syncing large mmaped files Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Content-Type: text/plain; charset=us-ascii From: Nikolay Denev In-Reply-To: Date: Thu, 18 Oct 2012 10:55:46 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Tristan Verniquet X-Mailer: Apple Mail (2.1498) Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Oct 2012 07:55:55 -0000 On Oct 18, 2012, at 3:08 AM, Tristan Verniquet = wrote: >=20 > I want to work with large (1-10G) files in memory but eventually sync = them back out to disk. The problem is that the sync process appears to = lock the file in kernel for the duration of the sync, which can run into = minutes. This prevents other processes from reading from the file = (unless they already have it mapped) for this whole time. Is there any = way to prevent this? I think I read in a post somewhere about openbsd = implementing partial-writes when it hits a file with lots of dirty pages = in order to prevent this. Is there anything available for FreeBSD or is = there another way around it? >=20 > Sorry if this is the wrong mailing list. > =20 > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to = "freebsd-hackers-unsubscribe@freebsd.org" Isn't msync(2) what you are looking for?= From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 18 08:35:45 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D123940F for ; Thu, 18 Oct 2012 08:35:45 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 4AEC28FC08 for ; Thu, 18 Oct 2012 08:35:44 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q9I8Zn6c015975; Thu, 18 Oct 2012 11:35:49 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q9I8Zbbc004754; Thu, 18 Oct 2012 11:35:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q9I8Zbul004753; Thu, 18 Oct 2012 11:35:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 18 Oct 2012 11:35:37 +0300 From: Konstantin Belousov To: Tristan Verniquet Subject: Re: syncing large mmaped files Message-ID: <20121018083537.GQ35915@deviant.kiev.zoral.com.ua> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="XA7quakUSnawneuz" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Oct 2012 08:35:45 -0000 --XA7quakUSnawneuz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote: >=20 > I want to work with large (1-10G) files in memory but eventually sync > them back out to disk. The problem is that the sync process appears to > lock the file in kernel for the duration of the sync, which can run > into minutes. This prevents other processes from reading from the file > (unless they already have it mapped) for this whole time. Is there > any way to prevent this? I think I read in a post somewhere about > openbsd implementing partial-writes when it hits a file with lots of > dirty pages in order to prevent this. Is there anything available for > FreeBSD or is there another way around it? > No, currently the vnode lock is held exclusive for the whole duration of the msync(2) syscall or its analog from the syncer. Making a change to periodically drop the vnode lock in vm_object_page_clean() might be possible, but requires the benchmarking to make sure that we do not pessimize the common case. Also, this opens a possibility for the vnode reclamation meantime. Anyway, note that you cannot 'work with large files in memory', even if you have enough RAM and no pressure to hold all the file pages resident. The syncer will do a writeback periodically regardless of the application calling msync(2) or not, with the interval of approximately 30 seconds. --XA7quakUSnawneuz Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlB/v1kACgkQC3+MBN1Mb4joagCgj1oYiDMQjM9s9kK7HniP4JiL RVEAn1294Rq3lIUMnPdt2G2ue1z3Jppa =Z1TH -----END PGP SIGNATURE----- --XA7quakUSnawneuz-- From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 18 13:47:36 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5AFDD7B5 for ; Thu, 18 Oct 2012 13:47:36 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 2CA8E8FC17 for ; Thu, 18 Oct 2012 13:47:36 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7F97AB98E; Thu, 18 Oct 2012 09:47:35 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Subject: Re: syncing large mmaped files Date: Thu, 18 Oct 2012 09:39:34 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; ) References: <20121018083537.GQ35915@deviant.kiev.zoral.com.ua> In-Reply-To: <20121018083537.GQ35915@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201210180939.34861.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 18 Oct 2012 09:47:35 -0400 (EDT) Cc: Konstantin Belousov , Tristan Verniquet X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Oct 2012 13:47:36 -0000 On Thursday, October 18, 2012 4:35:37 am Konstantin Belousov wrote: > On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote: > > > > I want to work with large (1-10G) files in memory but eventually sync > > them back out to disk. The problem is that the sync process appears to > > lock the file in kernel for the duration of the sync, which can run > > into minutes. This prevents other processes from reading from the file > > (unless they already have it mapped) for this whole time. Is there > > any way to prevent this? I think I read in a post somewhere about > > openbsd implementing partial-writes when it hits a file with lots of > > dirty pages in order to prevent this. Is there anything available for > > FreeBSD or is there another way around it? > > > No, currently the vnode lock is held exclusive for the whole duration > of the msync(2) syscall or its analog from the syncer. > > Making a change to periodically drop the vnode lock in > vm_object_page_clean() might be possible, but requires the benchmarking > to make sure that we do not pessimize the common case. Also, this opens > a possibility for the vnode reclamation meantime. You can simulate this in userland by breaking up your msync() into multiple msync() calls where each call just syncs a portion of the file. > Anyway, note that you cannot 'work with large files in memory', even if > you have enough RAM and no pressure to hold all the file pages resident. > The syncer will do a writeback periodically regardless of the application > calling msync(2) or not, with the interval of approximately 30 seconds. You can mmap with MAP_NOSYNC to prevent the syncer from writing the file out every 30 seconds. -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 18 15:11:58 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B5C3272C; Thu, 18 Oct 2012 15:11:58 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-wi0-f170.google.com (mail-wi0-f170.google.com [209.85.212.170]) by mx1.freebsd.org (Postfix) with ESMTP id 166468FC1B; Thu, 18 Oct 2012 15:11:56 +0000 (UTC) Received: by mail-wi0-f170.google.com with SMTP id hm2so1945909wib.1 for ; Thu, 18 Oct 2012 08:11:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=9EAGJ3tBEIhFVcRVS8g2aCLO95xdlOnsmAkaMcXLB7o=; b=ACHnkuttJboOB8/p1FV9vKYYayRuAfHVQxJFcZ0pdrZwgYqjmFa07Nqx/YWOCyz/gM SNGAYxviLoWBobbThjqQo1VaW2I1oCR2XJk9jaLu4xrObYBmjagMpL+NcevSRbJBU8uR Ry2smtbX1Xg/O6C+ShAIvQyQ4b/ftXHSZEhpd0kOV5Txhj+hY5SoXgbj1W396FOYti4h TbBqWMuOkqlcZrNN9Fpt15lbRqvHHZxZytm38jS6T+vSskWHuG6b+oWaOCs+XTozgP+I pXAbJQk+maT76grejsFWUDeorEuXJKir3RSdWwNJTb7OOKW1kBiQRaDc04Pt81KAqtyE 9r2Q== Received: by 10.180.85.99 with SMTP id g3mr12044511wiz.5.1350573115711; Thu, 18 Oct 2012 08:11:55 -0700 (PDT) Received: from ndenevsa.sf.moneybookers.net (g1.moneybookers.com. [217.18.249.148]) by mx.google.com with ESMTPS id ay10sm34034836wib.2.2012.10.18.08.11.51 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 18 Oct 2012 08:11:52 -0700 (PDT) Subject: Re: NFS server bottlenecks Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Content-Type: text/plain; charset=us-ascii From: Nikolay Denev In-Reply-To: Date: Thu, 18 Oct 2012 18:11:51 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com> References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca> <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <0857D79A-6276-433F-9603-D52125CF190F@gmail.com> To: Ivan Voras X-Mailer: Apple Mail (2.1498) Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Oct 2012 15:11:58 -0000 On Oct 15, 2012, at 5:34 PM, Ivan Voras wrote: > On 15 October 2012 16:31, Nikolay Denev wrote: >>=20 >> On Oct 15, 2012, at 2:52 PM, Ivan Voras wrote: >=20 >>> http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch >>>=20 >>> It should apply to HEAD without Rick's patches. >>>=20 >>> It's a bit different approach than Rick's, breaking down locks even = more. >>=20 >> Applied and compiled OK, I will be able to test it tomorrow. >=20 > Ok, thanks! >=20 > The differences should be most visible in edge cases with a larger > number of nfsd processes (16+) and many CPU cores. I'm now rebooting with your patch, and hopefully will have some results = tomorrow. From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 18 15:17:11 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E9EA0A54; Thu, 18 Oct 2012 15:17:11 +0000 (UTC) (envelope-from tris_vern@hotmail.com) Received: from snt0-omc3-s36.snt0.hotmail.com (snt0-omc3-s36.snt0.hotmail.com [65.55.90.175]) by mx1.freebsd.org (Postfix) with ESMTP id B638B8FC08; Thu, 18 Oct 2012 15:17:11 +0000 (UTC) Received: from SNT124-W29 ([65.55.90.135]) by snt0-omc3-s36.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 18 Oct 2012 08:16:04 -0700 Message-ID: X-Originating-IP: [165.228.7.150] From: Tristan Verniquet To: , freebsd hackers Subject: RE: syncing large mmaped files Date: Fri, 19 Oct 2012 01:16:04 +1000 Importance: Normal In-Reply-To: <201210180939.34861.jhb@freebsd.org> References: , <20121018083537.GQ35915@deviant.kiev.zoral.com.ua>, <201210180939.34861.jhb@freebsd.org> MIME-Version: 1.0 X-OriginalArrivalTime: 18 Oct 2012 15:16:04.0711 (UTC) FILETIME=[7DB60F70:01CDAD43] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: kostikbel@gmail.com X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Oct 2012 15:17:12 -0000 > From: jhb@freebsd.org > To: freebsd-hackers@freebsd.org > Subject: Re: syncing large mmaped files > Date: Thu=2C 18 Oct 2012 09:39:34 -0400 > CC: kostikbel@gmail.com=3B tris_vern@hotmail.com >=20 > On Thursday=2C October 18=2C 2012 4:35:37 am Konstantin Belousov wrote: > > On Thu=2C Oct 18=2C 2012 at 10:08:22AM +1000=2C Tristan Verniquet wrote= : > > >=20 > > > I want to work with large (1-10G) files in memory but eventually sync > > > them back out to disk. The problem is that the sync process appears t= o > > > lock the file in kernel for the duration of the sync=2C which can run > > > into minutes. This prevents other processes from reading from the fil= e > > > (unless they already have it mapped) for this whole time. Is there > > > any way to prevent this? I think I read in a post somewhere about > > > openbsd implementing partial-writes when it hits a file with lots of > > > dirty pages in order to prevent this. Is there anything available for > > > FreeBSD or is there another way around it? > > > > > No=2C currently the vnode lock is held exclusive for the whole duration > > of the msync(2) syscall or its analog from the syncer. > >=20 > > Making a change to periodically drop the vnode lock in > > vm_object_page_clean() might be possible=2C but requires the benchmarki= ng > > to make sure that we do not pessimize the common case. Also=2C this ope= ns > > a possibility for the vnode reclamation meantime. >=20 > You can simulate this in userland by breaking up your msync() into multip= le > msync() calls where each call just syncs a portion of the file. Thanks=2C I was doing this and I thought I was getting much worse performan= ce from the msync over the fsync=2C however I am trying it again now and th= e difference doesn't seem as large as I first imagined. It is still taking = about 4x as long for the case where all the pages are dirty but catches up = when the file is more sparsely written. I guess that is probably acceptable= . When all pages are dirty=2C iostat shows that the fsync will write 128KB/Tr= ansaction=2C whereas msync always does 16 KB/Transaction and a lower MB/s. = It will continue to do this if I only dirty every 2nd=2C 3rd or 4th page. W= hen I only dirty every 5th page the fsync seems to kick into another mode a= nd starts doing 16KB/Transaction and the time starts becoming comparable to= msync. Is there anyway to get that fsync 128K/Transaction performance increase whe= n all pages are dirty with msync?=20 > > Anyway=2C note that you cannot 'work with large files in memory'=2C eve= n if > > you have enough RAM and no pressure to hold all the file pages resident= . > > The syncer will do a writeback periodically regardless of the applicati= on > > calling msync(2) or not=2C with the interval of approximately 30 second= s. >=20 > You can mmap with MAP_NOSYNC to prevent the syncer from writing the file = out > every 30 seconds. Yes=2C I was mapping MAP_NOSYNC. =20 > --=20 > John Baldwin > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe=2C send any mail to "freebsd-hackers-unsubscribe@freebsd.o= rg" = From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 18 16:32:23 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D28FAD1A for ; Thu, 18 Oct 2012 16:32:23 +0000 (UTC) (envelope-from vasanth.raonaik@gmail.com) Received: from mail-da0-f54.google.com (mail-da0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id A40DA8FC14 for ; Thu, 18 Oct 2012 16:32:23 +0000 (UTC) Received: by mail-da0-f54.google.com with SMTP id z9so3797989dad.13 for ; Thu, 18 Oct 2012 09:32:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=p6bawuOax3Ag2hFn+tB4fwWSlIqpVdscztw+fEJVsdA=; b=POTCzF2caHPqyDYhUqNZ4tW4xi44k0tE5I0EKeLSeNvuTH8FlFukMPjvHf40Ha+azR Kn6vb6FbBbXtszLtkQb9yf8pOBBW+D/xgGpwy7Iz9lN5M6S3srvBUtaSX+u0In7JxiwM l+Q20HxQ8AvwuP74iEh4bvfHdjOnnNcn5AN4cHarFnnh7f/2jRVV5Lh91GrG9MWxORxx 3Bq5ErUt9wZMX5G20TgQ5OVhgWkyt4cy4YbQj0Czfyi8XGCrmNUEcWD/Xsio7I2CWByS BBymzqagF+skvy+1ZcEV6z37PD/M+uUSIhB6nMOiZT9yXzEwf0eC4GcxHFlSOYs7OfhH tBKw== MIME-Version: 1.0 Received: by 10.66.85.233 with SMTP id k9mr60443452paz.73.1350577943165; Thu, 18 Oct 2012 09:32:23 -0700 (PDT) Received: by 10.66.217.138 with HTTP; Thu, 18 Oct 2012 09:32:23 -0700 (PDT) Date: Thu, 18 Oct 2012 12:32:23 -0400 Message-ID: Subject: dtrace failed to resolve struct thread From: vasanth rao naik sabavat To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Oct 2012 16:32:23 -0000 Hi, I have an issue with latest FreeBSD when enabling dtrace dtrace -s schedgraph.d ./hotkernel both return the following error. : "/usr/lib/dtrace/psinfo.d", line 88: failed to resolve type kernel`struct thread * for identifier curthread: Unknown type name 10.0-CURRENT FreeBSD 10.0-CURRENT #0: Wed Oct 17 12:04:00 PDT 2012 I see that there was a problem report on FreeBSD which got closed as fixed. What is the fix for this issue? http://www.freebsd.org/cgi/query-pr.cgi?pr=130998 -- Thanks, Vasanth From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 18 16:42:27 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 154702AF; Thu, 18 Oct 2012 16:42:27 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 877128FC14; Thu, 18 Oct 2012 16:42:26 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q9IGgUJM079973; Thu, 18 Oct 2012 19:42:31 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q9IGgIYL086939; Thu, 18 Oct 2012 19:42:18 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q9IGgI11086938; Thu, 18 Oct 2012 19:42:18 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 18 Oct 2012 19:42:18 +0300 From: Konstantin Belousov To: John Baldwin Subject: Re: syncing large mmaped files Message-ID: <20121018164218.GR35915@deviant.kiev.zoral.com.ua> References: <20121018083537.GQ35915@deviant.kiev.zoral.com.ua> <201210180939.34861.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DI3e56nQDAJ1LWZd" Content-Disposition: inline In-Reply-To: <201210180939.34861.jhb@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-hackers@freebsd.org, Tristan Verniquet X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Oct 2012 16:42:27 -0000 --DI3e56nQDAJ1LWZd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 18, 2012 at 09:39:34AM -0400, John Baldwin wrote: > On Thursday, October 18, 2012 4:35:37 am Konstantin Belousov wrote: > > On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote: > > >=20 > > > I want to work with large (1-10G) files in memory but eventually sync > > > them back out to disk. The problem is that the sync process appears to > > > lock the file in kernel for the duration of the sync, which can run > > > into minutes. This prevents other processes from reading from the file > > > (unless they already have it mapped) for this whole time. Is there > > > any way to prevent this? I think I read in a post somewhere about > > > openbsd implementing partial-writes when it hits a file with lots of > > > dirty pages in order to prevent this. Is there anything available for > > > FreeBSD or is there another way around it? > > > > > No, currently the vnode lock is held exclusive for the whole duration > > of the msync(2) syscall or its analog from the syncer. > >=20 > > Making a change to periodically drop the vnode lock in > > vm_object_page_clean() might be possible, but requires the benchmarking > > to make sure that we do not pessimize the common case. Also, this opens > > a possibility for the vnode reclamation meantime. >=20 > You can simulate this in userland by breaking up your msync() into multip= le > msync() calls where each call just syncs a portion of the file. Be aware that this is much-much slower than msyncing the whole file, even if file is very large. The reason is that pager initiates asynchronous _immediate_ clustered write for such situations. Async writes (AKA bdwrite()) are only specified for full range msyncing. >=20 > > Anyway, note that you cannot 'work with large files in memory', even if > > you have enough RAM and no pressure to hold all the file pages resident. > > The syncer will do a writeback periodically regardless of the applicati= on > > calling msync(2) or not, with the interval of approximately 30 seconds. >=20 > You can mmap with MAP_NOSYNC to prevent the syncer from writing the file = out > every 30 seconds. This also prevents msync(2) from syncing the region. The flag is fine for throw-away data, but not for the scenario that was described, I think. --DI3e56nQDAJ1LWZd Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlCAMWoACgkQC3+MBN1Mb4iV7ACfeO+DqO2Onc8uMS29tjTbykJF Ek0An1i+6oS2OaxLly9sI5pAGmKlXw8F =UG69 -----END PGP SIGNATURE----- --DI3e56nQDAJ1LWZd-- From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 18 17:49:03 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EBE46CAC for ; Thu, 18 Oct 2012 17:49:03 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-qc0-f182.google.com (mail-qc0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id 9DDB48FC16 for ; Thu, 18 Oct 2012 17:49:03 +0000 (UTC) Received: by mail-qc0-f182.google.com with SMTP id l39so8741935qcs.13 for ; Thu, 18 Oct 2012 10:49:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=6/uNn96bjJavH1dHQDLd7ndwQ6V1nZcAPMbQ9SXyqMQ=; b=WJ5Ij3FoNCBBTLj6+NaDCNcA9GUtb/YJk0km2/TEnMn0FLrOqy7QDF3UlpB20kU9IU aG1OQ0Ljt2e6pwmpV2BVdVvKHWIiGrYcIOnkRp5u+eu1LWLW5w2ZUVQb4noqRS3ddegj 4CXH7DETF45h7N7GRxve51toQcFuHIAxftgRe9N0v+7nG8UEzo5dfZy41hxhpqCH6mUP 04GLbH6gqdePPhf/LkXC3YZdHli0DHdDgCjxrcm8eSbjjricEyclDL34nBoQvamlxJA0 hTPYBY+VE8ZW9yC9omJIm1XxPqrX8DjVwzDQdo2z2DE/VVtC6FfF35UlgsXB+X+VfeUf TqTQ== MIME-Version: 1.0 Received: by 10.224.33.205 with SMTP id i13mr16598608qad.35.1350582543116; Thu, 18 Oct 2012 10:49:03 -0700 (PDT) Received: by 10.49.81.234 with HTTP; Thu, 18 Oct 2012 10:49:03 -0700 (PDT) In-Reply-To: References: Date: Thu, 18 Oct 2012 13:49:03 -0400 Message-ID: Subject: Re: dtrace failed to resolve struct thread From: Ryan Stone To: vasanth rao naik sabavat Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Oct 2012 17:49:04 -0000 On Thu, Oct 18, 2012 at 12:32 PM, vasanth rao naik sabavat wrote: > Hi, > > I have an issue with latest FreeBSD when enabling dtrace > > dtrace -s schedgraph.d > ./hotkernel > both return the following error. > > : "/usr/lib/dtrace/psinfo.d", line 88: failed to resolve type kernel`struct > thread * for identifier curthread: Unknown type name > > 10.0-CURRENT FreeBSD 10.0-CURRENT #0: Wed Oct 17 12:04:00 PDT 2012 > > I see that there was a problem report on FreeBSD which got closed as fixed. > What is the fix for this issue? > > http://www.freebsd.org/cgi/query-pr.cgi?pr=130998 Did you buildkernel with WITH_CTF=1? You can check this by running ctfdump on /boot/kernel/kernel and seeing if that produces output. It will print the following error if you forgot to build with WITH_CTF=1: /boot/kernel/kernel does not contain .SUNW_ctf data From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 18 19:43:28 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EC9F7CDE for ; Thu, 18 Oct 2012 19:43:28 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id BD6838FC08 for ; Thu, 18 Oct 2012 19:43:28 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 28EA3B986; Thu, 18 Oct 2012 15:43:28 -0400 (EDT) From: John Baldwin To: Konstantin Belousov Subject: Re: syncing large mmaped files Date: Thu, 18 Oct 2012 15:43:25 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p20; KDE/4.5.5; amd64; ; ) References: <201210180939.34861.jhb@freebsd.org> <20121018164218.GR35915@deviant.kiev.zoral.com.ua> In-Reply-To: <20121018164218.GR35915@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201210181543.25191.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 18 Oct 2012 15:43:28 -0400 (EDT) Cc: freebsd-hackers@freebsd.org, Tristan Verniquet X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Oct 2012 19:43:29 -0000 On Thursday, October 18, 2012 12:42:18 pm Konstantin Belousov wrote: > On Thu, Oct 18, 2012 at 09:39:34AM -0400, John Baldwin wrote: > > On Thursday, October 18, 2012 4:35:37 am Konstantin Belousov wrote: > > > On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote: > > > > > > > > I want to work with large (1-10G) files in memory but eventually sync > > > > them back out to disk. The problem is that the sync process appears to > > > > lock the file in kernel for the duration of the sync, which can run > > > > into minutes. This prevents other processes from reading from the file > > > > (unless they already have it mapped) for this whole time. Is there > > > > any way to prevent this? I think I read in a post somewhere about > > > > openbsd implementing partial-writes when it hits a file with lots of > > > > dirty pages in order to prevent this. Is there anything available for > > > > FreeBSD or is there another way around it? > > > > > > > No, currently the vnode lock is held exclusive for the whole duration > > > of the msync(2) syscall or its analog from the syncer. > > > > > > Making a change to periodically drop the vnode lock in > > > vm_object_page_clean() might be possible, but requires the benchmarking > > > to make sure that we do not pessimize the common case. Also, this opens > > > a possibility for the vnode reclamation meantime. > > > > You can simulate this in userland by breaking up your msync() into multiple > > msync() calls where each call just syncs a portion of the file. > Be aware that this is much-much slower than msyncing the whole file, even > if file is very large. The reason is that pager initiates asynchronous > _immediate_ clustered write for such situations. Async writes (AKA > bdwrite()) are only specified for full range msyncing. Ugh. It would seem to me that msync(MS_ASYNC) should be doing delayed writes. > > > Anyway, note that you cannot 'work with large files in memory', even if > > > you have enough RAM and no pressure to hold all the file pages resident. > > > The syncer will do a writeback periodically regardless of the application > > > calling msync(2) or not, with the interval of approximately 30 seconds. > > > > You can mmap with MAP_NOSYNC to prevent the syncer from writing the file out > > every 30 seconds. > > This also prevents msync(2) from syncing the region. The flag is fine > for throw-away data, but not for the scenario that was described, I > think. Oof. I could see that in certain situations you might want to control this behavior from an application (similar to how I now make use of fadvise() at work). Having a way to disable syncer but having msync(MS_ASYNC) do something useful would be good. -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 18 20:24:35 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C747F6F9 for ; Thu, 18 Oct 2012 20:24:35 +0000 (UTC) (envelope-from vasanth.raonaik@gmail.com) Received: from mail-da0-f54.google.com (mail-da0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id 94D7D8FC0C for ; Thu, 18 Oct 2012 20:24:35 +0000 (UTC) Received: by mail-da0-f54.google.com with SMTP id z9so3908098dad.13 for ; Thu, 18 Oct 2012 13:24:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=RjQK6agvOK0VR5Df8kVISrfrDm45wHDimFNnxqeoStQ=; b=CBiA8d3ykIvoeLFMRMebXW1wniZ7bSf2HQMk2Tq+Uklg8CqEdhV+z41XD5SLlDvxFi UZoh4smHMSHd8BvvWM6qDkhv7Vw+LA4u+0abpBEc+IC3qyc7npAWsy4IO40/KApfgouk ZT/d0EpL9jmfIhPU0Fpv4JmIuPacqq0Mzb6yXuYj/vufMUZhHvReesnTP1PaCMzWl802 8q0/zKkbf91EHfIMDyeBlOiSLd5N+I3MJOXOknHaddYDO33VCQpu63strqHiafJTFi4P jNuyKoOjl1Ew5AHWeWYjnvyQ1JZ074xlQzj25K0HbANs3VgxxFMuC6tOyr3X6nVDSK1s H0pQ== MIME-Version: 1.0 Received: by 10.68.232.163 with SMTP id tp3mr69957905pbc.44.1350591874853; Thu, 18 Oct 2012 13:24:34 -0700 (PDT) Received: by 10.66.217.138 with HTTP; Thu, 18 Oct 2012 13:24:34 -0700 (PDT) In-Reply-To: References: Date: Thu, 18 Oct 2012 16:24:34 -0400 Message-ID: Subject: Re: dtrace failed to resolve struct thread From: vasanth rao naik sabavat To: Ryan Stone Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Oct 2012 20:24:35 -0000 Thanks Ryan, I checked out latest source code and compiled the kernel which has the option you have mentioned. I can now run dtrace -s schedgraph.d without any issues. Thanks, Vasanth On Thu, Oct 18, 2012 at 1:49 PM, Ryan Stone wrote: > On Thu, Oct 18, 2012 at 12:32 PM, vasanth rao naik sabavat > wrote: > > Hi, > > > > I have an issue with latest FreeBSD when enabling dtrace > > > > dtrace -s schedgraph.d > > ./hotkernel > > both return the following error. > > > > : "/usr/lib/dtrace/psinfo.d", line 88: failed to resolve type > kernel`struct > > thread * for identifier curthread: Unknown type name > > > > 10.0-CURRENT FreeBSD 10.0-CURRENT #0: Wed Oct 17 12:04:00 PDT 2012 > > > > I see that there was a problem report on FreeBSD which got closed as > fixed. > > What is the fix for this issue? > > > > http://www.freebsd.org/cgi/query-pr.cgi?pr=130998 > > Did you buildkernel with WITH_CTF=1? You can check this by running > ctfdump on /boot/kernel/kernel and seeing if that produces output. It > will print the following error if you forgot to build with WITH_CTF=1: > > /boot/kernel/kernel does not contain .SUNW_ctf data > -- Thanks, Vasanth From owner-freebsd-hackers@FreeBSD.ORG Fri Oct 19 00:11:42 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C1AABB91; Fri, 19 Oct 2012 00:11:42 +0000 (UTC) (envelope-from tris_vern@hotmail.com) Received: from snt0-omc3-s48.snt0.hotmail.com (snt0-omc3-s48.snt0.hotmail.com [65.54.51.85]) by mx1.freebsd.org (Postfix) with ESMTP id 8A9518FC1B; Fri, 19 Oct 2012 00:11:42 +0000 (UTC) Received: from SNT124-W23 ([65.55.90.137]) by snt0-omc3-s48.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 18 Oct 2012 17:11:36 -0700 Message-ID: X-Originating-IP: [165.228.7.150] From: Tristan Verniquet To: , Subject: RE: syncing large mmaped files Date: Fri, 19 Oct 2012 10:11:35 +1000 Importance: Normal In-Reply-To: <201210181543.25191.jhb@freebsd.org> References: , <201210180939.34861.jhb@freebsd.org>, <20121018164218.GR35915@deviant.kiev.zoral.com.ua>, <201210181543.25191.jhb@freebsd.org> MIME-Version: 1.0 X-OriginalArrivalTime: 19 Oct 2012 00:11:36.0301 (UTC) FILETIME=[4DA199D0:01CDAD8E] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd hackers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Oct 2012 00:11:43 -0000 > From: jhb@freebsd.org > To: kostikbel@gmail.com > Subject: Re: syncing large mmaped files > Date: Thu=2C 18 Oct 2012 15:43:25 -0400 > CC: freebsd-hackers@freebsd.org=3B tris_vern@hotmail.com >=20 > On Thursday=2C October 18=2C 2012 12:42:18 pm Konstantin Belousov wrote: > > On Thu=2C Oct 18=2C 2012 at 09:39:34AM -0400=2C John Baldwin wrote: > > > On Thursday=2C October 18=2C 2012 4:35:37 am Konstantin Belousov wrot= e: > > > > On Thu=2C Oct 18=2C 2012 at 10:08:22AM +1000=2C Tristan Verniquet w= rote: > > > > >=20 > > > > > I want to work with large (1-10G) files in memory but eventually = sync > > > > > them back out to disk. The problem is that the sync process appea= rs to > > > > > lock the file in kernel for the duration of the sync=2C which can= run > > > > > into minutes. This prevents other processes from reading from the= file > > > > > (unless they already have it mapped) for this whole time. Is ther= e > > > > > any way to prevent this? I think I read in a post somewhere about > > > > > openbsd implementing partial-writes when it hits a file with lots= of > > > > > dirty pages in order to prevent this. Is there anything available= for > > > > > FreeBSD or is there another way around it? > > > > > > > > > No=2C currently the vnode lock is held exclusive for the whole dura= tion > > > > of the msync(2) syscall or its analog from the syncer. > > > >=20 > > > > Making a change to periodically drop the vnode lock in > > > > vm_object_page_clean() might be possible=2C but requires the benchm= arking > > > > to make sure that we do not pessimize the common case. Also=2C this= opens > > > > a possibility for the vnode reclamation meantime. > > >=20 > > > You can simulate this in userland by breaking up your msync() into mu= ltiple > > > msync() calls where each call just syncs a portion of the file. > > Be aware that this is much-much slower than msyncing the whole file=2C = even > > if file is very large. The reason is that pager initiates asynchronous > > _immediate_ clustered write for such situations. Async writes (AKA > > bdwrite()) are only specified for full range msyncing. >=20 > Ugh. It would seem to me that msync(MS_ASYNC) should be doing delayed > writes. Ahh=2C using MS_ASYNC seems to get me the behaviour I was looking for. It i= s just as fast as fsync for cases when all the pages are dirtied but it rel= eases the lock allowing other programs to open and read the file. So it see= ms to be doing what I would expect. > > > > Anyway=2C note that you cannot 'work with large files in memory'=2C= even if > > > > you have enough RAM and no pressure to hold all the file pages resi= dent. > > > > The syncer will do a writeback periodically regardless of the appli= cation > > > > calling msync(2) or not=2C with the interval of approximately 30 se= conds. > > >=20 > > > You can mmap with MAP_NOSYNC to prevent the syncer from writing the f= ile out > > > every 30 seconds. > >=20 > > This also prevents msync(2) from syncing the region. The flag is fine > > for throw-away data=2C but not for the scenario that was described=2C I > > think. >=20 > Oof. I could see that in certain situations you might want to control th= is > behavior from an application (similar to how I now make use of fadvise() = at > work). Having a way to disable syncer but having msync(MS_ASYNC) do > something useful would be good. >=20 When I map using MAP_NOSYNC I still seem to be able to msync(2) the regions= ? I see memory move from Wired/Active to Invalid and the disk is busy. The madvise man page has a MADV_AUTOSYNC section which says that pages that= are already dirtied can be guaranteed to be reverted using msync(2) or fsy= nc(2). This is FreeBSD 8.3. So even if there is something wrong with sync'= ing MAP_NOSYNC pages=2C I guess I could always madvise MADV_AUTOSYNC them f= irst. > --=20 > John Baldwin > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe=2C send any mail to "freebsd-hackers-unsubscribe@freebsd.o= rg" = From owner-freebsd-hackers@FreeBSD.ORG Fri Oct 19 11:45:49 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E8A641BA; Fri, 19 Oct 2012 11:45:49 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 61E5D8FC0C; Fri, 19 Oct 2012 11:45:48 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q9JBjuAl018276; Fri, 19 Oct 2012 14:45:56 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q9JBjiCD093779; Fri, 19 Oct 2012 14:45:44 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q9JBjiPp093778; Fri, 19 Oct 2012 14:45:44 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 19 Oct 2012 14:45:44 +0300 From: Konstantin Belousov To: John Baldwin Subject: Re: syncing large mmaped files Message-ID: <20121019114544.GX35915@deviant.kiev.zoral.com.ua> References: <201210180939.34861.jhb@freebsd.org> <20121018164218.GR35915@deviant.kiev.zoral.com.ua> <201210181543.25191.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="/1aF9qoWKhphZS4n" Content-Disposition: inline In-Reply-To: <201210181543.25191.jhb@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-hackers@freebsd.org, Tristan Verniquet X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Oct 2012 11:45:50 -0000 --/1aF9qoWKhphZS4n Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Oct 18, 2012 at 03:43:25PM -0400, John Baldwin wrote: > On Thursday, October 18, 2012 12:42:18 pm Konstantin Belousov wrote: > > On Thu, Oct 18, 2012 at 09:39:34AM -0400, John Baldwin wrote: > > > On Thursday, October 18, 2012 4:35:37 am Konstantin Belousov wrote: > > > > On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote: > > > > >=20 > > > > > I want to work with large (1-10G) files in memory but eventually = sync > > > > > them back out to disk. The problem is that the sync process appea= rs to > > > > > lock the file in kernel for the duration of the sync, which can r= un > > > > > into minutes. This prevents other processes from reading from the= file > > > > > (unless they already have it mapped) for this whole time. Is there > > > > > any way to prevent this? I think I read in a post somewhere about > > > > > openbsd implementing partial-writes when it hits a file with lots= of > > > > > dirty pages in order to prevent this. Is there anything available= for > > > > > FreeBSD or is there another way around it? > > > > > > > > > No, currently the vnode lock is held exclusive for the whole durati= on > > > > of the msync(2) syscall or its analog from the syncer. > > > >=20 > > > > Making a change to periodically drop the vnode lock in > > > > vm_object_page_clean() might be possible, but requires the benchmar= king > > > > to make sure that we do not pessimize the common case. Also, this o= pens > > > > a possibility for the vnode reclamation meantime. > > >=20 > > > You can simulate this in userland by breaking up your msync() into mu= ltiple > > > msync() calls where each call just syncs a portion of the file. > > Be aware that this is much-much slower than msyncing the whole file, ev= en > > if file is very large. The reason is that pager initiates asynchronous > > _immediate_ clustered write for such situations. Async writes (AKA > > bdwrite()) are only specified for full range msyncing. >=20 > Ugh. It would seem to me that msync(MS_ASYNC) should be doing delayed > writes. The vm_pager_putpages() is called with the VM_PAGER_CLUSTER_OK flag for MS_ASYNC, according to my reading of the code. This results in neither IO_SYNC nor IO_ASYNC flags passed to VOP_WRITE() from vnode_pager_generic_putpages(). Since the mapped regions are typically large enough to mmap the whole fs blocks, the code in ffs_vnops.c:ffs_write() ends up in the cluster_write= (). Usually, fully populated cluster is written asynchronously. >=20 > > > > Anyway, note that you cannot 'work with large files in memory', eve= n if > > > > you have enough RAM and no pressure to hold all the file pages resi= dent. > > > > The syncer will do a writeback periodically regardless of the appli= cation > > > > calling msync(2) or not, with the interval of approximately 30 seco= nds. > > >=20 > > > You can mmap with MAP_NOSYNC to prevent the syncer from writing the f= ile out > > > every 30 seconds. > >=20 > > This also prevents msync(2) from syncing the region. The flag is fine > > for throw-away data, but not for the scenario that was described, I > > think. >=20 > Oof. I could see that in certain situations you might want to control th= is > behavior from an application (similar to how I now make use of fadvise() = at > work). Having a way to disable syncer but having msync(MS_ASYNC) do > something useful would be good. I was wrong there, sorry. Only syncer and fsync(2) would ignore VPO_NOSYNC pages. --/1aF9qoWKhphZS4n Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEUEARECAAYFAlCBPWcACgkQC3+MBN1Mb4jXnACeJAiNxO9S+ZVcJnKBzcxgwDT0 MfAAl1QgedvFLssA2kWLONoF7QJgX4o= =cxYS -----END PGP SIGNATURE----- --/1aF9qoWKhphZS4n-- From owner-freebsd-hackers@FreeBSD.ORG Fri Oct 19 23:07:40 2012 Return-Path: Delivered-To: hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BD71B9CB for ; Fri, 19 Oct 2012 23:07:40 +0000 (UTC) (envelope-from ryao@gentoo.org) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) by mx1.freebsd.org (Postfix) with ESMTP id 9A0BD8FC14 for ; Fri, 19 Oct 2012 23:07:40 +0000 (UTC) Received: from [192.168.1.2] (pool-72-89-250-138.nycmny.fios.verizon.net [72.89.250.138]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: ryao) by smtp.gentoo.org (Postfix) with ESMTPSA id 68D6B33DAC7 for ; Fri, 19 Oct 2012 23:07:34 +0000 (UTC) Message-ID: <5081DCA1.80906@gentoo.org> Date: Fri, 19 Oct 2012 19:05:05 -0400 From: Richard Yao User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.7) Gecko/20120917 Thunderbird/10.0.7 MIME-Version: 1.0 To: "hackers@FreeBSD.org" Subject: Loader-kernel interaction X-Enigmail-Version: 1.3.5 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig42DD07A214635D3CBFCBB299" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Oct 2012 23:07:40 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig42DD07A214635D3CBFCBB299 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Dear Everyone, I know that the kernel is a BTX client, but I do not understand the protocol used by loader to pass sysctl settings and loadable modules to the kernel. Is there documentation on this? Yours truly, Richard Yao --------------enig42DD07A214635D3CBFCBB299 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJQgdy2AAoJECDuEZm+6Exk/P4P/0YC3Kq+HK3EeMf8fhmM+V7N gUh4DUbnVGpjBhyRq713/s809XIAw0dNxlKZIkZKl7aMLp/mGSOTgMqjUf7iW6d7 ovkp/xLgn+ycEQxizBBXa5HMVTsaC2aO5XcrJdoiiAhan9J3irleNA3lX7IET6XC NMsrxfrsXkdS+2EGv11S4ifw3RTGe1435uhFd5+3zoy3Zlvw7Zh74RNiZQr/HMR3 +OjMLQ65aLzhtvaFh7in8z2ra+fzC2AdHAkB/ApbQfIDXZG5xzHDZYMf26Cl8+AI OTNj13y5Vp/qqvhZfofqQ6Z2fT0fdQt4JmcJQn7u3SO2eYFU3dD/q0WOtqDh0A4T D+QwmxXENw5vTrirAHhvHs6s8sBLazfvpRFuB/+05TmgJGL2gDDSdm18l8Cmm3Qw wMKciIbaeT8TS8utVDpE8b4oDxYzi47qfxHpmzRb1YT3KhRBUHL95oLbWlp+rELZ /FIKu8niWRNNqhB5itNno1NMepZHKs8krKFPePLHuJA8tbWu9ROvfKKUXc34yE3W WC7Dz3ETp9I9zymbKS6/xI5pmmd6fdyK2EktsvWjXQbxbK5uOwASv8suSM5GdjS/ aboxIp/vN+Dl4/RBqaHq/KnVtLvb4PHF4kkOWfvWRt8rWIjOu4UF2usSogWtXYSY srJGKN4t8h/TV8u1ufD9 =dQSi -----END PGP SIGNATURE----- --------------enig42DD07A214635D3CBFCBB299-- From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 20 11:43:07 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 521DCDE7; Sat, 20 Oct 2012 11:43:07 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by mx1.freebsd.org (Postfix) with ESMTP id A71B68FC12; Sat, 20 Oct 2012 11:43:06 +0000 (UTC) Received: by mail-wi0-f172.google.com with SMTP id hq12so886566wib.13 for ; Sat, 20 Oct 2012 04:43:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=RXy+Wea31o1pgJWOtbZ2ETzHeP87Yg9MjjYy7y21xJE=; b=mgDjQfuTLMxX8KrXeKbpnsRtEM+RWmhKcfL2UtQOSAFZkfFF5b1ruhBPoiKLeOcCze NNkpmAkr7p3pA0UsHA4gKVz6k3/etUmRPYmokzxpWGE8S7EYYUhmOK1GSefsAKz7GR3C TlOpW/n8Bs0fzSTvuvfT7jggG+6TEBzQikOVDLpXgLtzc/vQR1aLocipBpF3QCsFto0c T5Bzw3TLqbLhZ/pV5iy6BuuV2f3tUW9Qez9ev8haTS0OECwa+Mdzx/7pPV8PJ48sgZTQ kOqLh+5Ytd+ucZM3smeil4SWXI+XWX40DtGthXb73eQS02ZyU8Dnb0D8FTodaXUXuRUz SgwA== Received: by 10.216.195.144 with SMTP id p16mr2186313wen.174.1350733379998; Sat, 20 Oct 2012 04:42:59 -0700 (PDT) Received: from [10.0.0.86] ([93.152.184.10]) by mx.google.com with ESMTPS id v3sm9085964wiy.5.2012.10.20.04.42.58 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 20 Oct 2012 04:42:58 -0700 (PDT) Subject: Re: NFS server bottlenecks Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Content-Type: text/plain; charset=us-ascii From: Nikolay Denev In-Reply-To: <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com> Date: Sat, 20 Oct 2012 14:42:56 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <23D7CB3A-BD66-427E-A7F5-6C9D3890EE1B@gmail.com> References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca> <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <0857D79A-6276-433F-9603-D52125CF190F@gmail.com> <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com> To: "freebsd-hackers@freebsd.org Hackers" X-Mailer: Apple Mail (2.1498) Cc: Rick Macklem , Ivan Voras X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 11:43:07 -0000 On Oct 18, 2012, at 6:11 PM, Nikolay Denev wrote: >=20 > On Oct 15, 2012, at 5:34 PM, Ivan Voras wrote: >=20 >> On 15 October 2012 16:31, Nikolay Denev wrote: >>>=20 >>> On Oct 15, 2012, at 2:52 PM, Ivan Voras wrote: >>=20 >>>> http://people.freebsd.org/~ivoras/diffs/nfscache_lock.patch >>>>=20 >>>> It should apply to HEAD without Rick's patches. >>>>=20 >>>> It's a bit different approach than Rick's, breaking down locks even = more. >>>=20 >>> Applied and compiled OK, I will be able to test it tomorrow. >>=20 >> Ok, thanks! >>=20 >> The differences should be most visible in edge cases with a larger >> number of nfsd processes (16+) and many CPU cores. >=20 > I'm now rebooting with your patch, and hopefully will have some = results tomorrow. >=20 Here are the results from testing both patches : = http://home.totalterror.net/freebsd/nfstest/results.html Both tests ran for about 14 hours ( a bit too much, but I wanted to = compare different zfs recordsize settings ), and were done first after a fresh reboot. The only noticeable difference seems to be much more context switches = with Ivan's patch. From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 20 12:11:57 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 906176FF for ; Sat, 20 Oct 2012 12:11:57 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-qa0-f54.google.com (mail-qa0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 3BB248FC12 for ; Sat, 20 Oct 2012 12:11:57 +0000 (UTC) Received: by mail-qa0-f54.google.com with SMTP id p27so648224qat.13 for ; Sat, 20 Oct 2012 05:11:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=xImakIC7JV2zly/Y0OKoJJDqcQE9Q+/z4zufdJBv3uw=; b=GESJwgsCwPgkl8atHyIDbENcpbKKqfayG2jHSiuYK95Wfy7+uyuejr3G6W8dfL1OkZ Vz7tiIF/fMcHqtZiLjXqAQgesFctaWEhSI9DhaNGz+Oubj7cZqB+AT4pXh/uyIt8vHvJ Kx+VdiHFQ40N6qnsuZgk17bQIxgA/HP52b6cI+tewv8o4MeI/ZORpLmV8u8vZxk7mS6n 79y/kEfxMRhK6jipij1IHkRzJlDc2HQw5XSy2NYJ8XVRlDVJC4K0J1q3qIqatL4ir7Cr 2D/EfOEhlZFnE6ykipFQ7fs1QhyYNNRFNTfxeAZOOzNFMe0L/Nf4rSshzb4RFhcQCoLs u7Dw== Received: by 10.224.178.4 with SMTP id bk4mr1928123qab.38.1350735110677; Sat, 20 Oct 2012 05:11:50 -0700 (PDT) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.49.82.231 with HTTP; Sat, 20 Oct 2012 05:11:10 -0700 (PDT) In-Reply-To: <23D7CB3A-BD66-427E-A7F5-6C9D3890EE1B@gmail.com> References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca> <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <0857D79A-6276-433F-9603-D52125CF190F@gmail.com> <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com> <23D7CB3A-BD66-427E-A7F5-6C9D3890EE1B@gmail.com> From: Ivan Voras Date: Sat, 20 Oct 2012 14:11:10 +0200 X-Google-Sender-Auth: PsZqbdk8UhFAMP1_lmPtiJzhIQk Message-ID: Subject: Re: NFS server bottlenecks To: Nikolay Denev Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-hackers@freebsd.org Hackers" , Rick Macklem X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 12:11:57 -0000 On 20 October 2012 13:42, Nikolay Denev wrote: > Here are the results from testing both patches : http://home.totalterror.net/freebsd/nfstest/results.html > Both tests ran for about 14 hours ( a bit too much, but I wanted to compare different zfs recordsize settings ), > and were done first after a fresh reboot. > The only noticeable difference seems to be much more context switches with Ivan's patch. Thank you very much for your extensive testing! I don't know how to interpret the rise in context switches; as this is kernel code, I'd expect no context switches. I hope someone else can explain. But, you have also shown that my patch doesn't do any better than Rick's even on a fairly large configuration, so I don't think there's value in adding the extra complexity, and Rick knows NFS much better than I do. But there are a few things other than that I'm interested in: like why does your load average spike almost to 20-ties, and how come that with 24 drives in RAID-10 you only push through 600 MBit/s through the 10 GBit/s Ethernet. Have you tested your drive setup locally (AESNI shouldn't be a bottleneck, you should be able to encrypt well into Gbyte/s range) and the network? If you have the time, could you repeat the tests but with a recent Samba server and a CIFS mount on the client side? This is probably not important, but I'm just curious of how would it perform on your machine. From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 20 12:46:42 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E8803C31; Sat, 20 Oct 2012 12:46:42 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 820768FC16; Sat, 20 Oct 2012 12:46:42 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEANebglCDaFvO/2dsb2JhbAA8CIYUvACCIAEBAQMBAQEBICsfAQsFFg4KAgINGQIjBgEJJgYIBwQBHASHUQMJBguoaohRDYlUgSCJUmgWBIVDgRIDk0FYgVWBF4oRhRCDC4FHNQ X-IronPort-AV: E=Sophos;i="4.80,621,1344225600"; d="scan'208";a="184523521" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 20 Oct 2012 08:45:32 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 6AFCAB405E; Sat, 20 Oct 2012 08:45:32 -0400 (EDT) Date: Sat, 20 Oct 2012 08:45:32 -0400 (EDT) From: Rick Macklem To: Ivan Voras Message-ID: <191784842.2570110.1350737132305.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: NFS server bottlenecks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: "freebsd-hackers@freebsd.org Hackers" , Nikolay Denev X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 12:46:43 -0000 Ivan Voras wrote: > On 20 October 2012 13:42, Nikolay Denev wrote: > > > Here are the results from testing both patches : > > http://home.totalterror.net/freebsd/nfstest/results.html > > Both tests ran for about 14 hours ( a bit too much, but I wanted to > > compare different zfs recordsize settings ), > > and were done first after a fresh reboot. > > The only noticeable difference seems to be much more context > > switches with Ivan's patch. > > Thank you very much for your extensive testing! > > I don't know how to interpret the rise in context switches; as this is > kernel code, I'd expect no context switches. I hope someone else can > explain. > Don't the mtx_lock() calls spin for a little while and then context switch if another thread still has it locked? > But, you have also shown that my patch doesn't do any better than > Rick's even on a fairly large configuration, so I don't think there's > value in adding the extra complexity, and Rick knows NFS much better > than I do. > Hmm, I didn't look, but were there any tests using UDP mounts? (I would have thought that your patch would mainly affect UDP mounts, since that is when my version still has the single LRU queue/mutex. As I think you know, my concern with your patch would be correctness for UDP, not performance.) Anyhow, sounds like you guys are having fun with it and learning some useful things. Keep up the good work, rick > But there are a few things other than that I'm interested in: like why > does your load average spike almost to 20-ties, and how come that with > 24 drives in RAID-10 you only push through 600 MBit/s through the 10 > GBit/s Ethernet. Have you tested your drive setup locally (AESNI > shouldn't be a bottleneck, you should be able to encrypt well into > Gbyte/s range) and the network? > > If you have the time, could you repeat the tests but with a recent > Samba server and a CIFS mount on the client side? This is probably not > important, but I'm just curious of how would it perform on your > machine. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 20 12:52:13 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 831A6DB6; Sat, 20 Oct 2012 12:52:13 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id D21198FC16; Sat, 20 Oct 2012 12:52:12 +0000 (UTC) Received: by mail-we0-f182.google.com with SMTP id x43so869989wey.13 for ; Sat, 20 Oct 2012 05:52:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=KX31H4vAEQx/m8YgRZMoSUYzLTD6QAl62NO0N3TWDjA=; b=YYVIrXg0YBzHT5gCRanTroqgspOLz/h+aXR+CUzwyRqVU4sa/gdUfLnVibOOCS+gLp Xy8SIdbiq10qfQgWvixOmsEBnnmaQlPFO7L6+/6AODlPFEk3MKXwTDqmPKztQlLe6TqM 4bGgvspLBIoa81WRgewcqn8rcFkxQp/73X6EaEBjZb4brzXhMCik7nAnhiaZQH7sw44C kyuvXGatHqNeDW6pi2FO2+VMk5TXcoOzzDT/KL1t2Nj6jmV6hxBl0Ugdi6Q2nsrNSVCk +sFDPUAo93DFcWTbyZpmMwL+50Ksa8wgk7Op5lJn6nkZN+uUXFFuxupQPnRwnb8mmbBR Q5bg== Received: by 10.180.87.74 with SMTP id v10mr9385146wiz.21.1350737526192; Sat, 20 Oct 2012 05:52:06 -0700 (PDT) Received: from [10.0.0.86] ([93.152.184.10]) by mx.google.com with ESMTPS id v3sm9416314wiy.5.2012.10.20.05.52.04 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 20 Oct 2012 05:52:05 -0700 (PDT) Subject: Re: NFS server bottlenecks Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Content-Type: text/plain; charset=windows-1252 From: Nikolay Denev In-Reply-To: Date: Sat, 20 Oct 2012 15:52:03 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <942B9B96-7F2B-4833-865F-33DDCCA3500A@gmail.com> References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca> <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <0857D79A-6276-433F-9603-D52125CF190F@gmail.com> <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com> <23D7CB3A-BD66-427E-A7F5-6C9D3890EE1B@gmail.com> To: Ivan Voras X-Mailer: Apple Mail (2.1498) Cc: "freebsd-hackers@freebsd.org Hackers" , Rick Macklem X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 12:52:13 -0000 On Oct 20, 2012, at 3:11 PM, Ivan Voras wrote: > On 20 October 2012 13:42, Nikolay Denev wrote: >=20 >> Here are the results from testing both patches : = http://home.totalterror.net/freebsd/nfstest/results.html >> Both tests ran for about 14 hours ( a bit too much, but I wanted to = compare different zfs recordsize settings ), >> and were done first after a fresh reboot. >> The only noticeable difference seems to be much more context switches = with Ivan's patch. >=20 > Thank you very much for your extensive testing! >=20 > I don't know how to interpret the rise in context switches; as this is > kernel code, I'd expect no context switches. I hope someone else can > explain. >=20 > But, you have also shown that my patch doesn't do any better than > Rick's even on a fairly large configuration, so I don't think there's > value in adding the extra complexity, and Rick knows NFS much better > than I do. >=20 > But there are a few things other than that I'm interested in: like why > does your load average spike almost to 20-ties, and how come that with > 24 drives in RAID-10 you only push through 600 MBit/s through the 10 > GBit/s Ethernet. Have you tested your drive setup locally (AESNI > shouldn't be a bottleneck, you should be able to encrypt well into > Gbyte/s range) and the network? >=20 > If you have the time, could you repeat the tests but with a recent > Samba server and a CIFS mount on the client side? This is probably not > important, but I'm just curious of how would it perform on your > machine. I've now started this test locally. But from previous different iozone runs, I remember locally the speed = was much better, but I will wait for this test to finish, as the comparison will be = better. But I think there is still something fishy=85 I have cases where I have = reached 1000MB/s over NFS (from network stats, not local machine stats), but sometimes it is very = slow even for=20 file completely in ARC. Rick mentioned that this could be due to RPC = overhead and network round trip time, but earlier in this thread I've done a test only on the server by mounting = the NFS exported ZFS dataset locally and did some tests with "dd": > To take the network out of the equation I redid the test by mounting = the same filesystem over NFS on the server: >=20 > [18:23]root@goliath:~# mount -t nfs -o = rw,hard,intr,tcp,nfsv3,rsize=3D1048576,wsize=3D1048576 = localhost:/tank/spa_db/undo /mnt > [18:24]root@goliath:~# dd if=3D/mnt/data.dbf of=3D/dev/null bs=3D1M=20 > 30720+1 records in > 30720+1 records out > 32212262912 bytes transferred in 79.793343 secs (403696120 bytes/sec) > [18:25]root@goliath:~# dd if=3D/mnt/data.dbf of=3D/dev/null bs=3D1M > 30720+1 records in > 30720+1 records out > 32212262912 bytes transferred in 12.033420 secs (2676900110 bytes/sec) >=20 > During the first run I saw several nfsd threads in top, along with dd = and again zero disk I/O. > There was increase in memory usage because of the double buffering = ARC->buffercahe. > The second run was with all of the nfsd threads totally idle, and read = directly from the buffercache. From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 20 13:00:11 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 377D7937; Sat, 20 Oct 2012 13:00:11 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42]) by mx1.freebsd.org (Postfix) with ESMTP id 8D14D8FC17; Sat, 20 Oct 2012 13:00:10 +0000 (UTC) Received: by mail-wg0-f42.google.com with SMTP id fm10so652080wgb.1 for ; Sat, 20 Oct 2012 06:00:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=iCf7zOwb41trqzi1oKnj/9ho1Rc051rOTPPSVQNGpsA=; b=SvfoV3DishYCLiATdfSC6iMxwLWV2YTRdQibwtVpKlkYIKSJGrM2P3sXPIZZogoO+n Rzab3c4axJAyeHH+EWVmB8aIPQCaJpNEEAuoNfZRnK/7gAzxHcTrXgnyr/1DjsgpkZxt iRSvorX6nINTW4kZRTC2jH0zE9sPAAqcBXEAuQUV4v387UHnNG01a6xIUfEe6Ev6IhTy VF/9yWjXuCLn1bJSbk3NNb+L15vLjRg1386rHXRiBxVlYHilRg84zqjDm9h57NPQilmJ zSzcXr3jzitgbULLmGZ+ROE0SC0B2eDCPho6NVWIxKQhcTdebLWX6aoY8H//Zh2qAJ6x nDEA== Received: by 10.216.203.1 with SMTP id e1mr2538238weo.103.1350738004180; Sat, 20 Oct 2012 06:00:04 -0700 (PDT) Received: from [10.0.0.86] ([93.152.184.10]) by mx.google.com with ESMTPS id ay10sm9461879wib.2.2012.10.20.06.00.02 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 20 Oct 2012 06:00:03 -0700 (PDT) Subject: Re: NFS server bottlenecks Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Content-Type: text/plain; charset=us-ascii From: Nikolay Denev In-Reply-To: Date: Sat, 20 Oct 2012 16:00:01 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca> <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <0857D79A-6276-433F-9603-D52125CF190F@gmail.com> <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com> <23D7CB3A-BD66-427E-A7F5-6C9D3890EE1B@gmail.com> To: Ivan Voras X-Mailer: Apple Mail (2.1498) Cc: "freebsd-hackers@freebsd.org Hackers" , Rick Macklem X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 13:00:11 -0000 On Oct 20, 2012, at 3:11 PM, Ivan Voras wrote: > On 20 October 2012 13:42, Nikolay Denev wrote: >=20 >> Here are the results from testing both patches : = http://home.totalterror.net/freebsd/nfstest/results.html >> Both tests ran for about 14 hours ( a bit too much, but I wanted to = compare different zfs recordsize settings ), >> and were done first after a fresh reboot. >> The only noticeable difference seems to be much more context switches = with Ivan's patch. >=20 > Thank you very much for your extensive testing! >=20 > I don't know how to interpret the rise in context switches; as this is > kernel code, I'd expect no context switches. I hope someone else can > explain. >=20 > But, you have also shown that my patch doesn't do any better than > Rick's even on a fairly large configuration, so I don't think there's > value in adding the extra complexity, and Rick knows NFS much better > than I do. >=20 > But there are a few things other than that I'm interested in: like why > does your load average spike almost to 20-ties, and how come that with > 24 drives in RAID-10 you only push through 600 MBit/s through the 10 > GBit/s Ethernet. Have you tested your drive setup locally (AESNI > shouldn't be a bottleneck, you should be able to encrypt well into > Gbyte/s range) and the network? >=20 > If you have the time, could you repeat the tests but with a recent > Samba server and a CIFS mount on the client side? This is probably not > important, but I'm just curious of how would it perform on your > machine. The first iozone local run finished, I'll paste just the result here, = and also the same test over NFS for comparison: (This is iozone doing 8k sized IO ops, on ZFS dataset with = recordsize=3D8k) NFS: random = random bkwd record stride =20 KB reclen write rewrite read reread read = write read rewrite read =20 33554432 8 4973 5522 2930 2906 2908 = 3886 =20 Local: random = random bkwd record stride =20 KB reclen write rewrite read reread read = write read rewrite read =20 33554432 8 34740 41390 135442 142534 24992 = 12493 =20 P.S.: I forgot to mention that the network is with 9K mtu.= From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 20 18:58:10 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6E1DECDC; Sat, 20 Oct 2012 18:58:10 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id C5CC28FC0A; Sat, 20 Oct 2012 18:58:09 +0000 (UTC) Received: by mail-wg0-f50.google.com with SMTP id 16so1196254wgi.31 for ; Sat, 20 Oct 2012 11:58:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=Hx/e2v4Q3S5kLxIH99nIJbATi8H1lEf/1RUNEZ222qQ=; b=NEjvDy2+6+IO0RzvPcpzx03w7NGsTuI9ph0+wtlA8tevitJ6aCDNy0fZRurIldtxHk y2TKZWFgAjxUjxk2LN7rTob0EB0+provIFCrtJkLijGjEm9gLnJqy0gwa3wVeoun3+g4 rDYWF4EPVXeGPbOgcEY9QO7CdtpPonp3hmFrCdaG+wXHU8djJ1XqUXxL6b5Uw4UOpdQZ C5J58f+Q0XY4sABI+8zqE1E7/mfK8ESvNzdjvk2jmw//7jmMhi4V2jpU95MZZU1DHwBt ZVh6AfOnKLOgYcVl2ycCaM2zJURWZvV7Q4tlcxe4qGyEY5IjKcLcHAMCbEOMQ2b56hCu 7uBw== Received: by 10.180.106.9 with SMTP id gq9mr10810119wib.12.1350759488552; Sat, 20 Oct 2012 11:58:08 -0700 (PDT) Received: from [10.0.0.86] ([93.152.184.10]) by mx.google.com with ESMTPS id w8sm41666214wif.4.2012.10.20.11.58.05 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 20 Oct 2012 11:58:07 -0700 (PDT) Subject: Re: NFS server bottlenecks Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Content-Type: text/plain; charset=us-ascii From: Nikolay Denev In-Reply-To: Date: Sat, 20 Oct 2012 21:58:03 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <937460294.2185822.1350093954059.JavaMail.root@erie.cs.uoguelph.ca> <302BF685-4B9D-49C8-8000-8D0F6540C8F7@gmail.com> <0857D79A-6276-433F-9603-D52125CF190F@gmail.com> <6DAAB1E6-4AC7-4B08-8CAD-0D8584D039DE@gmail.com> <23D7CB3A-BD66-427E-A7F5-6C9D3890EE1B@gmail.com> To: "freebsd-hackers@freebsd.org Hackers" X-Mailer: Apple Mail (2.1498) Cc: Rick Macklem , Ivan Voras X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 18:58:10 -0000 On Oct 20, 2012, at 4:00 PM, Nikolay Denev wrote: >=20 > On Oct 20, 2012, at 3:11 PM, Ivan Voras wrote: >=20 >> On 20 October 2012 13:42, Nikolay Denev wrote: >>=20 >>> Here are the results from testing both patches : = http://home.totalterror.net/freebsd/nfstest/results.html >>> Both tests ran for about 14 hours ( a bit too much, but I wanted to = compare different zfs recordsize settings ), >>> and were done first after a fresh reboot. >>> The only noticeable difference seems to be much more context = switches with Ivan's patch. >>=20 >> Thank you very much for your extensive testing! >>=20 >> I don't know how to interpret the rise in context switches; as this = is >> kernel code, I'd expect no context switches. I hope someone else can >> explain. >>=20 >> But, you have also shown that my patch doesn't do any better than >> Rick's even on a fairly large configuration, so I don't think there's >> value in adding the extra complexity, and Rick knows NFS much better >> than I do. >>=20 >> But there are a few things other than that I'm interested in: like = why >> does your load average spike almost to 20-ties, and how come that = with >> 24 drives in RAID-10 you only push through 600 MBit/s through the 10 >> GBit/s Ethernet. Have you tested your drive setup locally (AESNI >> shouldn't be a bottleneck, you should be able to encrypt well into >> Gbyte/s range) and the network? >>=20 >> If you have the time, could you repeat the tests but with a recent >> Samba server and a CIFS mount on the client side? This is probably = not >> important, but I'm just curious of how would it perform on your >> machine. >=20 > The first iozone local run finished, I'll paste just the result here, = and also the same test over NFS for comparison: > (This is iozone doing 8k sized IO ops, on ZFS dataset with = recordsize=3D8k) >=20 > NFS: > random = random bkwd record stride =20 > KB reclen write rewrite read reread read = write read rewrite read =20 > 33554432 8 4973 5522 2930 2906 2908 = 3886 =20 >=20 > Local: > random = random bkwd record stride =20 > KB reclen write rewrite read reread read = write read rewrite read =20 > 33554432 8 34740 41390 135442 142534 24992 = 12493 =20 >=20 >=20 > P.S.: I forgot to mention that the network is with 9K mtu. Here are the full results of the test on the local fs : http://home.totalterror.net/freebsd/nfstest/local_fs/ I'm now running the same test on NFS mount over the loopback interface = on the NFS server machine. From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 20 19:29:08 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 16EA27D3 for ; Sat, 20 Oct 2012 19:29:08 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id B6B238FC08 for ; Sat, 20 Oct 2012 19:29:07 +0000 (UTC) Received: by mail-vb0-f54.google.com with SMTP id v11so2112709vbm.13 for ; Sat, 20 Oct 2012 12:29:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=zKMs9CrwkpCz0yGkAB3OOu/AB2aht1AGy/PjsjjeaF8=; b=Y8kzFmDttr353vWofQKNUaa4zNA7vsVjHTKXZPS1VIg2JJUo4dbmmLe9wfXQbaf8Eu U+efeeeI1f287OK8Q9AC93Zgk8sywuBkvDuTEqMmU2qkfTmu1eT3WQbw0TBOou+5N1Cp cQkq0n2hUGLPGrT01+s0LRAoDZUz9iXfGyJBJ6aTh0ngsL0V+k5LnjC3C3+CHrcZLJSI nIvxEq2khvUBZl6NJibuEBOA3k6DguLhdMha9CM2VhcmqsLhmFIA1a+Pau//qcsm9JWX Uzxy8C2Lt2gQjoLI1xrieVbOJVifRsA79BbABPbGZkk/77oz2mia81u2Is6p2s+6UafD 5qxA== Received: by 10.220.208.141 with SMTP id gc13mr7125636vcb.55.1350761346877; Sat, 20 Oct 2012 12:29:06 -0700 (PDT) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.59.0.37 with HTTP; Sat, 20 Oct 2012 12:28:25 -0700 (PDT) In-Reply-To: <191784842.2570110.1350737132305.JavaMail.root@erie.cs.uoguelph.ca> References: <191784842.2570110.1350737132305.JavaMail.root@erie.cs.uoguelph.ca> From: Ivan Voras Date: Sat, 20 Oct 2012 21:28:25 +0200 X-Google-Sender-Auth: sdtoGD1C3xtDti_eUN65Oyy525w Message-ID: Subject: Re: NFS server bottlenecks To: Rick Macklem Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-hackers@freebsd.org Hackers" , Nikolay Denev X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 19:29:08 -0000 On 20 October 2012 14:45, Rick Macklem wrote: > Ivan Voras wrote: >> I don't know how to interpret the rise in context switches; as this is >> kernel code, I'd expect no context switches. I hope someone else can >> explain. >> > Don't the mtx_lock() calls spin for a little while and then context > switch if another thread still has it locked? Yes, but are in-kernel context switches also counted? I was assuming they are light-weight enough not to count. > Hmm, I didn't look, but were there any tests using UDP mounts? > (I would have thought that your patch would mainly affect UDP mounts, > since that is when my version still has the single LRU queue/mutex. Another assumption - I thought UDP was the default. > As I think you know, my concern with your patch would be correctness > for UDP, not performance.) Yes. From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 20 19:45:06 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7DFB6104; Sat, 20 Oct 2012 19:45:06 +0000 (UTC) (envelope-from outbackdingo@gmail.com) Received: from mail-ia0-f182.google.com (mail-ia0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 3203D8FC0A; Sat, 20 Oct 2012 19:45:05 +0000 (UTC) Received: by mail-ia0-f182.google.com with SMTP id k10so1515452iag.13 for ; Sat, 20 Oct 2012 12:45:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=8EPTniMXRD0pdaiNVYFGJU3u0zv3LD/ySgEjTeuarng=; b=O0PDbcluqJ1TkFSx3vixPNI+BIYBGc6/AX8KJ9e0pPp8K1mZx+QMgu1J+Yvk4d9wcL rPTRVKLVGLvzD+LmvS0k9WYpGrfMf5G1vINQPPaGODJobmdph450P6mWw1wHvLnHrBgv k6uQfagrvpFnkITXKuYF5l497raTryWdZTtNFXY0Db53zEVEWT1S9epU4FoWPHFhTw2c JycUhfFV10kd1TQWk0Av5QEeZ0k3JoYJn4+W/wvSQIPv9FTvg4QKR2VdHzxw9KbGWbYr 1HZVFzs1JI+JGDO6SEt5qzmgkzLESfhQf2Y4VBW+TN86cWlMMMWrdEtE1BsbKCzKUxtP /E7w== MIME-Version: 1.0 Received: by 10.50.40.225 with SMTP id a1mr5939442igl.7.1350762305625; Sat, 20 Oct 2012 12:45:05 -0700 (PDT) Received: by 10.64.72.135 with HTTP; Sat, 20 Oct 2012 12:45:05 -0700 (PDT) In-Reply-To: References: <191784842.2570110.1350737132305.JavaMail.root@erie.cs.uoguelph.ca> Date: Sat, 20 Oct 2012 15:45:05 -0400 Message-ID: Subject: Re: NFS server bottlenecks From: Outback Dingo To: Ivan Voras Content-Type: text/plain; charset=ISO-8859-1 Cc: "freebsd-hackers@freebsd.org Hackers" , Rick Macklem , Nikolay Denev X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 19:45:06 -0000 On Sat, Oct 20, 2012 at 3:28 PM, Ivan Voras wrote: > On 20 October 2012 14:45, Rick Macklem wrote: >> Ivan Voras wrote: > >>> I don't know how to interpret the rise in context switches; as this is >>> kernel code, I'd expect no context switches. I hope someone else can >>> explain. >>> >> Don't the mtx_lock() calls spin for a little while and then context >> switch if another thread still has it locked? > > Yes, but are in-kernel context switches also counted? I was assuming > they are light-weight enough not to count. > >> Hmm, I didn't look, but were there any tests using UDP mounts? >> (I would have thought that your patch would mainly affect UDP mounts, >> since that is when my version still has the single LRU queue/mutex. > > Another assumption - I thought UDP was the default. > >> As I think you know, my concern with your patch would be correctness >> for UDP, not performance.) > > Yes. Ive got a similar box config here, with 2x 10GB intel nics, and 24 2TB drives on an LSI controller. Im watching the thread patiently, im kinda looking for results, and answers, Though Im also tempted to run benchmarks on my system also see if i get similar results I also considered that netmap might be one but not quite sure if it would help NFS, since its to hard to tell if its a network bottle neck, though it appears to be network related. > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 20 19:53:36 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 94974295; Sat, 20 Oct 2012 19:53:36 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-wg0-f50.google.com (mail-wg0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id E1B638FC08; Sat, 20 Oct 2012 19:53:35 +0000 (UTC) Received: by mail-wg0-f50.google.com with SMTP id 16so1215971wgi.31 for ; Sat, 20 Oct 2012 12:53:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=8OJs8Z6barj7ClQvDt1qnWXKmAlqaQhZpsG86iyrlYY=; b=o0kEJq7puhWUgm0D7pR0Ip1ghaczPi22n4Z43E4NvvZBRBxDn8I1XUmh+iq3DrDhCt V/rl+4qicmG4rtUC+W4m31Lv7JDaDiJqZoddMD7oP0KDHfXBkjouHSKOHwuRHazWc9J7 xPDOn4rMBksRxQxeCkkwK1/HFUCGHx3E4AiDaARFInS2fPUf2KVPvnvJt9cidHYW54Wo UgNpBU9hMOdr/KXweVgyT2u92nh9aCxwr0yS5j8Z9CwlT986sqoEXuuHLAe4Hcv1loXT h/QBVut176+6+s59dkab6SEwB5CC5RRS3rPXQFymTmBHknrxgWt9qHD8Tf6k+7giZQSj JrZQ== Received: by 10.180.95.130 with SMTP id dk2mr10928240wib.18.1350762814815; Sat, 20 Oct 2012 12:53:34 -0700 (PDT) Received: from [10.0.0.86] ([93.152.184.10]) by mx.google.com with ESMTPS id eq2sm11472426wib.1.2012.10.20.12.53.33 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 20 Oct 2012 12:53:34 -0700 (PDT) Subject: Re: NFS server bottlenecks Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) Content-Type: text/plain; charset=iso-8859-1 From: Nikolay Denev In-Reply-To: Date: Sat, 20 Oct 2012 22:53:31 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <191784842.2570110.1350737132305.JavaMail.root@erie.cs.uoguelph.ca> To: Outback Dingo X-Mailer: Apple Mail (2.1498) Cc: "freebsd-hackers@freebsd.org Hackers" , Rick Macklem , Ivan Voras X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 19:53:36 -0000 On Oct 20, 2012, at 10:45 PM, Outback Dingo = wrote: > On Sat, Oct 20, 2012 at 3:28 PM, Ivan Voras = wrote: >> On 20 October 2012 14:45, Rick Macklem wrote: >>> Ivan Voras wrote: >>=20 >>>> I don't know how to interpret the rise in context switches; as this = is >>>> kernel code, I'd expect no context switches. I hope someone else = can >>>> explain. >>>>=20 >>> Don't the mtx_lock() calls spin for a little while and then context >>> switch if another thread still has it locked? >>=20 >> Yes, but are in-kernel context switches also counted? I was assuming >> they are light-weight enough not to count. >>=20 >>> Hmm, I didn't look, but were there any tests using UDP mounts? >>> (I would have thought that your patch would mainly affect UDP = mounts, >>> since that is when my version still has the single LRU queue/mutex. >>=20 >> Another assumption - I thought UDP was the default. >>=20 >>> As I think you know, my concern with your patch would be correctness >>> for UDP, not performance.) >>=20 >> Yes. >=20 > Ive got a similar box config here, with 2x 10GB intel nics, and 24 2TB > drives on an LSI controller. > Im watching the thread patiently, im kinda looking for results, and > answers, Though Im also tempted to > run benchmarks on my system also see if i get similar results I also > considered that netmap might be one > but not quite sure if it would help NFS, since its to hard to tell if > its a network bottle neck, though it appears > to be network related. >=20 >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to = "freebsd-hackers-unsubscribe@freebsd.org" Doesn't look like network issue to me. =46rom my observations it's more = like some overhead in nfs and arc. The boxes easily push 10G with simple iperf test. Running two iperf test over each port of the dual ported 10G nics gives = 960MB/sec regardles which machine is the server. Also, I've seen over 960Gb/sec over NFS with this setup, but I can't = understand what type of workload was able to do this. At some point I was able to do this with simple dd, then after a reboot = I was no longer to push this traffic. I'm thinking something like ARC/kmem fragmentation might be the issue? =20 From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 20 21:03:11 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D8D84E2D; Sat, 20 Oct 2012 21:03:11 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 79C038FC0A; Sat, 20 Oct 2012 21:03:10 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAPUQg1CDaFvO/2dsb2JhbAA7CIYUvAKCIAEBAQMBAQEBICsgCxsYAgINGQIpAQkmBggHBAEcBIddBguoYJITgSCKPxYEhUOBEgOTRIItgRePIoMLgUc1 X-IronPort-AV: E=Sophos;i="4.80,622,1344225600"; d="scan'208";a="184554476" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 20 Oct 2012 17:03:09 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id BBB63B402D; Sat, 20 Oct 2012 17:03:09 -0400 (EDT) Date: Sat, 20 Oct 2012 17:03:09 -0400 (EDT) From: Rick Macklem To: Outback Dingo Message-ID: <1800695432.2577499.1350766989710.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: Subject: Re: NFS server bottlenecks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: "freebsd-hackers@freebsd.org Hackers" , Ivan Voras , Nikolay Denev X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Oct 2012 21:03:12 -0000 Outback Dingo wrote: > On Sat, Oct 20, 2012 at 3:28 PM, Ivan Voras > wrote: > > On 20 October 2012 14:45, Rick Macklem wrote: > >> Ivan Voras wrote: > > > >>> I don't know how to interpret the rise in context switches; as > >>> this is > >>> kernel code, I'd expect no context switches. I hope someone else > >>> can > >>> explain. > >>> > >> Don't the mtx_lock() calls spin for a little while and then context > >> switch if another thread still has it locked? > > > > Yes, but are in-kernel context switches also counted? I was assuming > > they are light-weight enough not to count. > > > >> Hmm, I didn't look, but were there any tests using UDP mounts? > >> (I would have thought that your patch would mainly affect UDP > >> mounts, > >> since that is when my version still has the single LRU > >> queue/mutex. > > > > Another assumption - I thought UDP was the default. > > TCP has been the default for a FreeBSD client for a long time. It was changed for the old NFS client before I became a committer. (You can explicitly set one or the other as mount options or check via wireshark/tcpdump) > >> As I think you know, my concern with your patch would be > >> correctness > >> for UDP, not performance.) > > > > Yes. > > Ive got a similar box config here, with 2x 10GB intel nics, and 24 2TB > drives on an LSI controller. > Im watching the thread patiently, im kinda looking for results, and > answers, Though Im also tempted to > run benchmarks on my system also see if i get similar results I also > considered that netmap might be one > but not quite sure if it would help NFS, since its to hard to tell if > its a network bottle neck, though it appears > to be network related. > NFS network traffic looks very different that a TCP stream (ala bit torrent or ...). I've seen this cause issues before. You can look at a packet trace in wireshark and see if TCP is retransmitting segments. rick > > _______________________________________________ > > freebsd-hackers@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > To unsubscribe, send any mail to > > "freebsd-hackers-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org"