From owner-freebsd-current@FreeBSD.ORG Fri Aug 20 19:19:58 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0732106567A; Fri, 20 Aug 2010 19:19:58 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 11D488FC08; Fri, 20 Aug 2010 19:19:57 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o7KJJrpr079223 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 20 Aug 2010 22:19:53 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o7KJJrGc093395; Fri, 20 Aug 2010 22:19:53 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o7KJJrMD093394; Fri, 20 Aug 2010 22:19:53 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 20 Aug 2010 22:19:53 +0300 From: Kostik Belousov To: pluknet Message-ID: <20100820191953.GN2396@deviant.kiev.zoral.com.ua> References: <201008181607.52798.jhb@freebsd.org> <201008190934.28365.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="6kwAx1lhQCGpXeLP" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.2 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_50, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: FreeBSD Current Subject: Re: LOR on nfs: vfs_vnops.c:301 kern_descrip.c:1580 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Aug 2010 19:19:58 -0000 --6kwAx1lhQCGpXeLP Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Aug 20, 2010 at 08:55:08PM +0400, pluknet wrote: > On 19 August 2010 17:34, John Baldwin wrote: > > On Thursday, August 19, 2010 5:29:25 am pluknet wrote: > >> On 19 August 2010 00:07, John Baldwin wrote: > >> > On Wednesday, August 18, 2010 3:17:56 pm pluknet wrote: > >> >> On 18 August 2010 23:11, pluknet wrote: > >> >> > On 18 August 2010 17:46, Kostik Belousov wr= ote: > >> >> >> On Wed, Aug 18, 2010 at 02:43:19PM +0400, pluknet wrote: > >> >> >>> On 18 August 2010 12:07, pluknet wrote: > >> >> >>> > On 17 August 2010 20:04, Kostik Belousov wrote: > >> >> >>> > > >> >> >>> >> > >> >> >>> >> Also please take a note of the John' suggestion to use the t= askqueue. > >> >> >>> > > >> >> >>> > I decided to go this road. Thank you both. > >> >> >>> > Now I do nfs buildkernel survive and prepare some benchmark r= esults. > >> >> >>> > > >> >> >>> > >> >> >>> So, I modified the patch to defer proc_create() with taskqueue(= 9). > >> >> >>> Below is `time make -j5 buildkernel WITHOUT_MODULES=3Dyes` perf. > >> > evaluation. > >> >> >>> Done on 4-way CPU on clean /usr/obj with /usr/src & /usr/obj bo= th > >> >> >>> nfs-mounted over 1Gbit LAN. > >> >> >>> > >> >> >>> clean old > >> >> >>> 1137.985u 239.411s 7:42.15 298.0% =9A =9A =9A 6538+2133k 87+433= 88io 226pf+0w > >> >> >>> > >> >> >>> clean new > >> >> >>> 1134.755u 240.032s 7:41.25 298.0% =9A =9A =9A 6553+2133k 87+433= 67io 224pf+0w > >> >> >>> > >> >> >>> Patch needs polishing, though it generally works. > >> >> >>> Not sure if shep_chan (or whatever name it will get) needs lock= ing. > >> >> >> As I said yesterday, if several requests to create nfsiod coming= one > >> >> >> after another, you would loose all but the last. > >> >> >> > >> >> >> You should put the requests into the list, probably protected by > >> >> >> nfs_iod_mtx. > >> >> >> > >> >> > > >> >> > How about this patch? Still several things to ask. > >> >> > 1) I used malloc instance w/ M_NOWAIT, since it's called with nfs= _iod_mtx > >> > held. > >> >> > 2) Probably busy/done gymnastics is a wrong mess. Your help is > >> > appreciated. > >> >> > 3) if (1) is fine, is it right to use fail: logic (i.e. set > >> >> > NFSIOD_NOT_AVAILABLE) > >> >> > on memory shortage? Not tested. > >> >> > > >> >> > There are debug printf() left intentionally to see how 3 contexts= run > >> > under load > >> >> > to each other. I attached these messages as well if that makes se= nse. > >> >> > > >> >> > >> >> Ah, yes. Sorry, forgot about that. > >> > > >> > Your task handler needs to run in a loop until the list of requests = is empty. > >> > If multiple threads call taskqueue_enqueue() before your task is sch= eduled, > >> > they will be consolidated down to a single call of your task handler. > >> > >> Thanks for your suggestions. > >> > >> So I converted nfs_nfsiodnew_tq() into loop, and as such > >> I changed a cleanup SLIST_FOREACH() as well to free a bulk of > >> (potentially consolidated) completed requests in one pass. > >> kproc_create() is still out of the SLIST_FOREACH() to avoid calling it > >> with nfs_iod_mtx held. > >> > >> > > >> > - =9A =9A =9A int error, i; > >> > - =9A =9A =9A int newiod; > >> > + =9A =9A =9A int i, newiod, error; > >> > > >> > You should sort these alphabetically if you are going to change this= . =9AI would > >> > probably just leave it as-is. > >> > >> Err.. that's resulted after a set of changes. Thanks for noting that. > >> > >> > > >> > Also, I'm not sure how this works as you don't actually wait for the= task to > >> > complete. =9AIf the taskqueue_enqueue() doesn't preempt, then you ma= y read > >> > ni_error as zero before the kproc has actually been created and retu= rn success > >> > even though an nfsiod hasn't been created. > >> > > >> > >> I put taskqueue_drain() right after taskqueue_enqueue() to be in sync = with > >> task handler. That was part to think about, as I didn't find such a us= e pattern. > >> It seems though that tasks are launched now strictly one-by-one, witho= ut > >> visible parallelism (as far as debug printf()s don't overlap anymore). > > > > Ah, if it is safe to sleep then I have a couple of suggestions: > > >=20 > Cool, credits go to John :). > I just adopted all of your changes (attached). >=20 > > - Use M_WAITOK to malloc() so you don't have to worry about the failure= case > > =9A(I see Rick already suggested this). >=20 > After all that reduces to the following replacement in nfs_nfsiodnew(). > So, no regression should be there in theory. >=20 > mtx_unlock(&nfs_iod_mtx); > - kproc_create(...) > + malloc(...) > mtx_lock(&nfs_iod_mtx); >=20 > It survived after this simple test running for an hour, which > forces creation of many iods with r/w from 300 threads: >=20 > nfsserv:/home/svn/freebsd/obj_exp on /usr/obj (nfs) >=20 > ./randomio /usr/obj/file 100 0.5 0 4096 60 100 & > ./randomio /usr/obj/file 100 0.5 0 4096 60 100 & > ./randomio /usr/obj/file 100 0.5 0 4096 60 100 & > while [ true ]; do sysctl vfs.nfs.iodmax=3D2; sysctl vfs.nfs.iodmax=3D60; > sleep 20; done >=20 > randomio is from ports/149838. >=20 >=20 > P.S. > it's funny to see in top > 0 root 10 44 0 0K 144K deadlk 2 23:16 28.86% kernel > Someone might think the kernel is in deadlock. It seems nobody replied to the mdf@ objection against wait of the new proc startup being equivalent to the LOR. I think that the wait is safe, because the task is executed in the context of the different process then the enqueue request. This might be worth noting in the comment or commit message. You should drain the taskq and prevent creation of new nfsiods before waiting for existing nfsiods to exit on module unload. I find it somewhat weird that nip is removed from the list in the task handler, but freed in the enqueue thread. You could provide pointers to the local variables to exchange done/exit codes. I do not insist on this part. --6kwAx1lhQCGpXeLP Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkxu1VgACgkQC3+MBN1Mb4iKEACg5/0bvwUd2PFkn5tSeFR6BaMw JPoAnR9kQdlIrE8uONgUC7uLOVt3CE0U =39jH -----END PGP SIGNATURE----- --6kwAx1lhQCGpXeLP--