From owner-freebsd-current@FreeBSD.ORG Thu Aug 19 13:35:29 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E63571065697 for ; Thu, 19 Aug 2010 13:35:28 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id B27538FC18 for ; Thu, 19 Aug 2010 13:35:28 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 3C94646B45; Thu, 19 Aug 2010 09:35:28 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 58E9B8A050; Thu, 19 Aug 2010 09:35:27 -0400 (EDT) From: John Baldwin To: pluknet Date: Thu, 19 Aug 2010 09:34:27 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; ) References: <201008181607.52798.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201008190934.28365.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Thu, 19 Aug 2010 09:35:27 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Kostik Belousov , FreeBSD Current Subject: Re: LOR on nfs: vfs_vnops.c:301 kern_descrip.c:1580 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Aug 2010 13:35:29 -0000 On Thursday, August 19, 2010 5:29:25 am pluknet wrote: > On 19 August 2010 00:07, John Baldwin wrote: > > On Wednesday, August 18, 2010 3:17:56 pm pluknet wrote: > >> On 18 August 2010 23:11, pluknet wrote: > >> > On 18 August 2010 17:46, Kostik Belousov wrote: > >> >> On Wed, Aug 18, 2010 at 02:43:19PM +0400, pluknet wrote: > >> >>> On 18 August 2010 12:07, pluknet wrote: > >> >>> > On 17 August 2010 20:04, Kostik Belousov wrote: > >> >>> > > >> >>> >> > >> >>> >> Also please take a note of the John' suggestion to use the taskqueue. > >> >>> > > >> >>> > I decided to go this road. Thank you both. > >> >>> > Now I do nfs buildkernel survive and prepare some benchmark results. > >> >>> > > >> >>> > >> >>> So, I modified the patch to defer proc_create() with taskqueue(9). > >> >>> Below is `time make -j5 buildkernel WITHOUT_MODULES=yes` perf. > > evaluation. > >> >>> Done on 4-way CPU on clean /usr/obj with /usr/src & /usr/obj both > >> >>> nfs-mounted over 1Gbit LAN. > >> >>> > >> >>> clean old > >> >>> 1137.985u 239.411s 7:42.15 298.0% 6538+2133k 87+43388io 226pf+0w > >> >>> > >> >>> clean new > >> >>> 1134.755u 240.032s 7:41.25 298.0% 6553+2133k 87+43367io 224pf+0w > >> >>> > >> >>> Patch needs polishing, though it generally works. > >> >>> Not sure if shep_chan (or whatever name it will get) needs locking. > >> >> As I said yesterday, if several requests to create nfsiod coming one > >> >> after another, you would loose all but the last. > >> >> > >> >> You should put the requests into the list, probably protected by > >> >> nfs_iod_mtx. > >> >> > >> > > >> > How about this patch? Still several things to ask. > >> > 1) I used malloc instance w/ M_NOWAIT, since it's called with nfs_iod_mtx > > held. > >> > 2) Probably busy/done gymnastics is a wrong mess. Your help is > > appreciated. > >> > 3) if (1) is fine, is it right to use fail: logic (i.e. set > >> > NFSIOD_NOT_AVAILABLE) > >> > on memory shortage? Not tested. > >> > > >> > There are debug printf() left intentionally to see how 3 contexts run > > under load > >> > to each other. I attached these messages as well if that makes sense. > >> > > >> > >> Ah, yes. Sorry, forgot about that. > > > > Your task handler needs to run in a loop until the list of requests is empty. > > If multiple threads call taskqueue_enqueue() before your task is scheduled, > > they will be consolidated down to a single call of your task handler. > > Thanks for your suggestions. > > So I converted nfs_nfsiodnew_tq() into loop, and as such > I changed a cleanup SLIST_FOREACH() as well to free a bulk of > (potentially consolidated) completed requests in one pass. > kproc_create() is still out of the SLIST_FOREACH() to avoid calling it > with nfs_iod_mtx held. > > > > > - int error, i; > > - int newiod; > > + int i, newiod, error; > > > > You should sort these alphabetically if you are going to change this. I would > > probably just leave it as-is. > > Err.. that's resulted after a set of changes. Thanks for noting that. > > > > > Also, I'm not sure how this works as you don't actually wait for the task to > > complete. If the taskqueue_enqueue() doesn't preempt, then you may read > > ni_error as zero before the kproc has actually been created and return success > > even though an nfsiod hasn't been created. > > > > I put taskqueue_drain() right after taskqueue_enqueue() to be in sync with > task handler. That was part to think about, as I didn't find such a use pattern. > It seems though that tasks are launched now strictly one-by-one, without > visible parallelism (as far as debug printf()s don't overlap anymore). Ah, if it is safe to sleep then I have a couple of suggestions: - Use M_WAITOK to malloc() so you don't have to worry about the failure case (I see Rick already suggested this). - Use something like this in the code that schedules the task: mtx_unlock(&nfs_iod_mtx); nip = malloc(..., M_WAITOK); /* Initialize nip. */ mtx_lock(&nfs_iod_mtx); SLIST_INSERT_HEAD(...); /* Maybe use STAILQ_INSERT_TAIL() instead? */ taskqueue_enqueue(...); while (!nip->done) mtx_sleep(nip, &nfs_iod_mtx, ...); mtx_unlock(&nfs_iod_mtx); error = nip->ni_error; free(nip); /* Existing if (error), etc. code */ and something like this in the task handler: mtx_lock(&nfs_iod_mtx); while ((nip = STAILQ_FIRST(...)) != NULL) { STAILQ_REMOVE_HEAD(...); mtx_unlock(&nfs_iod_mtx); /* Create thread, setting ni_error, etc. */ mtx_lock(&nfsd_iod_mtx); wakeup(nip); } mtx_unlock(&nfs_iod_mtx); The sleep/wakeup pattern is far more common than using taskqueue_drain() for this sort of thing. Using STAILQ instead of SLIST would give you "fairer" FIFO processing btw. -- John Baldwin