Date: Thu, 19 Aug 2010 09:34:27 -0400 From: John Baldwin <jhb@freebsd.org> To: pluknet <pluknet@gmail.com> Cc: Kostik Belousov <kostikbel@gmail.com>, FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: LOR on nfs: vfs_vnops.c:301 kern_descrip.c:1580 Message-ID: <201008190934.28365.jhb@freebsd.org> In-Reply-To: <AANLkTikpC9REEQKNUUHtLDxSLB1MmiqKdgbJ0Ee7O%2BGJ@mail.gmail.com> References: <AANLkTimJ=d06D2z24QyRQ98zEa1Pemk4=vkNGLNiX90N@mail.gmail.com> <201008181607.52798.jhb@freebsd.org> <AANLkTikpC9REEQKNUUHtLDxSLB1MmiqKdgbJ0Ee7O%2BGJ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, August 19, 2010 5:29:25 am pluknet wrote: > On 19 August 2010 00:07, John Baldwin <jhb@freebsd.org> wrote: > > On Wednesday, August 18, 2010 3:17:56 pm pluknet wrote: > >> On 18 August 2010 23:11, pluknet <pluknet@gmail.com> wrote: > >> > On 18 August 2010 17:46, Kostik Belousov <kostikbel@gmail.com> wrote: > >> >> On Wed, Aug 18, 2010 at 02:43:19PM +0400, pluknet wrote: > >> >>> On 18 August 2010 12:07, pluknet <pluknet@gmail.com> wrote: > >> >>> > On 17 August 2010 20:04, Kostik Belousov <kostikbel@gmail.com> wrote: > >> >>> > > >> >>> >> > >> >>> >> Also please take a note of the John' suggestion to use the taskqueue. > >> >>> > > >> >>> > I decided to go this road. Thank you both. > >> >>> > Now I do nfs buildkernel survive and prepare some benchmark results. > >> >>> > > >> >>> > >> >>> So, I modified the patch to defer proc_create() with taskqueue(9). > >> >>> Below is `time make -j5 buildkernel WITHOUT_MODULES=yes` perf. > > evaluation. > >> >>> Done on 4-way CPU on clean /usr/obj with /usr/src & /usr/obj both > >> >>> nfs-mounted over 1Gbit LAN. > >> >>> > >> >>> clean old > >> >>> 1137.985u 239.411s 7:42.15 298.0% 6538+2133k 87+43388io 226pf+0w > >> >>> > >> >>> clean new > >> >>> 1134.755u 240.032s 7:41.25 298.0% 6553+2133k 87+43367io 224pf+0w > >> >>> > >> >>> Patch needs polishing, though it generally works. > >> >>> Not sure if shep_chan (or whatever name it will get) needs locking. > >> >> As I said yesterday, if several requests to create nfsiod coming one > >> >> after another, you would loose all but the last. > >> >> > >> >> You should put the requests into the list, probably protected by > >> >> nfs_iod_mtx. > >> >> > >> > > >> > How about this patch? Still several things to ask. > >> > 1) I used malloc instance w/ M_NOWAIT, since it's called with nfs_iod_mtx > > held. > >> > 2) Probably busy/done gymnastics is a wrong mess. Your help is > > appreciated. > >> > 3) if (1) is fine, is it right to use fail: logic (i.e. set > >> > NFSIOD_NOT_AVAILABLE) > >> > on memory shortage? Not tested. > >> > > >> > There are debug printf() left intentionally to see how 3 contexts run > > under load > >> > to each other. I attached these messages as well if that makes sense. > >> > > >> > >> Ah, yes. Sorry, forgot about that. > > > > Your task handler needs to run in a loop until the list of requests is empty. > > If multiple threads call taskqueue_enqueue() before your task is scheduled, > > they will be consolidated down to a single call of your task handler. > > Thanks for your suggestions. > > So I converted nfs_nfsiodnew_tq() into loop, and as such > I changed a cleanup SLIST_FOREACH() as well to free a bulk of > (potentially consolidated) completed requests in one pass. > kproc_create() is still out of the SLIST_FOREACH() to avoid calling it > with nfs_iod_mtx held. > > > > > - int error, i; > > - int newiod; > > + int i, newiod, error; > > > > You should sort these alphabetically if you are going to change this. I would > > probably just leave it as-is. > > Err.. that's resulted after a set of changes. Thanks for noting that. > > > > > Also, I'm not sure how this works as you don't actually wait for the task to > > complete. If the taskqueue_enqueue() doesn't preempt, then you may read > > ni_error as zero before the kproc has actually been created and return success > > even though an nfsiod hasn't been created. > > > > I put taskqueue_drain() right after taskqueue_enqueue() to be in sync with > task handler. That was part to think about, as I didn't find such a use pattern. > It seems though that tasks are launched now strictly one-by-one, without > visible parallelism (as far as debug printf()s don't overlap anymore). Ah, if it is safe to sleep then I have a couple of suggestions: - Use M_WAITOK to malloc() so you don't have to worry about the failure case (I see Rick already suggested this). - Use something like this in the code that schedules the task: mtx_unlock(&nfs_iod_mtx); nip = malloc(..., M_WAITOK); /* Initialize nip. */ mtx_lock(&nfs_iod_mtx); SLIST_INSERT_HEAD(...); /* Maybe use STAILQ_INSERT_TAIL() instead? */ taskqueue_enqueue(...); while (!nip->done) mtx_sleep(nip, &nfs_iod_mtx, ...); mtx_unlock(&nfs_iod_mtx); error = nip->ni_error; free(nip); /* Existing if (error), etc. code */ and something like this in the task handler: mtx_lock(&nfs_iod_mtx); while ((nip = STAILQ_FIRST(...)) != NULL) { STAILQ_REMOVE_HEAD(...); mtx_unlock(&nfs_iod_mtx); /* Create thread, setting ni_error, etc. */ mtx_lock(&nfsd_iod_mtx); wakeup(nip); } mtx_unlock(&nfs_iod_mtx); The sleep/wakeup pattern is far more common than using taskqueue_drain() for this sort of thing. Using STAILQ instead of SLIST would give you "fairer" FIFO processing btw. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201008190934.28365.jhb>