Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Aug 2010 20:55:08 +0400
From:      pluknet <pluknet@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Kostik Belousov <kostikbel@gmail.com>, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: LOR on nfs: vfs_vnops.c:301 kern_descrip.c:1580
Message-ID:  <AANLkTin43JMZppMGgFDVhDKxbeE89ydkxwnFANfkrLc4@mail.gmail.com>
In-Reply-To: <201008190934.28365.jhb@freebsd.org>
References:  <AANLkTimJ=d06D2z24QyRQ98zEa1Pemk4=vkNGLNiX90N@mail.gmail.com> <201008181607.52798.jhb@freebsd.org> <AANLkTikpC9REEQKNUUHtLDxSLB1MmiqKdgbJ0Ee7O%2BGJ@mail.gmail.com> <201008190934.28365.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
On 19 August 2010 17:34, John Baldwin <jhb@freebsd.org> wrote:
> On Thursday, August 19, 2010 5:29:25 am pluknet wrote:
>> On 19 August 2010 00:07, John Baldwin <jhb@freebsd.org> wrote:
>> > On Wednesday, August 18, 2010 3:17:56 pm pluknet wrote:
>> >> On 18 August 2010 23:11, pluknet <pluknet@gmail.com> wrote:
>> >> > On 18 August 2010 17:46, Kostik Belousov <kostikbel@gmail.com> wrote:
>> >> >> On Wed, Aug 18, 2010 at 02:43:19PM +0400, pluknet wrote:
>> >> >>> On 18 August 2010 12:07, pluknet <pluknet@gmail.com> wrote:
>> >> >>> > On 17 August 2010 20:04, Kostik Belousov <kostikbel@gmail.com> wrote:
>> >> >>> >
>> >> >>> >>
>> >> >>> >> Also please take a note of the John' suggestion to use the taskqueue.
>> >> >>> >
>> >> >>> > I decided to go this road. Thank you both.
>> >> >>> > Now I do nfs buildkernel survive and prepare some benchmark results.
>> >> >>> >
>> >> >>>
>> >> >>> So, I modified the patch to defer proc_create() with taskqueue(9).
>> >> >>> Below is `time make -j5 buildkernel WITHOUT_MODULES=yes` perf.
>> > evaluation.
>> >> >>> Done on 4-way CPU on clean /usr/obj with /usr/src & /usr/obj both
>> >> >>> nfs-mounted over 1Gbit LAN.
>> >> >>>
>> >> >>> clean old
>> >> >>> 1137.985u 239.411s 7:42.15 298.0%       6538+2133k 87+43388io 226pf+0w
>> >> >>>
>> >> >>> clean new
>> >> >>> 1134.755u 240.032s 7:41.25 298.0%       6553+2133k 87+43367io 224pf+0w
>> >> >>>
>> >> >>> Patch needs polishing, though it generally works.
>> >> >>> Not sure if shep_chan (or whatever name it will get) needs locking.
>> >> >> As I said yesterday, if several requests to create nfsiod coming one
>> >> >> after another, you would loose all but the last.
>> >> >>
>> >> >> You should put the requests into the list, probably protected by
>> >> >> nfs_iod_mtx.
>> >> >>
>> >> >
>> >> > How about this patch? Still several things to ask.
>> >> > 1) I used malloc instance w/ M_NOWAIT, since it's called with nfs_iod_mtx
>> > held.
>> >> > 2) Probably busy/done gymnastics is a wrong mess. Your help is
>> > appreciated.
>> >> > 3) if (1) is fine, is it right to use fail: logic (i.e. set
>> >> > NFSIOD_NOT_AVAILABLE)
>> >> > on memory shortage? Not tested.
>> >> >
>> >> > There are debug printf() left intentionally to see how 3 contexts run
>> > under load
>> >> > to each other. I attached these messages as well if that makes sense.
>> >> >
>> >>
>> >> Ah, yes. Sorry, forgot about that.
>> >
>> > Your task handler needs to run in a loop until the list of requests is empty.
>> > If multiple threads call taskqueue_enqueue() before your task is scheduled,
>> > they will be consolidated down to a single call of your task handler.
>>
>> Thanks for your suggestions.
>>
>> So I converted nfs_nfsiodnew_tq() into loop, and as such
>> I changed a cleanup SLIST_FOREACH() as well to free a bulk of
>> (potentially consolidated) completed requests in one pass.
>> kproc_create() is still out of the SLIST_FOREACH() to avoid calling it
>> with nfs_iod_mtx held.
>>
>> >
>> > -       int error, i;
>> > -       int newiod;
>> > +       int i, newiod, error;
>> >
>> > You should sort these alphabetically if you are going to change this.  I would
>> > probably just leave it as-is.
>>
>> Err.. that's resulted after a set of changes. Thanks for noting that.
>>
>> >
>> > Also, I'm not sure how this works as you don't actually wait for the task to
>> > complete.  If the taskqueue_enqueue() doesn't preempt, then you may read
>> > ni_error as zero before the kproc has actually been created and return success
>> > even though an nfsiod hasn't been created.
>> >
>>
>> I put taskqueue_drain() right after taskqueue_enqueue() to be in sync with
>> task handler. That was part to think about, as I didn't find such a use pattern.
>> It seems though that tasks are launched now strictly one-by-one, without
>> visible parallelism (as far as debug printf()s don't overlap anymore).
>
> Ah, if it is safe to sleep then I have a couple of suggestions:
>

Cool, credits go to John :).
I just adopted all of your changes (attached).

> - Use M_WAITOK to malloc() so you don't have to worry about the failure case
>  (I see Rick already suggested this).

After all that reduces to the following replacement in nfs_nfsiodnew().
So, no regression should be there in theory.

 mtx_unlock(&nfs_iod_mtx);
- kproc_create(...)
+ malloc(...)
mtx_lock(&nfs_iod_mtx);

It survived after this simple test running for an hour, which
forces creation of many iods with r/w from 300 threads:

nfsserv:/home/svn/freebsd/obj_exp on /usr/obj (nfs)

./randomio /usr/obj/file 100 0.5 0 4096 60 100 &
./randomio /usr/obj/file 100 0.5 0 4096 60 100 &
./randomio /usr/obj/file 100 0.5 0 4096 60 100 &
while [ true ]; do sysctl vfs.nfs.iodmax=2; sysctl vfs.nfs.iodmax=60;
sleep 20; done

randomio is from ports/149838.


P.S.
it's funny to see in top
    0 root       10  44    0     0K   144K deadlk  2  23:16 28.86% kernel
Someone might think the kernel is in deadlock.

-- 
wbr,
pluknet

[-- Attachment #2 --]
Index: sys/nfsclient/nfs_subs.c
===================================================================
--- sys/nfsclient/nfs_subs.c	(revision 211279)
+++ sys/nfsclient/nfs_subs.c	(working copy)
@@ -59,6 +59,7 @@
 #include <sys/sysent.h>
 #include <sys/syscall.h>
 #include <sys/sysproto.h>
+#include <sys/taskqueue.h>
 
 #include <vm/vm.h>
 #include <vm/vm_object.h>
@@ -117,6 +118,7 @@
 
 struct nfs_bufq	nfs_bufq;
 static struct mtx nfs_xid_mtx;
+struct task	nfs_nfsiodnew_task;
 
 /*
  * and the reverse mapping from generic to Version 2 procedure numbers
@@ -354,6 +356,7 @@
 	 */
 	mtx_init(&nfs_iod_mtx, "NFS iod lock", NULL, MTX_DEF);
 	mtx_init(&nfs_xid_mtx, "NFS xid lock", NULL, MTX_DEF);
+	TASK_INIT(&nfs_nfsiodnew_task, 0, nfs_nfsiodnew_tq, NULL);
 
 	nfs_pbuf_freecnt = nswbuf / 2 + 1;
 
Index: sys/nfsclient/nfs_nfsiod.c
===================================================================
--- sys/nfsclient/nfs_nfsiod.c	(revision 211279)
+++ sys/nfsclient/nfs_nfsiod.c	(working copy)
@@ -59,6 +59,7 @@
 #include <sys/fcntl.h>
 #include <sys/lockf.h>
 #include <sys/mutex.h>
+#include <sys/taskqueue.h>
 
 #include <netinet/in.h>
 #include <netinet/tcp.h>
@@ -75,6 +76,16 @@
 
 static void	nfssvc_iod(void *);
 
+struct nfsiod_str {
+	STAILQ_ENTRY(nfsiod_str) ni_links;
+	int *ni_inst;
+	int ni_iod;
+	int ni_error;
+	int ni_done;
+};
+static STAILQ_HEAD(, nfsiod_str) nfsiodhead =
+    STAILQ_HEAD_INITIALIZER(nfsiodhead);
+
 static int nfs_asyncdaemon[NFS_MAXASYNCDAEMON];
 
 SYSCTL_DECL(_vfs_nfs);
@@ -159,11 +170,30 @@
     sizeof (nfs_iodmax), sysctl_iodmax, "IU",
     "Max number of nfsiod kthreads");
 
+void
+nfs_nfsiodnew_tq(__unused void *arg, int pending)
+{
+	struct nfsiod_str *nip;
+
+	mtx_lock(&nfs_iod_mtx);
+	while ((nip = STAILQ_FIRST(&nfsiodhead)) != NULL) {
+		STAILQ_REMOVE_HEAD(&nfsiodhead, ni_links);
+		mtx_unlock(&nfs_iod_mtx);
+		nip->ni_error = kproc_create(nfssvc_iod, nip->ni_inst, NULL,
+		    RFHIGHPID, 0, "nfsiod %d", nip->ni_iod);
+		nip->ni_done = 1;
+		mtx_lock(&nfs_iod_mtx);
+		wakeup(nip);
+	}
+	mtx_unlock(&nfs_iod_mtx);
+}
+
 int
 nfs_nfsiodnew(int set_iodwant)
 {
 	int error, i;
 	int newiod;
+	struct nfsiod_str *nip;
 
 	if (nfs_numasync >= nfs_iodmax)
 		return (-1);
@@ -179,9 +209,16 @@
 	if (set_iodwant > 0)
 		nfs_iodwant[i] = NFSIOD_CREATED_FOR_NFS_ASYNCIO;
 	mtx_unlock(&nfs_iod_mtx);
-	error = kproc_create(nfssvc_iod, nfs_asyncdaemon + i, NULL, RFHIGHPID,
-	    0, "nfsiod %d", newiod);
+	nip = malloc(sizeof(*nip), M_TEMP, M_WAITOK | M_ZERO);
+	nip->ni_inst = nfs_asyncdaemon + i;
+	nip->ni_iod = newiod;
 	mtx_lock(&nfs_iod_mtx);
+	STAILQ_INSERT_TAIL(&nfsiodhead, nip, ni_links);
+	taskqueue_enqueue(taskqueue_thread, &nfs_nfsiodnew_task);
+	while (!nip->ni_done)
+		mtx_sleep(nip, &nfs_iod_mtx, 0, "niwt", 0);
+	error = nip->ni_error;
+	free(nip, M_TEMP);
 	if (error) {
 		if (set_iodwant > 0)
 			nfs_iodwant[i] = NFSIOD_NOT_AVAILABLE;
Index: sys/nfsclient/nfs.h
===================================================================
--- sys/nfsclient/nfs.h	(revision 211279)
+++ sys/nfsclient/nfs.h	(working copy)
@@ -125,6 +125,7 @@
 
 extern struct nfsstats nfsstats;
 extern struct mtx nfs_iod_mtx;
+extern struct task nfs_nfsiodnew_task;
 
 extern int nfs_numasync;
 extern unsigned int nfs_iodmax;
@@ -253,6 +254,7 @@
 	    struct ucred *cred, struct thread *td);
 int	nfs_readdirrpc(struct vnode *, struct uio *, struct ucred *);
 int	nfs_nfsiodnew(int);
+void	nfs_nfsiodnew_tq(__unused void *, int);
 int	nfs_asyncio(struct nfsmount *, struct buf *, struct ucred *, struct thread *);
 int	nfs_doio(struct vnode *, struct buf *, struct ucred *, struct thread *);
 void	nfs_doio_directwrite (struct buf *);

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTin43JMZppMGgFDVhDKxbeE89ydkxwnFANfkrLc4>