From owner-freebsd-current@FreeBSD.ORG Fri Aug 20 16:55:09 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B73810656A5; Fri, 20 Aug 2010 16:55:09 +0000 (UTC) (envelope-from pluknet@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id 2FB278FC1B; Fri, 20 Aug 2010 16:55:08 +0000 (UTC) Received: by qyk8 with SMTP id 8so698899qyk.13 for ; Fri, 20 Aug 2010 09:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=lRVzHtqevEcPUp+nkmk+nM8jfWj+LXRqLEsbICJYthk=; b=Y7TVd/4Zpf4nw2x5Q7xVwdy8n9DXgh0Ay1/8Xjhbk1ej5IcKo88QEkWKzQ6uQeCqgP JLb8E87PHBo2CeAYpT48gCpXaLuUS+lZY/WFcZpqjT8axbjir5T18OsbqIlgKTCiceQm kzWKd6ilwWOfZqvX+2Y39PoHTQkhFGhuajhCA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=bxS6merDtfAsSxsbULSV+WgQjlAWK3sVL44UAp+TyfotwGzZyazRh7Ck+nDGkDF9Gb jVWKGQMP6geetBkWtN9CUnnD0xHzKYPpE4IeE32eo6hBosHoSUR3RdLYptW6bDfs74Vl aCnw5pEr5RIMtJz5Cgp5nUkWRPzwe21NgLwWw= MIME-Version: 1.0 Received: by 10.224.36.209 with SMTP id u17mr1108901qad.328.1282323308209; Fri, 20 Aug 2010 09:55:08 -0700 (PDT) Received: by 10.229.26.81 with HTTP; Fri, 20 Aug 2010 09:55:08 -0700 (PDT) In-Reply-To: <201008190934.28365.jhb@freebsd.org> References: <201008181607.52798.jhb@freebsd.org> <201008190934.28365.jhb@freebsd.org> Date: Fri, 20 Aug 2010 20:55:08 +0400 Message-ID: From: pluknet To: John Baldwin Content-Type: multipart/mixed; boundary=0015175caf7c45eade048e44290f Cc: Kostik Belousov , FreeBSD Current Subject: Re: LOR on nfs: vfs_vnops.c:301 kern_descrip.c:1580 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Aug 2010 16:55:09 -0000 --0015175caf7c45eade048e44290f Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 19 August 2010 17:34, John Baldwin wrote: > On Thursday, August 19, 2010 5:29:25 am pluknet wrote: >> On 19 August 2010 00:07, John Baldwin wrote: >> > On Wednesday, August 18, 2010 3:17:56 pm pluknet wrote: >> >> On 18 August 2010 23:11, pluknet wrote: >> >> > On 18 August 2010 17:46, Kostik Belousov wrot= e: >> >> >> On Wed, Aug 18, 2010 at 02:43:19PM +0400, pluknet wrote: >> >> >>> On 18 August 2010 12:07, pluknet wrote: >> >> >>> > On 17 August 2010 20:04, Kostik Belousov = wrote: >> >> >>> > >> >> >>> >> >> >> >>> >> Also please take a note of the John' suggestion to use the tas= kqueue. >> >> >>> > >> >> >>> > I decided to go this road. Thank you both. >> >> >>> > Now I do nfs buildkernel survive and prepare some benchmark res= ults. >> >> >>> > >> >> >>> >> >> >>> So, I modified the patch to defer proc_create() with taskqueue(9)= . >> >> >>> Below is `time make -j5 buildkernel WITHOUT_MODULES=3Dyes` perf. >> > evaluation. >> >> >>> Done on 4-way CPU on clean /usr/obj with /usr/src & /usr/obj both >> >> >>> nfs-mounted over 1Gbit LAN. >> >> >>> >> >> >>> clean old >> >> >>> 1137.985u 239.411s 7:42.15 298.0% =A0 =A0 =A0 6538+2133k 87+43388= io 226pf+0w >> >> >>> >> >> >>> clean new >> >> >>> 1134.755u 240.032s 7:41.25 298.0% =A0 =A0 =A0 6553+2133k 87+43367= io 224pf+0w >> >> >>> >> >> >>> Patch needs polishing, though it generally works. >> >> >>> Not sure if shep_chan (or whatever name it will get) needs lockin= g. >> >> >> As I said yesterday, if several requests to create nfsiod coming o= ne >> >> >> after another, you would loose all but the last. >> >> >> >> >> >> You should put the requests into the list, probably protected by >> >> >> nfs_iod_mtx. >> >> >> >> >> > >> >> > How about this patch? Still several things to ask. >> >> > 1) I used malloc instance w/ M_NOWAIT, since it's called with nfs_i= od_mtx >> > held. >> >> > 2) Probably busy/done gymnastics is a wrong mess. Your help is >> > appreciated. >> >> > 3) if (1) is fine, is it right to use fail: logic (i.e. set >> >> > NFSIOD_NOT_AVAILABLE) >> >> > on memory shortage? Not tested. >> >> > >> >> > There are debug printf() left intentionally to see how 3 contexts r= un >> > under load >> >> > to each other. I attached these messages as well if that makes sens= e. >> >> > >> >> >> >> Ah, yes. Sorry, forgot about that. >> > >> > Your task handler needs to run in a loop until the list of requests is= empty. >> > If multiple threads call taskqueue_enqueue() before your task is sched= uled, >> > they will be consolidated down to a single call of your task handler. >> >> Thanks for your suggestions. >> >> So I converted nfs_nfsiodnew_tq() into loop, and as such >> I changed a cleanup SLIST_FOREACH() as well to free a bulk of >> (potentially consolidated) completed requests in one pass. >> kproc_create() is still out of the SLIST_FOREACH() to avoid calling it >> with nfs_iod_mtx held. >> >> > >> > - =A0 =A0 =A0 int error, i; >> > - =A0 =A0 =A0 int newiod; >> > + =A0 =A0 =A0 int i, newiod, error; >> > >> > You should sort these alphabetically if you are going to change this. = =A0I would >> > probably just leave it as-is. >> >> Err.. that's resulted after a set of changes. Thanks for noting that. >> >> > >> > Also, I'm not sure how this works as you don't actually wait for the t= ask to >> > complete. =A0If the taskqueue_enqueue() doesn't preempt, then you may = read >> > ni_error as zero before the kproc has actually been created and return= success >> > even though an nfsiod hasn't been created. >> > >> >> I put taskqueue_drain() right after taskqueue_enqueue() to be in sync wi= th >> task handler. That was part to think about, as I didn't find such a use = pattern. >> It seems though that tasks are launched now strictly one-by-one, without >> visible parallelism (as far as debug printf()s don't overlap anymore). > > Ah, if it is safe to sleep then I have a couple of suggestions: > Cool, credits go to John :). I just adopted all of your changes (attached). > - Use M_WAITOK to malloc() so you don't have to worry about the failure c= ase > =A0(I see Rick already suggested this). After all that reduces to the following replacement in nfs_nfsiodnew(). So, no regression should be there in theory. mtx_unlock(&nfs_iod_mtx); - kproc_create(...) + malloc(...) mtx_lock(&nfs_iod_mtx); It survived after this simple test running for an hour, which forces creation of many iods with r/w from 300 threads: nfsserv:/home/svn/freebsd/obj_exp on /usr/obj (nfs) ./randomio /usr/obj/file 100 0.5 0 4096 60 100 & ./randomio /usr/obj/file 100 0.5 0 4096 60 100 & ./randomio /usr/obj/file 100 0.5 0 4096 60 100 & while [ true ]; do sysctl vfs.nfs.iodmax=3D2; sysctl vfs.nfs.iodmax=3D60; sleep 20; done randomio is from ports/149838. P.S. it's funny to see in top 0 root 10 44 0 0K 144K deadlk 2 23:16 28.86% kernel Someone might think the kernel is in deadlock. --=20 wbr, pluknet --0015175caf7c45eade048e44290f Content-Type: application/octet-stream; name="nfsiod_tq_lock.3.diff" Content-Disposition: attachment; filename="nfsiod_tq_lock.3.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_gd3a4fy00 SW5kZXg6IHN5cy9uZnNjbGllbnQvbmZzX3N1YnMuYwo9PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBzeXMvbmZzY2xp ZW50L25mc19zdWJzLmMJKHJldmlzaW9uIDIxMTI3OSkKKysrIHN5cy9uZnNjbGllbnQvbmZzX3N1 YnMuYwkod29ya2luZyBjb3B5KQpAQCAtNTksNiArNTksNyBAQAogI2luY2x1ZGUgPHN5cy9zeXNl bnQuaD4KICNpbmNsdWRlIDxzeXMvc3lzY2FsbC5oPgogI2luY2x1ZGUgPHN5cy9zeXNwcm90by5o PgorI2luY2x1ZGUgPHN5cy90YXNrcXVldWUuaD4KIAogI2luY2x1ZGUgPHZtL3ZtLmg+CiAjaW5j bHVkZSA8dm0vdm1fb2JqZWN0Lmg+CkBAIC0xMTcsNiArMTE4LDcgQEAKIAogc3RydWN0IG5mc19i dWZxCW5mc19idWZxOwogc3RhdGljIHN0cnVjdCBtdHggbmZzX3hpZF9tdHg7CitzdHJ1Y3QgdGFz awluZnNfbmZzaW9kbmV3X3Rhc2s7CiAKIC8qCiAgKiBhbmQgdGhlIHJldmVyc2UgbWFwcGluZyBm cm9tIGdlbmVyaWMgdG8gVmVyc2lvbiAyIHByb2NlZHVyZSBudW1iZXJzCkBAIC0zNTQsNiArMzU2 LDcgQEAKIAkgKi8KIAltdHhfaW5pdCgmbmZzX2lvZF9tdHgsICJORlMgaW9kIGxvY2siLCBOVUxM LCBNVFhfREVGKTsKIAltdHhfaW5pdCgmbmZzX3hpZF9tdHgsICJORlMgeGlkIGxvY2siLCBOVUxM LCBNVFhfREVGKTsKKwlUQVNLX0lOSVQoJm5mc19uZnNpb2RuZXdfdGFzaywgMCwgbmZzX25mc2lv ZG5ld190cSwgTlVMTCk7CiAKIAluZnNfcGJ1Zl9mcmVlY250ID0gbnN3YnVmIC8gMiArIDE7CiAK SW5kZXg6IHN5cy9uZnNjbGllbnQvbmZzX25mc2lvZC5jCj09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIHN5cy9uZnNj bGllbnQvbmZzX25mc2lvZC5jCShyZXZpc2lvbiAyMTEyNzkpCisrKyBzeXMvbmZzY2xpZW50L25m c19uZnNpb2QuYwkod29ya2luZyBjb3B5KQpAQCAtNTksNiArNTksNyBAQAogI2luY2x1ZGUgPHN5 cy9mY250bC5oPgogI2luY2x1ZGUgPHN5cy9sb2NrZi5oPgogI2luY2x1ZGUgPHN5cy9tdXRleC5o PgorI2luY2x1ZGUgPHN5cy90YXNrcXVldWUuaD4KIAogI2luY2x1ZGUgPG5ldGluZXQvaW4uaD4K ICNpbmNsdWRlIDxuZXRpbmV0L3RjcC5oPgpAQCAtNzUsNiArNzYsMTYgQEAKIAogc3RhdGljIHZv aWQJbmZzc3ZjX2lvZCh2b2lkICopOwogCitzdHJ1Y3QgbmZzaW9kX3N0ciB7CisJU1RBSUxRX0VO VFJZKG5mc2lvZF9zdHIpIG5pX2xpbmtzOworCWludCAqbmlfaW5zdDsKKwlpbnQgbmlfaW9kOwor CWludCBuaV9lcnJvcjsKKwlpbnQgbmlfZG9uZTsKK307CitzdGF0aWMgU1RBSUxRX0hFQUQoLCBu ZnNpb2Rfc3RyKSBuZnNpb2RoZWFkID0KKyAgICBTVEFJTFFfSEVBRF9JTklUSUFMSVpFUihuZnNp b2RoZWFkKTsKKwogc3RhdGljIGludCBuZnNfYXN5bmNkYWVtb25bTkZTX01BWEFTWU5DREFFTU9O XTsKIAogU1lTQ1RMX0RFQ0woX3Zmc19uZnMpOwpAQCAtMTU5LDExICsxNzAsMzAgQEAKICAgICBz aXplb2YgKG5mc19pb2RtYXgpLCBzeXNjdGxfaW9kbWF4LCAiSVUiLAogICAgICJNYXggbnVtYmVy IG9mIG5mc2lvZCBrdGhyZWFkcyIpOwogCit2b2lkCituZnNfbmZzaW9kbmV3X3RxKF9fdW51c2Vk IHZvaWQgKmFyZywgaW50IHBlbmRpbmcpCit7CisJc3RydWN0IG5mc2lvZF9zdHIgKm5pcDsKKwor CW10eF9sb2NrKCZuZnNfaW9kX210eCk7CisJd2hpbGUgKChuaXAgPSBTVEFJTFFfRklSU1QoJm5m c2lvZGhlYWQpKSAhPSBOVUxMKSB7CisJCVNUQUlMUV9SRU1PVkVfSEVBRCgmbmZzaW9kaGVhZCwg bmlfbGlua3MpOworCQltdHhfdW5sb2NrKCZuZnNfaW9kX210eCk7CisJCW5pcC0+bmlfZXJyb3Ig PSBrcHJvY19jcmVhdGUobmZzc3ZjX2lvZCwgbmlwLT5uaV9pbnN0LCBOVUxMLAorCQkgICAgUkZI SUdIUElELCAwLCAibmZzaW9kICVkIiwgbmlwLT5uaV9pb2QpOworCQluaXAtPm5pX2RvbmUgPSAx OworCQltdHhfbG9jaygmbmZzX2lvZF9tdHgpOworCQl3YWtldXAobmlwKTsKKwl9CisJbXR4X3Vu bG9jaygmbmZzX2lvZF9tdHgpOworfQorCiBpbnQKIG5mc19uZnNpb2RuZXcoaW50IHNldF9pb2R3 YW50KQogewogCWludCBlcnJvciwgaTsKIAlpbnQgbmV3aW9kOworCXN0cnVjdCBuZnNpb2Rfc3Ry ICpuaXA7CiAKIAlpZiAobmZzX251bWFzeW5jID49IG5mc19pb2RtYXgpCiAJCXJldHVybiAoLTEp OwpAQCAtMTc5LDkgKzIwOSwxNiBAQAogCWlmIChzZXRfaW9kd2FudCA+IDApCiAJCW5mc19pb2R3 YW50W2ldID0gTkZTSU9EX0NSRUFURURfRk9SX05GU19BU1lOQ0lPOwogCW10eF91bmxvY2soJm5m c19pb2RfbXR4KTsKLQllcnJvciA9IGtwcm9jX2NyZWF0ZShuZnNzdmNfaW9kLCBuZnNfYXN5bmNk YWVtb24gKyBpLCBOVUxMLCBSRkhJR0hQSUQsCi0JICAgIDAsICJuZnNpb2QgJWQiLCBuZXdpb2Qp OworCW5pcCA9IG1hbGxvYyhzaXplb2YoKm5pcCksIE1fVEVNUCwgTV9XQUlUT0sgfCBNX1pFUk8p OworCW5pcC0+bmlfaW5zdCA9IG5mc19hc3luY2RhZW1vbiArIGk7CisJbmlwLT5uaV9pb2QgPSBu ZXdpb2Q7CiAJbXR4X2xvY2soJm5mc19pb2RfbXR4KTsKKwlTVEFJTFFfSU5TRVJUX1RBSUwoJm5m c2lvZGhlYWQsIG5pcCwgbmlfbGlua3MpOworCXRhc2txdWV1ZV9lbnF1ZXVlKHRhc2txdWV1ZV90 aHJlYWQsICZuZnNfbmZzaW9kbmV3X3Rhc2spOworCXdoaWxlICghbmlwLT5uaV9kb25lKQorCQlt dHhfc2xlZXAobmlwLCAmbmZzX2lvZF9tdHgsIDAsICJuaXd0IiwgMCk7CisJZXJyb3IgPSBuaXAt Pm5pX2Vycm9yOworCWZyZWUobmlwLCBNX1RFTVApOwogCWlmIChlcnJvcikgewogCQlpZiAoc2V0 X2lvZHdhbnQgPiAwKQogCQkJbmZzX2lvZHdhbnRbaV0gPSBORlNJT0RfTk9UX0FWQUlMQUJMRTsK SW5kZXg6IHN5cy9uZnNjbGllbnQvbmZzLmgKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQotLS0gc3lzL25mc2NsaWVudC9u ZnMuaAkocmV2aXNpb24gMjExMjc5KQorKysgc3lzL25mc2NsaWVudC9uZnMuaAkod29ya2luZyBj b3B5KQpAQCAtMTI1LDYgKzEyNSw3IEBACiAKIGV4dGVybiBzdHJ1Y3QgbmZzc3RhdHMgbmZzc3Rh dHM7CiBleHRlcm4gc3RydWN0IG10eCBuZnNfaW9kX210eDsKK2V4dGVybiBzdHJ1Y3QgdGFzayBu ZnNfbmZzaW9kbmV3X3Rhc2s7CiAKIGV4dGVybiBpbnQgbmZzX251bWFzeW5jOwogZXh0ZXJuIHVu c2lnbmVkIGludCBuZnNfaW9kbWF4OwpAQCAtMjUzLDYgKzI1NCw3IEBACiAJICAgIHN0cnVjdCB1 Y3JlZCAqY3JlZCwgc3RydWN0IHRocmVhZCAqdGQpOwogaW50CW5mc19yZWFkZGlycnBjKHN0cnVj dCB2bm9kZSAqLCBzdHJ1Y3QgdWlvICosIHN0cnVjdCB1Y3JlZCAqKTsKIGludAluZnNfbmZzaW9k bmV3KGludCk7Cit2b2lkCW5mc19uZnNpb2RuZXdfdHEoX191bnVzZWQgdm9pZCAqLCBpbnQpOwog aW50CW5mc19hc3luY2lvKHN0cnVjdCBuZnNtb3VudCAqLCBzdHJ1Y3QgYnVmICosIHN0cnVjdCB1 Y3JlZCAqLCBzdHJ1Y3QgdGhyZWFkICopOwogaW50CW5mc19kb2lvKHN0cnVjdCB2bm9kZSAqLCBz dHJ1Y3QgYnVmICosIHN0cnVjdCB1Y3JlZCAqLCBzdHJ1Y3QgdGhyZWFkICopOwogdm9pZAluZnNf ZG9pb19kaXJlY3R3cml0ZSAoc3RydWN0IGJ1ZiAqKTsK --0015175caf7c45eade048e44290f--