Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Aug 2010 20:55:08 +0400
From:      pluknet <pluknet@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Kostik Belousov <kostikbel@gmail.com>, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: LOR on nfs: vfs_vnops.c:301 kern_descrip.c:1580
Message-ID:  <AANLkTin43JMZppMGgFDVhDKxbeE89ydkxwnFANfkrLc4@mail.gmail.com>
In-Reply-To: <201008190934.28365.jhb@freebsd.org>
References:  <AANLkTimJ=d06D2z24QyRQ98zEa1Pemk4=vkNGLNiX90N@mail.gmail.com> <201008181607.52798.jhb@freebsd.org> <AANLkTikpC9REEQKNUUHtLDxSLB1MmiqKdgbJ0Ee7O%2BGJ@mail.gmail.com> <201008190934.28365.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--0015175caf7c45eade048e44290f
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 19 August 2010 17:34, John Baldwin <jhb@freebsd.org> wrote:
> On Thursday, August 19, 2010 5:29:25 am pluknet wrote:
>> On 19 August 2010 00:07, John Baldwin <jhb@freebsd.org> wrote:
>> > On Wednesday, August 18, 2010 3:17:56 pm pluknet wrote:
>> >> On 18 August 2010 23:11, pluknet <pluknet@gmail.com> wrote:
>> >> > On 18 August 2010 17:46, Kostik Belousov <kostikbel@gmail.com> wrot=
e:
>> >> >> On Wed, Aug 18, 2010 at 02:43:19PM +0400, pluknet wrote:
>> >> >>> On 18 August 2010 12:07, pluknet <pluknet@gmail.com> wrote:
>> >> >>> > On 17 August 2010 20:04, Kostik Belousov <kostikbel@gmail.com> =
wrote:
>> >> >>> >
>> >> >>> >>
>> >> >>> >> Also please take a note of the John' suggestion to use the tas=
kqueue.
>> >> >>> >
>> >> >>> > I decided to go this road. Thank you both.
>> >> >>> > Now I do nfs buildkernel survive and prepare some benchmark res=
ults.
>> >> >>> >
>> >> >>>
>> >> >>> So, I modified the patch to defer proc_create() with taskqueue(9)=
.
>> >> >>> Below is `time make -j5 buildkernel WITHOUT_MODULES=3Dyes` perf.
>> > evaluation.
>> >> >>> Done on 4-way CPU on clean /usr/obj with /usr/src & /usr/obj both
>> >> >>> nfs-mounted over 1Gbit LAN.
>> >> >>>
>> >> >>> clean old
>> >> >>> 1137.985u 239.411s 7:42.15 298.0% =A0 =A0 =A0 6538+2133k 87+43388=
io 226pf+0w
>> >> >>>
>> >> >>> clean new
>> >> >>> 1134.755u 240.032s 7:41.25 298.0% =A0 =A0 =A0 6553+2133k 87+43367=
io 224pf+0w
>> >> >>>
>> >> >>> Patch needs polishing, though it generally works.
>> >> >>> Not sure if shep_chan (or whatever name it will get) needs lockin=
g.
>> >> >> As I said yesterday, if several requests to create nfsiod coming o=
ne
>> >> >> after another, you would loose all but the last.
>> >> >>
>> >> >> You should put the requests into the list, probably protected by
>> >> >> nfs_iod_mtx.
>> >> >>
>> >> >
>> >> > How about this patch? Still several things to ask.
>> >> > 1) I used malloc instance w/ M_NOWAIT, since it's called with nfs_i=
od_mtx
>> > held.
>> >> > 2) Probably busy/done gymnastics is a wrong mess. Your help is
>> > appreciated.
>> >> > 3) if (1) is fine, is it right to use fail: logic (i.e. set
>> >> > NFSIOD_NOT_AVAILABLE)
>> >> > on memory shortage? Not tested.
>> >> >
>> >> > There are debug printf() left intentionally to see how 3 contexts r=
un
>> > under load
>> >> > to each other. I attached these messages as well if that makes sens=
e.
>> >> >
>> >>
>> >> Ah, yes. Sorry, forgot about that.
>> >
>> > Your task handler needs to run in a loop until the list of requests is=
 empty.
>> > If multiple threads call taskqueue_enqueue() before your task is sched=
uled,
>> > they will be consolidated down to a single call of your task handler.
>>
>> Thanks for your suggestions.
>>
>> So I converted nfs_nfsiodnew_tq() into loop, and as such
>> I changed a cleanup SLIST_FOREACH() as well to free a bulk of
>> (potentially consolidated) completed requests in one pass.
>> kproc_create() is still out of the SLIST_FOREACH() to avoid calling it
>> with nfs_iod_mtx held.
>>
>> >
>> > - =A0 =A0 =A0 int error, i;
>> > - =A0 =A0 =A0 int newiod;
>> > + =A0 =A0 =A0 int i, newiod, error;
>> >
>> > You should sort these alphabetically if you are going to change this. =
=A0I would
>> > probably just leave it as-is.
>>
>> Err.. that's resulted after a set of changes. Thanks for noting that.
>>
>> >
>> > Also, I'm not sure how this works as you don't actually wait for the t=
ask to
>> > complete. =A0If the taskqueue_enqueue() doesn't preempt, then you may =
read
>> > ni_error as zero before the kproc has actually been created and return=
 success
>> > even though an nfsiod hasn't been created.
>> >
>>
>> I put taskqueue_drain() right after taskqueue_enqueue() to be in sync wi=
th
>> task handler. That was part to think about, as I didn't find such a use =
pattern.
>> It seems though that tasks are launched now strictly one-by-one, without
>> visible parallelism (as far as debug printf()s don't overlap anymore).
>
> Ah, if it is safe to sleep then I have a couple of suggestions:
>

Cool, credits go to John :).
I just adopted all of your changes (attached).

> - Use M_WAITOK to malloc() so you don't have to worry about the failure c=
ase
> =A0(I see Rick already suggested this).

After all that reduces to the following replacement in nfs_nfsiodnew().
So, no regression should be there in theory.

 mtx_unlock(&nfs_iod_mtx);
- kproc_create(...)
+ malloc(...)
mtx_lock(&nfs_iod_mtx);

It survived after this simple test running for an hour, which
forces creation of many iods with r/w from 300 threads:

nfsserv:/home/svn/freebsd/obj_exp on /usr/obj (nfs)

./randomio /usr/obj/file 100 0.5 0 4096 60 100 &
./randomio /usr/obj/file 100 0.5 0 4096 60 100 &
./randomio /usr/obj/file 100 0.5 0 4096 60 100 &
while [ true ]; do sysctl vfs.nfs.iodmax=3D2; sysctl vfs.nfs.iodmax=3D60;
sleep 20; done

randomio is from ports/149838.


P.S.
it's funny to see in top
    0 root       10  44    0     0K   144K deadlk  2  23:16 28.86% kernel
Someone might think the kernel is in deadlock.

--=20
wbr,
pluknet

--0015175caf7c45eade048e44290f
Content-Type: application/octet-stream; name="nfsiod_tq_lock.3.diff"
Content-Disposition: attachment; filename="nfsiod_tq_lock.3.diff"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gd3a4fy00

SW5kZXg6IHN5cy9uZnNjbGllbnQvbmZzX3N1YnMuYwo9PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSBzeXMvbmZzY2xp
ZW50L25mc19zdWJzLmMJKHJldmlzaW9uIDIxMTI3OSkKKysrIHN5cy9uZnNjbGllbnQvbmZzX3N1
YnMuYwkod29ya2luZyBjb3B5KQpAQCAtNTksNiArNTksNyBAQAogI2luY2x1ZGUgPHN5cy9zeXNl
bnQuaD4KICNpbmNsdWRlIDxzeXMvc3lzY2FsbC5oPgogI2luY2x1ZGUgPHN5cy9zeXNwcm90by5o
PgorI2luY2x1ZGUgPHN5cy90YXNrcXVldWUuaD4KIAogI2luY2x1ZGUgPHZtL3ZtLmg+CiAjaW5j
bHVkZSA8dm0vdm1fb2JqZWN0Lmg+CkBAIC0xMTcsNiArMTE4LDcgQEAKIAogc3RydWN0IG5mc19i
dWZxCW5mc19idWZxOwogc3RhdGljIHN0cnVjdCBtdHggbmZzX3hpZF9tdHg7CitzdHJ1Y3QgdGFz
awluZnNfbmZzaW9kbmV3X3Rhc2s7CiAKIC8qCiAgKiBhbmQgdGhlIHJldmVyc2UgbWFwcGluZyBm
cm9tIGdlbmVyaWMgdG8gVmVyc2lvbiAyIHByb2NlZHVyZSBudW1iZXJzCkBAIC0zNTQsNiArMzU2
LDcgQEAKIAkgKi8KIAltdHhfaW5pdCgmbmZzX2lvZF9tdHgsICJORlMgaW9kIGxvY2siLCBOVUxM
LCBNVFhfREVGKTsKIAltdHhfaW5pdCgmbmZzX3hpZF9tdHgsICJORlMgeGlkIGxvY2siLCBOVUxM
LCBNVFhfREVGKTsKKwlUQVNLX0lOSVQoJm5mc19uZnNpb2RuZXdfdGFzaywgMCwgbmZzX25mc2lv
ZG5ld190cSwgTlVMTCk7CiAKIAluZnNfcGJ1Zl9mcmVlY250ID0gbnN3YnVmIC8gMiArIDE7CiAK
SW5kZXg6IHN5cy9uZnNjbGllbnQvbmZzX25mc2lvZC5jCj09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIHN5cy9uZnNj
bGllbnQvbmZzX25mc2lvZC5jCShyZXZpc2lvbiAyMTEyNzkpCisrKyBzeXMvbmZzY2xpZW50L25m
c19uZnNpb2QuYwkod29ya2luZyBjb3B5KQpAQCAtNTksNiArNTksNyBAQAogI2luY2x1ZGUgPHN5
cy9mY250bC5oPgogI2luY2x1ZGUgPHN5cy9sb2NrZi5oPgogI2luY2x1ZGUgPHN5cy9tdXRleC5o
PgorI2luY2x1ZGUgPHN5cy90YXNrcXVldWUuaD4KIAogI2luY2x1ZGUgPG5ldGluZXQvaW4uaD4K
ICNpbmNsdWRlIDxuZXRpbmV0L3RjcC5oPgpAQCAtNzUsNiArNzYsMTYgQEAKIAogc3RhdGljIHZv
aWQJbmZzc3ZjX2lvZCh2b2lkICopOwogCitzdHJ1Y3QgbmZzaW9kX3N0ciB7CisJU1RBSUxRX0VO
VFJZKG5mc2lvZF9zdHIpIG5pX2xpbmtzOworCWludCAqbmlfaW5zdDsKKwlpbnQgbmlfaW9kOwor
CWludCBuaV9lcnJvcjsKKwlpbnQgbmlfZG9uZTsKK307CitzdGF0aWMgU1RBSUxRX0hFQUQoLCBu
ZnNpb2Rfc3RyKSBuZnNpb2RoZWFkID0KKyAgICBTVEFJTFFfSEVBRF9JTklUSUFMSVpFUihuZnNp
b2RoZWFkKTsKKwogc3RhdGljIGludCBuZnNfYXN5bmNkYWVtb25bTkZTX01BWEFTWU5DREFFTU9O
XTsKIAogU1lTQ1RMX0RFQ0woX3Zmc19uZnMpOwpAQCAtMTU5LDExICsxNzAsMzAgQEAKICAgICBz
aXplb2YgKG5mc19pb2RtYXgpLCBzeXNjdGxfaW9kbWF4LCAiSVUiLAogICAgICJNYXggbnVtYmVy
IG9mIG5mc2lvZCBrdGhyZWFkcyIpOwogCit2b2lkCituZnNfbmZzaW9kbmV3X3RxKF9fdW51c2Vk
IHZvaWQgKmFyZywgaW50IHBlbmRpbmcpCit7CisJc3RydWN0IG5mc2lvZF9zdHIgKm5pcDsKKwor
CW10eF9sb2NrKCZuZnNfaW9kX210eCk7CisJd2hpbGUgKChuaXAgPSBTVEFJTFFfRklSU1QoJm5m
c2lvZGhlYWQpKSAhPSBOVUxMKSB7CisJCVNUQUlMUV9SRU1PVkVfSEVBRCgmbmZzaW9kaGVhZCwg
bmlfbGlua3MpOworCQltdHhfdW5sb2NrKCZuZnNfaW9kX210eCk7CisJCW5pcC0+bmlfZXJyb3Ig
PSBrcHJvY19jcmVhdGUobmZzc3ZjX2lvZCwgbmlwLT5uaV9pbnN0LCBOVUxMLAorCQkgICAgUkZI
SUdIUElELCAwLCAibmZzaW9kICVkIiwgbmlwLT5uaV9pb2QpOworCQluaXAtPm5pX2RvbmUgPSAx
OworCQltdHhfbG9jaygmbmZzX2lvZF9tdHgpOworCQl3YWtldXAobmlwKTsKKwl9CisJbXR4X3Vu
bG9jaygmbmZzX2lvZF9tdHgpOworfQorCiBpbnQKIG5mc19uZnNpb2RuZXcoaW50IHNldF9pb2R3
YW50KQogewogCWludCBlcnJvciwgaTsKIAlpbnQgbmV3aW9kOworCXN0cnVjdCBuZnNpb2Rfc3Ry
ICpuaXA7CiAKIAlpZiAobmZzX251bWFzeW5jID49IG5mc19pb2RtYXgpCiAJCXJldHVybiAoLTEp
OwpAQCAtMTc5LDkgKzIwOSwxNiBAQAogCWlmIChzZXRfaW9kd2FudCA+IDApCiAJCW5mc19pb2R3
YW50W2ldID0gTkZTSU9EX0NSRUFURURfRk9SX05GU19BU1lOQ0lPOwogCW10eF91bmxvY2soJm5m
c19pb2RfbXR4KTsKLQllcnJvciA9IGtwcm9jX2NyZWF0ZShuZnNzdmNfaW9kLCBuZnNfYXN5bmNk
YWVtb24gKyBpLCBOVUxMLCBSRkhJR0hQSUQsCi0JICAgIDAsICJuZnNpb2QgJWQiLCBuZXdpb2Qp
OworCW5pcCA9IG1hbGxvYyhzaXplb2YoKm5pcCksIE1fVEVNUCwgTV9XQUlUT0sgfCBNX1pFUk8p
OworCW5pcC0+bmlfaW5zdCA9IG5mc19hc3luY2RhZW1vbiArIGk7CisJbmlwLT5uaV9pb2QgPSBu
ZXdpb2Q7CiAJbXR4X2xvY2soJm5mc19pb2RfbXR4KTsKKwlTVEFJTFFfSU5TRVJUX1RBSUwoJm5m
c2lvZGhlYWQsIG5pcCwgbmlfbGlua3MpOworCXRhc2txdWV1ZV9lbnF1ZXVlKHRhc2txdWV1ZV90
aHJlYWQsICZuZnNfbmZzaW9kbmV3X3Rhc2spOworCXdoaWxlICghbmlwLT5uaV9kb25lKQorCQlt
dHhfc2xlZXAobmlwLCAmbmZzX2lvZF9tdHgsIDAsICJuaXd0IiwgMCk7CisJZXJyb3IgPSBuaXAt
Pm5pX2Vycm9yOworCWZyZWUobmlwLCBNX1RFTVApOwogCWlmIChlcnJvcikgewogCQlpZiAoc2V0
X2lvZHdhbnQgPiAwKQogCQkJbmZzX2lvZHdhbnRbaV0gPSBORlNJT0RfTk9UX0FWQUlMQUJMRTsK
SW5kZXg6IHN5cy9uZnNjbGllbnQvbmZzLmgKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQotLS0gc3lzL25mc2NsaWVudC9u
ZnMuaAkocmV2aXNpb24gMjExMjc5KQorKysgc3lzL25mc2NsaWVudC9uZnMuaAkod29ya2luZyBj
b3B5KQpAQCAtMTI1LDYgKzEyNSw3IEBACiAKIGV4dGVybiBzdHJ1Y3QgbmZzc3RhdHMgbmZzc3Rh
dHM7CiBleHRlcm4gc3RydWN0IG10eCBuZnNfaW9kX210eDsKK2V4dGVybiBzdHJ1Y3QgdGFzayBu
ZnNfbmZzaW9kbmV3X3Rhc2s7CiAKIGV4dGVybiBpbnQgbmZzX251bWFzeW5jOwogZXh0ZXJuIHVu
c2lnbmVkIGludCBuZnNfaW9kbWF4OwpAQCAtMjUzLDYgKzI1NCw3IEBACiAJICAgIHN0cnVjdCB1
Y3JlZCAqY3JlZCwgc3RydWN0IHRocmVhZCAqdGQpOwogaW50CW5mc19yZWFkZGlycnBjKHN0cnVj
dCB2bm9kZSAqLCBzdHJ1Y3QgdWlvICosIHN0cnVjdCB1Y3JlZCAqKTsKIGludAluZnNfbmZzaW9k
bmV3KGludCk7Cit2b2lkCW5mc19uZnNpb2RuZXdfdHEoX191bnVzZWQgdm9pZCAqLCBpbnQpOwog
aW50CW5mc19hc3luY2lvKHN0cnVjdCBuZnNtb3VudCAqLCBzdHJ1Y3QgYnVmICosIHN0cnVjdCB1
Y3JlZCAqLCBzdHJ1Y3QgdGhyZWFkICopOwogaW50CW5mc19kb2lvKHN0cnVjdCB2bm9kZSAqLCBz
dHJ1Y3QgYnVmICosIHN0cnVjdCB1Y3JlZCAqLCBzdHJ1Y3QgdGhyZWFkICopOwogdm9pZAluZnNf
ZG9pb19kaXJlY3R3cml0ZSAoc3RydWN0IGJ1ZiAqKTsK
--0015175caf7c45eade048e44290f--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTin43JMZppMGgFDVhDKxbeE89ydkxwnFANfkrLc4>