Date: Wed, 19 Oct 2011 12:04:15 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Mark Saad <nonesuch@longcount.org> Cc: freebsd-fs@FreeBSD.org Subject: Re: kern/156168: [nfs] [panic] Kernel panic under concurrent access over NFS Message-ID: <436946680.103454.1319040255028.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201109291600.p8TG0OI4040954@freefall.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_103453_1537264991.1319040255027 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Mark Saad wrote: > The following reply was made to PR kern/156168; it has been noted by > GNATS. > > From: Mark Saad <nonesuch@longcount.org> > To: bug-followup@FreeBSD.org, niakrisn@gmail.com > Cc: > Subject: Re: kern/156168: [nfs] [panic] Kernel panic under concurrent > access > over NFS > Date: Thu, 29 Sep 2011 11:32:12 -0400 > > All > I am seeing a similar crash on 7.3-RELEASE-p2 amd64 when using > apache-1.3.34 with accf_httpd and a nfs docroot > The servers that have crashed are all FreeBSD 7.3-RELEASE amd64. > Hardware is HP Dl145 g2 > They have 2G of ram and 2G swap with one single core opteron cpu. > > > We are using the following sysctls . > > kern.ipc.maxsockbuf=2097152 > kern.ipc.nmbclusters=32768 > kern.ipc.somaxconn=1024 > kern.maxfiles=131072 > kern.maxfilesperproc=32768 > net.inet.tcp.inflight.enable=0 > net.inet.tcp.path_mtu_discovery=0 > net.inet.tcp.recvbuf_inc=524288 > net.inet.tcp.recvbuf_max=8388608 > net.inet.tcp.recvspace=32768 > net.inet.tcp.sendbuf_inc=16384 > net.inet.tcp.sendbuf_max=8388608 > net.inet.tcp.sendspace=32768 > net.inet.udp.recvspace=42080 > net.isr.direct=1 > vm.pmap.shpgperproc=600 > > > Up time prior to the crash was not the other system was up for 11 days > this one was 6 days. > > Here is the contents of my crash > > > [root@web29 /var/crash]# kgdb /boot/kernel/kernel /var/crash/vmcore.0 > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and > you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for > details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x258 > fault code = supervisor read data, page not present > instruction pointer = 0x8:0xffffffff8051a66d > stack pointer = 0x10:0xffffff803e69b1c0 > frame pointer = 0x10:0xffffff0001b50ae0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 9336 (libhttpd.ep) > trap number = 12 > panic: page fault > cpuid = 0 > Uptime: 6d5h18m39s > Physical memory: 2034 MB > Dumping 1451 MB: 1436 1420 1404 1388 1372 1356 1340 1324 1308 1292 > 1276 1260 1244 1228 1212 1196 1180 1164 1148 1132 1116 1100 1084 1068 > 1052 1036 1020 1004 988 972 956 940 924 908 892 876 860 844 828 812 > 796 780 764 748 732 716 700 684 668 652 636 620 604 588 572 556 540 > 524 508 492 476 460 444 428 412 396 380 364 348 332 316 300 284 268 > 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12 > > Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from > /boot/kernel/accf_http.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/accf_http.ko > #0 doadump () at pcpu.h:195 > 195 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) bt > #0 doadump () at pcpu.h:195 > #1 0x0000000000000004 in ?? () > #2 0xffffffff805285f9 in boot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:418 > #3 0xffffffff80528a02 in panic (fmt=0x104 <Address 0x104 out of > bounds>) at /usr/src/sys/kern/kern_shutdown.c:574 > #4 0xffffffff807ec813 in trap_fatal (frame=0xffffff0001b50ae0, > eva=Variable "eva" is not available. > ) at /usr/src/sys/amd64/amd64/trap.c:777 > #5 0xffffffff807ecbe5 in trap_pfault (frame=0xffffff803e69b110, > usermode=0) at /usr/src/sys/amd64/amd64/trap.c:693 > #6 0xffffffff807ed50c in trap (frame=0xffffff803e69b110) at > /usr/src/sys/amd64/amd64/trap.c:464 > #7 0xffffffff807d614e in calltrap () at > /usr/src/sys/amd64/amd64/exception.S:218 > #8 0xffffffff8051a66d in _mtx_lock_sleep (m=0xffffff002f3d7a80, > tid=18446742974226565856, opts=Variable "opts" is not available. > ) > at /usr/src/sys/kern/kern_mutex.c:339 > #9 0xffffffff80701f60 in clnt_dg_create (so=0xffffff00017755a0, > svcaddr=0xffffff803e69b310, program=100000, version=4, sendsz=Variable > "sendsz" is not available. > ) > at /usr/src/sys/rpc/clnt_dg.c:259 > #10 0xffffffff806e97c9 in nlm_get_rpc (sa=Variable "sa" is not > available. > ) at /usr/src/sys/nlm/nlm_prot_impl.c:327 > #11 0xffffffff806e9d39 in nlm_host_get_rpc (host=0xffffff0001705000) > at /usr/src/sys/nlm/nlm_prot_impl.c:1199 > #12 0xffffffff806e680f in nlm_clearlock (host=0xffffff0001705000, > ext=0xffffff803e69b9a0, vers=4, timo=0xffffff803e69b9d0, > retries=2147483647, vp=0xffffff004881edc8, op=2, > fl=0xffffff803e69bac0, flags=64, svid=9336, fhlen=32, > fh=0xffffff803e69b750, > size=689) at /usr/src/sys/nlm/nlm_advlock.c:943 > #13 0xffffffff806e7801 in nlm_advlock_internal (vp=0xffffff004881edc8, > id=Variable "id" is not available. > ) at /usr/src/sys/nlm/nlm_advlock.c:355 > #14 0xffffffff806e8166 in nlm_advlock (ap=Variable "ap" is not > available. > ) at /usr/src/sys/nlm/nlm_advlock.c:392 > #15 0xffffffff806ced28 in nfs_advlock (ap=0xffffff803e69ba90) at > /usr/src/sys/nfsclient/nfs_vnops.c:3153 > #16 0xffffffff804f40e2 in closef (fp=0xffffff0073716d80, > td=0xffffff0001b50ae0) at vnode_if.h:1036 > #17 0xffffffff804f462b in kern_close (td=0xffffff0001b50ae0, > fd=Variable "fd" is not available. > ) at /usr/src/sys/kern/kern_descrip.c:1125 > #18 0xffffffff807ece67 in syscall (frame=0xffffff803e69bc80) at > /usr/src/sys/amd64/amd64/trap.c:920 > #19 0xffffffff807d635b in Xfast_syscall () at > /usr/src/sys/amd64/amd64/exception.S:339 > #20 0x00000008009c5b1c in ?? () > Previous frame inner to this frame (corrupt stack?) > You could try the attached patch, which contains some of the changes in the newer versions of clnt_dg.c. (There have been many changes, so carrying them all across isn't practical, for me at least.) I have no way of testing this patch at this time, so all I did was compile it, rick > -- > mark saad | nonesuch@longcount.org > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" ------=_Part_103453_1537264991.1319040255027 Content-Type: text/x-patch; name=nlmdg7.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=nlmdg7.patch LS0tIHJwYy9jbG50X2RnLmMuc2F2CTIwMTEtMTAtMTkgMDk6Mzk6MjguMDAwMDAwMDAwIC0wNDAw CisrKyBycGMvY2xudF9kZy5jCTIwMTEtMTAtMTkgMDk6Mzk6MzkuMDAwMDAwMDAwIC0wNDAwCkBA IC0xMjAsOSArMTIwLDExIEBAIHN0cnVjdCBjdV9zb2NrZXQgewogCXN0cnVjdCBtdHgJCWNzX2xv Y2s7CiAJaW50CQkJY3NfcmVmczsJLyogQ291bnQgb2YgY2xpZW50cyAqLwogCXN0cnVjdCBjdV9y ZXF1ZXN0X2xpc3QJY3NfcGVuZGluZzsJLyogUmVxdWVzdHMgYXdhaXRpbmcgcmVwbGllcyAqLwot CQorCWludAkJCWNzX3VwY2FsbHJlZnM7CS8qIFJlZmNudCBvZiB1cGNhbGxzIGluIHByb2cuKi8K IH07CiAKK3N0YXRpYyB2b2lkIGNsbnRfZGdfdXBjYWxsc2RvbmUoc3RydWN0IHNvY2tldCAqLCBz dHJ1Y3QgY3Vfc29ja2V0ICopOworCiAvKgogICogUHJpdmF0ZSBkYXRhIGtlcHQgcGVyIGNsaWVu dCBoYW5kbGUKICAqLwpAQCAtMjc2LDYgKzI3OCw3IEBAIHJlY2hlY2tfc29ja2V0OgogCQl9CiAJ CW10eF9pbml0KCZjcy0+Y3NfbG9jaywgImNzLT5jc19sb2NrIiwgTlVMTCwgTVRYX0RFRik7CiAJ CWNzLT5jc19yZWZzID0gMTsKKwkJY3MtPmNzX3VwY2FsbHJlZnMgPSAwOwogCQlUQUlMUV9JTklU KCZjcy0+Y3NfcGVuZGluZyk7CiAJCXNvLT5zb191cGNhbGxhcmcgPSBjczsKIAkJc28tPnNvX3Vw Y2FsbCA9IGNsbnRfZGdfc291cGNhbGw7CkBAIC04MTEsMTggKzgxNCwyMyBAQCBjbG50X2RnX2Rl c3Ryb3koQ0xJRU5UICpjbCkKIAl3aGlsZSAoY3UtPmN1X3RocmVhZHMpCiAJCW1zbGVlcChjdSwg JmNzLT5jc19sb2NrLCAwLCAicnBjY2xvc2UiLCAwKTsKIAorCW10eF91bmxvY2soJmNzLT5jc19s b2NrKTsJCS8qIFRvIGF2b2lkIGEgTE9SLiAqLworCVNPQ0tCVUZfTE9DSygmY3UtPmN1X3NvY2tl dC0+c29fcmN2KTsKKwltdHhfbG9jaygmY3MtPmNzX2xvY2spOwogCWNzLT5jc19yZWZzLS07CiAJ aWYgKGNzLT5jc19yZWZzID09IDApIHsKLQkJbXR4X2Rlc3Ryb3koJmNzLT5jc19sb2NrKTsKLQkJ U09DS0JVRl9MT0NLKCZjdS0+Y3Vfc29ja2V0LT5zb19yY3YpOworCQltdHhfdW5sb2NrKCZjcy0+ Y3NfbG9jayk7CiAJCWN1LT5jdV9zb2NrZXQtPnNvX3VwY2FsbGFyZyA9IE5VTEw7CiAJCWN1LT5j dV9zb2NrZXQtPnNvX3VwY2FsbCA9IE5VTEw7CiAJCWN1LT5jdV9zb2NrZXQtPnNvX3Jjdi5zYl9m bGFncyAmPSB+U0JfVVBDQUxMOworCQljbG50X2RnX3VwY2FsbHNkb25lKGN1LT5jdV9zb2NrZXQs IGNzKTsKIAkJU09DS0JVRl9VTkxPQ0soJmN1LT5jdV9zb2NrZXQtPnNvX3Jjdik7CisJCW10eF9k ZXN0cm95KCZjcy0+Y3NfbG9jayk7CiAJCW1lbV9mcmVlKGNzLCBzaXplb2YoKmNzKSk7CiAJCWxh c3Rzb2NrZXRyZWYgPSBUUlVFOwogCX0gZWxzZSB7CiAJCW10eF91bmxvY2soJmNzLT5jc19sb2Nr KTsKKwkJU09DS0JVRl9VTkxPQ0soJmN1LT5jdV9zb2NrZXQtPnNvX3Jjdik7CiAJCWxhc3Rzb2Nr ZXRyZWYgPSBGQUxTRTsKIAl9CiAKQEAgLTg2Myw2ICs4NzEsOSBAQCBjbG50X2RnX3NvdXBjYWxs KHN0cnVjdCBzb2NrZXQgKnNvLCB2b2lkCiAJaW50IGVycm9yLCByY3ZmbGFnLCBmb3VuZHJlcTsK IAl1aW50MzJfdCB4aWQ7CiAKKwltdHhfbG9jaygmY3MtPmNzX2xvY2spOworCWNzLT5jc191cGNh bGxyZWZzKys7CisJbXR4X3VubG9jaygmY3MtPmNzX2xvY2spOwogCXVpby51aW9fcmVzaWQgPSAx MDAwMDAwMDAwOwogCXVpby51aW9fdGQgPSBjdXJ0aHJlYWQ7CiAJZG8gewpAQCAtOTM1LDUgKzk0 NiwyMiBAQCBjbG50X2RnX3NvdXBjYWxsKHN0cnVjdCBzb2NrZXQgKnNvLCB2b2lkCiAJCWlmICgh Zm91bmRyZXEpCiAJCQltX2ZyZWVtKG0pOwogCX0gd2hpbGUgKG0pOworCW10eF9sb2NrKCZjcy0+ Y3NfbG9jayk7CisJY3MtPmNzX3VwY2FsbHJlZnMtLTsKKwltdHhfdW5sb2NrKCZjcy0+Y3NfbG9j ayk7CisJd2FrZXVwKCZjcy0+Y3NfdXBjYWxscmVmcyk7CiB9CiAKKy8qCisgKiBXYWl0IGZvciBh bGwgdXBjYWxscyBpbiBwcm9ncmVzcyB0byBjb21wbGV0ZS4KKyAqLworc3RhdGljIHZvaWQKK2Ns bnRfZGdfdXBjYWxsc2RvbmUoc3RydWN0IHNvY2tldCAqc28sIHN0cnVjdCBjdV9zb2NrZXQgKmNz KQoreworCisJU09DS0JVRl9MT0NLX0FTU0VSVCgmc28tPnNvX3Jjdik7CisKKwl3aGlsZSAoY3Mt PmNzX3VwY2FsbHJlZnMgPiAwKQorCQkodm9pZCkgbXNsZWVwKCZjcy0+Y3NfdXBjYWxscmVmcywg U09DS0JVRl9NVFgoJnNvLT5zb19yY3YpLCAwLAorCQkgICAgInJwY2RndXAiLCAwKTsKK30K ------=_Part_103453_1537264991.1319040255027--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?436946680.103454.1319040255028.JavaMail.root>