Date: Fri, 16 May 2003 19:23:24 +0200 From: Bernd Walter <ticso@cicely12.cicely.de> To: current@FreeBSD.org Cc: ticso@cicely.de Subject: vlruwk deadlock (was: NFS troubles on recent -current.) Message-ID: <20030516172323.GA15462@cicely12.cicely.de> In-Reply-To: <20030512203308.GC7361@cicely9.cicely.de> References: <20030511133355.GA3789@cicely9.cicely.de> <200305112329.h4BNTYM7045966@gw.catspoiler.org> <20030511234520.GF3789@cicely9.cicely.de> <20030512203308.GC7361@cicely9.cicely.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, May 12, 2003 at 10:33:08PM +0200, Bernd Walter wrote: > On Mon, May 12, 2003 at 01:45:21AM +0200, Bernd Walter wrote: > > On Sun, May 11, 2003 at 04:29:34PM -0700, Don Lewis wrote: > > > On 11 May, Bernd Walter wrote: > > > > I have a -current (10th may) NFS server which serves for a diskless > > > > NFS client (11th May). > > > > The client has / on a CF media and /usr as well as /var/spool on the > > > > server. > > > > Also swap is on the server (/usr/swapfile) via md. > > > > When I run installworld on the client the server hangs with > > > > "lockmgr draning against myself". > > > > I got no DDB prompt so no stacktrace or crashdump :( > > > > > > Try the attached patch. I didn't have any problems with > > > make -j10 buildworld > > > make DESTDIR=/mnt installworld > > > with /usr/obj and /mnt NFS mounted from the filesystems exported on the > > > same server (though I had to tweak some sysctl variables to avoid wdrain > > > deadlocks in the bio code in this configuration). > > > > > > Can you break into the debugger with CNTRL-Alt-Esc? I also recommend > > > > I havn't tried as it shouldn't be required after a panic. > > > > > trying the DEBUG_VFS_LOCKS kernel configuration option, which will drop > > > into ddb if any of the vnode locking assertions are violated. > > > > I will try your patch and suggestions tomorrow. > > I can't reproduce the panic with your patch anymore. > Great work! I had an io deadlook on the same server today while doing a make release on an alpha nfs client. I don't know if it is related to the given patch or not. It happened shortly after the client checked out the ports. Sorry - I forgot to take a dump. One of the last commands I was able to send: [244]cicely5# ps -axl UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 0 0 0 -16 0 0 4 sched DLs ?? 0:02.92 (swapper) 0 1 0 0 8 0 716 260 wait ILs ?? 0:02.39 /sbin/init -- 0 2 0 0 -8 0 0 12 g_even DL ?? 0:54.29 (g_event) 0 3 0 1 -8 0 0 12 g_up DL ?? 23:28.80 (g_up) 0 4 0 0 -8 0 0 12 g_down DL ?? 10:24.32 (g_down) 0 5 0 0 -16 0 0 12 psleep DL ?? 1:36.59 (pagedaemon) 0 6 0 0 20 0 0 12 psleep DL ?? 0:00.00 (vmdaemon) 0 7 0 0 171 0 0 12 pgzero DL ?? 7:55.19 (pagezero) 0 8 0 0 -16 0 0 12 psleep DL ?? 0:28.41 (bufdaemon) 0 9 0 0 20 0 0 12 vlrup DL ?? 0:34.80 (vnlru) 0 10 0 0 -16 0 0 12 ktrace DL ?? 0:00.00 (ktrace) 0 11 0 17 -16 0 0 12 - RL ?? 5384:08.48 (idle) 0 12 0 0 -48 0 0 12 - WL ?? 18:42.52 (swi7: tty:sio clock) 0 14 0 0 -44 0 0 12 - WL ?? 41:44.12 (swi1: net) 0 15 0 0 76 0 0 12 sleep DL ?? 5:28.24 (random) 0 18 0 1 -36 0 0 12 - WL ?? 4:28.69 (swi3: cambio) 0 22 0 0 -64 0 0 12 - WL ?? 0:00.00 (irq15: ata1) 0 23 0 1 -64 0 0 12 - WL ?? 3:14.38 (irq5: ahc0 ahc1++) 0 24 0 0 -64 0 0 12 - WL ?? 0:32.48 (irq12: ahc3) 0 25 0 0 -68 0 0 12 - WL ?? 18:45.62 (irq10: de0) 0 26 0 0 -64 0 0 12 - WL ?? 0:26.23 (irq11: ahc5 ahc6+) 0 27 0 0 -60 0 0 12 - WL ?? 0:00.03 (irq1: atkbd0) 0 28 0 0 -64 0 0 12 - WL ?? 0:00.00 (irq6: fdc0) 0 29 0 0 -60 0 0 12 - WL ?? 0:00.00 (irq7: ppc0) 0 30 0 0 -48 0 0 12 - WL ?? 0:00.00 (swi0: tty:sio) 0 35 0 0 20 0 0 12 syncer DL ?? 19:05.67 (syncer) 0 36 0 0 8 0 0 12 nfsidl IL ?? 2:17.00 (nfsiod 0) 0 37 0 12 8 0 0 12 nfsidl IL ?? 0:22.74 (nfsiod 1) 0 38 0 0 8 0 0 12 nfsidl IL ?? 0:00.01 (nfsiod 2) 0 39 0 0 8 0 0 12 nfsidl IL ?? 0:00.00 (nfsiod 3) 0 73 1 6 -8 0 848 176 vinum DLs ?? 0:00.02 vinum: vinum daemon (vinum) 0 127 1 26 20 0 224 84 pause Is ?? 0:00.00 adjkerntz -i 0 278 1 0 96 0 1196 712 select Is ?? 1:17.76 /usr/sbin/syslogd -s 0 294 1 0 96 0 1516 960 select Ss ?? 0:02.01 /usr/sbin/rpcbind 0 310 1 0 96 0 1244 876 select Ss ?? 0:10.22 /usr/sbin/ypbind 0 379 1 0 96 0 1404 1076 select Is ?? 0:00.79 /usr/sbin/mountd -r 0 381 1 0 96 0 1212 832 select Is ?? 0:00.31 nfsd: master (nfsd) 0 384 381 0 -4 0 1148 692 vlruwk D ?? 90:01.02 nfsd: server (nfsd) 0 385 381 0 4 0 1148 692 nfsd S ?? 20:15.06 nfsd: server (nfsd) 0 386 381 0 4 0 1148 692 nfsd S ?? 8:51.20 nfsd: server (nfsd) 0 387 381 0 4 0 1148 692 nfsd I ?? 5:39.17 nfsd: server (nfsd) 0 389 1 0 96 0 263420 804 select Is ?? 0:01.52 /usr/sbin/rpc.statd 0 427 1 0 96 0 6072 4052 select Ss ?? 1:53.33 /usr/local/sbin/httpd -k start -DSSL 0 435 1 7 20 0 1832 696 lockf Is ?? 0:00.01 /usr/local/sbin/saslauthd -a pam 0 436 435 7 4 0 1832 696 accept I ?? 0:00.00 /usr/local/sbin/saslauthd -a pam 0 437 435 7 20 0 1832 696 lockf I ?? 0:00.00 /usr/local/sbin/saslauthd -a pam 0 438 435 7 20 0 1832 696 lockf I ?? 0:00.00 /usr/local/sbin/saslauthd -a pam 0 439 435 7 20 0 1832 696 lockf I ?? 0:00.00 /usr/local/sbin/saslauthd -a pam 0 447 1 1 96 0 1220 768 select Is ?? 0:00.07 /usr/sbin/lpd 0 462 1 0 96 0 1548 948 select Ss ?? 1:13.14 /usr/sbin/ntpd -p /var/run/ntpd.pid 1 474 1 0 -4 0 1176 748 vlruwk Ds ?? 0:12.10 /usr/sbin/rwhod 0 515 1 0 96 0 3320 1764 select Is ?? 0:05.30 /usr/sbin/sshd 0 523 1 0 96 0 5208 2616 select Ss ?? 1:38.83 sendmail: accepting connections (sendmail) 25 526 1 0 20 0 3452 2012 pause Is ?? 0:01.18 sendmail: Queue runner@00:30:00 for /var/spool/clientmqueue (sen 0 544 1 0 -4 0 1272 840 vlruwk Ds ?? 0:07.50 /usr/sbin/cron 0 581 1 0 96 0 1348 888 select Is ?? 0:00.06 /usr/sbin/inetd -wW 0 596 1 34 5 0 1212 732 siodcd I ?? 0:00.02 /usr/libexec/getty std.9600 ttyd0 0 601 515 0 4 0 6124 2116 sbwait Is ?? 0:00.28 sshd: ticso [priv] (sshd) 1007 603 601 0 96 0 6124 2196 select I ?? 1:15.78 sshd: ticso@ttyp0 (sshd) 0 3121 515 0 4 0 6124 2132 sbwait Is ?? 0:00.25 sshd: ticso [priv] (sshd) 1007 3123 3121 0 96 0 6124 2188 select I ?? 0:00.11 sshd: ticso@ttyp1 (sshd) 80 24548 427 5 20 0 6072 4080 lockf I ?? 0:00.05 /usr/local/sbin/httpd -k start -DSSL 80 24549 427 5 20 0 6072 4080 lockf I ?? 0:00.04 /usr/local/sbin/httpd -k start -DSSL 80 24550 427 5 20 0 6072 4080 lockf I ?? 0:00.04 /usr/local/sbin/httpd -k start -DSSL 80 24551 427 6 20 0 6072 4080 lockf I ?? 0:00.05 /usr/local/sbin/httpd -k start -DSSL 80 24552 427 6 96 0 6072 4080 select I ?? 0:00.05 /usr/local/sbin/httpd -k start -DSSL 60 28689 440 2 -4 0 24836 3240 vlruwk D ?? 2:27.68 imapd: imapd: cicely12.cicely.de[3ffe:400:8d0:301::12] ticso use 0 29641 523 0 4 0 5500 3404 sbwait S ?? 0:00.29 sendmail: h4GGWjhx029641 gif-cicely.cosmo-project.de [IPv6:3ffe: 0 29643 544 0 -8 0 1320 1004 piperd I ?? 0:00.00 cron: running job (cron) 2 29644 29643 0 8 0 868 348 wait Is ?? 0:00.04 /bin/sh -c /usr/libexec/save-entropy 2 29645 29644 1 8 0 908 372 wait I ?? 0:00.05 /bin/sh /usr/libexec/save-entropy 2 29656 29645 0 -4 0 240 140 vlruwk D ?? 0:00.00 dd if=/dev/random of=/var/db/entropy/saved-entropy.1 bs=2048 cou 0 29657 523 1 4 0 5268 2712 kqread S ?? 0:00.01 sendmail: startup with gif-cicely.cosmo-project.de (sendmail) 0 29658 523 0 4 0 5268 2712 kqread S ?? 0:00.01 sendmail: startup with gif-cicely.cosmo-project.de (sendmail) 1007 604 603 0 20 0 1596 1108 pause Is p0 0:00.19 -tcsh (tcsh) 0 10000 604 0 20 0 1720 1252 pause I p0 0:01.95 tcsh 0 29647 10000 0 96 0 700 324 - R+ p0 0:00.01 ps -axl 1007 3124 3123 0 5 0 1556 1052 ttyin Is+ p1 0:00.17 -tcsh (tcsh) 1007 3132 3124 0 8 0 896 368 - T p1 0:00.05 /bin/sh ./run-mozilla.sh ./mozilla-bin 1007 3143 3132 11 97 0 48896 39668 - T p1 43:46.76 ./mozilla-bin 0 592 1 0 8 0 1636 1180 wait Is v0 0:00.16 login [pam] (login) 1007 13632 592 0 20 0 1548 1060 pause I v0 0:00.13 -tcsh (tcsh) 1007 13636 13632 0 96 0 3260 2076 select I+ v0 0:23.72 ssh c6 0 593 1 99 5 0 1216 796 ttyin Is+ v1 0:00.02 /usr/libexec/getty Pc ttyv1 0 594 1 99 5 0 1216 796 ttyin Is+ v2 0:00.02 /usr/libexec/getty Pc ttyv2 0 595 1 99 5 0 1216 796 ttyin Is+ v3 0:00.02 /usr/libexec/getty Pc ttyv3 60 440 1 0 96 0 2780 1348 select I con- 0:15.58 /usr/local/cyrus/bin/master I saw many of those process already over minutes waiting with vlruwk in top output on another window. -- B.Walter BWCT http://www.bwct.de ticso@bwct.de info@bwct.de
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030516172323.GA15462>