Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 18 Jun 2000 13:23:54 +0200
From:      Jeroen Ruigrok van der Werven <jruigrok@via-net-works.nl>
To:        Luoqi Chen <luoqi@watermarkgroup.com>
Cc:        stable@freebsd.org, dillon@freebsd.org
Subject:   Details [note: this is long]
Message-ID:  <20000618132353.B95177@lucifer.bart.nl>
In-Reply-To: <200006131458.e5DEwJb03585@lor.watermarkgroup.com>; from luoqi@watermarkgroup.com on Tue, Jun 13, 2000 at 10:58:19AM -0400
References:  <200006131458.e5DEwJb03585@lor.watermarkgroup.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[still unsure where to sent 4.x related problems, since we only lag
behind 5 a few weeks/months.]

[Sorry for cc:'ing you directly Matthew, but I thought you might find
this interesting]

-On [20000613 17:00], Luoqi Chen (luoqi@watermarkgroup.com) wrote:
>The best way to diagnose the problem is to work on the live system when
>the same symptom occurs (unkillable process), find out which channels
>these processes are sleeping on and why they're not waken up (hardware
>failure might contribute to it).
>
>A `ps axl' report would be very helpful.

Ok, be afraid:

UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT  TT       TIME COMMAND
    0     0     0   0 -18  0     0    0 sched  DLs   ??    0:04.42  (swapper)
    0     1     0   0  10  0   524   72 wait   ILs   ??    0:00.70 /sbin/init --
    0     2     0   0 -18  0     0    0 psleep DL    ??    0:31.86  (pagedaemon)
    0     3     0   0  18  0     0    0 psleep DL    ??    0:00.00  (vmdaemon)
    0     4     0   0 -18  0     0    0 psleep DL    ??    0:02.14  (bufdaemon)
    0     5     0   0  18  0     0    0 syncer DL    ??    2:12.06  (syncer)
    0    30     1  29  18  0   208    0 pause  Is    ??    0:00.00 adjkerntz -i
    0    95     1   0   2  0   888  396 select Ss    ??    0:07.50 syslogd -s
    1   102     1   0   2  0   904  424 select Is    ??    0:13.25 /usr/sbin/portmap
    0   104     1   0   2  0   944  400 select Is    ??    0:09.97 ypbind -s -Sadmin,idun10,bragi10
    0   111     1   0  10  0   208    0 nfsidl I     ??    0:05.71 nfsiod -n 4
    0   112     1   0  10  0   208    0 nfsidl I     ??    0:00.24 nfsiod -n 4
    0   113     1   0  10  0   208    0 nfsidl I     ??    0:00.00 nfsiod -n 4
    0   114     1   0  10  0   208    0 nfsidl I     ??    0:00.00 nfsiod -n 4
    0   133     1   0   2  0  1024  412 select Is    ??    0:00.01 inetd -wW
    0   135     1   0  10  0   940  436 nanslp Is    ??    0:08.99 cron
    0   138     1   0   2  0  1828  752 select Is    ??    0:05.86 /usr/sbin/sshd
   80  2105     1   0  -4  0  1900  840 nfsvin D     ??    0:00.16 /usr/local/apache/bin/httpd
   80  2119     1   0  -4  0  1900  840 nfsvin D     ??    0:00.14 /usr/local/apache/bin/httpd
   80  2338     1   0  -4  0  1900  840 nfsvin D     ??    0:00.18 /usr/local/apache/bin/httpd
   80  2343     1   0 -18  0  1920  840 vmopar DL    ??    0:00.27 /usr/local/apache/bin/httpd
   80  2344     1   0  -4  0  1900  840 nfsvin D     ??    0:00.21 /usr/local/apache/bin/httpd
   80  2357     1   0  -4  0  1900  840 nfsvin D     ??    0:00.15 /usr/local/apache/bin/httpd
   80  2604     1   0  -4  0  1900  840 nfsvin D     ??    0:00.48 /usr/local/apache/bin/httpd
   80  2605     1   0 -18  0  1900  840 vmopar D     ??    0:00.40 /usr/local/apache/bin/httpd
   80  2608     1   0  -4  0  1900  840 nfsvin D     ??    0:00.38 /usr/local/apache/bin/httpd
   80  2610     1   0  -4  0  1900  840 nfsvin D     ??    0:00.43 /usr/local/apache/bin/httpd
   80  2659     1   0  -4  0  1900  840 nfsvin D     ??    0:00.39 /usr/local/apache/bin/httpd
   80  2684     1   0  -4  0  1900  840 nfsvin D     ??    0:00.68 /usr/local/apache/bin/httpd
   80  2697     1   0 -18  0  1900  840 vmopar D     ??    0:00.12 /usr/local/apache/bin/httpd
   80  2729     1   0  -4  0  1900  840 nfsvin D     ??    0:00.08 /usr/local/apache/bin/httpd
   80  2774     1   0 -18  0  1900  840 vmopar D     ??    0:00.16 /usr/local/apache/bin/httpd
   80  2798     1   0  -4  0  1900  840 nfsvin D     ??    0:00.26 /usr/local/apache/bin/httpd
   80  2803     1   0  -4  0  1900  840 nfsvin D     ??    0:00.30 /usr/local/apache/bin/httpd

[snip couple 'o hundred of the same types of httpd processes, ps -axl | wc -l gave 1027]

   80 99475     1   0  -4  0  1900  840 nfsvin D     ??    0:00.41 /usr/local/apache/bin/httpd
   80 99502     1   0  -4  0  1900  840 nfsvin D     ??    0:00.24 /usr/local/apache/bin/httpd
   80 99503     1   0  -4  0  1900  840 nfsvin D     ??    0:00.16 /usr/local/apache/bin/httpd
   80 99504     1   0  -4  0  1900  840 nfsvin D     ??    0:00.07 /usr/local/apache/bin/httpd
   80 99527     1   0  -4  0  1900  840 nfsvin D     ??    0:00.54 /usr/local/apache/bin/httpd
   80 99538     1   0  -4  0  1900  840 nfsvin D     ??    0:00.18 /usr/local/apache/bin/httpd
   80 99589     1   0  -4  0  1900  840 nfsvin D     ??    0:00.19 /usr/local/apache/bin/httpd
   80 99590     1   0  -4  0  1900  840 nfsvin D     ??    0:00.14 /usr/local/apache/bin/httpd
   80 99611     1   0  -4  0  1900  840 nfsvin D     ??    0:00.07 /usr/local/apache/bin/httpd
   80 99622     1   0  -4  0  1900  840 nfsvin D     ??    0:00.37 /usr/local/apache/bin/httpd
    0  7244  7243   0  18  0   476  348 pause  Is    p0    0:00.29 -csh (csh)
    0  7250  7244   0  10  0  1628 1276 wait   S     p0    0:00.03 bash
    0  7253  7250   1  28  0  1540 1404 -      R+    p0    0:00.01 ps -axl
    0  7254  7250   1  28  0   308  128 -      R+    p0    0:00.00 more
    0 29318     1   0   2  0  2280  512 select I     p0-   0:00.81 /usr/local/sbin/snmpd
    0 58721     1   0   3  0   908  276 ttyin  Is+   v0    0:00.01 /usr/libexec/getty Pc ttyv0
    0   183     1   0   3  0   904  272 ttyin  Is+   v1    0:00.01 /usr/libexec/getty Pc ttyv1
    0   184     1   0   3  0   904  272 ttyin  Is+   v2    0:00.01 /usr/libexec/getty Pc ttyv2
    0   185     1   0   3  0   904  272 ttyin  Is+   v3    0:00.01 /usr/libexec/getty Pc ttyv3
    0   186     1   0   3  0   904  272 ttyin  Is+   v4    0:00.01 /usr/libexec/getty Pc ttyv4
    0   187     1   0   3  0   904  272 ttyin  Is+   v5    0:00.01 /usr/libexec/getty Pc ttyv5
    0   188     1   0   3  0   904  272 ttyin  Is+   v6    0:00.01 /usr/libexec/getty Pc ttyv6
    0   189     1   0   3  0   904  272 ttyin  Is+   v7    0:00.01 /usr/libexec/getty Pc ttyv7


>For those unkillable processes,
>you might want to report the backtrace for each, here's how to get them,
>	# gdb -k /kernel /dev/mem
>	(kgdb) proc <pid>
>	(kgdb) bt

Basically each and every httpd process is unkillable.  We get our data
from a NetApp Filer F720.

root@fenrir: /root$ gdb -k /kernel.debug /dev/mem
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-unknown-freebsd"...
IdlePTD 2990080
initial pcb at 700f000
panic messages:
---
---
#0  mi_switch () at ../../kern/kern_synch.c:858
858             if (switchtime.tv_sec == 0)
(kgdb) proc 2105
(kgdb) bt
#0  mi_switch () at ../../kern/kern_synch.c:858
858             if (switchtime.tv_sec == 0)
#1  0xc0149539 in tsleep (ident=0xd441d20a, priority=18, wmesg=0xc02387fb "nfsvinval", timo=0) at ../../kern/kern_synch.c:467
467             mi_switch();
#2  0xc019a14b in nfs_vinvalbuf (vp=0xd4c0c200, flags=1, cred=0xc2e42180, p=0xd5ccc920, intrflg=1) at ../../nfs/nfs_bio.c:1170
1170                    error = tsleep((caddr_t)&np->n_flag, PRIBIO + 2, "nfsvinval",
#3  0xc01bcfd6 in nfs_open ()
#4  0xc017682b in vn_open (ndp=0xd5cddecc, fmode=1, cmode=420) at vnode_if.h:189
189             rc = VCALL(vp, VOFFSET(vop_open), &a);
#5  0xc017281d in open (p=0xd5ccc920, uap=0xd5cddf80) at ../../kern/vfs_syscalls.c:994
994             error = vn_open(&nd, flags, cmode);
#6  0xc0216459 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 4, tf_esi = 672500536, tf_ebp = -1077937632, 
      tf_isp = -707928108, tf_ebx = 672435076, tf_edx = 672500536, tf_ecx = 15, tf_eax = 5, tf_trapno = 22, tf_err = 2, 
      tf_eip = 672351492, tf_cs = 31, tf_eflags = 659, tf_esp = -1077937676, tf_ss = 47}) at ../../i386/i386/trap.c:1126
1126            error = (*callp->sy_call)(p, args);
#7  0xc020b0a6 in Xint0x80_syscall ()
#8  0x8050c14 in ?? ()

Another nfsvin:

(kgdb) proc 99538
(kgdb) bt
#0  mi_switch () at ../../kern/kern_synch.c:858
858             if (switchtime.tv_sec == 0)
#1  0xc0149539 in tsleep (ident=0xd441d20a, priority=18, wmesg=0xc02387fb "nfsvinval", timo=0) at ../../kern/kern_synch.c:467
467             mi_switch();
#2  0xc019a14b in nfs_vinvalbuf (vp=0xd4c0c200, flags=1, cred=0xc2747180, p=0xd5ccbc20, intrflg=1) at ../../nfs/nfs_bio.c:1170
1170                    error = tsleep((caddr_t)&np->n_flag, PRIBIO + 2, "nfsvinval",
#3  0xc01bcfd6 in nfs_open ()
#4  0xc017682b in vn_open (ndp=0xd5cfcecc, fmode=1, cmode=420) at vnode_if.h:189
189             rc = VCALL(vp, VOFFSET(vop_open), &a);
#5  0xc017281d in open (p=0xd5ccbc20, uap=0xd5cfcf80) at ../../kern/vfs_syscalls.c:994
994             error = vn_open(&nd, flags, cmode);
#6  0xc0216459 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 4, tf_esi = 672500536, tf_ebp = -1077937632, 
      tf_isp = -707801132, tf_ebx = 672435076, tf_edx = 672500536, tf_ecx = 15, tf_eax = 5, tf_trapno = 7, tf_err = 2, 
      tf_eip = 672351492, tf_cs = 31, tf_eflags = 659, tf_esp = -1077937676, tf_ss = 47}) at ../../i386/i386/trap.c:1126
1126            error = (*callp->sy_call)(p, args);
#7  0xc020b0a6 in Xint0x80_syscall ()
#8  0x8050c14 in ?? ()

vmopar stuck one:

(kgdb) proc 2343
(kgdb) bt
#0  mi_switch () at ../../kern/kern_synch.c:858
858             if (switchtime.tv_sec == 0)
#1  0xc0149539 in tsleep (ident=0xc05ceec0, priority=4, wmesg=0xc0240471 "vmopar", timo=0) at ../../kern/kern_synch.c:467
467             mi_switch();
#2  0xc01ef947 in vm_object_page_remove ()
#3  0xc01f39d1 in vnode_pager_setsize ()
#4  0xc01b7c93 in nfs_loadattrcache ()
#5  0xc01c0b83 in nfs_readrpc ()
#6  0xc01987f5 in nfs_getpages (ap=0xd5de7c8c) at ../../nfs/nfs_bio.c:153
#7  0xc01f3f7e in vnode_pager_getpages ()
#8  0xc01e8e4e in vm_fault ()
#9  0xc0215e1a in trap_pfault (frame=0xd5de7dd0, usermode=0, eva=672915456) at ../../i386/i386/trap.c:800
800                     rv = vm_fault(map, va, ftype,
#10 0xc0215a83 in trap (frame={tf_fs = -1072496624, tf_es = -1062797296, tf_ds = 16, tf_edi = -1062680520, tf_esi = 672915455, 
      tf_ebp = -706838980, tf_isp = -706839044, tf_ebx = 2048, tf_edx = 672917447, tf_ecx = 498, tf_eax = -706846720, 
      tf_trapno = 12, tf_err = 0, tf_eip = -1071558811, tf_cs = 8, tf_eflags = 66054, tf_esp = -706838772, tf_ss = -706838828})
    at ../../i386/i386/trap.c:426
426                             (void) trap_pfault(&frame, FALSE, eva);
#11 0xc0214f65 in generic_copyin ()
#12 0xc016193c in sosend (so=0xd1678480, addr=0x0, uio=0xd5de7f0c, top=0x0, control=0x0, flags=0, p=0xd5da25a0)
    at ../../kern/uipc_socket.c:567
567                             error = uiomove(mtod(m, caddr_t), (int)len, uio);
#13 0xc0156c28 in soo_write (fp=0xc27c3680, uio=0xd5de7f0c, cred=0xc2e48b80, flags=0, p=0xd5da25a0)
    at ../../kern/sys_socket.c:78
78              return so->so_proto->pr_usrreqs->pru_sosend(so, 0, uio, 0, 0, 0,
#14 0xc0153b03 in writev (p=0xd5da25a0, uap=0xd5de7f80) at ../../sys/file.h:159
159             error = (*fp->f_ops->fo_write)(fp, uio, cred, flags, p);
#15 0xc0216459 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1077936692, tf_esi = -1077936700, 
      tf_ebp = -1077937776, tf_isp = -706838572, tf_ebx = 289, tf_edx = 3, tf_ecx = 18407, tf_eax = 121, tf_trapno = 7, 
      tf_err = 2, tf_eip = 672156464, tf_cs = 31, tf_eflags = 642, tf_esp = -1077937820, tf_ss = 47})
    at ../../i386/i386/trap.c:1126
1126            error = (*callp->sy_call)(p, args);
#16 0xc020b0a6 in Xint0x80_syscall ()
#17 0x80529a7 in ?? ()

And another one:

(kgdb) proc 2774
(kgdb) bt
#0  mi_switch () at ../../kern/kern_synch.c:858
858             if (switchtime.tv_sec == 0)
#1  0xc0149539 in tsleep (ident=0xc05ceec0, priority=4, wmesg=0xc0240471 "vmopar", timo=0) at ../../kern/kern_synch.c:467
467             mi_switch();
#2  0xc01ef947 in vm_object_page_remove ()
#3  0xc01f39d1 in vnode_pager_setsize ()
#4  0xc01b7c93 in nfs_loadattrcache ()
#5  0xc01bcb37 in nfs3_access_otw ()
#6  0xc01bd113 in nfs_getattr ()
#7  0xc01bf11c in nfs_lookup ()
#8  0xc016e518 in lookup ()
#9  0xc016e014 in namei ()
#10 0xc01738ed in stat (p=0xd5da0ba0, uap=0xd5e1ff80) at ../../kern/vfs_syscalls.c:1698
1698            if ((error = namei(&nd)) != 0)
#11 0xc0216459 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1077936692, tf_esi = -1077936700, 
      tf_ebp = -1077937456, tf_isp = -706609196, tf_ebx = 1, tf_edx = 135129834, tf_ecx = 39, tf_eax = 188, tf_trapno = 7, 
      tf_err = 2, tf_eip = 672155472, tf_cs = 31, tf_eflags = 518, tf_esp = -1077937516, tf_ss = 47})
    at ../../i386/i386/trap.c:1126
1126            error = (*callp->sy_call)(p, args);
#12 0xc020b0a6 in Xint0x80_syscall ()
#13 0x806622a in ?? ()

Some last logging from /var/log/messages:

Jun 17 17:09:21 fenrir /kernel: nfs_getpages: error 70
Jun 17 17:09:21 fenrir /kernel: vm_fault: pager read error, pid 3896 (httpd)
Jun 17 17:11:25 fenrir /kernel: nfs_getpages: error 70
Jun 17 17:11:25 fenrir /kernel: vm_fault: pager read error, pid 3880 (httpd)
Jun 18 00:05:02 fenrir /kernel: nfs_getpages: error 70
Jun 18 00:05:02 fenrir /kernel: vm_fault: pager read error, pid 5178 (httpd)
Jun 18 00:14:40 fenrir /kernel: nfs_getpages: error 70
Jun 18 00:14:40 fenrir /kernel: vm_fault: pager read error, pid 5145 (httpd)
Jun 18 00:16:13 fenrir /kernel: nfs_getpages: error 70
Jun 18 00:16:13 fenrir /kernel: vm_fault: pager read error, pid 5302 (httpd)
Jun 18 00:21:51 fenrir /kernel: nfs_getpages: error 70
Jun 18 00:21:51 fenrir /kernel: vm_fault: pager read error, pid 5268 (httpd)

/etc/fstab:

netapp://basic/users   /home/users             nfs     ro,nodev,noexec,nosuid  0 0

Every NFS mount gets mounted like this on the webservers.

top reports:

last pid:  7353;  load averages:  0.14,  0.04,  0.01   up 15+23:59:31  13:16:15
1020 processes:1 running, 1019 sleeping
CPU states:  0.0% user,  0.0% nice,  1.2% system,  0.0% interrupt, 98.8% idle
Mem: 84M Active, 16M Inact, 88M Wired, 14M Cache, 35M Buf, 47M Free
Swap: 1024M Total, 174M Used, 850M Free, 16% Inuse

*sigh*

I just wanted to drop into DDB only to realise it wasn't configured on
this particular box.  [mental note: go past all boxes and fix this]

This is not my week. =(

-- 
Jeroen Ruigrok van der Werven          Network- and systemadministrator
<jruigrok@via-net-works.nl>            VIA Net.Works The Netherlands
BSD: Technical excellence at its best  http://www.via-net-works.nl
I succeed him; no one could replace him...


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000618132353.B95177>