From owner-freebsd-net@freebsd.org Wed Oct 7 09:46:50 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 94F599B9940 for ; Wed, 7 Oct 2015 09:46:50 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 1DA6012A2 for ; Wed, 7 Oct 2015 09:46:49 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (localhost [127.0.0.1]) by mail.pingpong.net (Postfix) with ESMTP id 2FFD5E864 for ; Wed, 7 Oct 2015 11:39:43 +0200 (CEST) X-Virus-Scanned: by amavisd-new at pingpong.net Received: from mail.pingpong.net ([127.0.0.1]) by mail.pingpong.net (mail.pingpong.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id c_o3ZKVdGZ94 for ; Wed, 7 Oct 2015 11:39:43 +0200 (CEST) Received: from [10.0.0.143] (citron2.pingpong.net [195.178.173.68]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 0753BE7F2 for ; Wed, 7 Oct 2015 11:39:42 +0200 (CEST) From: Palle Girgensohn Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Process hung in STOPPED_SINGLE, wchan vodead, and cannot be killed or continued Message-Id: <60F10B6B-0B90-4728-B405-4B916CDF7FD6@FreeBSD.org> Date: Wed, 7 Oct 2015 11:39:42 +0200 To: freebsd-net@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 9.0 \(3094\)) X-Mailer: Apple Mail (2.3094) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Oct 2015 09:46:50 -0000 Hi, I see a process that is hung in a jail, and cannot be killed or = continued: # ps HO wchan,nwchan,ppid -p 92266 PID WCHAN NWCHAN PPID TT STAT TIME COMMAND 92266 - - 1 - TJ 0:00,73 /usr/local/bin/jsvc = -home /usr/local/openjdk8 -server 92266 vodead fffff811a5e6b400 1 - TJ 0:00,48 /usr/local/bin/jsvc = -home /usr/local/openjdk8 -server # top ... PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU = COMMAND 92266 nobody 2 20 0 4470M 418M STOP 2 0:20 0.00% = jsvc # ps axu USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND nobody 92266 0,0 0,4 4577204 427756 - TJ 11:02pm 0:20,08 = /usr/local/bin/jsvc -home /usr/local/openjdk8 ... # sockstat USER COMMAND PID FD PROTO LOCAL ADDRESS FOREIGN = ADDRESS =20 nobody jsvc 92266 15 stream (not connected) nobody jsvc 92266 16 tcp4 127.0.0.1:8078 *:* ? ? ? ? tcp4 127.0.0.1:8078 = 127.0.0.1:22789 ... # sockstat | grep '^?' |wc -l 151 # netstat -an | less netstat: kvm not available: /dev/mem: No such file or directory Active Internet connections (including servers) Proto Recv-Q Send-Q Local Address Foreign Address = (state) tcp4 374 0 127.0.0.1.8078 127.0.0.1.32866 CLOSED ... # procstat -t 92266 PID TID COMM TDNAME CPU PRI STATE WCHAN =20= 92266 105754 jsvc - 20 120 stop - = =20 92266 106982 jsvc - 2 120 stop vodead = =20 # procstat -k 92266 PID TID COMM TDNAME KSTACK = =20 92266 105754 jsvc - mi_switch = thread_suspend_switch thread_single exit1 sys_sys_exit amd64_syscall = Xfast_syscall=20 92266 106982 jsvc - mi_switch sleepq_switch = sleepq_wait _sleep vnode_create_vobject zfs_freebsd_open VOP_OPEN_APV = vn_open_vnode vn_open_cred kern_openat amd64_syscall Xfast_syscall=20 8078 is the java port that it used to listen to... all look like this ? ? ? ? tcp4 127.0.0.1:8078 = 127.0.0.1:53583 # gdb -p 92266 /usr/local/bin/jsvc=20 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you = are welcome to change it and/or distribute copies of it under certain = conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for = details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging = symbols found)... Attaching to program: /usr/local/bin/jsvc, process 92266 [ just hangs ]... ^Z [1]+ Stopped gdb -p 92266 /usr/local/bin/jsvc [root@tranbar /]#=20 [root@tranbar /]#=20 [root@tranbar /]# kill %1 [root@tranbar /]#=20 [1]+ Terminated gdb -p 92266 /usr/local/bin/jsvc [root@tranbar /]#=20 The culprit to begin with could be this: Oct 7 07:54:00 host kernel: sonewconn: pcb 0xfffff80b49171310: Listen = queue overflow: 151 already in queue awaiting acceptance (6 occurrences) Occurred all through the night, saturating a service, *very likely* the = one now showing problems, but i was never there to check. 151 lost = network sockets (see sockstat above) connects the dots. It seems the service entered STOP when we tried to stop it. jsvc is = similar to daemontools, and I remeber seeing a references to a parent = process 92265, but I might be imaginating, since the ppid =3D 1. Trying to shut down the jail we got hanging shutdown processes: from host:/var/log/console.jailname: ... Stopping tomcat. Waiting for PIDS: 9226690 second watchdog timeout expired. Shutdown = terminated. Ons 7 Okt 2015 08:27:19 CEST ... # freebsd-version -ku 10.2-RELEASE-p3 10.2-RELEASE-p3 So basically, is there a way to get rid of this process without = rebooting? Palle