From owner-freebsd-jail@freebsd.org Mon May 16 12:55:05 2016 Return-Path: Delivered-To: freebsd-jail@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BDAB4B3C9F7 for ; Mon, 16 May 2016 12:55:05 +0000 (UTC) (envelope-from list1@gjunka.com) Received: from msa1.earth.yoonka.com (yoonka.com [185.24.122.233]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "msa1.earth.yoonka.com", Issuer "msa1.earth.yoonka.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6072D17F5 for ; Mon, 16 May 2016 12:55:04 +0000 (UTC) (envelope-from list1@gjunka.com) Received: from crayon2.yoonka.com (crayon2.yoonka.com [192.168.1.20]) (authenticated bits=0) by msa1.earth.yoonka.com (8.15.2/8.15.2) with ESMTPSA id u4GCt2Ti004209 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Mon, 16 May 2016 12:55:02 GMT (envelope-from list1@gjunka.com) To: freebsd-jail@freebsd.org From: Grzegorz Junka Subject: Unresponsive jails issues Message-ID: <6beab349-73bb-7159-cd81-443e115b687a@gjunka.com> Date: Mon, 16 May 2016 12:55:02 +0000 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-jail@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "Discussion about FreeBSD jail\(8\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2016 12:55:05 -0000 I have a server running 13 jails for various system services. Recently I added two jails to run simple go applications for testing. They open a network socket and nginx, which is in another jail, and which round robin balances requests to them. I mention that because it may be related, however not necessarily because it was happening earlier. The problem is that every 2-3 days jails in my servers stop responding. "jexec jailname tcsh" hangs forever, "service jail stop jailname" hangs forever as well. "top" doesn't show anything suspicious. I can login through SSH to the main server fine. I don't login to jails through SSH so I can't check but it seems that when that happens they stop responding because the services that are running in them stop too (e.g. web server, imap, ...). I tried to "kill -9" the "jexec" process that hangs but that doesn't work. My first question is what evidence should I gather when that happens so that I can investigate the issue later on after the server is restarted? And the second question, any idea why that might be happening in the first place? I am running FreeBSD 10.3 AMD64 updated from 10.2 a couple of weeks ago. Grzegorz