From owner-freebsd-stable@FreeBSD.ORG Fri Jul 4 04:26:37 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 73E55D85 for ; Fri, 4 Jul 2014 04:26:37 +0000 (UTC) Received: from hub.org (hub.org [200.46.208.146]) by mx1.freebsd.org (Postfix) with ESMTP id 1DE142AC2 for ; Fri, 4 Jul 2014 04:26:37 +0000 (UTC) Received: from maia.hub.org (unknown [200.46.151.188]) by hub.org (Postfix) with ESMTP id 25A5DDA9801 for ; Fri, 4 Jul 2014 01:26:36 -0300 (ADT) Received: from hub.org ([200.46.208.146]) by maia.hub.org (mx1.hub.org [200.46.151.188]) (amavisd-maia, port 10024) with ESMTP id 97884-04 for ; Fri, 4 Jul 2014 04:26:35 +0000 (UTC) Received: from [192.168.1.2] (S01067cb21b2ff4ca.gv.shawcable.net [24.108.26.71]) by hub.org (Postfix) with ESMTPA id 8134EDA97FF for ; Fri, 4 Jul 2014 01:26:35 -0300 (ADT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: FreeBSD 10.x + LiquidSoap + NFS == Server Hang From: Marc Fournier In-Reply-To: Date: Thu, 3 Jul 2014 21:26:34 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: freebsd-stable@freebsd.org X-Mailer: Apple Mail (2.1878.6) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Jul 2014 04:26:37 -0000 Oh, on the remote console, last two lines I see are: =3D=3D nfs_getpages: error 4 vm_fault: pager read error, pid 2957 (liquid soap) =3D=3D if that helps any ...=20 On Jul 3, 2014, at 9:23 PM, Marc Fournier wrote: >=20 > Hi all =85 >=20 > I have a jail running on FreeBSD 10-STABLE (svn update as of = July 2nd @ ~05:30 UTC: >=20 > =3D=3D > Working Copy Root Path: /usr/src > URL: https://svn0.us-east.freebsd.org/base/stable/10 > Relative URL: ^/stable/10 > Repository Root: https://svn0.us-east.freebsd.org/base > Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f > Revision: 268135 > Node Kind: directory > Schedule: normal > Last Changed Author: pfg > Last Changed Rev: 268132 > Last Changed Date: 2014-07-02 01:28:38 +0000 (Wed, 02 Jul 2014) > =3D=3D >=20 > Currently it has 3 jail=92d environments running off it, with = the files for them NFS mounted from a NetApp filer =85 and right now, = the NFS mount that these jails are running from is =93locked=94 =85 a = =91df=92 hangs =85 trying to do a =91jexec # /bin/tcsh=92 into one of = the jail=92s hangs =85 etc. >=20 > The same NFS file system is mounted and running on a half dozen = other servers, and they are all operating just fine, so the NetApp is = operating properly. >=20 > If I move the jail with liquidsoap running around to a different = server, the hang will follow to the new server, and the old server will = once more become rock solid =85=20 >=20 > I=92m not 100% certain it is liquidsoap, but the hang appears to = always coincide with reloading a new playlist =85 and although it = happens frequently (more with recent upgrades), it doesn=92t happen = *every* night =85 >=20 > This is on a remote server =85 so doing things at the console = isn=92t possible, and although I=92ve got a remote console on this, I=92ve= never figured out how to break to the debugger through it, although I=92m= going to work on it to see if I can=92t get it to work =85 >=20 > Baring breaking to the debugger (is there a way, from the = command line, to force it to break to the debugger?), is there anything = else I can use to provide some sort of useful information? >=20 > ps aux for the proces shows: >=20 > # ps aux | grep liq > 1002 2957 0.0 0.7 226888 112792 - TLJ 4:45AM 370:27.23 = /usr/local/bin/liquidsoap -q -d /usr/local/etc/liquidsoap/liquidsoap.liq >=20 > and: >=20 > # ps auxxwl | grep 2957 > 1002 2957 0.0 0.7 226888 112792 - TLJ 4:45AM 370:27.23 = /usr/local/bin/l 1002 1 0 20 0 - > 1002 96280 0.0 0.0 12316 0 - IWJ - 0:00.00 = pwait 2957 1002 96274 0 52 0 kqread > root 96508 0.0 0.0 18788 1828 4 S+ 4:19AM 0:00.00 = grep 2957 0 96505 0 20 0 piperd >=20 > Other commands I can / should run next time it happens =85 ? = Which won=92t take long ... >=20 > Thanks =85 >=20 >=20