From owner-freebsd-fs@FreeBSD.ORG Tue Dec 7 15:59:36 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 19A00106564A for ; Tue, 7 Dec 2010 15:59:36 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id 7A11B8FC16 for ; Tue, 7 Dec 2010 15:59:35 +0000 (UTC) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id oB7FxJee048701; Tue, 7 Dec 2010 16:59:34 +0100 (CET) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id oB7FxJqf048700; Tue, 7 Dec 2010 16:59:19 +0100 (CET) (envelope-from olli) Date: Tue, 7 Dec 2010 16:59:19 +0100 (CET) Message-Id: <201012071559.oB7FxJqf048700@lurza.secnetix.de> From: Oliver Fromme To: freebsd-fs@FreeBSD.ORG In-Reply-To: <201011171705.oAHH5age003849@lurza.secnetix.de> X-Newsgroups: list.freebsd-fs User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.4-PRERELEASE-20080904 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5 (lurza.secnetix.de [127.0.0.1]); Tue, 07 Dec 2010 16:59:34 +0100 (CET) Cc: Subject: Re: NFS hangs (7.3) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Dec 2010 15:59:36 -0000 Oliver Fromme wrote: > I've got a problem on a server farm. Every now and then, > some NFS mounts hang. This happens after a few days or > after a few weeks. All processes trying to access files > from the hanging mount go to state "D" and freeze. The > only way to resolve the problem is to reboot the server. > [...] > The machine is quite busy. The hangs seem to always occur > in the night when lots of cron jobs are running. The machine > has 221 NFS mounts and 26 nullfs mounts, and it has 26 jails, > if that matters. All NFS shares are mounted from a virtual > filer running on a NetApp filer. The mounts use the default > settings, so they should be v3 TCP (this is the default, > right?). The only extra option we use is -L in order to > "fake" locking locally. Shortly after I posted the above, I found out that stable/7 does *not* use TCP by default. tcpdump showed that NFS was using UDP. So I changed the mamangement scripts to force TCP when mounting NFS shares. So far, there were no further hangs. It's still possible that it might occur in the future (sometimes it took a few weeks to produce a hang), but it seems as if the problem is really fixed now. In case it happens again, I will follow Rick's and Kostik's advice (procstat -ka, ps axHl etc.). Thanks! Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "... there are two ways of constructing a software design: One way is to make it so simple that there are _obviously_ no deficiencies and the other way is to make it so complicated that there are no _obvious_ deficiencies." -- C.A.R. Hoare, ACM Turing Award Lecture, 1980