From owner-freebsd-bugs@FreeBSD.ORG Fri Aug 26 06:10:09 2011 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 437931065674 for ; Fri, 26 Aug 2011 06:10:09 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 203518FC16 for ; Fri, 26 Aug 2011 06:10:09 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p7Q6A80w018426 for ; Fri, 26 Aug 2011 06:10:08 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p7Q6A8LA018425; Fri, 26 Aug 2011 06:10:08 GMT (envelope-from gnats) Resent-Date: Fri, 26 Aug 2011 06:10:08 GMT Resent-Message-Id: <201108260610.p7Q6A8LA018425@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Artem Belevich Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B6B11065672 for ; Fri, 26 Aug 2011 06:05:02 +0000 (UTC) (envelope-from art@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 6D4928FC15 for ; Fri, 26 Aug 2011 06:05:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p7Q652de018238 for ; Fri, 26 Aug 2011 06:05:02 GMT (envelope-from art@freefall.freebsd.org) Received: (from art@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p7Q652FD018236; Fri, 26 Aug 2011 06:05:02 GMT (envelope-from art) Message-Id: <201108260605.p7Q652FD018236@freefall.freebsd.org> Date: Fri, 26 Aug 2011 06:05:02 GMT From: Artem Belevich To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Cc: Subject: kern/160198: amd + NFS reconnect = ICMP storm + unkillable process + hung amd mount. X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Artem Belevich List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Aug 2011 06:10:09 -0000 >Number: 160198 >Category: kern >Synopsis: amd + NFS reconnect = ICMP storm + unkillable process + hung amd mount. >Confidential: no >Severity: serious >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Aug 26 06:10:08 UTC 2011 >Closed-Date: >Last-Modified: >Originator: Artem Belevich >Release: FreeBSD 8.2-STABLE i386 >Organization: FreeBSD >Environment: FreeBSD stable/8, head >Description: When a process is interrupted during NFS reconnect which uses UDP, the process gets stuck in an unkillable state. In my particular case NFS connection is to the amd process on the localhost. Continuous reconnects result in a self-inflicted DoS attack on the amd which renders it unresponsive which hangs all other processes that access amd-mounted filesystems. As a side effect we also generate rather high rate of ICMP port unreachable replies. All in all the system ends up being virtually unavailable and in many cases it requires reboot to get it out of this state. The stuck process always has clnt_reconnect_call() in its backtrace: 18779 100511 collect2 - mi_switch+0x176 turnstile_wait+0x1cb _mtx_lock_sleep+0xe1 sleepq_catch_signals+0x386 sleepq_timedwait_sig+0x19 _sleep+0x1b1 clnt_dg_call+0x7e6 clnt_reconnect_call+0x12e nfs_request+0x212 nfs_getattr+0x2e4 VOP_GETATTR_APV+0x44 nfs_bioread+0x42a VOP_READLINK_APV+0x4a namei+0x4f9 kern_statat_vnhook+0x92 kern_statat+0x15 freebsd32_stat+0x2e syscallenter+0x23d >How-To-Repeat: In my case the problem most frequently occurs when a parallel build that touches amd-mounted filesystem is interrupted. >Fix: clnt_dg_call() uses msleep() which may return ERESTART when current process is interrupted. In that happens we return to clnt_reconnect_call with RPC_CANTRECV. clnt_reconnect_call() handles RPC_CANTRECV by trying to reconnect again and the story repeats. Because current code never returns to the userland, it never quits and gets stuck, in most cases, forever. The fix is to convert ERESTART to RPC_INTR which is what's done in other places where it's handled in RPC code. >Release-Note: >Audit-Trail: >Unformatted: