From owner-freebsd-bugs@FreeBSD.ORG Thu Sep 5 09:50:00 2013 Return-Path: Delivered-To: freebsd-bugs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id CF8004EC for ; Thu, 5 Sep 2013 09:50:00 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id AE1F72FA0 for ; Thu, 5 Sep 2013 09:50:00 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r859o0qp036771 for ; Thu, 5 Sep 2013 09:50:00 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r859o0lw036770; Thu, 5 Sep 2013 09:50:00 GMT (envelope-from gnats) Resent-Date: Thu, 5 Sep 2013 09:50:00 GMT Resent-Message-Id: <201309050950.r859o0lw036770@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Julien Charbon Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id ED0E44E4 for ; Thu, 5 Sep 2013 09:49:25 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from oldred.freebsd.org (oldred.freebsd.org [8.8.178.121]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D9F952F99 for ; Thu, 5 Sep 2013 09:49:25 +0000 (UTC) Received: from oldred.freebsd.org ([127.0.1.6]) by oldred.freebsd.org (8.14.5/8.14.7) with ESMTP id r859nPVp026786 for ; Thu, 5 Sep 2013 09:49:25 GMT (envelope-from nobody@oldred.freebsd.org) Received: (from nobody@localhost) by oldred.freebsd.org (8.14.5/8.14.5/Submit) id r859nPSo026783; Thu, 5 Sep 2013 09:49:25 GMT (envelope-from nobody) Message-Id: <201309050949.r859nPSo026783@oldred.freebsd.org> Date: Thu, 5 Sep 2013 09:49:25 GMT From: Julien Charbon To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Subject: misc/181834: amd mounting NFS directories can drive a dead-lock X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Sep 2013 09:50:00 -0000 >Number: 181834 >Category: misc >Synopsis: amd mounting NFS directories can drive a dead-lock >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Sep 05 09:50:00 UTC 2013 >Closed-Date: >Last-Modified: >Originator: Julien Charbon >Release: FreeBSD 8.4 RELEASE >Organization: Verisign >Environment: FreeBSD atlas4 8.4-RELEASE-p3 FreeBSD 8.4-RELEASE-p3 #0 r+efa3d77-dirty: Wed Sep 4 20:41:53 UTC 2013 root@atlas4:/app/jcharbon/git/freebsd-vrsn/sys/GENERIC amd64 >Description: Short Summary: On FreeBSD 8.4, the amd auto-mounter daemon can drive a machine dead-lock when mounting NFS directories. Long Summary: If amd daemon starts, the machine appears dead-locked right after: - ssh connection are stalled - virtual consoles are stalled - serial console is stalled However machine still replies to ping, thus the kernel is still alive. Launching DDB kernel debugger via hardware NMI button during dead-lock gave us this status: syslogd and devd are waiting for the Giant kernel lock: db> show allchains chain 1: thread 100171 (pid 885, syslogd) blocked on lock 0xffffffff80e2e100 (sleep mutex) "Giant" thread 100148 (pid 1120, amd) running on CPU 9 chain 2: thread 100205 (pid 742, devd) blocked on lock 0xffffffff80e2e100 (sleep mutex) "Giant" thread 100148 (pid 1120, amd) running on CPU 9 which is owned by amd deamon db> show lock Giant class: sleep mutex name: Giant flags: {DEF, RECURSE} state: {OWNED, CONTESTED, RECURSED} owner: 0xffffff003ef0f8e0 (tid 100148, pid 1120, "amd") recursed: 1 An other backstrace with the witness kernel (kernel-witness): by the way this amd thread also owns other kernel mutexes: db> show alllocks Process 1120 (amd) thread 0xffffff003ef0f8e0 (100148) exclusive rw udpinp (udpinp) r = 0 (0xffffff007b39fa60) locked @ /app/jcharbon/git/freebsd-vrsn/sys/netinet/in_pcb.c:237 exclusive rw udp (udp) r = 0 (0xffffffff80ff4d28) locked @ /app/jcharbon/git/freebsd-vrsn/sys/netinet/udp_usrreq.c:1464 exclusive lockmgr nfs (nfs) r = 0 (0xffffff007b1bd7e8) locked @ /app/jcharbon/git/freebsd-vrsn/sys/nfsclient/nfs_node.c:166 exclusive sleep mutex Giant (Giant) r = 1 (0xffffffff80e2e100) locked @ /app/jcharbon/git/freebsd-vrsn/sys/kern/vfs_mount.c:730 Next we launch DDB directly from kernel NFS code which gave us as backstrace: Tracing pid 1142 tid 100301 td 0xffffff00379098e0 kvprintf() at kvprintf+0x17a nfs_msg() at nfs_msg+0x52 nfs_feedback() at nfs_feedback+0x105 clnt_reconnect_call() at clnt_reconnect_call+0x19b nfs_request() at nfs_request+0x1e5 nfs_getattr() at nfs_getattr+0x2bc mountnfs() at mountnfs+0x330 nfs_mount() at nfs_mount+0xe3f vfs_donmount() at vfs_donmount+0xcde kernel_mount() at kernel_mount+0xa1 nfs_cmount() at nfs_cmount+0x5a mount() at mount+0x1ea amd64_syscall() at amd64_syscall+0xf9 Xfast_syscall() at Xfast_syscall+0xfc --- syscall (21, FreeBSD ELF64, mount), rip = 0x8007cea4c, rsp = 0x7fffffffdd88, rbp = 0x2 --- db> next db> next db> next .. And many 'next' debugger commands later: We are back in the same place in nfs_request(). At that point, the mountnfs() call will just loop infinitely in nfs_request() function and never releases kernel Giant and (nfs) kernel mutexes. >How-To-Repeat: Using these amd's files: /etc/amd.conf: $ cat /etc/amd.conf [global] browsable_dirs = no map_type = file mount_type = nfs search_path = /etc auto_dir = /.amd cache_duration = 30 log_file = syslog:daemon log_options = fatal,error print_pid = yes pid_file = /var/run/amd.pid restart_mounts = yes selectors_in_defaults = no [/nfs/home] map_name = /etc/home.map $ and /etc/home.map: $ cat /etc/home.map /defaults type:=nfs;opts:=tcp,intr,nosuid;rhost:=1.2.3.4 * rfs:=/dev/${key};fs:=${autodir}/nfs/home/${key} $ Just a: # /etc/rc.d/amd onestart will drive the dead-lock. >Fix: >Release-Note: >Audit-Trail: >Unformatted: