From owner-freebsd-hackers@FreeBSD.ORG Fri Aug 18 19:28:16 2006 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 49BE316A4DE for ; Fri, 18 Aug 2006 19:28:16 +0000 (UTC) (envelope-from unledev@gmail.com) Received: from qb-out-0506.google.com (qb-out-0506.google.com [72.14.204.227]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5A28A43D64 for ; Fri, 18 Aug 2006 19:28:09 +0000 (GMT) (envelope-from unledev@gmail.com) Received: by qb-out-0506.google.com with SMTP id a10so383185qbd for ; Fri, 18 Aug 2006 12:28:09 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=dxHVxX63/wZyCI+TaMDt/6vLihb+2mjaNzFBK+ujJapUvf+3jOFSG96DUpznZ1k/tfMHLne8s1+pbPumLqPCfr9oj6Nco82BXjcDs1GARWpEM2n6k56QHxMUO+Dx3TtWslzfxjvDV8mlj76wbUKxTgpA1Ey70BLggTUtiAPOX7M= Received: by 10.49.41.18 with SMTP id t18mr4452610nfj; Fri, 18 Aug 2006 12:26:26 -0700 (PDT) Received: by 10.78.134.10 with HTTP; Fri, 18 Aug 2006 12:26:26 -0700 (PDT) Message-ID: <5e4707340608181226u131be51ak547c5912a35cfcec@mail.gmail.com> Date: Fri, 18 Aug 2006 19:26:26 +0000 From: "Alex Unleashed" To: freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Subject: devfs deadlock X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Aug 2006 19:28:16 -0000 Hello, Before anything else I'd like to say I'm working on a SoC project for Gentoo for which I have to port a sandbox-like application wrapper (in userspace) to FreeBSD, which deals with the building and installing of software so that nothing gets screwed in the filesystem. It's finished, but unfortunately it gets frozen at random points, preventing the system from launching new programs or saving to disk, which suggested a kernel bug, as was later confirmed looking at the waitchannels and debugging. I know there is someone working out issues in devfs code in 6.x, so this might also be interesting to him. I've been able to reproduce both in 6.1-RELEASE-p3 and 6-STABLE (snapshot from August 16th ~01:00 GMT) a deadlock in devfs code which leaves the system unable to access the disk. I've come up with some interesting debugging info, and it looks to me like there are vnode problems while a sx lock is being held. My take at it is that the deadlock occurs when a process gets a lock on a vnode (tagged "devfs") and another process xlocks an sx lock ("devfsmount"). For some reason the one holding the sx lock wants to get the lock on the vnode through devfs_allocv(), and the other process wants to get the sx lock through devfs_lookup(). From this point on, pretty much anything wanting to touch the filesystem waits forever on devfs_root() for another vnode flagged as VV_ROOT and locked by the process holding the sx lock. Patching the devfs code with fixes from -CURRENT didn't work out. Here's some info from a DDB session after the deadlock had just occurred: db> ps pid proc uid ppid pgrp flag stat wmesg wchan cmd 17072 c4145624 0 17070 1071 0004002 [SLPQ piperd 0xc39904c8][SLP] gawk-3.1.5 17071 c4149c48 0 17070 1071 0000002 [SLPQ devfs 0xc41435c8][SLP] sh 17070 c4145830 0 17068 1071 0000002 [SLPQ wait 0xc4145830][SLP] sh 17069 c414920c 0 17067 1071 0004002 [SLPQ devfsmount 0xc396d540][SLP] xgcc <> db> trace 17069 Tracing pid 17069 tid 100071 td 0xc4146600 sched_switch(c4146600,0,1,c09545dc,c4146600) at sched_switch+0x2ea mi_switch(1,0,c0955f54,1ba,c091b594) at mi_switch+0x425 sleepq_switch(c396d540,0,c0955f54,20c,dad529f0) at sleepq_switch+0x110 sleepq_wait(c396d540,c0a36da0,c0918064,1,c09c1b6c) at sleepq_wait+0x65 cv_wait(c396d540,c0a36da0,c094f9cd,23f,dad52ad4) at cv_wait+0x1e6 _sx_xlock(c396d510,c094f9c4,23f,c09b8d80,dad52ad4) at _sx_xlock+0x99 devfs_lookup(dad52ad4,c092ddcb,37,c096da3c,c4143570) at devfs_lookup+0x36 VOP_LOOKUP_APV(c09b8d80,dad52ad4,c4146600,c4146600,0) at VOP_LOOKUP_APV+0xa4 lookup(dad52b80,0,c09204ab,b6,c070487d) at lookup+0x4ed namei(dad52b80,c091c646,275,c09550c4,c4146600) at namei+0x43c kern_lstat(c4146600,807a800,0,dad52c54,c091b47f) at kern_lstat+0x4f lstat(c4146600,dad52d04,3c4,c096ba20,c4146600) at lstat+0x2f syscall(3b,3b,3b,28232cb2,bfbfa2c1) at syscall+0x31f Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (190, FreeBSD ELF32, lstat), eip = 0x282251bb, esp = 0xbfbf9e7c, ebp = 0xbfbfab38 --- db> trace 17071 Tracing pid 17071 tid 100076 td 0xc394cd80 sched_switch(c394cd80,0,1,c09545dc,c394cd80) at sched_switch+0x2ea mi_switch(1,0,c0955f54,1ba,c091b594) at mi_switch+0x425 sleepq_switch(c41435c8,0,c0955f54,20c,d9bb6768) at sleepq_switch+0x110 sleepq_wait(c41435c8,0,c091bebc,c8,c394cd80) at sleepq_wait+0x65 msleep(c41435c8,c0a3743c,50,c0917eb4,0) at msleep+0x342 acquire(d9bb6840,80,60000,b1,0) at acquire+0x11c lockmgr(c41435c8,2002,c4143638,c394cd80,d9bb6888) at lockmgr+0x576 vop_stdlock(d9bb68b0,c08f40c3,42af,c091d728,1) at vop_stdlock+0x32 VOP_LOCK_APV(c09b8d80,d9bb68b0,c4143638,8,c0920936) at VOP_LOCK_APV+0xa4 vn_lock(c4143570,2002,c394cd80,7a7,2) at vn_lock+0xd2 vget(c4143570,2,c394cd80,c3972005,c396c460) at vget+0xf0 devfs_allocv(c396c400,c3889800,d9bb6b94,c394cd80,d9bb6988) at devfs_allocv+0x43 devfs_lookupx(d9bb6ad4,c094f9c4,23f,c09b8d80,d9bb6ad4) at devfs_lookupx+0x5f5 devfs_lookup(d9bb6ad4,c092ddcb,37,c096dac3,c3971000) at devfs_lookupx+0x3e VOP_LOOKUP_APV(c09b8d80,d9bb6ad4,c394cd80,c394cd80,0) at VOP_LOOKUP_APV+0xa4 lookup(d9bb6b80,0,c09204ab,b6,c0a82d70) at lookup+0x4ed namei(d9bb6b80,c091c646,275,c09550c4,c394cd80) at namei+0x43c kern_lstat(c394cd80,806d800,0,d9bb6c54,c091b47f) at kern_lstat+0x4f lstat(c394cd80,d9bb6d04,3c4,c096ba20,c394cd80) at lstat+0x2f syscall(3b,3b,3b,281b2cb2,bfbf9731) at syscall+0x31f Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (190, FreeBSD ELF32, lstat), eip = 0x281a51bb, esp = 0xbfbf92ec, ebp = 0xbfbf9fa8 --- db> show alllocks Process 17071 (sh) thread 0xc394cd80 (100076) exclusive sx devfsmount r = 0 (0xc396d510) locked @ fs/devfs/devfs_vnops:575 Process 34 (irq1: atkbd0) thread 0xc3687a80 (100031) exclusive sleep mutex Giant r = 0 (0xc0a37ac0) locked @ kern/kern_intr.c:683 db> show lock 0xc396d510 class: sx name: devfsmount state: XLOCK: 0xc394cd80 (tid 100076, pid 17071, "sh") waiters: 0 shared, 1 exclusive db> show lockedvnods Locked vnodes 0xc3971000: tag devfs, type VDIR usecount 3, writecount 0, refcount 3 mountedhere 0 flags (VV_ROOT) lock type devfs: EXCL (count 1) by thread 0xc394cd80 (pid 17071) #0 0xc06903a1 at lockmgr+0x621 #1 0xc06f98db at vop_stdlock+0x32 #2 0xc08e18fb at VOP_LOCK_APV+0xa4 #3 0xc0712460 at vn_lock+0xd2 #4 0xc07040ff at vget+0xf0 #5 0xc0645702 at devfs_allocv+0x43 #6 0xc064550e at devfs_root+0x2c #7 0xc06fc023 at lookup+0x829 #8 0xc06fb521 at namei+0x43c #9 0xc070ba21 at kern_lstat+0x4f #10 0xc070b9ad at lstat+0x2f #11 0xc08cbb89 at syscall+0x31f #12 0xc08b890f at Xint0x80_syscall+0x1f 0xc4143570: tag devfs, type VDIR usecount 1, writecount 0, refcount 2 mountedhere 0 flags () lock type devfs: EXCL (count 1) by thread 0xc4146600 (pid 17069) with 1 pending #0 0xc06903a1 at lockmgr+0x621 #1 0xc06f98db at vop_stdlock+0x32 #2 0xc08e18fb at VOP_LOCK_APV+0xa4 #3 0xc0712460 at vn_lock+0xd2 #4 0xc07040ff at vget+0xf0 #5 0xc0645702 at devfs_allocv+0x43 #6 0xc064680e at devfs_lookupx+0x5f5 #7 0xc064685b at devfs_lookup+0x3e #8 0xc08dea25 at VOP_LOOKUP_APV+0xa4 #9 0xc06fbce7 at lookup+0x4ed #10 0xc06fb521 at namei+0x43c #11 0xc070ba21 at kern_lstat+0x4f #12 0xc070b9ad at lstat+0x2f #13 0xc08cbb89 at syscall+0x31f #14 0xc08b890f at Xint0x80_syscall+0x1f After this the system still works, but it's unable to perform operations on the filesystem, like launching a new process. When this happens, the new process calls at some point a syscall (like open or chflags, for example) which ends up at devfs_root() and the vnode with the VV_ROOT flag gets its refcount and pending status incremented, having the process waiting there forever. And that's what there is to it. Unfortunately I am not experienced enough with the kernel to propose a patch, but I can reproduce this and perform whatever you'd want me to in order to fix it. Thanks, Alex