From owner-freebsd-fs@FreeBSD.ORG Mon Dec 20 00:25:10 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0BC7F106566C for ; Mon, 20 Dec 2010 00:25:10 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-7.mit.edu (DMZ-MAILSEC-SCANNER-7.MIT.EDU [18.7.68.36]) by mx1.freebsd.org (Postfix) with ESMTP id C05658FC14 for ; Mon, 20 Dec 2010 00:25:09 +0000 (UTC) X-AuditID: 12074424-b7b0bae000000a05-04-4d0e9edf6da0 Received: from mailhub-auth-2.mit.edu ( [18.7.62.36]) by dmz-mailsec-scanner-7.mit.edu (Symantec Brightmail Gateway) with SMTP id AF.22.02565.FDE9E0D4; Sun, 19 Dec 2010 19:10:07 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-2.mit.edu (8.13.8/8.9.2) with ESMTP id oBK0A7t6012645 for ; Sun, 19 Dec 2010 19:10:07 -0500 Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id oBK0A500010268 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Sun, 19 Dec 2010 19:10:06 -0500 (EST) Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id oBK0A4ns008561; Sun, 19 Dec 2010 19:10:04 -0500 (EST) Date: Sun, 19 Dec 2010 19:10:04 -0500 (EST) From: Benjamin Kaduk To: freebsd-fs@freebsd.org Message-ID: User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Brightmail-Tracker: AAAAAA== Subject: debugging process in bovlbx state X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Dec 2010 00:25:10 -0000 Hi all, I'm working on bringing the out-of-tree OpenAFS network filesystem up-to-date for FreeBSD 7.3-RELEASE, and I think I need some help to fix this bug. I should preface my discourse with the fact that there is a whole slow of lock order reversals that I haven't even tried to track down, but I do not believe that this hang is deadlock since 'show alllocks' in DDB does not show anything that seems interesting. Any pointers for things to look at would be appreciated; more details of the failing case below. In order to get the afs kernel module to load, I needed to tweak a few lines of code in getpages(), as I had previously cribbed a bunch of changes/updates from the experimental NFS client while getting AFS to work on current freebsd. In particular, vm_page_set_valid is not present in 7.3, so I am currently running with: --- a/src/afs/FBSD/osi_vnodeops.c +++ b/src/afs/FBSD/osi_vnodeops.c @@ -890,12 +890,8 @@ afs_vop_getpages(struct vop_getpages_args *ap) * Read operation filled a partial page. */ m->valid = 0; - vm_page_set_valid(m, 0, size - toff); -#ifndef AFS_FBSD80_ENV - vm_page_undirty(m); -#else + vm_page_set_validclean(m, 0, size - toff); KASSERT(m->dirty == 0, ("afs_getpages: page %p is dirty", m)); -#endif } But my knowledge of vm_page_* is approximately nil, so there's no reason to think everything was correct even before that patch. Anyway, my test case is running libarchive's configure script with source and destination directories in (different places in) AFS. It only gets twenty lines in, ending with: checking for gcc option to accept ISO C89... none needed checking for style of include used by make... GNU checking dependency style of gcc... ^Tload: 0.04 cmd: cp 1250 [bovlbx] 0.00u 0.00 procstat -kk reports: mega-man# procstat -kk 1250 PID TID COMM TDNAME KSTACK 1250 100060 cp - mi_switch+0x233 sleepq_switch+0xe9 sleepq_wait+0x44 _sleep+0x3a0 vm_object_pip_wait+0x4e bufobj_invalbuf+0x10e afs_GetVCache+0x2f7 The call to vinvalbuf in afs_GetVCache is here: 1646 iheldthelock = VOP_ISLOCKED(vp, curthread); 1647 if (!iheldthelock) 1648 vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, curthread); 1649 AFS_GUNLOCK(); 1650 vinvalbuf(vp, V_SAVE, curthread, PINOD, 0); 1651 AFS_GLOCK(); 1652 if (!iheldthelock) 1653 VOP_UNLOCK(vp, LK_EXCLUSIVE, curthread); Which is not very enlightening. I kind of suspect that some flags on the bufobj were erroneously set elsewhere and it is only now popping up. afs_GetVCache is in this source file: http://git.openafs.org/?p=openafs.git;a=blob;f=src/afs/afs_vcache.c;h=26ed2c2be271048509425583f0cc2de6c4166c4b;hb=HEAD and {get,put}pages in this: http://git.openafs.org/?p=openafs.git;a=blob;f=src/afs/FBSD/osi_vnodeops.c;h=7ae6571adb74d69cfe25e3190ade3b22dc8cdab8;hb=HEAD Thanks, Ben Kaduk