From owner-freebsd-stable@FreeBSD.ORG Tue Oct 3 08:43:25 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 34ECD16A403 for ; Tue, 3 Oct 2006 08:43:25 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from fw.zoral.com.ua (fw.zoral.com.ua [213.186.206.134]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3E35B43D46 for ; Tue, 3 Oct 2006 08:43:23 +0000 (GMT) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by fw.zoral.com.ua (8.13.4/8.13.4) with ESMTP id k938ef0M036905 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 3 Oct 2006 11:40:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.13.8/8.13.8) with ESMTP id k938hGbU096179; Tue, 3 Oct 2006 11:43:16 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.13.8/8.13.8/Submit) id k938hFV5096091; Tue, 3 Oct 2006 11:43:15 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 3 Oct 2006 11:43:15 +0300 From: Kostik Belousov To: Vivek Khera Message-ID: <20061003084315.GA89654@deviant.kiev.zoral.com.ua> References: <917B087C-5E13-4D7F-94FA-95CB0E5C1884@khera.org> <20060922190328.GA64849@xor.obsecurity.org> <555B84D2-520F-44D6-84D6-CF9CE7EE47C7@khera.org> <20060922203654.GA65693@xor.obsecurity.org> <847DD3A5-D5DD-4D3E-B755-64B13D1DA506@khera.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="3MwIy2ne0vdjdPXF" Content-Disposition: inline In-Reply-To: <847DD3A5-D5DD-4D3E-B755-64B13D1DA506@khera.org> User-Agent: Mutt/1.4.2.2i X-Virus-Scanned: ClamAV version 0.88.4, clamav-milter version 0.88.4 on fw.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=1.4 required=5.0 tests=SPF_NEUTRAL, UNPARSEABLE_RELAY autolearn=no version=3.1.4 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on fw.zoral.com.ua Cc: stable@freebsd.org Subject: Re: ffs snapshot lockup X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Oct 2006 08:43:25 -0000 --3MwIy2ne0vdjdPXF Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Oct 02, 2006 at 03:23:49PM -0400, Vivek Khera wrote: >=20 > On Sep 22, 2006, at 4:36 PM, Kris Kennaway wrote: >=20 > >Start by enabling INVARIANTS, INVARIANT_SUPPORT, DEBUG_LOCKS and > >DEBUG_VFS_LOCKS, then run 'show lockedvnods' and 'alltrace' in DDB > >(spammy, need that serial console), or at least trace the running > >processes (show allpcpu) and those listed in lockedvnods. Then call > >doadump and save the core+kernel.debug when you reboot. >=20 > Well, what an exciting weekend we had! Three lockups/panics: two =20 > during running a dump (mksnap_ffs seems to be the culprit) and one =20 > running background fsck after rebooting from the prior crash. >=20 > Details are posted at http://vivek.khera.org/scratch/crashlogs/ >=20 > I have the crashdumps available to a kernel hacker upon request (i'd =20 > rather not make them generally available to the public...) >=20 It seems that you have snapshotted fs exported by nfsd ? At least, 18a is definitely the case. I have the patch (for current) that shall fix the issu= e. In fact, you need two patches: 1. buffer ownership patch by Tor Egge (already in current, I think it shall be MFCed before 6.2) tegge 2006-10-02 02:06:27 UTC FreeBSD src repository Modified files: sys/kern kern_lock.c vfs_bio.c=20 sys/sys buf.h lockmgr.h=20 Log: If the buffer lock has waiters after the buffer has changed identity then getnewbuf() needs to drop the buffer in order to wake waiters that might sleep on the buffer in the context of the old identity. =20 Revision Changes Path 1.100 +15 -0 src/sys/kern/kern_lock.c 1.510 +11 -0 src/sys/kern/vfs_bio.c 1.194 +11 -0 src/sys/sys/buf.h 1.51 +1 -0 src/sys/sys/lockmgr.h _______________________________________________ cvs-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/cvs-all To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org" Index: src/sys/kern/kern_lock.c diff -u src/sys/kern/kern_lock.c:1.99 src/sys/kern/kern_lock.c:1.100 --- src/sys/kern/kern_lock.c:1.99 Tue Aug 15 18:29:01 2006 +++ src/sys/kern/kern_lock.c Mon Oct 2 02:06:26 2006 @@ -566,6 +566,21 @@ } =20 /* + * Determine the number of waiters on a lock. + */ +int +lockwaiters(lkp) + struct lock *lkp; +{ + int count; + + mtx_lock(lkp->lk_interlock); + count =3D lkp->lk_waitcount; + mtx_unlock(lkp->lk_interlock); + return (count); +} + +/* * Print out information about state of a lock. Used by VOP_PRINT * routines to display status about contained locks. */ Index: src/sys/kern/vfs_bio.c diff -u src/sys/kern/vfs_bio.c:1.509 src/sys/kern/vfs_bio.c:1.510 --- src/sys/kern/vfs_bio.c:1.509 Wed Aug 9 17:43:26 2006 +++ src/sys/kern/vfs_bio.c Mon Oct 2 02:06:26 2006 @@ -1890,6 +1890,17 @@ } =20 /* + * Notify any waiters for the buffer lock about + * identity change by freeing the buffer. + */ + if (qindex =3D=3D QUEUE_CLEAN && BUF_LOCKWAITERS(bp) > 0) { + bp->b_flags |=3D B_INVAL; + bfreekva(bp); + brelse(bp); + goto restart; + } + + /* * If we are overcomitted then recover the buffer and its * KVM space. This occurs in rare situations when multiple * processes are blocked in getnewbuf() or allocbuf(). Index: src/sys/sys/buf.h diff -u src/sys/sys/buf.h:1.193 src/sys/sys/buf.h:1.194 --- src/sys/sys/buf.h:1.193 Fri Mar 31 02:56:30 2006 +++ src/sys/sys/buf.h Mon Oct 2 02:06:27 2006 @@ -371,6 +371,17 @@ return ret; } =20 + +/* + * Find out the number of waiters on a lock. + */ +static __inline int BUF_LOCKWAITERS(struct buf *); +static __inline int +BUF_LOCKWAITERS(struct buf *bp) +{ + return (lockwaiters(&bp->b_lock)); +} + #endif /* _KERNEL */ =20 struct buf_queue_head { Index: src/sys/sys/lockmgr.h diff -u src/sys/sys/lockmgr.h:1.50 src/sys/sys/lockmgr.h:1.51 --- src/sys/sys/lockmgr.h:1.50 Tue Aug 15 18:29:01 2006 +++ src/sys/sys/lockmgr.h Mon Oct 2 02:06:27 2006 @@ -203,6 +203,7 @@ void lockmgr_printinfo(struct lock *); int lockstatus(struct lock *, struct thread *); int lockcount(struct lock *); +int lockwaiters(struct lock *); #ifdef DDB int lockmgr_chain(struct thread *td, struct thread **ownerp); #endif 2. this one (it shall be applied in the sys/ufs; please, test and report results) Index: ffs/ffs_inode.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /usr/local/arch/ncvs/src/sys/ufs/ffs/ffs_inode.c,v retrieving revision 1.106 diff -u -r1.106 ffs_inode.c --- ffs/ffs_inode.c 5 Apr 2005 08:49:41 -0000 1.106 +++ ffs/ffs_inode.c 27 Sep 2006 08:29:18 -0000 @@ -66,9 +66,11 @@ * IN_ACCESS, IN_UPDATE, and IN_CHANGE flags respectively. Write the inode * to disk if the IN_MODIFIED flag is set (it may be set initially, or by * the timestamp update). The IN_LAZYMOD flag is set to force a write - * later if not now. If we write now, then clear both IN_MODIFIED and - * IN_LAZYMOD to reflect the presumably successful write, and if waitfor is - * set, then wait for the write to complete. + * later if not now. The IN_LAZYACCESS is set instead of IN_MODIFIED if fs + * is currently suspending/suspended and vnode has been accessed. If we + * write now, then clear IN_MODIFIED, IN_LAZYACCESS and IN_LAZYMOD to refl= ect + * the presumably successful write, and if waitfor is set, then wait for t= he + * write to complete. */ int ffs_update(vp, waitfor) @@ -80,12 +82,12 @@ struct inode *ip; int error; =20 - ASSERT_VOP_LOCKED(vp, "ffs_update"); + ASSERT_VOP_ELOCKED(vp, "ffs_update"); ufs_itimes(vp); ip =3D VTOI(vp); if ((ip->i_flag & IN_MODIFIED) =3D=3D 0 && waitfor =3D=3D 0) return (0); - ip->i_flag &=3D ~(IN_LAZYMOD | IN_MODIFIED); + ip->i_flag &=3D ~(IN_LAZYACCESS | IN_LAZYMOD | IN_MODIFIED); fs =3D ip->i_fs; if (fs->fs_ronly) return (0); Index: ffs/ffs_snapshot.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /usr/local/arch/ncvs/src/sys/ufs/ffs/ffs_snapshot.c,v retrieving revision 1.130 diff -u -r1.130 ffs_snapshot.c --- ffs/ffs_snapshot.c 26 Sep 2006 04:19:11 -0000 1.130 +++ ffs/ffs_snapshot.c 27 Sep 2006 08:29:18 -0000 @@ -2309,15 +2309,16 @@ return (bp->b_error); } =20 - /* * Process file deletes that were deferred by ufs_inactive() due to - * the file system being suspended. + * the file system being suspended. Transfer IN_LAZYACCESS into + * IN_MODIFIED for vnodes that were accessed during suspension. */ static void process_deferred_inactive(struct mount *mp) { struct vnode *vp, *mvp; + struct inode *ip; struct thread *td; int error; =20 @@ -2327,9 +2328,14 @@ loop: MNT_VNODE_FOREACH(vp, mp, mvp) { VI_LOCK(vp); - if ((vp->v_iflag & (VI_DOOMED | VI_OWEINACT)) !=3D VI_OWEINACT || - vp->v_usecount > 0 || - vp->v_type =3D=3D VNON) { + /* + * IN_LAZYACCESS is checked here without holding any + * vnode lock, but this flag is set only while holding + * vnode interlock. + */ + if (vp->v_type =3D=3D VNON || (vp->v_iflag & VI_DOOMED) !=3D 0 || + ((VTOI(vp)->i_flag & IN_LAZYACCESS) =3D=3D 0 && + ((vp->v_iflag & VI_OWEINACT) =3D=3D 0 || vp->v_usecount > 0))) { VI_UNLOCK(vp); continue; } @@ -2344,8 +2350,13 @@ MNT_VNODE_FOREACH_ABORT_ILOCKED(mp, mvp); goto loop; } + ip =3D VTOI(vp); + if ((ip->i_flag & IN_LAZYACCESS) !=3D 0) { + ip->i_flag &=3D ~IN_LAZYACCESS; + ip->i_flag |=3D IN_MODIFIED; + } VI_LOCK(vp); - if ((vp->v_iflag & VI_OWEINACT) =3D=3D 0) { + if ((vp->v_iflag & VI_OWEINACT) =3D=3D 0 || vp->v_usecount > 0) { VI_UNLOCK(vp); VOP_UNLOCK(vp, 0, td); vdrop(vp); Index: ffs/ffs_vnops.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /usr/local/arch/ncvs/src/sys/ufs/ffs/ffs_vnops.c,v retrieving revision 1.160 diff -u -r1.160 ffs_vnops.c --- ffs/ffs_vnops.c 5 May 2006 19:58:36 -0000 1.160 +++ ffs/ffs_vnops.c 27 Sep 2006 08:29:18 -0000 @@ -593,8 +593,11 @@ } =20 if ((error =3D=3D 0 || uio->uio_resid !=3D orig_resid) && - (vp->v_mount->mnt_flag & MNT_NOATIME) =3D=3D 0) + (vp->v_mount->mnt_flag & MNT_NOATIME) =3D=3D 0) { + VI_LOCK(vp); ip->i_flag |=3D IN_ACCESS; + VI_UNLOCK(vp); + } return (error); } =20 @@ -989,8 +992,11 @@ } =20 if ((error =3D=3D 0 || uio->uio_resid !=3D orig_resid) && - (vp->v_mount->mnt_flag & MNT_NOATIME) =3D=3D 0) + (vp->v_mount->mnt_flag & MNT_NOATIME) =3D=3D 0) { + VI_LOCK(vp); ip->i_flag |=3D IN_ACCESS; + VI_UNLOCK(vp); + } return (error); } =20 Index: ufs/inode.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/inode.h,v retrieving revision 1.50 diff -u -r1.50 inode.h --- ufs/inode.h 26 Sep 2006 04:15:59 -0000 1.50 +++ ufs/inode.h 27 Sep 2006 08:29:18 -0000 @@ -55,6 +55,13 @@ * is the permanent meta-data associated with the file which is read in * from the permanent dinode from long term storage when the file becomes * active, and is put back when the file is no longer being used. + * + * An inode may only be changed while holding either the exclusive + * vnode lock or the shared vnode lock and the vnode interlock. We use + * the latter only for "read" and "get" operations that require + * changing i_flag, or a timestamp. This locking protocol allows executing + * those operations without having to upgrade the vnode lock from shared to + * exclusive. */ struct inode { TAILQ_ENTRY(inode) i_nextsnap; /* snapshot file list. */ @@ -119,6 +126,8 @@ #define IN_RENAME 0x0010 /* Inode is being renamed. */ #define IN_LAZYMOD 0x0040 /* Modified, but don't write yet. */ #define IN_SPACECOUNTED 0x0080 /* Blocks to be freed in free count. */ +#define IN_LAZYACCESS 0x0100 /* Process IN_ACCESS after the + suspension finished */ =20 #define i_devvp i_ump->um_devvp #define i_umbufobj i_ump->um_bo Index: ufs/ufs_vnops.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v retrieving revision 1.278 diff -u -r1.278 ufs_vnops.c --- ufs/ufs_vnops.c 20 Aug 2006 10:52:44 -0000 1.278 +++ ufs/ufs_vnops.c 27 Sep 2006 08:29:18 -0000 @@ -128,31 +128,51 @@ { struct inode *ip; struct timespec ts; + int mnt_locked; =20 ip =3D VTOI(vp); + mnt_locked =3D 0; + if ((vp->v_mount->mnt_flag & MNT_RDONLY) !=3D 0) { + VI_LOCK(vp); + goto out; + } + + MNT_ILOCK(vp->v_mount); /* For reading of mnt_kern_flags */ + mnt_locked =3D 1; + VI_LOCK(vp); if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) =3D=3D 0) - return; + goto out_unl; + if ((vp->v_type =3D=3D VBLK || vp->v_type =3D=3D VCHR) && !DOINGSOFTDEP(v= p)) ip->i_flag |=3D IN_LAZYMOD; - else + else if (((vp->v_mount->mnt_kern_flag & + (MNTK_SUSPENDED | MNTK_SUSPEND)) =3D=3D 0) || + (ip->i_flag & (IN_CHANGE | IN_UPDATE))) ip->i_flag |=3D IN_MODIFIED; - if ((vp->v_mount->mnt_flag & MNT_RDONLY) =3D=3D 0) { - vfs_timestamp(&ts); - if (ip->i_flag & IN_ACCESS) { - DIP_SET(ip, i_atime, ts.tv_sec); - DIP_SET(ip, i_atimensec, ts.tv_nsec); - } - if (ip->i_flag & IN_UPDATE) { - DIP_SET(ip, i_mtime, ts.tv_sec); - DIP_SET(ip, i_mtimensec, ts.tv_nsec); - ip->i_modrev++; - } - if (ip->i_flag & IN_CHANGE) { - DIP_SET(ip, i_ctime, ts.tv_sec); - DIP_SET(ip, i_ctimensec, ts.tv_nsec); - } + else if (ip->i_flag & IN_ACCESS) + ip->i_flag |=3D IN_LAZYACCESS; +=09 + vfs_timestamp(&ts); + if (ip->i_flag & IN_ACCESS) { + DIP_SET(ip, i_atime, ts.tv_sec); + DIP_SET(ip, i_atimensec, ts.tv_nsec); + } + if (ip->i_flag & IN_UPDATE) { + DIP_SET(ip, i_mtime, ts.tv_sec); + DIP_SET(ip, i_mtimensec, ts.tv_nsec); + ip->i_modrev++; + } + if (ip->i_flag & IN_CHANGE) { + DIP_SET(ip, i_ctime, ts.tv_sec); + DIP_SET(ip, i_ctimensec, ts.tv_nsec); } + + out: ip->i_flag &=3D ~(IN_ACCESS | IN_CHANGE | IN_UPDATE); + out_unl: + VI_UNLOCK(vp); + if (mnt_locked) + MNT_IUNLOCK(vp->v_mount); } =20 /* @@ -266,11 +286,15 @@ } */ *ap; { struct vnode *vp =3D ap->a_vp; + int usecount; =20 VI_LOCK(vp); - if (vp->v_usecount > 1) - ufs_itimes(vp); + usecount =3D vp->v_usecount; VI_UNLOCK(vp); + + if (usecount > 1) + ufs_itimes(vp); + return (0); } =20 @@ -378,8 +402,10 @@ if (ip->i_ump->um_fstype =3D=3D UFS1) { vap->va_rdev =3D ip->i_din1->di_rdev; vap->va_size =3D ip->i_din1->di_size; + VI_LOCK(vp); vap->va_atime.tv_sec =3D ip->i_din1->di_atime; vap->va_atime.tv_nsec =3D ip->i_din1->di_atimensec; + VI_UNLOCK(vp); vap->va_mtime.tv_sec =3D ip->i_din1->di_mtime; vap->va_mtime.tv_nsec =3D ip->i_din1->di_mtimensec; vap->va_ctime.tv_sec =3D ip->i_din1->di_ctime; @@ -390,8 +416,10 @@ } else { vap->va_rdev =3D ip->i_din2->di_rdev; vap->va_size =3D ip->i_din2->di_size; + VI_LOCK(vp); vap->va_atime.tv_sec =3D ip->i_din2->di_atime; vap->va_atime.tv_nsec =3D ip->i_din2->di_atimensec; + VI_UNLOCK(vp); vap->va_mtime.tv_sec =3D ip->i_din2->di_mtime; vap->va_mtime.tv_nsec =3D ip->i_din2->di_mtimensec; vap->va_ctime.tv_sec =3D ip->i_din2->di_ctime; @@ -1992,11 +2020,15 @@ } */ *ap; { struct vnode *vp =3D ap->a_vp; + int usecount; =20 VI_LOCK(vp); - if (vp->v_usecount > 1) - ufs_itimes(vp); + usecount =3D vp->v_usecount; VI_UNLOCK(vp); + + if (usecount > 1) + ufs_itimes(vp); + return (fifo_specops.vop_close(ap)); } =20 --3MwIy2ne0vdjdPXF Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFFIiKjC3+MBN1Mb4gRAqhLAJwIyOajhhXN7cPXxN5zgXYu67HXcQCfWhz9 63Fk2NcqucovjkHVzNhLsF8= =kqAI -----END PGP SIGNATURE----- --3MwIy2ne0vdjdPXF--