From owner-freebsd-hackers Tue Nov 5 17:23:45 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id RAA07291 for hackers-outgoing; Tue, 5 Nov 1996 17:23:45 -0800 (PST) Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id RAA07283 for ; Tue, 5 Nov 1996 17:23:40 -0800 (PST) Received: from current1.whistle.com (current1.whistle.com [207.76.205.22]) by alpo.whistle.com (8.8.2/8.8.2) with SMTP id RAA01157; Tue, 5 Nov 1996 17:22:12 -0800 (PST) Message-ID: <327FE834.167EB0E7@whistle.com> Date: Tue, 05 Nov 1996 17:21:56 -0800 From: Julian Elischer Organization: Whistle Communications X-Mailer: Mozilla 3.0Gold (X11; I; FreeBSD 2.2-CURRENT i386) MIME-Version: 1.0 To: Archie Cobbs CC: freebsd-hackers@freebsd.org Subject: Davidg bug (was: mount panics & hangs) References: <199611052100.NAA05158@bubba.whistle.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Archie Cobbs wrote: > > Here is a way to crash a fairly -current system... I'd be interested > to hear if this works for anyone else, and what the cause may be. The > system eventually either goes into some sort of deadlock, or else panics: > an analysis of this problem yields.. A filesystem is being unmounted. The unmount sleeps for some reason in the function below (reason not known). this leaves teh mount point with teh MNT_BUSY flag set. int dounmount(mp, flags, p) register struct mount *mp; int flags; struct proc *p; { struct vnode *coveredvp; int error; coveredvp = mp->mnt_vnodecovered; if (vfs_busy(mp)) return (EBUSY); mp->mnt_flag |= MNT_UNMOUNT; error = vfs_lock(mp); if (error) return (error); <-------line "C" mp->mnt_flag &=~ MNT_ASYNC; vfs_msync(mp, MNT_NOWAIT); vnode_pager_umount(mp); /* release cached vnodes */ cache_purgevfs(mp); /* remove cache entries for this file sys */ if ((error = VFS_SYNC(mp, MNT_WAIT, p->p_ucred, p)) == 0 || (flags & MNT_FORCE)) error = VFS_UNMOUNT(mp, flags, p); mp->mnt_flag &= ~MNT_UNMOUNT; vfs_unbusy(mp); <----- line "D" if (error) { vfs_unlock(mp); } else { vrele(coveredvp); CIRCLEQ_REMOVE(&mountlist, mp, mnt_list); mp->mnt_vnodecovered->v_mountedhere = (struct mount *)0; vfs_unlock(mp); mp->mnt_vfc->vfc_refcount--; if (mp->mnt_vnodelist.lh_first != NULL) panic("unmount: dangling vnode"); free((caddr_t)mp, M_MOUNT); } return (error); } someone else does a 'mount' to see what is mounted. this does a getfsstat() in getfsstat(), it does the loop: for (mp = mountlist.cqh_first; mp != (void *)&mountlist; mp = nmp) { if (vfs_busy(mp)) { <----------line "B" nmp = mp->mnt_list.cqe_next; continue; } if (sfsp && count < maxcount && ((mp->mnt_flag & MNT_MLOCK) == 0)) { sp = &mp->mnt_stat; /* * If MNT_NOWAIT is specified, do not refresh the * fsstat cache. MNT_WAIT overrides MNT_NOWAIT. */ if (((uap->flags & MNT_NOWAIT) == 0 || (uap->flags & MNT_WAIT)) && (error = VFS_STATFS(mp, sp, p))) { nmp = mp->mnt_list.cqe_next; vfs_unbusy(mp); continue; } sp->f_flags = mp->mnt_flag & MNT_VISFLAGMASK; error = copyout((caddr_t)sp, sfsp, sizeof(*sp)); if (error) { vfs_unbusy(mp); return (error); } sfsp += sizeof(*sp); } count++; nmp = mp->mnt_list.cqe_next; <------line "A" vfs_unbusy(mp); } .... in vfs_busy() we see. int vfs_busy(mp) register struct mount *mp; { while (mp->mnt_flag & MNT_MPBUSY) { mp->mnt_flag |= MNT_MPWANT; (void) tsleep((caddr_t) &mp->mnt_flag, PVFS, "vfsbsy", 0); } if (mp->mnt_flag & MNT_UNMOUNT) return (1); mp->mnt_flag |= MNT_MPBUSY; return (0); } unfortunatly, as the filesystem is busy the 'mount' sleeps in vfs_busy (in line "B") and is only woken up when dounmount() does a wakeup (via vfs_unbusy), (on line "D") but the other waiting process doesn't run immediatly, but only some time AFTER dounmount has done a 'free' on the mp, (and AFTER the MNT_UNMOUNT flag has been cleared). So in the 'mount' process, vfs_busy returns 0 at some later time, and the processing continues, now on a mountpoint that is no longer valid or even linked into the system. This seems to have been introduced by david in an attempt to fix a panic he was seeing on Freefall. Before that a straight "mount" woul dprobably not ahve been able to sleep and would possibly have been able to get past the mp being freed. At the next iteration of the loop, mp is undefined (often 0 in the case we see) and a page fault occurs. ( the next mp is derived on line "A") The only answer I can see is to either make the awakenned process start again from scratch, as the mp it has may no longer be valid, or to put some sort of lock on the whole mount list. BTW there is another small bug, which is.. the return at line "C" should also do a vfs_unbusy() suggestions? julian