From owner-freebsd-hackers Mon Mar 3 21:32:21 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id VAA24891 for hackers-outgoing; Mon, 3 Mar 1997 21:32:21 -0800 (PST) Received: from genesis.atrad.adelaide.edu.au (genesis.atrad.adelaide.edu.au [129.127.96.120]) by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id VAA24885 for ; Mon, 3 Mar 1997 21:32:15 -0800 (PST) Received: (from msmith@localhost) by genesis.atrad.adelaide.edu.au (8.8.5/8.7.3) id QAA10831; Tue, 4 Mar 1997 16:02:00 +1030 (CST) From: Michael Smith Message-Id: <199703040532.QAA10831@genesis.atrad.adelaide.edu.au> Subject: Re: xemacs crashes kernel In-Reply-To: <19970303230157.25741@right.PCS> from Jonathan Lemon at "Mar 3, 97 11:01:57 pm" To: jlemon@americantv.com (Jonathan Lemon) Date: Tue, 4 Mar 1997 16:02:00 +1030 (CST) Cc: msmith@atrad.adelaide.edu.au, proff@iq.org, hackers@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL28 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Jonathan Lemon stands accused of saying: > On Mar 03, 1997 at 03:11:23PM +1030, Michael Smith wrote: > > Jonathan Lemon stands accused of saying: > > > On Mar 03, 1997 at 01:03:08PM +1100, Julian Assange wrote: > > > > > > > > (1) telnet into machine > > > > (2) start up xemacs in text mode > > > > (3) suspend xemacs > > > > (4) remote-disconnect telnet > > > > > > Bleah. Confirmed here, on a 2.2-GAMMA machine. Doing this causes > > > a "Trap 12, code 0 - page fault in kernel mode". > > > > Can you give us the trap message and do the nm /kernel | less thing? > > Panic dump (typed by hand): > > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x18 > fault code = supervisor read, page not present Looks like a read dereference of a null structure pointer. > instruction pointer = 0x8:0xf013753b > stack pointer = 0x10:0x3fbfff18 > frame pointer = 0x10:0x3fbfff44 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > interrupt mask = > kernel: type 12 trap, code 0 > > stopped at _fsync+0x73, testb $0x40, 0x18(%eax) > > nm /kernel | grep f0137 | sort > > f01374c8 T _fsync Ok. Here it is : int fsync(p, uap, retval) struct proc *p; struct fsync_args *uap; int *retval; { register struct vnode *vp; struct file *fp; int error; error = getvnode(p->p_fd, uap->fd, &fp); if (error) return (error); vp = (struct vnode *)fp->f_data; VOP_LOCK(vp); if (vp->v_object) { vm_object_page_clean(vp->v_object, 0, 0 ,0, FALSE); } error = VOP_FSYNC(vp, fp->f_cred, (vp->v_mount->mnt_flag & MNT_ASYNC) ? MNT_NOWAIT : MNT_WAIT, p); MNT_ASYNC is 0x40, and mnt_flag looks to be about 0x18 offset in the mount structure. Looks like maybe someone trying to fsync something that's not a file, although a quick test here doesn't indicate that. Are non-file items supposed to have valid v_mount pointers? Other places in the kernel that look at vp->v_mount often check it against zero first; should that be done here, eg. (vp->v_mount && (vp->v_mount->mnt_flag & MNT_ASYNC)) ? MNT_NOWAIT... as well? This looks like it might have been overlooked when the async filesystem stuff came in, as old versions of this code read : error = VOP_FSYNC(vp, fp->f_cred, MNT_WAIT, p); Suggestions? Jonathan, can you try the above and see if it cures your problem? -- ]] Mike Smith, Software Engineer msmith@gsoft.com.au [[ ]] Genesis Software genesis@gsoft.com.au [[ ]] High-speed data acquisition and (GSM mobile) 0411-222-496 [[ ]] realtime instrument control. (ph) +61-8-8267-3493 [[ ]] Unix hardware collector. "Where are your PEZ?" The Tick [[