From owner-freebsd-current@FreeBSD.ORG Mon Jan 5 02:37:08 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1F64516A4CE; Mon, 5 Jan 2004 02:37:08 -0800 (PST) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6257E43D1F; Mon, 5 Jan 2004 02:37:03 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3p2/8.8.7) with ESMTP id VAA12935; Mon, 5 Jan 2004 21:36:55 +1100 Date: Mon, 5 Jan 2004 21:36:54 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Stefan Ehmann In-Reply-To: <1073256379.801.24.camel@shoeserv.freebsd> Message-ID: <20040105202928.Q4703@gamplex.bde.org> References: <200401042224.i04MOP7E009799@gw.catspoiler.org> <1073256379.801.24.camel@shoeserv.freebsd> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Don Lewis cc: current@freebsd.org Subject: Re: page fault panic tracked down (selwakeuppri()) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Jan 2004 10:37:08 -0000 On Sun, 4 Jan 2004, Stefan Ehmann wrote: > For the first time I got a backtrace that ended in the soundcard module > - So maybe this is the right direction (on the other hand this might be > some newly introduced error) > > panic: bad bufsize > #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:240 > #1 0xc04e5198 in boot (howto=256) at > /usr/src/sys/kern/kern_shutdown.c:372 > #2 0xc04e5527 in panic () at /usr/src/sys/kern/kern_shutdown.c:550 > #3 0xc07ec648 in feed_vchan_s16 () from /boot/kernel/snd_pcm.ko > #4 0xc07e2c6d in sndbuf_feed () from /boot/kernel/snd_pcm.ko > #5 0xc07e3225 in chn_wrfeed () from /boot/kernel/snd_pcm.ko > #6 0xc07e327c in chn_wrintr () from /boot/kernel/snd_pcm.ko > #7 0xc07e3990 in chn_intr () from /boot/kernel/snd_pcm.ko > #8 0xc07fca2f in csa_intr () from /boot/kernel/snd_csa.ko > #9 0xc07fb724 in csa_intr () from /boot/kernel/snd_csa.ko > #10 0xc04d1692 in ithread_loop (arg=0xc1737b00) > at /usr/src/sys/kern/kern_intr.c:544 > #11 0xc04d0684 in fork_exit (callout=0xc04d1500 , arg=0x0, > frame=0x0) at /usr/src/sys/kern/kern_fork.c:796 The sound driver is in modules. Have you ruled out having an inconsistent module. > Nearly all other panics (I put a collection of ~20 backtraces on > http://stud4.tuwien.ac.at/~e0125637/fbsd/ if it's of any interest) ended > up in file system code. The args in many of these don't make sense. E.g., in bt.13: % ... % #15 0xc06743f8 in calltrap () at {standard input}:94 % #16 0xc0502a34 in lockmgr (lkp=0xc3a91028, flags=0, interlkp=0xe4, % td=0xc06bfc68) at /usr/src/sys/kern/kern_lock.c:228 % #17 0xc0566c67 in vfs_busy (mp=0x0, flags=0, interlkp=0x0, td=0x0) % at /usr/src/sys/kern/vfs_subr.c:527 % #18 0xc056362c in lookup (ndp=0xd8cd5bdc) at /usr/src/sys/kern/vfs_lookup.c:559 % ... mp=0x0 in #17 can't happen, and is inconsistent with lkp!=0 in #16 (since lkp = &mp->mnt_lock). The panic in line 228 in #16 is from lkp being garbage, so apparently it is not &mp->mnt_lock for a correct mp either. flags=0 and interlkp=0x0 in #17 are correct. td=0x0 is inconsistent with td!=0 in #16 (I think the latter is correct). flags=0 in #16 is correct. interlkp=0xe4 in #16 is inconsistent with interlkp=0 in #17. Compiling without optimization might simplify debugging, but I thought that gcc doesn't clobber stack args even with optimization. bt.34 has an example of a garbage arg the is unlikely to be from mis-debugging it since it is in 2 stack frames: % ... % #3 0xc0560029 in getblk (vp=0xc42ec6d8, blkno=16, size=7537385, slpflag=0, ^^^^^^^^^^^^ % slptimeo=0, flags=0) at /usr/src/sys/kern/vfs_bio.c:2415 % #4 0xc05650bd in cluster_read (vp=0xc42ec6d8, filesize=3675636, lblkno=16, % size=7537385, cred=0x0, totread=32768, seqcount=0, bpp=0x0) ^^^^^^^^^^^^ % at /usr/src/sys/kern/vfs_cluster.c:127 % #5 0xc419efa7 in ext2_read (ap=0xd8d17bcc) % at @/gnu/ext2fs/ext2_readwrite.c:115 % ... `size' is the ext2fs block size. It should be either 1K or 4K for ext2fs. So the size, or more likely the pointer to it, is corrupt in ext2_read() here: % size = BLKSIZE(fs, ip, lbn); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ % blkoffset = blkoff(fs, uio->uio_offset); % % xfersize = fs->s_frag_size - blkoffset; % if (uio->uio_resid < xfersize) % xfersize = uio->uio_resid; % if (bytesinfile < xfersize) % xfersize = bytesinfile; % % if (lblktosize(fs, nextlbn) >= ip->i_size) % error = bread(vp, lbn, size, NOCRED, &bp); % else if ((vp->v_mount->mnt_flag & MNT_NOCLUSTERR) == 0) % error = cluster_read(vp, % ip->i_size, lbn, size, NOCRED, ^^^^ % uio->uio_resid, (ap->a_ioflag >> 16), &bp); BLKSIZE(fs, ip, lbn) depends mainly on ip (fs is ip->FS), so it seems that we have a corrupt ip here. This is consistent with some of the panics being nearer to inode allocation. *_vget() has delicate locking which has caused problems before. I just remembered than ext2_vget() and other *_vget() (nfs...) are missing the last round of locking changes in ffs_vget() (ffs_vfsops.c 1.181). But these changes are only claimed to be simplifications and optimizationns. I can't see any bugs in the locking in ext2_vget() except that stuff like the following makes too many assumptions: % /* % * Lock out the creation of new entries in the FFS hash table in % * case getnewvnode() or MALLOC() blocks, otherwise a duplicate % * may occur! % */ % if (ext2fs_inode_hash_lock) { % while (ext2fs_inode_hash_lock) { % ext2fs_inode_hash_lock = -1; % tsleep(&ext2fs_inode_hash_lock, PVM, "e2vget", 0); % } % goto restart; % } % ext2fs_inode_hash_lock = 1; This assumes that the kernel is non-preemptible or that something else prevents concurrent non-atomic test-and-set of ex2fs_inode_hash_lock. I think Giant locking now prevents concurrent access, and anyway, concurrent access is too unlikely to give so many panics if the bug is here. Bruce