From owner-freebsd-current  Thu Oct 22 09:53:53 1998
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id JAA20541
          for freebsd-current-outgoing; Thu, 22 Oct 1998 09:53:53 -0700 (PDT)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from panzer.plutotech.com (panzer.plutotech.com [206.168.67.125])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id JAA20528
          for <FreeBSD-current@FreeBSD.ORG>; Thu, 22 Oct 1998 09:53:50 -0700 (PDT)
          (envelope-from ken@panzer.plutotech.com)
Received: (from ken@localhost)
          by panzer.plutotech.com (8.9.1/8.8.5) id KAA16575;
          Thu, 22 Oct 1998 10:53:15 -0600 (MDT)
From: "Kenneth D. Merry" <ken@plutotech.com>
Message-Id: <199810221653.KAA16575@panzer.plutotech.com>
Subject: Re: cdda2wav == panic (/sys/vm/vm_page.c:516)
In-Reply-To: <19981021224539.A10190@znh.org> from Zach Heilig at "Oct 21, 98 10:45:39 pm"
To: zach@gaffaneys.com (Zach Heilig)
Date: Thu, 22 Oct 1998 10:53:15 -0600 (MDT)
Cc: FreeBSD-current@FreeBSD.ORG, dg@root.com
X-Mailer: ELM [version 2.4ME+ PL28s (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Zach Heilig wrote...
> This is an ELF kernel compiled from sources cvsup-ed ~01:50 GMT (Oct 22)
> 
> Relevent hardware:
> ncr0: <ncr 53c875j fast20 wide scsi> rev 0x04 int a irq 11 on pci0.13.0
> cd0 at ncr0 bus 0 target 5 lun 0
> cd0: <MATSHITA CD-R   CW-7502 4.10> Removable CD-ROM SCSI2 device
> cd0: 10.0MB/s transfers (10.0MHz, offset 8)
> cd0: cd present [176612 x 2048 byte records]
> 
> The ncr0 is a diamond fireport 40, and the cdr is the only device on that bus
> (it is in an external case, with a terminator plugged into the passthrough
> connector).  It works very well burning audio/data tracks and reading data
> tracks.
> 
> I noticed this earlier today with a kernel from Oct 10.  The panic with an up
> to date kernel is different from the Oct 10 kernel.  That kernel would usually
> wait until cdda2wav exited before panic'ing (complaining about dirty pages --
> the last 5-10 megs or so of the track would be zero's after reboot), today's
> kernel panics 8-10 Mbytes into the track (at least the 3 times I tried to
> read an audio track).
> 
> stack trace:

[ ... ]

> #0  boot (howto=256) at ../../kern/kern_shutdown.c:268
> 268			dumppcb.pcb_cr3 = rcr3();
> (kgdb) where
> #0  boot (howto=256) at ../../kern/kern_shutdown.c:268
> #1  0xf01478bc in at_shutdown (function=0xf022901e <db_panic_cmd+22>, 
>     arg=0xf48daba8, queue=-267228095) at ../../kern/kern_shutdown.c:430
> #2  0xf0126ca1 in db_panic (addr=-266451744, have_addr=0, count=-1, 
>     modif=0xf48dab30 "") at ../../ddb/db_command.c:432
> #3  0xf0126c41 in db_command (last_cmdp=0xf0246804, cmd_table=0xf0246664, 
>     aux_cmd_tablep=0xf025c4e4) at ../../ddb/db_command.c:332
> #4  0xf0126d06 in db_command_loop () at ../../ddb/db_command.c:454
> #5  0xf0129067 in db_trap (type=12, code=0) at ../../ddb/db_trap.c:71
> #6  0xf01ec719 in kdb_trap (type=12, code=0, regs=0xf48dac70)
>     at ../../i386/i386/db_interface.c:157
> #7  0xf01f6d93 in trap_fatal (frame=0xf48dac70) at ../../i386/i386/trap.c:874
> #8  0xf01f6a84 in trap_pfault (frame=0xf48dac70, usermode=0)
>     at ../../i386/i386/trap.c:772
> #9  0xf01f66d7 in trap (frame={tf_es = 16, tf_ds = 16, tf_edi = -236365800, 
>       tf_esi = 8550, tf_ebp = -192041804, tf_isp = -192041832, tf_ebx = 51390, 
>       tf_edx = 65470, tf_ecx = -192026224, tf_eax = -264085512, 
>       tf_trapno = 12, tf_err = 0, tf_eip = -266451744, tf_cs = 8, 
>       tf_eflags = 66054, tf_esp = 0, tf_ss = 0}) at ../../i386/i386/trap.c:396
> #10 0xf01e44e0 in vm_page_lookup (object=0xf48de990, pindex=8550)
>     at ../../vm/vm_page.c:516
> #11 0xf01630c7 in allocbuf (bp=0xf1e95818, size=8192)
>     at ../../kern/vfs_bio.c:1782
> #12 0xf0162cb2 in getblk (vp=0xf48a82c0, blkno=4275, size=8192, slpflag=0, 
>     slptimeo=0) at ../../kern/vfs_bio.c:1557
> #13 0xf01cb09f in ffs_balloc (ap=0xf48dae98) at ../../ufs/ffs/ffs_balloc.c:297
> #14 0xf01d38a4 in ffs_write (ap=0xf48daeec) at vnode_if.h:1015
> #15 0xf016dc17 in vn_write (fp=0xf0b28640, uio=0xf48daf30, cred=0xf0a59b00)
>     at vnode_if.h:331
> #16 0xf014f9a2 in write (p=0xf4834e00, uap=0xf48daf84)
>     at ../../kern/sys_generic.c:270
> #17 0xf01f7017 in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 805601292, 
>       tf_esi = 805601292, tf_ebp = 805730652, tf_isp = -192041004, 
>       tf_ebx = 129360, tf_edx = 805601292, tf_ecx = 0, tf_eax = 4, 
>       tf_trapno = 7, tf_err = 2, tf_eip = 671874136, tf_cs = 31, 
>       tf_eflags = 582, tf_esp = -272640664, tf_ss = 39})
>     at ../../i386/i386/trap.c:1031
> #18 0xf01ed06c in Xint0x80_syscall ()
> (kgdb) 


This is a known problem.  Daniel O'Conner first reported it with 2.2.7 and
CAM.  See PR kern/8112.  I was also able to reproduce the problem under
-current/CAM last month.  I haven't messed with it since.

Here's the stack trace from my panic last month (Sept. 8th):

==================================================================
login: vm_page_free: pindex(63), busy(0), PG_BUSY(1), hold(9)
panic: vm_page_free: freeing busy page
mp_lock = 01000001; cpuid = 1; lapic.id = 00000000
Debugger("panic")
Stopped at      _Debugger+0x35: movb    $0,_in_Debugger.98
db> trace
_Debugger(f0134343) at _Debugger+0x35
_panic(f01f17ff,f054241c,80000000,f83f3ea8,f01f19b0) at _panic+0x8d
_vm_page_freechk_and_unqueue(f054241c) at _vm_page_freechk_and_unqueue+0x6e
_vm_page_free(f054241c,f8976220,0,f83f3ed4,f01eeffd) at _vm_page_free+0x1c
_vm_object_terminate(f8976220,f4dd85f0,f091e220,f189e440,f83f3ee8) at _vm_object_terminate+0xb7
_vm_object_deallocate(f8976220,f4dd85f0,0,f83f3f00,f01423fe) at _vm_object_deallocate+0x1c9
_shm_deallocate_segment(f4dd85f0,f189e440,0,f8344cc0,f83f3f1c) at _shm_deallocate_segment+0x12
_shm_delete_mapping(f8344cc0,f189e440) at _shm_delete_mapping+0x6e
_shmexit(f8344cc0) at _shmexit+0x29
_exit1(f8344cc0,0,f83f3fb4,f020dbdf,f8344cc0) at _exit1+0x1bc
_exit(f8344cc0,f83f3f94,200c2060,ffffffff,0) at _exit+0x14
_syscall(27,27,0,ffffffff,efbfd2f4) at _syscall+0x187
_Xsyscall() at _Xsyscall+0x55
--- syscall 0x1, eip = 0x200b1c4d, esp = 0xefbfd2e0, ebp = 0xefbfd2f4 ---
db> panic
panic: from debugger
mp_lock = 01000002; cpuid = 1; lapic.id = 00000000
boot() called on cpu#1
==================================================================

It looks like your panic is somewhat different from the one I saw.

Daniel O'Conner was able to work around this by hacking cdda2wav so it
didn't remove shared memory segments.  However, he got the same panic later
when he tried to remove the shared memory segments by hand.  From your
later mail, it looks like you've found other ways to work around it.

If I knew what the problem was, I would have probably fixed it by now. :)
I think it will take someone knowledgeable about the VM system to fix this,
so I'm CCing this to David.  :)

The CAM passthrough driver uses vmapbuf() and vunmapbuf() (via
cam_periph_mapmem()) to map data segments into and out of kernel virtual
memory.  My guess is that this, in combination with cdda2wav's shared
memory usage, exposes some VM bug.

Anyway, hopefully someone can shed some light on this.

Ken
-- 
Kenneth Merry
ken@plutotech.com

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message