From owner-freebsd-sparc64@FreeBSD.ORG Mon Mar 30 19:12:55 2009 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B184B1065670 for ; Mon, 30 Mar 2009 19:12:55 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id 594308FC16 for ; Mon, 30 Mar 2009 19:12:55 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.3/8.14.3/ALCHEMY.FRANKEN.DE) with ESMTP id n2UJCdBG002093; Mon, 30 Mar 2009 21:12:40 +0200 (CEST) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.3/8.14.3/Submit) id n2UJCdXu002092; Mon, 30 Mar 2009 21:12:39 +0200 (CEST) (envelope-from marius) Date: Mon, 30 Mar 2009 21:12:39 +0200 From: Marius Strobl To: Andreas Tobler Message-ID: <20090330191239.GA74661@alchemy.franken.de> References: <49CD39B7.3050500@nexus-ag.com> <20090328214138.GA93149@alchemy.franken.de> <49CFBAC6.5030809@fgznet.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <49CFBAC6.5030809@fgznet.ch> User-Agent: Mutt/1.4.2.3i Cc: freebsd-sparc64@freebsd.org Subject: Re: kernel panic with firewire PCI card X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Mar 2009 19:12:56 -0000 On Sun, Mar 29, 2009 at 08:15:34PM +0200, Andreas Tobler wrote: > Marius Strobl wrote: > > > >PCI AFSR 0x4000000000000000 indicates that the primary error > >was a target abort. Given that no DMA is involved at this stage > >this means it actually was the OHCI chip which complained > >about the PIO access. If this is the first access after the > >reset (check with "l *(0xc0659be4)", "l *(fwphy_rddata+0xe8)" > >and "l *(fwohci_reset+0x298)" in gdb on the corresponding > >kernel.debug what code is actually involved) I'd suspect > >the problem to be a combination of a sloppy driver with a > >chip that takes some more time than the other contenders > >to get ready again after a reset, i.e. fwohci_reset() only > >tries 100 times with waiting one millisecond between tries > >for OHCI_HCC_RESET to clear after the reset (the latter part > >is in line with the OHCI specification). Increasing to f.e. > >1000 tries should solve the panic then, if this is actually > >the cause. Generally fwohci(4) should be changed to fail if > >the chip doesn't become ready again after a reset instead > >of just ignoring that problem though. At least fwohci_reset() > >(there are probably more such functions in fwohci(4)) also > >seems to miss some bus space barriers, which also could be > >the cause of this panic. > > I increased the for counter to 1000 in fwohci_reset(). > > while(OREAD(sc, OHCI_HCCCTL) & OHCI_HCC_RESET) { > - if (i++ > 100) break; > + if (i++ > 1000) break; > DELAY(1000); > } > > This did not help so far. Okay, this was my best guess based on the information available, sorry. > > I also tried to check the addresses you mentioned with l *(0xXXXX) > But here I miss some things. I guess I need to invoke gdb somehow? Yes, simply `gdb /path/to/kernel.debug` You might also want to bug firewire@ and simokawa@ regarding this. > > > Thanks so far, I continue playing :) > > Andreas > > > Having the firewire debug on (hardcoded) Gives the below. > > You see that the rest loop terminates: > fwohci0: resetting OHCI...done (loop=0) > > u60# kldload firewire > fwohci0: mem > 0x4008000-0x40087ff,0x400c000-0x400ff > ff at device 4.0 on pci0 > fwohci0: latency timer 24 -> 32. > fwohci0: cache size 16 -> 16. > fwohci0: [ITHREAD] > fwohci0: OHCI version 1.0 (ROM=1) > fwohci0: No. of Isochronous channels is 4. > fwohci0: EUI64 00:10:74:60:00:00:ee:a9 > fwohci0: resetting OHCI...done (loop=0) > panic: pcib: PCI bus B error AFAR 0x1ff840080ec AFSR 0x4000f00000000000 > cpuid = 0 > KDB: stack backtrace: > panic() at 0xc03361a8 = panic+0x1c8 > psycho_pci_bus() at 0xc0628848 = psycho_pci_bus+0x88 > intr_event_handle() at 0xc030be7c = intr_event_handle+0x5c > intr_execute_handlers() at 0xc0638fb4 = intr_execute_handlers+0x14 > intr_fast() at 0xc0081328 = intr_fast+0x68 > -- interrupt level=0xd pil=0 %o7=0xc0372920 -- > -- data access error %o7=0xc0895000 -- > fwphy_rddata() at 0xc0ef8b60 = fwphy_rddata+0x120 > fwohci_reset() at 0xc0efb9d8 = fwohci_reset+0x1d8 > fwohci_init() at 0xc0efcb54 = fwohci_init+0x9d4 > fwohci_pci_attach() at 0xc0eff0f8 = fwohci_pci_attach+0x278 > device_attach() at 0xc0366544 = device_attach+0x4a4 > device_probe_and_attach() at 0xc03674a4 = device_probe_and_attach+0x64 > pci_driver_added() at 0xc021c054 = pci_driver_added+0x154 > devclass_driver_added() at 0xc0363bb4 = devclass_driver_added+0x74 > devclass_add_driver() at 0xc03648a8 = devclass_add_driver+0xa8 > driver_module_handler() at 0xc0365fd8 = driver_module_handler+0x58 > module_register_init() at 0xc032223c = module_register_init+0xdc > linker_load_module() at 0xc0317994 = linker_load_module+0xbd4 > kern_kldload() at 0xc0317f58 = kern_kldload+0xb8 > kldload() at 0xc03180e0 = kldload+0x60 > syscall() at 0xc064a2f0 = syscall+0x2f0 > -- syscall (304, FreeBSD ELF64, kldload) %o7=0x1008e0 -- > userland() at 0x4045b708 > user trace: trap %o7=0x1008e0 > pc 0x4045b708, sp 0x7fdffffe1e1 > pc 0x1006f0, sp 0x7fdffffe2a1 > pc 0x40206934, sp 0x7fdffffe361 > done > KDB: enter: panic > [thread pid 1305 tid 100066 ] > Stopped at 0xc036dd20 = kdb_enter+0x80: ta %xcc, 1 Marius