Date: Wed, 25 Mar 2009 12:44:27 +0100 From: Marius Strobl <marius@alchemy.franken.de> To: zenxyzzy <zenxyzzy@gmail.com> Cc: freebsd-sparc64@freebsd.org Subject: Re: US-III crashes on current Message-ID: <20090325114426.GA74306@alchemy.franken.de> In-Reply-To: <bc4edd860903221730p584dc13s5aff941ae3515b60@mail.gmail.com> References: <bc4edd860903221730p584dc13s5aff941ae3515b60@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Mar 22, 2009 at 07:30:28PM -0500, zenxyzzy wrote: > I've been tinkering with my sunblade 1000 for some time, and have run > pretty much all the os's available on it > it's really cool to have those easily swappable fiber channel root disks... > > configuration is pretty phat, with 2x900, 4G, 2x73G FC, 500G sata on a > shoehorned in internal bay, and 1 scsi dvd-rom > and 1 ide dvd burner, 2x creator 3d UPA, 1 belkin usb2 card, a promise > 4 drive ide card, and a cheap sil3512 sata card. > > anyhow, I was tickled pink when 8.0-20090111-SNAP showed up a while > back, and it runs well, with a zfs root, even. > some caveats: > > 1) the fans run all the time. As long there's no driver to control the fans based on the temperature this is what's expected. If you'd like to give writing a driver a try, OpenSolaris contains the source for such a daemon. Both Linux and OpenBSD also have a driver for this. The latter might or might not be a viable start for a FreeBSD one, depending on whether it can be untangled from their sensors framework and other stuff which does not and should not exist in FreeBSD. In any case I'd highly suggest to verify that it does the same as OpenSolaris does in order to not risk overheating. > 2) halt consistently panic's the machine. quite benign, if you think about it: > > panic: trap: fast data access mmu miss > cpuid = 0 > KDB: enter: panic > [thread pid 1402 tid 100148 ] > Stopped at kdb_enter+0x80: ta %xcc, 1 > db> where > Tracing pid 1402 tid 100148 td 0xfffff8000448a700 > panic() at panic+0x20c > trap() at trap+0x4d0 > -- fast data access mmu miss tar=0x14543da000 %o7=0xc034c96c -- > callout_lock() at callout_lock+0x40 > untimeout() at untimeout+0xc > isp_done() at isp_done+0x140 > isp_intr() at isp_intr+0x3eb8 > isp_poll() at isp_poll+0x38 > xpt_polled_action() at xpt_polled_action+0xc8 > dashutdown() at dashutdown+0x16c > boot() at boot+0x858 > reboot() at reboot+0x64 > syscall() at syscall+0x2e8 > -- syscall (55, FreeBSD ELF64, reboot) %o7=0x1013e4 -- > userland() at 0x4056af08 > user trace: trap %o7=0x1013e4 > pc 0x4056af08, sp 0x7fdffffe261 > pc 0x100df0, sp 0x7fdffffe321 > pc 0x402066f4, sp 0x7fdffffe3e1 IIRC, this was recently already (correctly) reported to scsi@. At least I for one didn't have time to investigate this so far though. > > 3) no X X generally works fine with Creator3D-cards on pre-USIII machines so it shouldn't be that hard to get it also to work with B{1,2}000. Due to 1) using these as workstations currenly isn't realistic so I haven't looked into this so far. Currently the bigger problem here probably is that like every X.Org update so far 7.4 has caused severe breakage for sparc64 which has yet to be fixed. > 4) no sound The sound chip integrated in B{1,2}000 should work fine with snd_audiocs(4). > 5) annoying lock order reversals. I haven't seen any sparc64-specific LOR with 8.0-CURRENT so far, not even one that doesn't also happen on amd64 and i386 (there hardly will be), i.e. they're a general FreeBSD-problem. > 6) under extreme loads (load av == 10) possibly a hang or two. > I've pretty much stressed FreeBSD on USIII, USIII+ and USIIIi machines without seeing such hangs, at least not with the in-tree source, I'm also not using things like SIL-controllers or ZFS though. Prior to r190374 opensolaris.ko, which zfs.ko depends on in turn, was incorrectly built to use emulated atomic operations, as zfs.ko already used real ones this means that things weren't necessarily atomic across opensolaris.ko and zfs.ko, which could lead to all kinds of funny things. Without detailed information these hangs could be caused by anything including hardware bugs, where USIII+ are really good in. > so, since I want to contribute some data, I build a kernel from the > SNAP's source, and it works just as well, even with > the lock instrumentation removed. > > so, I pull a current source tree and build it. oops. no go. 1100+ > files changed in those 2 months; how to find the culprit? > > it panics long before probing devices, using the generic config file: > > BOOM: > > Hit [Enter] to boot immediately, or any other key for command prompt. > Booting [/boot/kernel/kernel]... > jumping to kernel entry at 0xc0080000. > GDB: no debug ports present > KDB: debugger backends: ddb > KDB: current backend: ddb > Copyright (c) 1992-2009 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 8.0-CURRENT #0: Sun Mar 22 09:47:54 CDT 2009 > root@ra.zen-room.org:/usr/src/sys/sparc64/compile/SAFE > WARNING: WITNESS option enabled, expect reduced performance. > real memory = 4294967296 (4096 MB) > panic: vm_phys_paddr_to_vm_page: paddr 0xfd81a000 is not in any segment > cpuid = 0 > KDB: enter: panic > [thread pid 0 tid 0 ] > Stopped at kdb_enter+0x80: ta %xcc, 1 > db> where > Tracing pid 0 tid 0 td 0xc08ad670 > panic() at panic+0x20c > vm_phys_paddr_to_vm_page() at vm_phys_paddr_to_vm_page+0x84 > pmap_remove_tte() at pmap_remove_tte+0x80 > pmap_enter_locked() at pmap_enter_locked+0x204 > pmap_enter() at pmap_enter+0x64 > vm_fault() at vm_fault+0x17ac > vm_fault_wire() at vm_fault_wire+0x3c > vm_map_wire() at vm_map_wire+0x26c > kmem_alloc() at kmem_alloc+0x1b4 > vm_ksubmap_init() at vm_ksubmap_init+0x74 > cpu_startup() at cpu_startup+0xc4 > mi_startup() at mi_startup+0x18c > btext() at btext+0x30 > > anybody got any better source than 8.0-20090111-SNAP? Those 1100 file > changes look pretty daunting. The brute-force way would be to do a binary search, this somewhat doesn't smell like a new problem but something you just happen to trigger now though, f.e. by initially loading a larger kernel, then unloading and booting one that takes up fewer TLB slots one can provoke a similar panic. Unfortunately the information you provided is rather limited and I can't reproduce this problem with current sources. Did you (un)load any kernels or modules prior to this snippet, what is the size of the kernel and pre-loaded modules (if any) and do you use any special kernel or loader options (for ZFS mauybe)? Please also provide the output when booting this kernel and modules with a loader built with LOADER_DEBUG defined. Marius
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090325114426.GA74306>