Date: Wed, 20 Mar 2002 02:20:39 -0500 From: Mike Nowlin <mike@argos.org> To: freebsd-stable@freebsd.org Subject: 4.5-S crashing like clockwork Message-ID: <20020320022039.A2315@argos.org>
next in thread | raw e-mail | index | archive | help
I have a pair of identical 1GHz Duron systems (with the exception of the hard drive size) - they were installed with the same 4.3-R CD around Feb 20, and both upgraded to 4.5-S on Feb 23. All was good until around March 7, when I cvs'd one of them to 4.5-S as of that date, rebuilt everything, and rebooted. The one with the 40G drive in it has been running just fine - no crashes. The one that I updated on March 7 (60G drive in it) decided to adopt a time bomb mentality - it crashes like clockwork, literally, every 24 hours. (Actually, it's usually 23:58:00 or somewhere around there - floats around a minute or so - I'm guessing that might have something to do with the amount of time the fscks take.) My first guess was the BIOS power management. Took a drive over to the colo and compared the two machines - identical settings on each. Just to be safe, I turned off as much of the power management I could on the crashing machine (these are the new "don't let the guy shut it off completely" breed), and made sure that none of the BIOS timers were set anywhere around 24 hours - set them as low as possible (5 min here, 1 min there) in hopes that it might help track the problem down... Unfortunately, it still blows up every 24h. (BTW: this is 24 hours after a cold reboot, "shutdown -r now", or a crash - doesn't matter what time of day it starts.) Since Mar 7, I've cvs'd and rebuilt every few days, hoping that it will go away.... No luck yet... :( The reason I mention the drive sizes is due to the process that is causing the crash - it's always the syncer: Fatal trap 12: page fault while in kernel mode fault virtual address = 0xd80fad5c fault code = supervisor read, page not present instruction pointer = 0x8:0xc0180258 stack pointer = 0x10:0xccbe4f34 frame pointer = 0x10:0xccbe4f58 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 6 (syncer) interrupt mask = none trap number = 12 panic: page fault syncing disks... panic: lockmgr: pid 0, not exclusive lock holder 663289855 unlocking Uptime: 23h57m10s ...that "panic: lockmgr: pid 0,..." message was new tonight - never showed up before. I'm ignoring it for now, unless there's some reason that I shouldn't. kgdb backtrace looks like: (kgdb) bt #0 0xc0151dde in dumpsys () #1 0xc0151baf in boot () #2 0xc0151fd4 in poweroff_wait () #3 0xc014c5e8 in lockmgr () #4 0xc017d776 in vfs_unbusy () #5 0xc0180eea in sync () #6 0xc015194a in boot () #7 0xc0151fd4 in poweroff_wait () #8 0xc026085e in trap_fatal () #9 0xc0260531 in trap_pfault () #10 0xc026011b in trap () #11 0xc0180258 in sync_fsync () #12 0xc017e8e7 in sched_sync () (kgdb) I just compiled the kernel with -g a little while ago, and am waiting for 00:50 EST tomorrow (when it hurls once more) to get a little more info. Any ideas? Thanks - Mike To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020320022039.A2315>