Date: Mon, 12 Dec 2005 18:57:24 -0800 From: Atanas <atanas@asd.aplus.net> To: Peter Jeremy <PeterJeremy@optushome.com.au> Cc: freebsd-stable@freebsd.org Subject: Re: 6.0 random freezes Message-ID: <439E3894.6060901@asd.aplus.net> In-Reply-To: <20051212214003.GA77268@cirb503493.alcatel.com.au> References: <439DE88B.1090407@asd.aplus.net> <20051212214003.GA77268@cirb503493.alcatel.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
Peter Jeremy said the following on 12/12/05 13:40: > > Define "freezing": Does it respond to pings? Can you switch VTYs? > Do the num-lock/caps-lock LEDs respond? Do some processes seem to > freeze before others? > I used the word "freeze" instead of "crash", because the latter often gets associated with some errors reported by the kernel in system logs or on the console. In this case there are absolutely no error messages. I have also remote logging enabled (on another machine over the network), but there's nothing either. When the thing happens, the server appears to respond to pings for the first few minutes, but everything goes down until I go to the data canter. When I plug a keyboard, there's no response at all - no LEDs, no VTYs, Ctrl-Alt-Esc, etc. You might think of "hint.atkbd.0.flags" not being set properly, but it's right (i.e. unchanged, it appears to default to that on i386 5.x+) and other machines with identical configuration do accept keyboard. I have no information about processes. Only the thing I have is a real time CPU load graph. I have a script tailing the output of a "vmstat cpu 15" and drawing a graph with user/system/idle times, so according to that graph there are no load spikes or unusual variations before the crashes. The usual user/system/idle percentages look like 10/7/83. > I suggest you add the following to your kernel config: > options KDB # Enable kernel debugger support. > options DDB # Support DDB. > I just set these along with the DEBUG option below, and got the new kernel (from 6.0-RELEASE sources dated Dec 9) running on both machines, so we'll see. > When it hangs, break into DDB (Ctrl-Alt-Esc on the console or BREAK on > a serial console). > > As a start, run 'show lockedvnods' and 'ps'. My guess is that you'll > see a lock that has a number of waiters - which is probably the > culprit. Use 'panic' or 'call doadump' to get a crashdump and then > you can use kgdb to rummage around once you reboot - see > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebg-gdb.html > I don't have any experience in chasing kernel bugs, so I'm not sure whether I would be able to get something useful, but I'll try that on the next crash. But if I have no keyboard response I won't be able to save it, right? I do not know what a serial console is and would need some time to get along with it. Would I get something in addition to what I can get from the standard console? >>< makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols > > I suggest you add this back in. Without it, you can't debug any crash > dumps that you manage to get (and add "dumpdev" to your rc.conf). > My bad, I realized that it's kind of harmless, but it was weeks later after I put the box in production. It's back there now. The "dumpdev" variable seems to default to AUTO, i.e. trying to use the first swap device if it's bigger than the RAM (in my case yes), so I guess I don't need to touch it. > Whilst I realise that you can't have production machines freezing on > schedule, your assistance in providing more information about your > problem will help make 6.x more stable. > Yes, I know and I will try. Today I already had a couple of crashes (got lucky, no nasty data corruptions this time), and I cannot afford this to continue. I'm already working on the downgrade, but most likely I will have at least one of these 2 machines still running 6.x during the next day or two. After the downgrade we could eventually set a test bed and start hammering it with requests. The problem would be how to trigger the crash and whether we would be able to reproduce it at all. Thanks for the prompt reply! Regards, Atanas
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?439E3894.6060901>