Date: Thu, 17 Jul 2008 00:14:16 +0200 From: Roland Smith <rsmith@xs4all.nl> To: Jo Rhett <hostmaster@netconsonance.com> Cc: FreeBSD Stable <freebsd-stable@freebsd.org> Subject: Re: how to get more logging from GEOM? Message-ID: <20080716221416.GA39265@slackbox.xs4all.nl> In-Reply-To: <6AA8BC91-AF84-4CC7-B6BE-4CA84D82EC1E@netconsonance.com> References: <C278655C-4FFB-4A8E-9501-2B84283E324D@netconsonance.com> <20080711155831.GA72963@slackbox.xs4all.nl> <6AA8BC91-AF84-4CC7-B6BE-4CA84D82EC1E@netconsonance.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--UlVJffcvxoiEqYs2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 16, 2008 at 02:41:28PM -0700, Jo Rhett wrote: > On Jul 11, 2008, at 8:58 AM, Roland Smith wrote: > >> After about 2 weeks of watching it carefully I've learned almost > >> nothing. It's not a disk failure (AFAIK) it's not cpu overheat (now > >> running healthd without complaints) it's not based on any given > >> network traffic... however it does appear to accompany heavy cpu/=20 > >> disk > >> activity. It usually dies when indexing my websites at night (but =20 > >> not > >> always) and it sometimes dies when compiling programs. Just heavy > >> disk isn't enough to do the job, as backups proceed without > >> problems. Heavy cpu by itself isn't enough to do it either. But if > >> I start compiling things and keep going a while, it will eventually > >> hang. > > > >> Is there anything else I should be looking at? > > > > Power supply or motherboard would be my first guess. >=20 >=20 > If the system went offline, I agree. But it's clearly a kernel =20 > deadlock, since the system remains pingable, answers TCP connections, =20 > etc etcc.... but doesn't respond.=20 Ah. Well, you did said the system 'dies', not 'becomes unresponsive'. > No TCP negotiation, no response on =20 > the console, etc. It's higher level activity which isn't working... Try compiling a kernel with debugging options e.g. WITNESS(4), MUTEX_DEBUG, LOCK_PROFILING, DIAGNOSTIC and INVARIANTS. See /usr/src/sys/conf/NOTES This will create a lot of messages in the dmesg output.=20 If you can hook the system up to another machine via serial console, you might be able to debug the kernel. Read the kernel debugging chapter in the Developers' Handbook. Another tip is to create a cron job that makes log entries every couple of minutes with logger. This might help you pinpoint the exact time of the mishap, to correlate it to other system activity. Be _really_ sure that it isn't hardware though. Otherwise you'll be led on a merry goose chase looking for software errors that aren't there. If you can restore a backup of this machine's software to a similar one, do so and see if the hangs persist. If they don't, it's hardware. Roland --=20 R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) --UlVJffcvxoiEqYs2 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (FreeBSD) iEYEARECAAYFAkh+crgACgkQEnfvsMMhpyUlYwCcCkE8cT0y1tvhEe/xtVrRwKXT 8HwAmwQ6JniwPgb/NyxHuRfXbwQtN2dA =vi47 -----END PGP SIGNATURE----- --UlVJffcvxoiEqYs2--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080716221416.GA39265>