From owner-freebsd-stable@FreeBSD.ORG Wed Jul 16 22:14:23 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 980831065673 for ; Wed, 16 Jul 2008 22:14:23 +0000 (UTC) (envelope-from rsmith@xs4all.nl) Received: from smtp-vbr5.xs4all.nl (smtp-vbr5.xs4all.nl [194.109.24.25]) by mx1.freebsd.org (Postfix) with ESMTP id 2AA038FC27 for ; Wed, 16 Jul 2008 22:14:22 +0000 (UTC) (envelope-from rsmith@xs4all.nl) Received: from slackbox.xs4all.nl (slackbox.xs4all.nl [213.84.242.160]) by smtp-vbr5.xs4all.nl (8.13.8/8.13.8) with ESMTP id m6GMELNN062676; Thu, 17 Jul 2008 00:14:21 +0200 (CEST) (envelope-from rsmith@xs4all.nl) Received: by slackbox.xs4all.nl (Postfix, from userid 1001) id 1B6F9BA8C; Thu, 17 Jul 2008 00:14:16 +0200 (CEST) Date: Thu, 17 Jul 2008 00:14:16 +0200 From: Roland Smith To: Jo Rhett Message-ID: <20080716221416.GA39265@slackbox.xs4all.nl> References: <20080711155831.GA72963@slackbox.xs4all.nl> <6AA8BC91-AF84-4CC7-B6BE-4CA84D82EC1E@netconsonance.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="UlVJffcvxoiEqYs2" Content-Disposition: inline In-Reply-To: <6AA8BC91-AF84-4CC7-B6BE-4CA84D82EC1E@netconsonance.com> X-GPG-Fingerprint: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 X-GPG-Key: http://www.xs4all.nl/~rsmith/pubkey.txt X-GPG-Notice: If this message is not signed, don't assume I sent it! User-Agent: Mutt/1.5.18 (2008-05-17) X-Virus-Scanned: by XS4ALL Virus Scanner Cc: FreeBSD Stable Subject: Re: how to get more logging from GEOM? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jul 2008 22:14:23 -0000 --UlVJffcvxoiEqYs2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 16, 2008 at 02:41:28PM -0700, Jo Rhett wrote: > On Jul 11, 2008, at 8:58 AM, Roland Smith wrote: > >> After about 2 weeks of watching it carefully I've learned almost > >> nothing. It's not a disk failure (AFAIK) it's not cpu overheat (now > >> running healthd without complaints) it's not based on any given > >> network traffic... however it does appear to accompany heavy cpu/=20 > >> disk > >> activity. It usually dies when indexing my websites at night (but =20 > >> not > >> always) and it sometimes dies when compiling programs. Just heavy > >> disk isn't enough to do the job, as backups proceed without > >> problems. Heavy cpu by itself isn't enough to do it either. But if > >> I start compiling things and keep going a while, it will eventually > >> hang. > > > >> Is there anything else I should be looking at? > > > > Power supply or motherboard would be my first guess. >=20 >=20 > If the system went offline, I agree. But it's clearly a kernel =20 > deadlock, since the system remains pingable, answers TCP connections, =20 > etc etcc.... but doesn't respond.=20 Ah. Well, you did said the system 'dies', not 'becomes unresponsive'. > No TCP negotiation, no response on =20 > the console, etc. It's higher level activity which isn't working... Try compiling a kernel with debugging options e.g. WITNESS(4), MUTEX_DEBUG, LOCK_PROFILING, DIAGNOSTIC and INVARIANTS. See /usr/src/sys/conf/NOTES This will create a lot of messages in the dmesg output.=20 If you can hook the system up to another machine via serial console, you might be able to debug the kernel. Read the kernel debugging chapter in the Developers' Handbook. Another tip is to create a cron job that makes log entries every couple of minutes with logger. This might help you pinpoint the exact time of the mishap, to correlate it to other system activity. Be _really_ sure that it isn't hardware though. Otherwise you'll be led on a merry goose chase looking for software errors that aren't there. If you can restore a backup of this machine's software to a similar one, do so and see if the hangs persist. If they don't, it's hardware. Roland --=20 R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) --UlVJffcvxoiEqYs2 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (FreeBSD) iEYEARECAAYFAkh+crgACgkQEnfvsMMhpyUlYwCcCkE8cT0y1tvhEe/xtVrRwKXT 8HwAmwQ6JniwPgb/NyxHuRfXbwQtN2dA =vi47 -----END PGP SIGNATURE----- --UlVJffcvxoiEqYs2--