From owner-freebsd-current@FreeBSD.ORG Fri Aug 6 20:38:11 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C76E816A4CF; Fri, 6 Aug 2004 20:38:11 +0000 (GMT) Received: from smtp-gw-cl-c.dmv.com (smtp-gw-cl-c.dmv.com [216.240.97.41]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4840943D5D; Fri, 6 Aug 2004 20:38:09 +0000 (GMT) (envelope-from sven@dmv.com) Received: from lanshark.dmv.com (lanshark.dmv.com [216.240.97.46]) i76Kc6MH024926; Fri, 6 Aug 2004 16:38:06 -0400 (EDT) (envelope-from sven@dmv.com) From: Sven Willenberger To: Scott Long In-Reply-To: <41117305.2010301@freebsd.org> References: <20040804204915.8337A5D08@ptavv.es.net> <411154D8.1050001@freebsd.org><41117305.2010301@freebsd.org> Content-Type: text/plain Date: Fri, 06 Aug 2004 16:36:19 -0400 Message-Id: <1091824579.32749.15.camel@lanshark.dmv.com> Mime-Version: 1.0 X-Mailer: Evolution 1.5.9 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.39 cc: freebsd-current@freebsd.org Subject: Re: Postgresql locks up server - no response at all X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Aug 2004 20:38:12 -0000 On Wed, 2004-08-04 at 17:36 -0600, Scott Long wrote: > Sven Willenberger wrote: > > >>> > >>>Based on this and Jeremy C.'s response it would appear that I should > >>>either try to upgrade my 5.2.1-P8 system to -CURRENT (which is scary > >>>because of the vinum array - root is not mounted on a vinum device, but > >>>the data directory is - will gvinum simply read this correctly? it is a > >>>stripe+mirror array of 4 drives) or start from scratch and go back to > >>>4.10 (STABLE) for a while. I am assuming that the lockups I am seeing > >>>were exacerbated by the PREEMPTION episodes of the past couple weeks? If > >>>I choose the upgrade to -CURRENT, are there any caveats or > >>>recommendations? (besides reading "/usr/src/UPDATING" which I do > >>>religiously anyway) > >>> > >>>_______________________________________________ > >>>freebsd-current@freebsd.org mailing list > >>>http://lists.freebsd.org/mailman/listinfo/freebsd-current > >>>To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > >> > >>I'm a bit nervous with asking you to upgrade to -current. PREEMPTION is > >>practically disabled in 5.2.1 so upgrading has a low chance of fixing > >>the problem except maybe by sheer luck. The best action would be to > >>get a crashdump. If your system has an NMI button, then there are some > >>trivial patches that will assist with this. If not, then you might want > >>to look at backporting the ichwd watchdog driver and letting that do a > >>chip-assisted NMI. > >> > >>In any case, finding out exactly what each CPU is doing at the time of > >>the lockup is going to be vital. The lockups that I've been able to > >>reproduce happen when a TAILQ in the scheduler gets corrupted and > >>resulting in one CPU spinning on the list forever with the scheduler > >>lock held. All other cpus then quickly grind to a halt while they wait > >>for the sched lock to become free, which it never does. > >> > > > > > > The case unfortunately does not have a button (although the mobo does > > have an NMI header/jumper). Backporting the watchdog driver sounds > > doable; other than downloading the sys/dev/ichwd directory from a > > repository and adding "options ichwd" to my kernel config file, what > > else would be needed? I am willing to try to get at least one crashdump > > before I have to go back to a -STABLE setup or try something so I can > > get some uptime on this box. > > > > I believe that the ichwd driver depends on the watchdog infrastructure > driver that was added back in the early spring. I'm not 100% sure, > though. > > Scott The watchdog routines were incorporated into 5.2.1 as evidenced by the NOTES for i386, however, apparently a lot has changed with those files at the point that the ichwd driver was added. In essence, my kernel config additions: options HW_WDOG options WATCHDOG (note that in -CURRENT the software watchdog option is different) added ichwd to the i386.files file and to the /usr/src/sys/modules/ Makefile. buildkernel fails at building the ichwd module with: syntax error in included file ichwd.h at eventhandler_tag ev_tag; which then causes the undefined reference to ev_tag to cause the build to fail. for now I have sysctl machdep.hlt_logical_cpus=1 to see if this helps any.