From owner-freebsd-questions@FreeBSD.ORG Tue Feb 24 21:01:15 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 51A951065688 for ; Tue, 24 Feb 2009 21:01:15 +0000 (UTC) (envelope-from psteele@maxiscale.com) Received: from exprod7og116.obsmtp.com (exprod7og116.obsmtp.com [64.18.2.219]) by mx1.freebsd.org (Postfix) with SMTP id E02618FC0C for ; Tue, 24 Feb 2009 21:01:14 +0000 (UTC) (envelope-from psteele@maxiscale.com) Received: from source ([209.85.198.249]) by exprod7ob116.postini.com ([64.18.6.12]) with SMTP ID DSNKSaRgGlB5NzOhTiz52iahfCJtU77vouKp@postini.com; Tue, 24 Feb 2009 13:01:15 PST Received: by rv-out-0708.google.com with SMTP id k29so2804723rvb.34 for ; Tue, 24 Feb 2009 13:01:14 -0800 (PST) Received: by 10.140.136.5 with SMTP id j5mr2771129rvd.167.1235509274415; Tue, 24 Feb 2009 13:01:14 -0800 (PST) Received: from localhost ([76.231.178.131]) by mx.google.com with ESMTPS id g22sm8094390rvb.0.2009.02.24.13.01.13 (version=SSLv3 cipher=RC4-MD5); Tue, 24 Feb 2009 13:01:13 -0800 (PST) Date: Tue, 24 Feb 2009 13:01:13 -0800 (PST) From: Peter Steele To: Mel Message-ID: <7836999.881235509269433.JavaMail.HALO$@halo> In-Reply-To: <32561700.861235508797247.JavaMail.HALO$@halo> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-questions@freebsd.org Subject: Re: What is correct way to enable watchdog? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2009 21:01:15 -0000 > If -e cmd is not specified, the daemon will > perform a trivial file system check instead. So -e has to be provided for the system to reboot? That doesn't seem to jive with our experience. When we first enabled the watchdog, we just went with the defaults--no -e command. The default for the timeout is 16 seconds. We started getting reboots regularly until we increased this value. We decided we didn't need anything as agressive as 16 seconds and went instead with 300 seconds. We still see the reboots, but nowhere near as frequently. >This smells more like a bug in watchdog. If that's the case, the crash dumps >should point right at it, at which point I'd take it to freebsd-stable >or -current, whichever applies to the OS version. Okay, we'll enable dumpdev/dumpdir and see what we get. With 300 seconds though, a system would have to be truly dead before a reboot should occur. But our own application logs show that only four minutes elapsed from the last log we recorded to the first log we recorded after the reboot. Considering it takes 2-3 minutes for a system to boot and our application to start running after the boot, I would think we should see a span of at least 7 minutes in our logs where nothing is recorded. However, the span is only about 4 minutes, which is more or less the same as we'd get if someone went by the box and hit the reset button. So it doesn't look like the watchdog is behaving properly.