From owner-freebsd-hackers@freebsd.org Thu Nov 14 18:10:26 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 17B0C1AF433 for ; Thu, 14 Nov 2019 18:10:26 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.116.210]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 47DTzj6ZSlz46pK; Thu, 14 Nov 2019 18:10:25 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=cs.huji.ac.il; s=57791128; h=To:References:Message-Id:Content-Transfer-Encoding:Cc:Date:In-Reply-To:From:Subject:Mime-Version:Content-Type; bh=ZBc4FA1XV8v3aX9IwabxsnNYH9WCSkU81dgR/BvOkg8=; b=Yv4VUEvwH67SPxtKrt8ZF3oE6n3Uwn3RpFvnJx1iw9hPI+ZEmmzzSitiKp2LblxARL0vAlWLPtM3nX7y/IYT7G3y+GpqZmVfGGk4QWuoGdW7YyoX4UwpMc9OUG8/op9VsWPvWUwhlF4Nm5kRy6+AzcesOY+J84lYafHeuS9pCJicoZgyvMVtMDmUTg2G/fk7Br/LuvnJsxLL5cLjK7aJu6xaziPeHnrHQ1lEBOauwWsg8vhAmSE5ExmqTEPT6iqfCo/IKevwSKijusM4SjdRQhM0KiR9n4FIAmeD9V4wNbGEUv85aVHjK4bND0DoM249TMEQnt0xUFmFkNbRO3rI9A==; Received: from macmini.bk.cs.huji.ac.il ([132.65.179.19]) by kabab.cs.huji.ac.il with esmtp id 1iVJZV-000NLO-7d; Thu, 14 Nov 2019 20:10:21 +0200 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3601.0.10\)) Subject: Re: can the hardware watchdog reboot a hung kernel? From: Daniel Braniss In-Reply-To: <828605fef472e04311c83a7de0d1f4df429ae717.camel@freebsd.org> Date: Thu, 14 Nov 2019 20:10:20 +0200 Cc: freebsd-hackers Content-Transfer-Encoding: quoted-printable Message-Id: References: <9cded04a-9ae1-881e-3962-7ef0322e96ed@grosbein.net> <2AD912BF-97B0-421D-B561-722D74864DC9@cs.huji.ac.il> <828605fef472e04311c83a7de0d1f4df429ae717.camel@freebsd.org> To: Ian Lepore X-Mailer: Apple Mail (2.3601.0.10) X-Rspamd-Queue-Id: 47DTzj6ZSlz46pK X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-5.98 / 15.00]; NEURAL_HAM_MEDIUM(-0.98)[-0.982,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Nov 2019 18:10:26 -0000 > On 14 Nov 2019, at 18:02, Ian Lepore wrote: >=20 > On Thu, 2019-11-14 at 17:35 +0200, Daniel Braniss wrote: >>> On 14 Nov 2019, at 17:28, Eugene Grosbein >>> wrote: >>>=20 >>> 14.11.2019 21:52, Daniel Braniss wrote: >>>=20 >>>> hi, >>>> I have serveral hundred Nano-pi NEO running, and sometimes they >>>> hang, since there is no console >>>> available, the only solution is to do a power cycle - not so easy >>>> since they are distributed in three buildings :-) >>>>=20 >>>> I am looking at the watchdog stuff, but it seems that what I want >>>> is not supported, i.e. >>>> reboot the kernel when hung=20 >>>>=20 >>>> wishful thinking? >>>=20 >>> It's possible if the hardware has such a watchdog and kernel >>> subsystem watchdog(4) supports it. >>> rc.conf(5) manual page describes watchdogd_enable option. >>>=20 >>=20 >> yes, but it relys on user land, what if the kernel is hung?=20 >>=20 >=20 > It relies on the userland daemon to issue the ioctl() calls to pet the > dog. If the kernel is hung, then userland code isn't going to run > either, and the watchdog petting won't happen, and eventually the > hardware reboots. >=20 > We use this at $work specifically to reboot if the kernel hangs, using > this config: >=20 > watchdogd_enable=3DYES > watchdogd_flags=3D"-s 16 -t 64 -x 64" >=20 > That says the daemon should pet the dog every 16 seconds, and the > hardware is programmed to reboot if 64 seconds elapses without = petting. > In addition, when watchdogd is shutdown normally (like during a normal > system reboot) it doesn't disable the watchdog hardware, it sets the > timeout to 64s to protect against any kind of hang during the reboot.=20= > The -t and -x times can be different, 64s just happens to work well = for > us in both cases. >=20 > -- Ian >=20 ok, that is very encouraging, now a last question how can i hang the kernel to test that the watchdog kicks in? apart from = writing a kernel module :-) =20 >=20