From owner-freebsd-hackers@freebsd.org Thu Nov 14 18:13:49 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id DCCDD1AF6FD for ; Thu, 14 Nov 2019 18:13:49 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from outbound3d.ore.mailhop.org (outbound3d.ore.mailhop.org [54.186.57.195]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 47DV3c3Z7Cz4C7K for ; Thu, 14 Nov 2019 18:13:48 +0000 (UTC) (envelope-from ian@freebsd.org) ARC-Seal: i=1; a=rsa-sha256; t=1573755226; cv=none; d=outbound.mailhop.org; s=arc-outbound20181012; b=hkltkIKhwA1J1QI5lnfe6F0nBosETdlEbnYmLu2652GElHVYC3er+fs1XJ9eA18fCl5RllAWHQkP9 RH5M18Mx+X+p1ter39bwbB9Wq17hTHZrGBQGwRRvg5XQzEGVx9EtLPxA11VUZhBkbxsp6W1pPnL0Tp rbextfJt45bXSaEhWWdZx1pPw/3jgEs81B6HEOFEhNUG757+KLqSzCcRAztKiW0q4b45VthoifSUXT 8H0ryePApVM7c7J/dJo3NvXjjIIalRBtaPgZobjAV0T4E8aoYyPVrkfwAW38BeROOuTc6JlMJc+kKk WwLuh2IGmcZjQHe5o6OONfL+fR7XLkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=outbound.mailhop.org; s=arc-outbound20181012; h=content-transfer-encoding:mime-version:content-type:references:in-reply-to: date:cc:to:from:subject:message-id:dkim-signature:from; bh=n73dx7dTF1kOC15Bh1uayku8lVO/FBEct8IbXAvrqpE=; b=rS3K0zesC5Hpv30ZXIO/4NbMFGDD5b/2pC6epr3VHag7Dakrx9f23clCAhy7cSP66+z90TjpktJY0 TX+jXj1gUwRJOglMunsDG7ozcxPP65/pqA4LlH0I1j/VXM64ZOKZy7WAGORn9RgQidZERvxAroVXbo L2HtPbY6EK1b3hAWieTjEJmQBShmY8xY6GsdQu+sEyX8g5t1PE9zgiJ8a+LtwCVLEKhdGp/9MHOoCQ I1j7XOtrjMXGW5ZeKWpYIUTVuPAeUvGoSiijouZEBjfqmcEHm9qck72RHeJAtikuZ5b4e7IBkRZUha IDlawmHf2Gs9jh8Qn+iGgnLId4Cy+tg== ARC-Authentication-Results: i=1; outbound3.ore.mailhop.org; spf=softfail smtp.mailfrom=freebsd.org smtp.remote-ip=67.177.211.60; dmarc=none header.from=freebsd.org; arc=none header.oldest-pass=0; DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outbound.mailhop.org; s=dkim-high; h=content-transfer-encoding:mime-version:content-type:references:in-reply-to: date:cc:to:from:subject:message-id:from; bh=n73dx7dTF1kOC15Bh1uayku8lVO/FBEct8IbXAvrqpE=; b=nkflKJAbtn6dQwOHAR43RWygtz5ahmF1yga6r0CH48/Y6kUAeWC/oY9RTPK9CSjX+LCv52ltULrx7 /ut4AkueLlrOdTqihVD0fJw/NYZ/5H/Kdwau2+C/jEHpIaIJadVyRvFRhpWwKMhytQnGsM0DYlUdfT 1k5czkex3T0WyJy8te7frNaNuCj6MSsl7n5SpbRY7C1fZsYNkrs9gs9oo4UxNZkJ7V9dHOHc1pP4v6 AdCpq6SQweNns2ZT8jk7eT3DjOVQxQ5SnlLK4naSKl6M9ldAJ/ayby/DFxmVqZY2giYrWqVtBJ/dR2 n2svaHRlFbMZRIM5JvlmOAwCllvfiEg== X-MHO-RoutePath: aGlwcGll X-MHO-User: 7ceb28e0-070a-11ea-b80c-052b4a66b6b2 X-Report-Abuse-To: https://support.duocircle.com/support/solutions/articles/5000540958-duocircle-standard-smtp-abuse-information X-Originating-IP: 67.177.211.60 X-Mail-Handler: DuoCircle Outbound SMTP Received: from ilsoft.org (unknown [67.177.211.60]) by outbound3.ore.mailhop.org (Halon) with ESMTPSA id 7ceb28e0-070a-11ea-b80c-052b4a66b6b2; Thu, 14 Nov 2019 18:13:44 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.15.2/8.15.2) with ESMTP id xAEIDftA028254; Thu, 14 Nov 2019 11:13:41 -0700 (MST) (envelope-from ian@freebsd.org) Message-ID: <8814791e9634980810a41b9cc229612e225a40ee.camel@freebsd.org> Subject: Re: can the hardware watchdog reboot a hung kernel? From: Ian Lepore To: Daniel Braniss Cc: freebsd-hackers Date: Thu, 14 Nov 2019 11:13:41 -0700 In-Reply-To: References: <9cded04a-9ae1-881e-3962-7ef0322e96ed@grosbein.net> <2AD912BF-97B0-421D-B561-722D74864DC9@cs.huji.ac.il> <828605fef472e04311c83a7de0d1f4df429ae717.camel@freebsd.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 FreeBSD GNOME Team Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 47DV3c3Z7Cz4C7K X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-1.92 / 15.00]; local_wl_from(0.00)[freebsd.org]; NEURAL_HAM_MEDIUM(-0.92)[-0.916,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; ASN(0.00)[asn:16509, ipnet:54.186.0.0/15, country:US] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Nov 2019 18:13:50 -0000 On Thu, 2019-11-14 at 20:10 +0200, Daniel Braniss wrote: > > On 14 Nov 2019, at 18:02, Ian Lepore wrote: > > > > On Thu, 2019-11-14 at 17:35 +0200, Daniel Braniss wrote: > > > > On 14 Nov 2019, at 17:28, Eugene Grosbein > > > > wrote: > > > > > > > > 14.11.2019 21:52, Daniel Braniss wrote: > > > > > > > > > hi, > > > > > I have serveral hundred Nano-pi NEO running, and sometimes > > > > > they > > > > > hang, since there is no console > > > > > available, the only solution is to do a power cycle - not so > > > > > easy > > > > > since they are distributed in three buildings :-) > > > > > > > > > > I am looking at the watchdog stuff, but it seems that what I > > > > > want > > > > > is not supported, i.e. > > > > > reboot the kernel when hung > > > > > > > > > > wishful thinking? > > > > > > > > It's possible if the hardware has such a watchdog and kernel > > > > subsystem watchdog(4) supports it. > > > > rc.conf(5) manual page describes watchdogd_enable option. > > > > > > > > > > yes, but it relys on user land, what if the kernel is hung? > > > > > > > It relies on the userland daemon to issue the ioctl() calls to pet > > the > > dog. If the kernel is hung, then userland code isn't going to run > > either, and the watchdog petting won't happen, and eventually the > > hardware reboots. > > > > We use this at $work specifically to reboot if the kernel hangs, > > using > > this config: > > > > watchdogd_enable=YES > > watchdogd_flags="-s 16 -t 64 -x 64" > > > > That says the daemon should pet the dog every 16 seconds, and the > > hardware is programmed to reboot if 64 seconds elapses without > > petting. > > In addition, when watchdogd is shutdown normally (like during a > > normal > > system reboot) it doesn't disable the watchdog hardware, it sets > > the > > timeout to 64s to protect against any kind of hang during the > > reboot. > > The -t and -x times can be different, 64s just happens to work well > > for > > us in both cases. > > > > -- Ian > > > > ok, that is very encouraging, now a last question > how can i hang the kernel to test that the watchdog kicks in? apart > from writing a kernel module :-) > Drop into the kernel debugger and just let it sit there until it reboots (or fails to, I guess). Do "sysctl debug.kdb.enter=1". -- Ian