From nobody Mon Sep 30 07:18:42 2024 X-Original-To: freebsd-hardware@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XHCD40SmFz5Xtwf for ; Mon, 30 Sep 2024 07:21:08 +0000 (UTC) (envelope-from Stephane.ROCHOY@stormshield.eu) Received: from mail.stormshield.eu (mail.stormshield.eu [91.212.116.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.stormshield.eu", Issuer "Sectigo RSA Organization Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XHCD25nt3z4xCK for ; Mon, 30 Sep 2024 07:21:06 +0000 (UTC) (envelope-from Stephane.ROCHOY@stormshield.eu) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=stormshield.eu header.s=signer2 header.b=uLASj5PR; spf=pass (mx1.freebsd.org: domain of Stephane.ROCHOY@stormshield.eu designates 91.212.116.25 as permitted sender) smtp.mailfrom=Stephane.ROCHOY@stormshield.eu; dmarc=pass (policy=quarantine) header.from=stormshield.eu DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stormshield.eu; s=signer2; t=1727680859; h=From:Subject:Date:Message-ID:To:Cc :MIME-Version:Content-Type:Content-Transfer-Encoding:In-Reply-To :References; bh=SMvDe0xDvcA3MgA7FPu5qF1FrRbALs34AkjChu922k8=; b=uLASj5PR/ wP124F+SBIOG1v3cFHUt9KkkavsAEJXb54bmOYrhXsGHvA3mD0Mr5cer/rjuyXO/JnA0XY+xc FDUfECXPyLZGWglQC+yXwA/k9/3eRFEGmAbLSzGuRnc98vIPqWdU6Dt0sgbMvX4v3+haMyBYP NNognTdiYZByBfHSTdC3DR/6V7np4mJfmr0JOeW7EemRqg67/p9SNOTapNYs7/2X04JnAjjeK ulCqBhBUkCoP8ZVg9lzK0XSptX7XyMS8eT87oyMffw6g/rbW1HoUeWd+0f8pRDAA5CZsqhk+4 YPiL5LoyBdGtWZQJyV2ROw9juGf4Xr5Z1cfWyYB9g==; References: <3065debc-8d4f-4487-abbb-c9408810cea6@sentex.net> <86plotbk5b.fsf@cthulhu.stephaner.labo.int> <9008b389-ab06-401d-9a95-84f849ca602a@sentex.net> <86plosdv48.fsf@cthulhu.stephaner.labo.int> <78e9461c-b93d-403f-b3a1-3568548b9283@sentex.net> <86h6a1egcs.fsf@cthulhu.stephaner.labo.int> <868qvddwph.fsf@cthulhu.stephaner.labo.int> <2d850ccc-2e90-4a1a-927c-045d4750d570@sentex.net> User-agent: mu4e 1.10.7; emacs 29.4 From: Stephane Rochoy To: mike tancsa CC: Chris6 via freebsd-hardware Subject: Re: watchdog timer programming Date: Mon, 30 Sep 2024 09:18:42 +0200 In-Reply-To: <2d850ccc-2e90-4a1a-927c-045d4750d570@sentex.net> Message-ID: <864j5xehes.fsf@cthulhu.stephaner.labo.int> List-Id: General discussion of FreeBSD hardware List-Archive: https://lists.freebsd.org/archives/freebsd-hardware List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hardware@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: ICTDCCEXCH003.one.local (10.180.4.3) To ICTDCCEXCH002.one.local (10.180.4.2) X-DKIM-Signer: DkimX (v3.60.360) X-Spamd-Result: default: False [-4.00 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[stormshield.eu,quarantine]; R_DKIM_ALLOW(-0.20)[stormshield.eu:s=signer2]; R_SPF_ALLOW(-0.20)[+a:mail.stormshield.eu]; MIME_GOOD(-0.10)[text/plain]; ARC_NA(0.00)[]; ASN(0.00)[asn:49068, ipnet:91.212.116.0/24, country:FR]; RCVD_COUNT_ZERO(0.00)[0]; MLMMJ_DEST(0.00)[freebsd-hardware@freebsd.org]; MIME_TRACE(0.00)[0:+]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; TO_DN_ALL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[stormshield.eu:+] X-Rspamd-Queue-Id: 4XHCD25nt3z4xCK X-Spamd-Bar: --- mike tancsa writes: > Do you know off hand how to set the system to just reboot ? The=20 > ddb man > page seems to imply I need options DDB as well, which is not in=20 > GENERIC > in order to set script actions. I would try the following: ddb script kdb.enter.default=3Dreset Regards, --=20 St=C3=A9phane Rochoy O: Stormshield From nobody Mon Sep 30 17:05:37 2024 X-Original-To: freebsd-hardware@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XHSBf0SKmz5XMkF for ; Mon, 30 Sep 2024 17:05:46 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [IPv6:2607:f3e0:0:1::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smarthost1.sentex.ca", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XHSBd2KF5z4kPq for ; Mon, 30 Sep 2024 17:05:45 +0000 (UTC) (envelope-from mike@sentex.net) Authentication-Results: mx1.freebsd.org; none Received: from pyroxene2a.sentex.ca (pyroxene19.sentex.ca [199.212.134.19]) by smarthost1.sentex.ca (8.18.1/8.18.1) with ESMTPS id 48UH5b9w028078 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=FAIL); Mon, 30 Sep 2024 13:05:37 -0400 (EDT) (envelope-from mike@sentex.net) Received: from [IPV6:2607:f3e0:0:4:55dd:35b0:debc:6f67] ([IPv6:2607:f3e0:0:4:55dd:35b0:debc:6f67]) by pyroxene2a.sentex.ca (8.18.1/8.15.2) with ESMTPS id 48UH5aJm089002 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Mon, 30 Sep 2024 13:05:36 -0400 (EDT) (envelope-from mike@sentex.net) Message-ID: Date: Mon, 30 Sep 2024 13:05:37 -0400 List-Id: General discussion of FreeBSD hardware List-Archive: https://lists.freebsd.org/archives/freebsd-hardware List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hardware@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: watchdog timer programming To: Stephane Rochoy Cc: Chris6 via freebsd-hardware References: <3065debc-8d4f-4487-abbb-c9408810cea6@sentex.net> <86plotbk5b.fsf@cthulhu.stephaner.labo.int> <9008b389-ab06-401d-9a95-84f849ca602a@sentex.net> <86plosdv48.fsf@cthulhu.stephaner.labo.int> <78e9461c-b93d-403f-b3a1-3568548b9283@sentex.net> <86h6a1egcs.fsf@cthulhu.stephaner.labo.int> <868qvddwph.fsf@cthulhu.stephaner.labo.int> <2d850ccc-2e90-4a1a-927c-045d4750d570@sentex.net> <864j5xehes.fsf@cthulhu.stephaner.labo.int> Content-Language: en-US From: mike tancsa Autocrypt: addr=mike@sentex.net; keydata= xsBNBFywzOMBCACoNFpwi5MeyEREiCeHtbm6pZJI/HnO+wXdCAWtZkS49weOoVyUj5BEXRZP xflV2ib2hflX4nXqhenaNiia4iaZ9ft3I1ebd7GEbGnsWCvAnob5MvDZyStDAuRxPJK1ya/s +6rOvr+eQiXYNVvfBhrCfrtR/esSkitBGxhUkBjOti8QwzD71JVF5YaOjBAs7jZUKyLGj0kW yDg4jUndudWU7G2yc9GwpHJ9aRSUN8e/mWdIogK0v+QBHfv/dsI6zVB7YuxCC9Fx8WPwfhDH VZC4kdYCQWKXrm7yb4TiVdBh5kgvlO9q3js1yYdfR1x8mjK2bH2RSv4bV3zkNmsDCIxjABEB AAHNHW1pa2UgdGFuY3NhIDxtaWtlQHNlbnRleC5uZXQ+wsCOBBMBCAA4FiEEmuvCXT0aY6hs 4SbWeVOEFl5WrMgFAl+pQfkCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQeVOEFl5W rMiN6ggAk3H5vk8QnbvGbb4sinxZt/wDetgk0AOR9NRmtTnPaW+sIJEfGBOz47Xih+f7uWJS j+uvc9Ewn2Z7n8z3ZHJlLAByLVLtcNXGoRIGJ27tevfOaNqgJHBPbFOcXCBBFTx4MYMM4iAZ cDT5vsBTSaM36JZFtHZBKkuFEItbA/N8ZQSHKdTYMIA7A3OCLGbJBqloQ8SlW4MkTzKX4u7R yefAYQ0h20x9IqC5Ju8IsYRFacVZconT16KS81IBceO42vXTN0VexbVF2rZIx3v/NT75r6Vw 0FlXVB1lXOHKydRA2NeleS4NEG2vWqy/9Boj0itMfNDlOhkrA/0DcCurMpnpbM7ATQRcsMzk AQgA1Dpo/xWS66MaOJLwA28sKNMwkEk1Yjs+okOXDOu1F+0qvgE8sVmrOOPvvWr4axtKRSG1 t2QUiZ/ZkW/x/+t0nrM39EANV1VncuQZ1ceIiwTJFqGZQ8kb0+BNkwuNVFHRgXm1qzAJweEt RdsCMohB+H7BL5LGCVG5JaU0lqFU9pFP40HxEbyzxjsZgSE8LwkI6wcu0BLv6K6cLm0EiHPO l5G8kgRi38PS7/6s3R8QDsEtbGsYy6O82k3zSLIjuDBwA9GRaeigGppTxzAHVjf5o9KKu4O7 gC2KKVHPegbXS+GK7DU0fjzX57H5bZ6komE5eY4p3oWT/CwVPSGfPs8jOwARAQABwsB2BBgB CAAgFiEEmuvCXT0aY6hs4SbWeVOEFl5WrMgFAl+pQfkCGwwACgkQeVOEFl5WrMiVqwf9GwU8 c6cylknZX8QwlsVudTC8xr/L17JA84wf03k3d4wxP7bqy5AYy7jboZMbgWXngAE/HPQU95NM aukysSnknzoIpC96XZJ0okLBXVS6Y0ylZQ+HrbIhMpuQPoDweoF5F9wKrsHRoDaUK1VR706X rwm4HUzh7Jk+auuMYfuCh0FVlFBEuiJWMLhg/5WCmcRfiuB6F59ZcUQrwLEZeNhF2XJV4KwB Tlg7HCWO/sy1foE5noaMyACjAtAQE9p5kGYaj+DuRhPdWUTsHNuqrhikzIZd2rrcMid+ktb0 NvtvswzMO059z1YGMtGSqQ4srCArju+XHIdTFdiIYbd7+jeehg== In-Reply-To: <864j5xehes.fsf@cthulhu.stephaner.labo.int> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.86 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:11647, ipnet:2607:f3e0::/32, country:CA] X-Rspamd-Queue-Id: 4XHSBd2KF5z4kPq X-Spamd-Bar: ---- On 9/30/2024 3:18 AM, Stephane Rochoy wrote: > > mike tancsa writes: > >> Do you know off hand how to set the system to just reboot ? The ddb man >> page seems to imply I need options DDB as well, which is not in GENERIC >> in order to set script actions. > > I would try the following: > >  ddb script kdb.enter.default=reset > If I build a custom kernel then that will work. But with GENERIC (I am tracking project via freebsd-update), it fails # ddb script kdb.enter.default=reset ddb: sysctl: debug.ddb.scripting.scripts: No such file or directory With a customer kernel, adding options DDB it works perfectly. Is there any way to get this to work without having ddb custom compiled in ?     ---Mike > Regards, From nobody Tue Oct 1 06:07:28 2024 X-Original-To: freebsd-hardware@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XHnxV45lDz5XL5J for ; Tue, 01 Oct 2024 06:25:34 +0000 (UTC) (envelope-from Stephane.ROCHOY@stormshield.eu) Received: from mail.stormshield.eu (mail.stormshield.eu [91.212.116.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.stormshield.eu", Issuer "Sectigo RSA Organization Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XHnxV1m3Hz4Y62 for ; Tue, 1 Oct 2024 06:25:34 +0000 (UTC) (envelope-from Stephane.ROCHOY@stormshield.eu) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stormshield.eu; s=signer2; t=1727763931; h=From:Subject:Date:Message-ID:To:Cc :MIME-Version:Content-Type:Content-Transfer-Encoding:In-Reply-To :References; bh=UI1fenzYampuYiX9gIjmxvrvAb0DjqpRL83JyoZyc2s=; b=R1Jkv86tj 5V9KgC0ZsWor3N08Cmkn7lK+340RgyfStOsPy7pY4tO7y9KFl7AKBaNhw9vzmdESoaUDFzbAu H4EDDnX8MDLT7U4U1sWfuAzgWYZCFjY1EV8ET49Xfu9GR0robgg/fk+2I1ge3z+H8VttbSfpV ayJ2zAiEMoCEqNwXuSNRs/q1OeiNNUijSnjhrMvm5m451I3T+gzcsS+ahzIqW4ePZKmKN1BRG MZQyrNX9bFGl385fvNsqV1Ud6osRooHbJMiA8k6R0R+gjwWShaKTYLteu+tazOdPmXY/wVede FkjwoE38Nd3GTA2hlBFcA1ANiLb6XvCGCoXvzSfCw==; References: <3065debc-8d4f-4487-abbb-c9408810cea6@sentex.net> <86plotbk5b.fsf@cthulhu.stephaner.labo.int> <9008b389-ab06-401d-9a95-84f849ca602a@sentex.net> <86plosdv48.fsf@cthulhu.stephaner.labo.int> <78e9461c-b93d-403f-b3a1-3568548b9283@sentex.net> <86h6a1egcs.fsf@cthulhu.stephaner.labo.int> <868qvddwph.fsf@cthulhu.stephaner.labo.int> <2d850ccc-2e90-4a1a-927c-045d4750d570@sentex.net> <864j5xehes.fsf@cthulhu.stephaner.labo.int> User-agent: mu4e 1.10.7; emacs 29.4 From: Stephane Rochoy To: mike tancsa CC: Chris6 via freebsd-hardware Subject: Re: watchdog timer programming Date: Tue, 1 Oct 2024 08:07:28 +0200 In-Reply-To: Message-ID: <86zfnocpb8.fsf@cthulhu.stephaner.labo.int> List-Id: General discussion of FreeBSD hardware List-Archive: https://lists.freebsd.org/archives/freebsd-hardware List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hardware@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: ICTDCCEXCH003.one.local (10.180.4.3) To ICTDCCEXCH002.one.local (10.180.4.2) X-DKIM-Signer: DkimX (v3.60.360) X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:49068, ipnet:91.212.116.0/24, country:FR] X-Rspamd-Queue-Id: 4XHnxV1m3Hz4Y62 X-Spamd-Bar: ---- mike tancsa writes: > WARNING: This e-mail comes from someone outside your=20 > organisation. Do not click > on links or open attachments if you do not know the sender and=20 > are not sure that > the content is safe. > > On 9/30/2024 3:18 AM, Stephane Rochoy wrote: >> >> mike tancsa writes: >> >>> Do you know off hand how to set the system to just reboot ?=20 >>> The ddb man >>> page seems to imply I need options DDB as well, which is not=20 >>> in GENERIC >>> in order to set script actions. >> >> I would try the following: >> >> ddb script kdb.enter.default=3Dreset >> > If I build a custom kernel then that will work. But with GENERIC=20 > (I am > tracking project via freebsd-update), it fails > > # ddb script kdb.enter.default=3Dreset > ddb: sysctl: debug.ddb.scripting.scripts: No such file or=20 > directory > > With a customer kernel, adding > > options DDB > > it works perfectly. > > Is there any way to get this to work without having ddb custom > compiled in ? I don't understand what's happening here. AFAIK, the code corresponding to the soft watchdog being triggered is the following: static void wd_timeout_cb(void *arg) { const char *type =3D arg; #ifdef DDB if ((wd_pretimeout_act & WD_SOFT_DDB)) { char kdb_why[80]; snprintf(kdb_why, sizeof(kdb_why), "watchdog %s-timeout",=20 type); kdb_backtrace(); kdb_enter(KDB_WHY_WATCHDOG, kdb_why); } #endif if ((wd_pretimeout_act & WD_SOFT_LOG)) log(LOG_EMERG, "watchdog %s-timeout, WD_SOFT_LOG\n", type); if ((wd_pretimeout_act & WD_SOFT_PRINTF)) printf("watchdog %s-timeout, WD_SOFT_PRINTF\n", type); if ((wd_pretimeout_act & WD_SOFT_PANIC)) panic("watchdog %s-timeout, WD_SOFT_PANIC set", type); } So without DDB, it should call panic. But in your case, it called kdb_backtrace. So initial hypothesis was wrong. What I missed is that panic was natively able to kdb_backtrace if gently asked to do so: #ifdef KDB if ((newpanic || trace_all_panics) && trace_on_panic) kdb_backtrace(); if (debugger_on_panic) kdb_enter(KDB_WHY_PANIC, "panic"); else if (!newpanic && debugger_on_recursive_panic) kdb_enter(KDB_WHY_PANIC, "re-panic"); #endif /*thread_lock(td); */ td->td_flags |=3D TDF_INPANIC; /* thread_unlock(td); */ if (!sync_on_panic) bootopt |=3D RB_NOSYNC; if (poweroff_on_panic) bootopt |=3D RB_POWEROFF; if (powercycle_on_panic) bootopt |=3D RB_POWERCYCLE; kern_reboot(bootopt); So it definitely should reboot but as it don't, maybe playing with kern.powercycle_on_panic would help? Regards, --=20=20 St=C3=A9phane Rochoy O: Stormshield From nobody Tue Oct 1 20:03:23 2024 X-Original-To: freebsd-hardware@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XJ85H5n0Fz5YJ86 for ; Tue, 01 Oct 2024 20:03:31 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [IPv6:2607:f3e0:0:1::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smarthost1.sentex.ca", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XJ85G6P5Vz4hq0 for ; Tue, 1 Oct 2024 20:03:30 +0000 (UTC) (envelope-from mike@sentex.net) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of mike@sentex.net designates 2607:f3e0:0:1::12 as permitted sender) smtp.mailfrom=mike@sentex.net; dmarc=none Received: from pyroxene2a.sentex.ca (pyroxene19.sentex.ca [199.212.134.19]) by smarthost1.sentex.ca (8.18.1/8.18.1) with ESMTPS id 491K3OOp043971 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=FAIL); Tue, 1 Oct 2024 16:03:24 -0400 (EDT) (envelope-from mike@sentex.net) Received: from [IPV6:2607:f3e0:0:4:eca3:ea83:d867:1a0] ([IPv6:2607:f3e0:0:4:eca3:ea83:d867:1a0]) by pyroxene2a.sentex.ca (8.18.1/8.15.2) with ESMTPS id 491K3NGW049914 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Tue, 1 Oct 2024 16:03:23 -0400 (EDT) (envelope-from mike@sentex.net) Message-ID: <8b730043-a759-4bb4-b7ee-323a317ce6d2@sentex.net> Date: Tue, 1 Oct 2024 16:03:23 -0400 List-Id: General discussion of FreeBSD hardware List-Archive: https://lists.freebsd.org/archives/freebsd-hardware List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hardware@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: watchdog timer programming To: Chris6 via freebsd-hardware References: <3065debc-8d4f-4487-abbb-c9408810cea6@sentex.net> <86plotbk5b.fsf@cthulhu.stephaner.labo.int> <9008b389-ab06-401d-9a95-84f849ca602a@sentex.net> <86plosdv48.fsf@cthulhu.stephaner.labo.int> <78e9461c-b93d-403f-b3a1-3568548b9283@sentex.net> <86h6a1egcs.fsf@cthulhu.stephaner.labo.int> <868qvddwph.fsf@cthulhu.stephaner.labo.int> <2d850ccc-2e90-4a1a-927c-045d4750d570@sentex.net> <864j5xehes.fsf@cthulhu.stephaner.labo.int> <86zfnocpb8.fsf@cthulhu.stephaner.labo.int> Content-Language: en-US From: mike tancsa Autocrypt: addr=mike@sentex.net; keydata= xsBNBFywzOMBCACoNFpwi5MeyEREiCeHtbm6pZJI/HnO+wXdCAWtZkS49weOoVyUj5BEXRZP xflV2ib2hflX4nXqhenaNiia4iaZ9ft3I1ebd7GEbGnsWCvAnob5MvDZyStDAuRxPJK1ya/s +6rOvr+eQiXYNVvfBhrCfrtR/esSkitBGxhUkBjOti8QwzD71JVF5YaOjBAs7jZUKyLGj0kW yDg4jUndudWU7G2yc9GwpHJ9aRSUN8e/mWdIogK0v+QBHfv/dsI6zVB7YuxCC9Fx8WPwfhDH VZC4kdYCQWKXrm7yb4TiVdBh5kgvlO9q3js1yYdfR1x8mjK2bH2RSv4bV3zkNmsDCIxjABEB AAHNHW1pa2UgdGFuY3NhIDxtaWtlQHNlbnRleC5uZXQ+wsCOBBMBCAA4FiEEmuvCXT0aY6hs 4SbWeVOEFl5WrMgFAl+pQfkCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQeVOEFl5W rMiN6ggAk3H5vk8QnbvGbb4sinxZt/wDetgk0AOR9NRmtTnPaW+sIJEfGBOz47Xih+f7uWJS j+uvc9Ewn2Z7n8z3ZHJlLAByLVLtcNXGoRIGJ27tevfOaNqgJHBPbFOcXCBBFTx4MYMM4iAZ cDT5vsBTSaM36JZFtHZBKkuFEItbA/N8ZQSHKdTYMIA7A3OCLGbJBqloQ8SlW4MkTzKX4u7R yefAYQ0h20x9IqC5Ju8IsYRFacVZconT16KS81IBceO42vXTN0VexbVF2rZIx3v/NT75r6Vw 0FlXVB1lXOHKydRA2NeleS4NEG2vWqy/9Boj0itMfNDlOhkrA/0DcCurMpnpbM7ATQRcsMzk AQgA1Dpo/xWS66MaOJLwA28sKNMwkEk1Yjs+okOXDOu1F+0qvgE8sVmrOOPvvWr4axtKRSG1 t2QUiZ/ZkW/x/+t0nrM39EANV1VncuQZ1ceIiwTJFqGZQ8kb0+BNkwuNVFHRgXm1qzAJweEt RdsCMohB+H7BL5LGCVG5JaU0lqFU9pFP40HxEbyzxjsZgSE8LwkI6wcu0BLv6K6cLm0EiHPO l5G8kgRi38PS7/6s3R8QDsEtbGsYy6O82k3zSLIjuDBwA9GRaeigGppTxzAHVjf5o9KKu4O7 gC2KKVHPegbXS+GK7DU0fjzX57H5bZ6komE5eY4p3oWT/CwVPSGfPs8jOwARAQABwsB2BBgB CAAgFiEEmuvCXT0aY6hs4SbWeVOEFl5WrMgFAl+pQfkCGwwACgkQeVOEFl5WrMiVqwf9GwU8 c6cylknZX8QwlsVudTC8xr/L17JA84wf03k3d4wxP7bqy5AYy7jboZMbgWXngAE/HPQU95NM aukysSnknzoIpC96XZJ0okLBXVS6Y0ylZQ+HrbIhMpuQPoDweoF5F9wKrsHRoDaUK1VR706X rwm4HUzh7Jk+auuMYfuCh0FVlFBEuiJWMLhg/5WCmcRfiuB6F59ZcUQrwLEZeNhF2XJV4KwB Tlg7HCWO/sy1foE5noaMyACjAtAQE9p5kGYaj+DuRhPdWUTsHNuqrhikzIZd2rrcMid+ktb0 NvtvswzMO059z1YGMtGSqQ4srCArju+XHIdTFdiIYbd7+jeehg== In-Reply-To: <86zfnocpb8.fsf@cthulhu.stephaner.labo.int> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.86 X-Spamd-Result: default: False [-3.39 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; R_SPF_ALLOW(-0.20)[+ip6:2607:f3e0::/32]; MIME_GOOD(-0.10)[text/plain]; RCVD_IN_DNSWL_LOW(-0.10)[199.212.134.19:received]; XM_UA_NO_VERSION(0.01)[]; ASN(0.00)[asn:11647, ipnet:2607:f3e0::/32, country:CA]; FREEFALL_USER(0.00)[mike]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-hardware@freebsd.org]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_DN_ALL(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DMARC_NA(0.00)[sentex.net]; RCVD_TLS_ALL(0.00)[] X-Rspamd-Queue-Id: 4XJ85G6P5Vz4hq0 X-Spamd-Bar: --- On 10/1/2024 2:07 AM, Stephane Rochoy wrote: > > mike tancsa writes: > >> WARNING: This e-mail comes from someone outside your organisation. Do >> not click >> on links or open attachments if you do not know the sender and are >> not sure that >> the content is safe. >> >> On 9/30/2024 3:18 AM, Stephane Rochoy wrote: >>> >>> mike tancsa writes: >>> >>>> Do you know off hand how to set the system to just reboot ? The ddb >>>> man >>>> page seems to imply I need options DDB as well, which is not in >>>> GENERIC >>>> in order to set script actions. >>> >>> I would try the following: >>> >>>  ddb script kdb.enter.default=reset >>> >> If I build a custom kernel then that will work. But with GENERIC (I am >> tracking project via freebsd-update), it fails >> >> # ddb script kdb.enter.default=reset >> ddb: sysctl: debug.ddb.scripting.scripts: No such file or directory >> >> With a customer kernel, adding >> >> options DDB >> >> it works perfectly. >> >> Is there any way to get this to work without having ddb custom >> compiled in ? > > I don't understand what's happening here. AFAIK, the code > corresponding to the soft watchdog being triggered is the > following: > >  static void >  wd_timeout_cb(void *arg) >  { >    const char *type = arg; > >  #ifdef DDB >    if ((wd_pretimeout_act & WD_SOFT_DDB)) { >      char kdb_why[80]; >      snprintf(kdb_why, sizeof(kdb_why), "watchdog %s-timeout",      > type); >      kdb_backtrace(); >      kdb_enter(KDB_WHY_WATCHDOG, kdb_why); >    } >  #endif >    if ((wd_pretimeout_act & WD_SOFT_LOG)) >      log(LOG_EMERG, "watchdog %s-timeout, WD_SOFT_LOG\n", type); >    if ((wd_pretimeout_act & WD_SOFT_PRINTF)) >      printf("watchdog %s-timeout, WD_SOFT_PRINTF\n", type); >    if ((wd_pretimeout_act & WD_SOFT_PANIC)) >      panic("watchdog %s-timeout, WD_SOFT_PANIC set", type); >  } > > So without DDB, it should call panic. But in your case, it > called kdb_backtrace. So initial hypothesis was wrong. What I > missed is that panic was natively able to kdb_backtrace if gently > asked to do so: > >  #ifdef KDB >    if ((newpanic || trace_all_panics) && trace_on_panic) >      kdb_backtrace(); >    if (debugger_on_panic) >      kdb_enter(KDB_WHY_PANIC, "panic"); >    else if (!newpanic && debugger_on_recursive_panic) >      kdb_enter(KDB_WHY_PANIC, "re-panic"); >  #endif >    /*thread_lock(td); */ >    td->td_flags |= TDF_INPANIC; >    /* thread_unlock(td); */ >    if (!sync_on_panic) >      bootopt |= RB_NOSYNC; >    if (poweroff_on_panic) >      bootopt |= RB_POWEROFF; >    if (powercycle_on_panic) >      bootopt |= RB_POWERCYCLE; >    kern_reboot(bootopt); > > So it definitely should reboot but as it don't, maybe playing with > kern.powercycle_on_panic would help? > > Thank you for your continued help on this. Still no luck with the GENERIC kernel 0{p9999}# sysctl -w kern.powercycle_on_panic=1 kern.powercycle_on_panic: 0 -> 1 0{p9999}# ps -auxwww | grep dog root     4752   0.0  0.2   12820  12916  -  S; Tue, 01 Oct 2024 21:02:09 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [IPv6:2607:f3e0:0:1::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smarthost1.sentex.ca", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XJ9Nx0WGLz4qHv for ; Tue, 1 Oct 2024 21:02:09 +0000 (UTC) (envelope-from mike@sentex.net) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of mike@sentex.net designates 2607:f3e0:0:1::12 as permitted sender) smtp.mailfrom=mike@sentex.net; dmarc=none Received: from pyroxene2a.sentex.ca (pyroxene19.sentex.ca [199.212.134.19]) by smarthost1.sentex.ca (8.18.1/8.18.1) with ESMTPS id 491L28Xp065694 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=FAIL); Tue, 1 Oct 2024 17:02:08 -0400 (EDT) (envelope-from mike@sentex.net) Received: from [IPV6:2607:f3e0:0:4:eca3:ea83:d867:1a0] ([IPv6:2607:f3e0:0:4:eca3:ea83:d867:1a0]) by pyroxene2a.sentex.ca (8.18.1/8.15.2) with ESMTPS id 491L26ka065737 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Tue, 1 Oct 2024 17:02:07 -0400 (EDT) (envelope-from mike@sentex.net) Message-ID: <1b346afb-d6ed-4f00-8dcf-5cdd389d210b@sentex.net> Date: Tue, 1 Oct 2024 17:02:07 -0400 List-Id: General discussion of FreeBSD hardware List-Archive: https://lists.freebsd.org/archives/freebsd-hardware List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hardware@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: watchdog timer programming From: mike tancsa To: Chris6 via freebsd-hardware References: <3065debc-8d4f-4487-abbb-c9408810cea6@sentex.net> <86plotbk5b.fsf@cthulhu.stephaner.labo.int> <9008b389-ab06-401d-9a95-84f849ca602a@sentex.net> <86plosdv48.fsf@cthulhu.stephaner.labo.int> <78e9461c-b93d-403f-b3a1-3568548b9283@sentex.net> <86h6a1egcs.fsf@cthulhu.stephaner.labo.int> <868qvddwph.fsf@cthulhu.stephaner.labo.int> <2d850ccc-2e90-4a1a-927c-045d4750d570@sentex.net> <864j5xehes.fsf@cthulhu.stephaner.labo.int> <86zfnocpb8.fsf@cthulhu.stephaner.labo.int> <8b730043-a759-4bb4-b7ee-323a317ce6d2@sentex.net> Content-Language: en-US Autocrypt: addr=mike@sentex.net; keydata= xsBNBFywzOMBCACoNFpwi5MeyEREiCeHtbm6pZJI/HnO+wXdCAWtZkS49weOoVyUj5BEXRZP xflV2ib2hflX4nXqhenaNiia4iaZ9ft3I1ebd7GEbGnsWCvAnob5MvDZyStDAuRxPJK1ya/s +6rOvr+eQiXYNVvfBhrCfrtR/esSkitBGxhUkBjOti8QwzD71JVF5YaOjBAs7jZUKyLGj0kW yDg4jUndudWU7G2yc9GwpHJ9aRSUN8e/mWdIogK0v+QBHfv/dsI6zVB7YuxCC9Fx8WPwfhDH VZC4kdYCQWKXrm7yb4TiVdBh5kgvlO9q3js1yYdfR1x8mjK2bH2RSv4bV3zkNmsDCIxjABEB AAHNHW1pa2UgdGFuY3NhIDxtaWtlQHNlbnRleC5uZXQ+wsCOBBMBCAA4FiEEmuvCXT0aY6hs 4SbWeVOEFl5WrMgFAl+pQfkCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQeVOEFl5W rMiN6ggAk3H5vk8QnbvGbb4sinxZt/wDetgk0AOR9NRmtTnPaW+sIJEfGBOz47Xih+f7uWJS j+uvc9Ewn2Z7n8z3ZHJlLAByLVLtcNXGoRIGJ27tevfOaNqgJHBPbFOcXCBBFTx4MYMM4iAZ cDT5vsBTSaM36JZFtHZBKkuFEItbA/N8ZQSHKdTYMIA7A3OCLGbJBqloQ8SlW4MkTzKX4u7R yefAYQ0h20x9IqC5Ju8IsYRFacVZconT16KS81IBceO42vXTN0VexbVF2rZIx3v/NT75r6Vw 0FlXVB1lXOHKydRA2NeleS4NEG2vWqy/9Boj0itMfNDlOhkrA/0DcCurMpnpbM7ATQRcsMzk AQgA1Dpo/xWS66MaOJLwA28sKNMwkEk1Yjs+okOXDOu1F+0qvgE8sVmrOOPvvWr4axtKRSG1 t2QUiZ/ZkW/x/+t0nrM39EANV1VncuQZ1ceIiwTJFqGZQ8kb0+BNkwuNVFHRgXm1qzAJweEt RdsCMohB+H7BL5LGCVG5JaU0lqFU9pFP40HxEbyzxjsZgSE8LwkI6wcu0BLv6K6cLm0EiHPO l5G8kgRi38PS7/6s3R8QDsEtbGsYy6O82k3zSLIjuDBwA9GRaeigGppTxzAHVjf5o9KKu4O7 gC2KKVHPegbXS+GK7DU0fjzX57H5bZ6komE5eY4p3oWT/CwVPSGfPs8jOwARAQABwsB2BBgB CAAgFiEEmuvCXT0aY6hs4SbWeVOEFl5WrMgFAl+pQfkCGwwACgkQeVOEFl5WrMiVqwf9GwU8 c6cylknZX8QwlsVudTC8xr/L17JA84wf03k3d4wxP7bqy5AYy7jboZMbgWXngAE/HPQU95NM aukysSnknzoIpC96XZJ0okLBXVS6Y0ylZQ+HrbIhMpuQPoDweoF5F9wKrsHRoDaUK1VR706X rwm4HUzh7Jk+auuMYfuCh0FVlFBEuiJWMLhg/5WCmcRfiuB6F59ZcUQrwLEZeNhF2XJV4KwB Tlg7HCWO/sy1foE5noaMyACjAtAQE9p5kGYaj+DuRhPdWUTsHNuqrhikzIZd2rrcMid+ktb0 NvtvswzMO059z1YGMtGSqQ4srCArju+XHIdTFdiIYbd7+jeehg== In-Reply-To: <8b730043-a759-4bb4-b7ee-323a317ce6d2@sentex.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.86 X-Spamd-Result: default: False [-3.39 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.995]; R_SPF_ALLOW(-0.20)[+ip6:2607:f3e0::/32]; MIME_GOOD(-0.10)[text/plain]; RCVD_IN_DNSWL_LOW(-0.10)[199.212.134.19:received]; XM_UA_NO_VERSION(0.01)[]; ASN(0.00)[asn:11647, ipnet:2607:f3e0::/32, country:CA]; FREEFALL_USER(0.00)[mike]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-hardware@freebsd.org]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_DN_ALL(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DMARC_NA(0.00)[sentex.net]; RCVD_TLS_ALL(0.00)[] X-Rspamd-Queue-Id: 4XJ9Nx0WGLz4qHv X-Spamd-Bar: --- On 10/1/2024 4:03 PM, mike tancsa wrote: > On 10/1/2024 2:07 AM, Stephane Rochoy wrote: >> >> mike tancsa writes: >> >>> WARNING: This e-mail comes from someone outside your organisation. >>> Do not click >>> on links or open attachments if you do not know the sender and are >>> not sure that >>> the content is safe. >>> >>> On 9/30/2024 3:18 AM, Stephane Rochoy wrote: >>>> >>>> mike tancsa writes: >>>> >>>>> Do you know off hand how to set the system to just reboot ? The >>>>> ddb man >>>>> page seems to imply I need options DDB as well, which is not in >>>>> GENERIC >>>>> in order to set script actions. >>>> >>>> I would try the following: >>>> >>>>  ddb script kdb.enter.default=reset >>>> >>> If I build a custom kernel then that will work. But with GENERIC (I am >>> tracking project via freebsd-update), it fails >>> >>> # ddb script kdb.enter.default=reset >>> ddb: sysctl: debug.ddb.scripting.scripts: No such file or directory >>> >>> With a customer kernel, adding >>> >>> options DDB >>> >>> it works perfectly. >>> >>> Is there any way to get this to work without having ddb custom >>> compiled in ? >> >> I don't understand what's happening here. AFAIK, the code >> corresponding to the soft watchdog being triggered is the >> following: >> >>  static void >>  wd_timeout_cb(void *arg) >>  { >>    const char *type = arg; >> >>  #ifdef DDB >>    if ((wd_pretimeout_act & WD_SOFT_DDB)) { >>      char kdb_why[80]; >>      snprintf(kdb_why, sizeof(kdb_why), "watchdog %s-timeout",      >> type); >>      kdb_backtrace(); >>      kdb_enter(KDB_WHY_WATCHDOG, kdb_why); >>    } >>  #endif >>    if ((wd_pretimeout_act & WD_SOFT_LOG)) >>      log(LOG_EMERG, "watchdog %s-timeout, WD_SOFT_LOG\n", type); >>    if ((wd_pretimeout_act & WD_SOFT_PRINTF)) >>      printf("watchdog %s-timeout, WD_SOFT_PRINTF\n", type); >>    if ((wd_pretimeout_act & WD_SOFT_PANIC)) >>      panic("watchdog %s-timeout, WD_SOFT_PANIC set", type); >>  } >> >> So without DDB, it should call panic. But in your case, it >> called kdb_backtrace. So initial hypothesis was wrong. What I >> missed is that panic was natively able to kdb_backtrace if gently >> asked to do so: >> >>  #ifdef KDB >>    if ((newpanic || trace_all_panics) && trace_on_panic) >>      kdb_backtrace(); >>    if (debugger_on_panic) >>      kdb_enter(KDB_WHY_PANIC, "panic"); >>    else if (!newpanic && debugger_on_recursive_panic) >>      kdb_enter(KDB_WHY_PANIC, "re-panic"); >>  #endif >>    /*thread_lock(td); */ >>    td->td_flags |= TDF_INPANIC; >>    /* thread_unlock(td); */ >>    if (!sync_on_panic) >>      bootopt |= RB_NOSYNC; >>    if (poweroff_on_panic) >>      bootopt |= RB_POWEROFF; >>    if (powercycle_on_panic) >>      bootopt |= RB_POWERCYCLE; >>    kern_reboot(bootopt); >> >> So it definitely should reboot but as it don't, maybe playing with >> kern.powercycle_on_panic would help? >> >> > > Thank you for your continued help on this. Still no luck with the > GENERIC kernel > > 0{p9999}# sysctl -w kern.powercycle_on_panic=1 > kern.powercycle_on_panic: 0 -> 1 > 0{p9999}# ps -auxwww | grep dog > root     4752   0.0  0.2   12820  12916  -  S watchdogd --softtimeout-action panic -t 10 > root     4792   0.0  0.0   12808   2644 u0  S+   15:39     0:00.00 > grep dog > 0{p9999}# kill -9 4752 > 0{p9999}# KDB: stack backtrace: > #0 0xffffffff80b7fefd at kdb_backtrace+0x5d > #1 0xffffffff80abec93 at hardclock+0x103 > #2 0xffffffff80abfe8b at handleevents+0xab > #3 0xffffffff80ac0b7c at timercb+0x24c > #4 0xffffffff810d0ebb at lapic_handle_timer+0xab > #5 0xffffffff80fd8a71 at Xtimerint+0xb1 > #6 0xffffffff804b3685 at acpi_cpu_idle+0x2c5 > #7 0xffffffff80fc48f6 at cpu_idle_acpi+0x46 > #8 0xffffffff80fc49ad at cpu_idle+0x9d > #9 0xffffffff80b67bb6 at sched_idletd+0x576 > #10 0xffffffff80aecf7f at fork_exit+0x7f > #11 0xffffffff80fd7dae at fork_trampoline+0xe > > 0{p9999}# > > Where would be the best place to hack in something like this in the > driver ? >  sysctl -w debug.kdb.panic_str="Watchdog Panic" > > which actually does panic the box > > One other datapoint. It seems starting watchdogd --softtimeout-action panic --softtimeout -t 10 After kill -9 it eventually prints out watchdog soft-timeout, WD_SOFT_LOG to dmesg.  But after that, I cannot start a new watchdogd with just watchdogd --softtimeout-action panic -t 10 I get watchdogd: setting WDIOC_SETSOFT 1: Invalid argument watchdogd: patting the dog: Invalid argument From nobody Wed Oct 2 00:40:17 2024 X-Original-To: freebsd-hardware@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XJGDh6RK7z5XdqV for ; Wed, 02 Oct 2024 00:40:20 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [IPv6:2607:f3e0:0:1::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smarthost1.sentex.ca", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XJGDf5BJgz4CZn for ; Wed, 2 Oct 2024 00:40:18 +0000 (UTC) (envelope-from mike@sentex.net) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of mike@sentex.net designates 2607:f3e0:0:1::12 as permitted sender) smtp.mailfrom=mike@sentex.net; dmarc=none Received: from pyroxene2a.sentex.ca (pyroxene19.sentex.ca [199.212.134.19]) by smarthost1.sentex.ca (8.18.1/8.18.1) with ESMTPS id 4920eHYC038524 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=FAIL); Tue, 1 Oct 2024 20:40:17 -0400 (EDT) (envelope-from mike@sentex.net) Received: from [IPV6:2607:f3e0:0:4:eca3:ea83:d867:1a0] ([IPv6:2607:f3e0:0:4:eca3:ea83:d867:1a0]) by pyroxene2a.sentex.ca (8.18.1/8.15.2) with ESMTPS id 4920eGr3020097 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Tue, 1 Oct 2024 20:40:16 -0400 (EDT) (envelope-from mike@sentex.net) Message-ID: <82dc6dbf-8aa7-45ef-8fe9-08dc54973c2c@sentex.net> Date: Tue, 1 Oct 2024 20:40:17 -0400 List-Id: General discussion of FreeBSD hardware List-Archive: https://lists.freebsd.org/archives/freebsd-hardware List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hardware@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: watchdog timer programming (progress) From: mike tancsa To: Chris6 via freebsd-hardware References: <3065debc-8d4f-4487-abbb-c9408810cea6@sentex.net> <86plotbk5b.fsf@cthulhu.stephaner.labo.int> <9008b389-ab06-401d-9a95-84f849ca602a@sentex.net> <86plosdv48.fsf@cthulhu.stephaner.labo.int> <78e9461c-b93d-403f-b3a1-3568548b9283@sentex.net> <86h6a1egcs.fsf@cthulhu.stephaner.labo.int> <868qvddwph.fsf@cthulhu.stephaner.labo.int> <2d850ccc-2e90-4a1a-927c-045d4750d570@sentex.net> <864j5xehes.fsf@cthulhu.stephaner.labo.int> <86zfnocpb8.fsf@cthulhu.stephaner.labo.int> <8b730043-a759-4bb4-b7ee-323a317ce6d2@sentex.net> <1b346afb-d6ed-4f00-8dcf-5cdd389d210b@sentex.net> Content-Language: en-US Autocrypt: addr=mike@sentex.net; keydata= xsBNBFywzOMBCACoNFpwi5MeyEREiCeHtbm6pZJI/HnO+wXdCAWtZkS49weOoVyUj5BEXRZP xflV2ib2hflX4nXqhenaNiia4iaZ9ft3I1ebd7GEbGnsWCvAnob5MvDZyStDAuRxPJK1ya/s +6rOvr+eQiXYNVvfBhrCfrtR/esSkitBGxhUkBjOti8QwzD71JVF5YaOjBAs7jZUKyLGj0kW yDg4jUndudWU7G2yc9GwpHJ9aRSUN8e/mWdIogK0v+QBHfv/dsI6zVB7YuxCC9Fx8WPwfhDH VZC4kdYCQWKXrm7yb4TiVdBh5kgvlO9q3js1yYdfR1x8mjK2bH2RSv4bV3zkNmsDCIxjABEB AAHNHW1pa2UgdGFuY3NhIDxtaWtlQHNlbnRleC5uZXQ+wsCOBBMBCAA4FiEEmuvCXT0aY6hs 4SbWeVOEFl5WrMgFAl+pQfkCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQeVOEFl5W rMiN6ggAk3H5vk8QnbvGbb4sinxZt/wDetgk0AOR9NRmtTnPaW+sIJEfGBOz47Xih+f7uWJS j+uvc9Ewn2Z7n8z3ZHJlLAByLVLtcNXGoRIGJ27tevfOaNqgJHBPbFOcXCBBFTx4MYMM4iAZ cDT5vsBTSaM36JZFtHZBKkuFEItbA/N8ZQSHKdTYMIA7A3OCLGbJBqloQ8SlW4MkTzKX4u7R yefAYQ0h20x9IqC5Ju8IsYRFacVZconT16KS81IBceO42vXTN0VexbVF2rZIx3v/NT75r6Vw 0FlXVB1lXOHKydRA2NeleS4NEG2vWqy/9Boj0itMfNDlOhkrA/0DcCurMpnpbM7ATQRcsMzk AQgA1Dpo/xWS66MaOJLwA28sKNMwkEk1Yjs+okOXDOu1F+0qvgE8sVmrOOPvvWr4axtKRSG1 t2QUiZ/ZkW/x/+t0nrM39EANV1VncuQZ1ceIiwTJFqGZQ8kb0+BNkwuNVFHRgXm1qzAJweEt RdsCMohB+H7BL5LGCVG5JaU0lqFU9pFP40HxEbyzxjsZgSE8LwkI6wcu0BLv6K6cLm0EiHPO l5G8kgRi38PS7/6s3R8QDsEtbGsYy6O82k3zSLIjuDBwA9GRaeigGppTxzAHVjf5o9KKu4O7 gC2KKVHPegbXS+GK7DU0fjzX57H5bZ6komE5eY4p3oWT/CwVPSGfPs8jOwARAQABwsB2BBgB CAAgFiEEmuvCXT0aY6hs4SbWeVOEFl5WrMgFAl+pQfkCGwwACgkQeVOEFl5WrMiVqwf9GwU8 c6cylknZX8QwlsVudTC8xr/L17JA84wf03k3d4wxP7bqy5AYy7jboZMbgWXngAE/HPQU95NM aukysSnknzoIpC96XZJ0okLBXVS6Y0ylZQ+HrbIhMpuQPoDweoF5F9wKrsHRoDaUK1VR706X rwm4HUzh7Jk+auuMYfuCh0FVlFBEuiJWMLhg/5WCmcRfiuB6F59ZcUQrwLEZeNhF2XJV4KwB Tlg7HCWO/sy1foE5noaMyACjAtAQE9p5kGYaj+DuRhPdWUTsHNuqrhikzIZd2rrcMid+ktb0 NvtvswzMO059z1YGMtGSqQ4srCArju+XHIdTFdiIYbd7+jeehg== In-Reply-To: <1b346afb-d6ed-4f00-8dcf-5cdd389d210b@sentex.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.86 X-Spamd-Result: default: False [-3.39 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.996]; R_SPF_ALLOW(-0.20)[+ip6:2607:f3e0::/32]; MIME_GOOD(-0.10)[text/plain]; RCVD_IN_DNSWL_LOW(-0.10)[199.212.134.19:received]; XM_UA_NO_VERSION(0.01)[]; ASN(0.00)[asn:11647, ipnet:2607:f3e0::/32, country:CA]; FREEFALL_USER(0.00)[mike]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-hardware@freebsd.org]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_DN_ALL(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DMARC_NA(0.00)[sentex.net]; RCVD_TLS_ALL(0.00)[] X-Rspamd-Queue-Id: 4XJGDf5BJgz4CZn X-Spamd-Bar: --- On 10/1/2024 5:02 PM, mike tancsa wrote: > On 10/1/2024 4:03 PM, mike tancsa wrote: >> On 10/1/2024 2:07 AM, Stephane Rochoy wrote: >>> >>> mike tancsa writes: >>> >>>> WARNING: This e-mail comes from someone outside your organisation. >>>> Do not click >>>> on links or open attachments if you do not know the sender and are >>>> not sure that >>>> the content is safe. >>>> >>>> On 9/30/2024 3:18 AM, Stephane Rochoy wrote: >>>>> >>>>> mike tancsa writes: >>>>> >>>>>> Do you know off hand how to set the system to just reboot ? The >>>>>> ddb man >>>>>> page seems to imply I need options DDB as well, which is not in >>>>>> GENERIC >>>>>> in order to set script actions. >>>>> >>>>> I would try the following: >>>>> >>>>>  ddb script kdb.enter.default=reset >>>>> >>>> If I build a custom kernel then that will work. But with GENERIC (I am >>>> tracking project via freebsd-update), it fails >>>> >>>> # ddb script kdb.enter.default=reset >>>> ddb: sysctl: debug.ddb.scripting.scripts: No such file or directory >>>> >>>> With a customer kernel, adding >>>> >>>> options DDB >>>> >>>> it works perfectly. >>>> >>>> Is there any way to get this to work without having ddb custom >>>> compiled in ? >>> >>> I don't understand what's happening here. AFAIK, the code >>> corresponding to the soft watchdog being triggered is the >>> following: >>> >>>  static void >>>  wd_timeout_cb(void *arg) >>>  { >>>    const char *type = arg; >>> >>>  #ifdef DDB >>>    if ((wd_pretimeout_act & WD_SOFT_DDB)) { >>>      char kdb_why[80]; >>>      snprintf(kdb_why, sizeof(kdb_why), "watchdog %s-timeout",      >>> type); >>>      kdb_backtrace(); >>>      kdb_enter(KDB_WHY_WATCHDOG, kdb_why); >>>    } >>>  #endif >>>    if ((wd_pretimeout_act & WD_SOFT_LOG)) >>>      log(LOG_EMERG, "watchdog %s-timeout, WD_SOFT_LOG\n", type); >>>    if ((wd_pretimeout_act & WD_SOFT_PRINTF)) >>>      printf("watchdog %s-timeout, WD_SOFT_PRINTF\n", type); >>>    if ((wd_pretimeout_act & WD_SOFT_PANIC)) >>>      panic("watchdog %s-timeout, WD_SOFT_PANIC set", type); >>>  } >>> >>> So without DDB, it should call panic. But in your case, it >>> called kdb_backtrace. So initial hypothesis was wrong. What I >>> missed is that panic was natively able to kdb_backtrace if gently >>> asked to do so: >>> >>>  #ifdef KDB >>>    if ((newpanic || trace_all_panics) && trace_on_panic) >>>      kdb_backtrace(); >>>    if (debugger_on_panic) >>>      kdb_enter(KDB_WHY_PANIC, "panic"); >>>    else if (!newpanic && debugger_on_recursive_panic) >>>      kdb_enter(KDB_WHY_PANIC, "re-panic"); >>>  #endif >>>    /*thread_lock(td); */ >>>    td->td_flags |= TDF_INPANIC; >>>    /* thread_unlock(td); */ >>>    if (!sync_on_panic) >>>      bootopt |= RB_NOSYNC; >>>    if (poweroff_on_panic) >>>      bootopt |= RB_POWEROFF; >>>    if (powercycle_on_panic) >>>      bootopt |= RB_POWERCYCLE; >>>    kern_reboot(bootopt); >>> >>> So it definitely should reboot but as it don't, maybe playing with >>> kern.powercycle_on_panic would help? >>> >>> >> >> Thank you for your continued help on this. Still no luck with the >> GENERIC kernel >> >> 0{p9999}# sysctl -w kern.powercycle_on_panic=1 >> kern.powercycle_on_panic: 0 -> 1 >> 0{p9999}# ps -auxwww | grep dog >> root     4752   0.0  0.2   12820  12916  -  S> watchdogd --softtimeout-action panic -t 10 >> root     4792   0.0  0.0   12808   2644 u0  S+   15:39 0:00.00 grep dog >> 0{p9999}# kill -9 4752 >> 0{p9999}# KDB: stack backtrace: >> #0 0xffffffff80b7fefd at kdb_backtrace+0x5d >> #1 0xffffffff80abec93 at hardclock+0x103 >> #2 0xffffffff80abfe8b at handleevents+0xab >> #3 0xffffffff80ac0b7c at timercb+0x24c >> #4 0xffffffff810d0ebb at lapic_handle_timer+0xab >> #5 0xffffffff80fd8a71 at Xtimerint+0xb1 >> #6 0xffffffff804b3685 at acpi_cpu_idle+0x2c5 >> #7 0xffffffff80fc48f6 at cpu_idle_acpi+0x46 >> #8 0xffffffff80fc49ad at cpu_idle+0x9d >> #9 0xffffffff80b67bb6 at sched_idletd+0x576 >> #10 0xffffffff80aecf7f at fork_exit+0x7f >> #11 0xffffffff80fd7dae at fork_trampoline+0xe >> >> 0{p9999}# >> >> Where would be the best place to hack in something like this in the >> driver ? >>  sysctl -w debug.kdb.panic_str="Watchdog Panic" >> >> which actually does panic the box >> >> > > One other datapoint. It seems starting > > watchdogd --softtimeout-action panic --softtimeout -t 10 > > After kill -9 > it eventually prints out > > watchdog soft-timeout, WD_SOFT_LOG > > to dmesg.  But after that, I cannot start a new watchdogd with just > > watchdogd --softtimeout-action panic -t 10 > > I get > > watchdogd: setting WDIOC_SETSOFT 1: Invalid argument > watchdogd: patting the dog: Invalid argument I made these 2 changes to the driver --- watchdog.c  2024-10-01 20:37:28.667869000 -0400 +++ /tmp/watchdog.c     2024-10-01 20:36:59.764330000 -0400 @@ -61,7 +61,8 @@  static struct callout wd_softtimeo_handle;  static int wd_softtimer;       /* true = use softtimer instead of hardware                                    watchdog */ -static int wd_softtimeout_act = WD_SOFT_LOG;   /* action for the software timeout */ +// static int wd_softtimeout_act = WD_SOFT_LOG;        /* action for the software timeout */ +static int wd_softtimeout_act = WD_SOFT_PANIC; /* action for the software timeout */  static struct cdev *wd_dev;  static volatile u_int wd_last_u;    /* last timeout value set by kern_do_pat */ @@ -241,6 +242,7 @@  wd_timeout_cb(void *arg)  {         const char *type = arg; +       panic("mdt watchdog %s-timeout, WD_SOFT_PANIC set", type);  #ifdef DDB         if ((wd_pretimeout_act & WD_SOFT_DDB)) { and it works now KDB: stack backtrace: #0 0xffffffff80b8943d at kdb_backtrace+0x5d #1 0xffffffff80b3bfd1 at vpanic+0x131 #2 0xffffffff80b3be93 at panic+0x43 #3 0xffffffff8098b585 at wd_timeout_cb+0x15 #4 0xffffffff80b59fcc at softclock_call_cc+0x12c #5 0xffffffff80b5b815 at softclock_thread+0xe5 #6 0xffffffff80af61df at fork_exit+0x7f #7 0xffffffff80ff76ce at fork_trampoline+0xe Uptime: 1m13s it seems the soft timeout value action is never overridden for some reason. This kinda feels like a bug / pr ?     ---Mike From nobody Wed Oct 2 09:13:34 2024 X-Original-To: freebsd-hardware@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XJTyG4p3Vz5YFt7 for ; Wed, 02 Oct 2024 09:28:38 +0000 (UTC) (envelope-from Stephane.ROCHOY@stormshield.eu) Received: from mail.stormshield.eu (mail.stormshield.eu [91.212.116.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.stormshield.eu", Issuer "Sectigo RSA Organization Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XJTyG26wRz4v6D for ; Wed, 2 Oct 2024 09:28:38 +0000 (UTC) (envelope-from Stephane.ROCHOY@stormshield.eu) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stormshield.eu; s=signer2; t=1727861311; h=From:Subject:Date:Message-ID:To:Cc :MIME-Version:Content-Type:Content-Transfer-Encoding:In-Reply-To :References; bh=kGv9jRhCU6UG6AX/W4LNfrQ3vu1ayIgM38LxjK/5eEg=; b=zKL8mh2Qi uHVxNA0r2tyNHT1AXMO6aJgBOU1Xl/ag/FASBXm2vh3iFj7iZChtM7PWYMjWkYPOtKIw+DVP7 6puDeA7z33L+6Bpxs4YAu6UWs3jwQwXQZH8woo2yKGp8yE9GR9na3CDqOkl2f04zn/WghxtLc XJJvqBG4ZyCuVViE/bL1siKZ4TxZly3JhuqJSD8uwDuSHUMNrltVGit9MyB1vVwEvcOu3nEvn B/OD8VS5HTi71sAlx9XYSkj+KT6vdEWyRgN4sVMa7TnPPd7JeqvrLuC847dbYWzfRcn8PiCND TDdr0TaBCSxoVSp0ZtzIYbcK6ByDshFOuEKPXrlNg==; References: <3065debc-8d4f-4487-abbb-c9408810cea6@sentex.net> <86plotbk5b.fsf@cthulhu.stephaner.labo.int> <9008b389-ab06-401d-9a95-84f849ca602a@sentex.net> <86plosdv48.fsf@cthulhu.stephaner.labo.int> <78e9461c-b93d-403f-b3a1-3568548b9283@sentex.net> <86h6a1egcs.fsf@cthulhu.stephaner.labo.int> <868qvddwph.fsf@cthulhu.stephaner.labo.int> <2d850ccc-2e90-4a1a-927c-045d4750d570@sentex.net> <864j5xehes.fsf@cthulhu.stephaner.labo.int> <86zfnocpb8.fsf@cthulhu.stephaner.labo.int> <8b730043-a759-4bb4-b7ee-323a317ce6d2@sentex.net> <1b346afb-d6ed-4f00-8dcf-5cdd389d210b@sentex.net> <82dc6dbf-8aa7-45ef-8fe9-08dc54973c2c@sentex.net> User-agent: mu4e 1.10.7; emacs 29.4 From: Stephane Rochoy To: mike tancsa CC: Chris6 via freebsd-hardware Subject: Re: watchdog timer programming (progress) Date: Wed, 2 Oct 2024 11:13:34 +0200 In-Reply-To: <82dc6dbf-8aa7-45ef-8fe9-08dc54973c2c@sentex.net> Message-ID: <86r08ydfb5.fsf@cthulhu.stephaner.labo.int> List-Id: General discussion of FreeBSD hardware List-Archive: https://lists.freebsd.org/archives/freebsd-hardware List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hardware@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: ICTDCCEXCH003.one.local (10.180.4.3) To ICTDCCEXCH002.one.local (10.180.4.2) X-DKIM-Signer: DkimX (v3.60.360) X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:49068, ipnet:91.212.116.0/24, country:FR] X-Rspamd-Queue-Id: 4XJTyG26wRz4v6D X-Spamd-Bar: ---- mike tancsa writes: > On 10/1/2024 5:02 PM, mike tancsa wrote: >> On 10/1/2024 4:03 PM, mike tancsa wrote: >>> On 10/1/2024 2:07 AM, Stephane Rochoy wrote: >>>> >>>> mike tancsa writes: >>>> >>>>> WARNING: This e-mail comes from someone outside your=20 >>>>> organisation. >>>>> Do not click >>>>> on links or open attachments if you do not know the sender=20 >>>>> and are >>>>> not sure that >>>>> the content is safe. >>>>> >>>>> On 9/30/2024 3:18 AM, Stephane Rochoy wrote: >>>>>> >>>>>> mike tancsa writes: >>>>>> >>>>>>> Do you know off hand how to set the system to just reboot=20 >>>>>>> ? The >>>>>>> ddb man >>>>>>> page seems to imply I need options DDB as well, which is=20 >>>>>>> not in >>>>>>> GENERIC >>>>>>> in order to set script actions. >>>>>> >>>>>> I would try the following: >>>>>> >>>>>> ddb script kdb.enter.default=3Dreset >>>>>> >>>>> If I build a custom kernel then that will work. But with=20 >>>>> GENERIC (I am >>>>> tracking project via freebsd-update), it fails >>>>> >>>>> # ddb script kdb.enter.default=3Dreset >>>>> ddb: sysctl: debug.ddb.scripting.scripts: No such file or=20 >>>>> directory >>>>> >>>>> With a customer kernel, adding >>>>> >>>>> options DDB >>>>> >>>>> it works perfectly. >>>>> >>>>> Is there any way to get this to work without having ddb=20 >>>>> custom >>>>> compiled in ? >>>> >>>> I don't understand what's happening here. AFAIK, the code >>>> corresponding to the soft watchdog being triggered is the >>>> following: >>>> >>>> static void >>>> wd_timeout_cb(void *arg) >>>> { >>>> const char *type =3D arg; >>>> >>>> #ifdef DDB >>>> if ((wd_pretimeout_act & WD_SOFT_DDB)) { >>>> char kdb_why[80]; >>>> snprintf(kdb_why, sizeof(kdb_why), "watchdog=20 >>>> %s-timeout", >>>> type); >>>> kdb_backtrace(); >>>> kdb_enter(KDB_WHY_WATCHDOG, kdb_why); >>>> } >>>> #endif >>>> if ((wd_pretimeout_act & WD_SOFT_LOG)) >>>> log(LOG_EMERG, "watchdog %s-timeout, WD_SOFT_LOG\n",=20 >>>> type); >>>> if ((wd_pretimeout_act & WD_SOFT_PRINTF)) >>>> printf("watchdog %s-timeout, WD_SOFT_PRINTF\n", type); >>>> if ((wd_pretimeout_act & WD_SOFT_PANIC)) >>>> panic("watchdog %s-timeout, WD_SOFT_PANIC set", type); >>>> } >>>> >>>> So without DDB, it should call panic. But in your case, it >>>> called kdb_backtrace. So initial hypothesis was wrong. What I >>>> missed is that panic was natively able to kdb_backtrace if=20 >>>> gently >>>> asked to do so: >>>> >>>> #ifdef KDB >>>> if ((newpanic || trace_all_panics) && trace_on_panic) >>>> kdb_backtrace(); >>>> if (debugger_on_panic) >>>> kdb_enter(KDB_WHY_PANIC, "panic"); >>>> else if (!newpanic && debugger_on_recursive_panic) >>>> kdb_enter(KDB_WHY_PANIC, "re-panic"); >>>> #endif >>>> /*thread_lock(td); */ >>>> td->td_flags |=3D TDF_INPANIC; >>>> /* thread_unlock(td); */ >>>> if (!sync_on_panic) >>>> bootopt |=3D RB_NOSYNC; >>>> if (poweroff_on_panic) >>>> bootopt |=3D RB_POWEROFF; >>>> if (powercycle_on_panic) >>>> bootopt |=3D RB_POWERCYCLE; >>>> kern_reboot(bootopt); >>>> >>>> So it definitely should reboot but as it don't, maybe playing=20 >>>> with >>>> kern.powercycle_on_panic would help? >>>> >>>> >>> >>> Thank you for your continued help on this. Still no luck with=20 >>> the >>> GENERIC kernel >>> >>> 0{p9999}# sysctl -w kern.powercycle_on_panic=3D1 >>> kern.powercycle_on_panic: 0 -> 1 >>> 0{p9999}# ps -auxwww | grep dog >>> root 4752 0.0 0.2 12820 12916 - S>> watchdogd --softtimeout-action panic -t 10 >>> root 4792 0.0 0.0 12808 2644 u0 S+ 15:39 0:00.00=20 >>> grep dog >>> 0{p9999}# kill -9 4752 >>> 0{p9999}# KDB: stack backtrace: >>> #0 0xffffffff80b7fefd at kdb_backtrace+0x5d >>> #1 0xffffffff80abec93 at hardclock+0x103 >>> #2 0xffffffff80abfe8b at handleevents+0xab >>> #3 0xffffffff80ac0b7c at timercb+0x24c >>> #4 0xffffffff810d0ebb at lapic_handle_timer+0xab >>> #5 0xffffffff80fd8a71 at Xtimerint+0xb1 >>> #6 0xffffffff804b3685 at acpi_cpu_idle+0x2c5 >>> #7 0xffffffff80fc48f6 at cpu_idle_acpi+0x46 >>> #8 0xffffffff80fc49ad at cpu_idle+0x9d >>> #9 0xffffffff80b67bb6 at sched_idletd+0x576 >>> #10 0xffffffff80aecf7f at fork_exit+0x7f >>> #11 0xffffffff80fd7dae at fork_trampoline+0xe >>> >>> 0{p9999}# >>> >>> Where would be the best place to hack in something like this=20 >>> in the >>> driver ? >>> sysctl -w debug.kdb.panic_str=3D"Watchdog Panic" >>> >>> which actually does panic the box >>> >>> >> >> One other datapoint. It seems starting >> >> watchdogd --softtimeout-action panic --softtimeout -t 10 >> >> After kill -9 >> it eventually prints out >> >> watchdog soft-timeout, WD_SOFT_LOG >> >> to dmesg. But after that, I cannot start a new watchdogd with=20 >> just >> >> watchdogd --softtimeout-action panic -t 10 >> >> I get >> >> watchdogd: setting WDIOC_SETSOFT 1: Invalid argument >> watchdogd: patting the dog: Invalid argument > > > I made these 2 changes to the driver > > --- watchdog.c 2024-10-01 20:37:28.667869000 -0400 > +++ /tmp/watchdog.c 2024-10-01 20:36:59.764330000 -0400 > @@ -61,7 +61,8 @@ > static struct callout wd_softtimeo_handle; > static int wd_softtimer; /* true =3D use softtimer instead=20 > of hardware > watchdog */ > -static int wd_softtimeout_act =3D WD_SOFT_LOG; /* action for=20 > the > software timeout */ > +// static int wd_softtimeout_act =3D WD_SOFT_LOG; /*=20 > action for > the software timeout */ > +static int wd_softtimeout_act =3D WD_SOFT_PANIC; /* action for=20 > the > software timeout */ > > static struct cdev *wd_dev; > static volatile u_int wd_last_u; /* last timeout value set=20 > by > kern_do_pat */ > @@ -241,6 +242,7 @@ > wd_timeout_cb(void *arg) > { > const char *type =3D arg; > + panic("mdt watchdog %s-timeout, WD_SOFT_PANIC set",=20 > type); > > #ifdef DDB > if ((wd_pretimeout_act & WD_SOFT_DDB)) { > > > and it works now > > KDB: stack backtrace: > #0 0xffffffff80b8943d at kdb_backtrace+0x5d > #1 0xffffffff80b3bfd1 at vpanic+0x131 > #2 0xffffffff80b3be93 at panic+0x43 > #3 0xffffffff8098b585 at wd_timeout_cb+0x15 > #4 0xffffffff80b59fcc at softclock_call_cc+0x12c > #5 0xffffffff80b5b815 at softclock_thread+0xe5 > #6 0xffffffff80af61df at fork_exit+0x7f > #7 0xffffffff80ff76ce at fork_trampoline+0xe > Uptime: 1m13s > > it seems the soft timeout value action is never overridden for=20 > some reason. > > This kinda feels like a bug / pr ? Well, honestly I'm puzzled: - in one hand, watchdog.c don't seems to use wd_softtimeout_act - and on the other hand hardclock seems to directly call watchdog_fire which just kdb_enter or panic. Note that wd_timeout_cb seems to be about both pretimeout and timeout handling. Regards, --=20 St=C3=A9phane Rochoy O: Stormshield From nobody Wed Oct 2 12:59:09 2024 X-Original-To: freebsd-hardware@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XJZdG6PnSz5YVGd for ; Wed, 02 Oct 2024 12:59:14 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [IPv6:2607:f3e0:0:1::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smarthost1.sentex.ca", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XJZdF6h6sz4crj for ; Wed, 2 Oct 2024 12:59:13 +0000 (UTC) (envelope-from mike@sentex.net) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of mike@sentex.net designates 2607:f3e0:0:1::12 as permitted sender) smtp.mailfrom=mike@sentex.net; dmarc=none Received: from pyroxene2a.sentex.ca (pyroxene19.sentex.ca [199.212.134.19]) by smarthost1.sentex.ca (8.18.1/8.18.1) with ESMTPS id 492CxATl019451 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=FAIL); Wed, 2 Oct 2024 08:59:10 -0400 (EDT) (envelope-from mike@sentex.net) Received: from [IPV6:2607:f3e0:0:4:eca3:ea83:d867:1a0] ([IPv6:2607:f3e0:0:4:eca3:ea83:d867:1a0]) by pyroxene2a.sentex.ca (8.18.1/8.15.2) with ESMTPS id 492Cx8NT030922 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Wed, 2 Oct 2024 08:59:09 -0400 (EDT) (envelope-from mike@sentex.net) Message-ID: <24fba042-3c77-4dc5-9660-c73a102e33ca@sentex.net> Date: Wed, 2 Oct 2024 08:59:09 -0400 List-Id: General discussion of FreeBSD hardware List-Archive: https://lists.freebsd.org/archives/freebsd-hardware List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hardware@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: watchdog timer programming (progress) To: Chris6 via freebsd-hardware References: <3065debc-8d4f-4487-abbb-c9408810cea6@sentex.net> <86plotbk5b.fsf@cthulhu.stephaner.labo.int> <9008b389-ab06-401d-9a95-84f849ca602a@sentex.net> <86plosdv48.fsf@cthulhu.stephaner.labo.int> <78e9461c-b93d-403f-b3a1-3568548b9283@sentex.net> <86h6a1egcs.fsf@cthulhu.stephaner.labo.int> <868qvddwph.fsf@cthulhu.stephaner.labo.int> <2d850ccc-2e90-4a1a-927c-045d4750d570@sentex.net> <864j5xehes.fsf@cthulhu.stephaner.labo.int> <86zfnocpb8.fsf@cthulhu.stephaner.labo.int> <8b730043-a759-4bb4-b7ee-323a317ce6d2@sentex.net> <1b346afb-d6ed-4f00-8dcf-5cdd389d210b@sentex.net> <82dc6dbf-8aa7-45ef-8fe9-08dc54973c2c@sentex.net> <86r08ydfb5.fsf@cthulhu.stephaner.labo.int> Content-Language: en-US From: mike tancsa Autocrypt: addr=mike@sentex.net; keydata= xsBNBFywzOMBCACoNFpwi5MeyEREiCeHtbm6pZJI/HnO+wXdCAWtZkS49weOoVyUj5BEXRZP xflV2ib2hflX4nXqhenaNiia4iaZ9ft3I1ebd7GEbGnsWCvAnob5MvDZyStDAuRxPJK1ya/s +6rOvr+eQiXYNVvfBhrCfrtR/esSkitBGxhUkBjOti8QwzD71JVF5YaOjBAs7jZUKyLGj0kW yDg4jUndudWU7G2yc9GwpHJ9aRSUN8e/mWdIogK0v+QBHfv/dsI6zVB7YuxCC9Fx8WPwfhDH VZC4kdYCQWKXrm7yb4TiVdBh5kgvlO9q3js1yYdfR1x8mjK2bH2RSv4bV3zkNmsDCIxjABEB AAHNHW1pa2UgdGFuY3NhIDxtaWtlQHNlbnRleC5uZXQ+wsCOBBMBCAA4FiEEmuvCXT0aY6hs 4SbWeVOEFl5WrMgFAl+pQfkCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQeVOEFl5W rMiN6ggAk3H5vk8QnbvGbb4sinxZt/wDetgk0AOR9NRmtTnPaW+sIJEfGBOz47Xih+f7uWJS j+uvc9Ewn2Z7n8z3ZHJlLAByLVLtcNXGoRIGJ27tevfOaNqgJHBPbFOcXCBBFTx4MYMM4iAZ cDT5vsBTSaM36JZFtHZBKkuFEItbA/N8ZQSHKdTYMIA7A3OCLGbJBqloQ8SlW4MkTzKX4u7R yefAYQ0h20x9IqC5Ju8IsYRFacVZconT16KS81IBceO42vXTN0VexbVF2rZIx3v/NT75r6Vw 0FlXVB1lXOHKydRA2NeleS4NEG2vWqy/9Boj0itMfNDlOhkrA/0DcCurMpnpbM7ATQRcsMzk AQgA1Dpo/xWS66MaOJLwA28sKNMwkEk1Yjs+okOXDOu1F+0qvgE8sVmrOOPvvWr4axtKRSG1 t2QUiZ/ZkW/x/+t0nrM39EANV1VncuQZ1ceIiwTJFqGZQ8kb0+BNkwuNVFHRgXm1qzAJweEt RdsCMohB+H7BL5LGCVG5JaU0lqFU9pFP40HxEbyzxjsZgSE8LwkI6wcu0BLv6K6cLm0EiHPO l5G8kgRi38PS7/6s3R8QDsEtbGsYy6O82k3zSLIjuDBwA9GRaeigGppTxzAHVjf5o9KKu4O7 gC2KKVHPegbXS+GK7DU0fjzX57H5bZ6komE5eY4p3oWT/CwVPSGfPs8jOwARAQABwsB2BBgB CAAgFiEEmuvCXT0aY6hs4SbWeVOEFl5WrMgFAl+pQfkCGwwACgkQeVOEFl5WrMiVqwf9GwU8 c6cylknZX8QwlsVudTC8xr/L17JA84wf03k3d4wxP7bqy5AYy7jboZMbgWXngAE/HPQU95NM aukysSnknzoIpC96XZJ0okLBXVS6Y0ylZQ+HrbIhMpuQPoDweoF5F9wKrsHRoDaUK1VR706X rwm4HUzh7Jk+auuMYfuCh0FVlFBEuiJWMLhg/5WCmcRfiuB6F59ZcUQrwLEZeNhF2XJV4KwB Tlg7HCWO/sy1foE5noaMyACjAtAQE9p5kGYaj+DuRhPdWUTsHNuqrhikzIZd2rrcMid+ktb0 NvtvswzMO059z1YGMtGSqQ4srCArju+XHIdTFdiIYbd7+jeehg== In-Reply-To: <86r08ydfb5.fsf@cthulhu.stephaner.labo.int> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.86 X-Spamd-Result: default: False [-1.75 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-0.999]; NEURAL_SPAM_SHORT(0.64)[0.636]; R_SPF_ALLOW(-0.20)[+ip6:2607:f3e0::/32]; MIME_GOOD(-0.10)[text/plain]; RCVD_IN_DNSWL_LOW(-0.10)[199.212.134.19:received]; XM_UA_NO_VERSION(0.01)[]; ASN(0.00)[asn:11647, ipnet:2607:f3e0::/32, country:CA]; FREEFALL_USER(0.00)[mike]; MIME_TRACE(0.00)[0:+]; RCPT_COUNT_ONE(0.00)[1]; MID_RHS_MATCH_FROM(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; DMARC_NA(0.00)[sentex.net]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MLMMJ_DEST(0.00)[freebsd-hardware@freebsd.org]; RCVD_COUNT_TWO(0.00)[2]; TO_DN_ALL(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; ARC_NA(0.00)[] X-Rspamd-Queue-Id: 4XJZdF6h6sz4crj X-Spamd-Bar: - On 10/2/2024 5:13 AM, Stephane Rochoy wrote: >> >> it seems the soft timeout value action is never overridden for some >> reason. >> >> This kinda feels like a bug / pr ? > > Well, honestly I'm puzzled: > - in one hand, watchdog.c don't seems to use wd_softtimeout_act > - and on the other hand hardclock seems to directly call >  watchdog_fire which just kdb_enter or panic. > > Note that wd_timeout_cb seems to be about both pretimeout and > timeout handling. > I was able to get things to work the way I want by setting the pre-action timeout. But, I need to set things 'right' the first time I call it.  If I do 0{p9999}# watchdogd --pretimeout-action panic --softtimeout-action panic -t 10 0{p9999}# killall -9 watchdogd 0{p9999}# KDB: stack backtrace: #0 0xffffffff80b7fefd at kdb_backtrace+0x5d #1 0xffffffff80abec93 at hardclock+0x103 #2 0xffffffff80abfe8b at handleevents+0xab #3 0xffffffff80ac0b7c at timercb+0x24c #4 0xffffffff810d0ebb at lapic_handle_timer+0xab #5 0xffffffff80fd8a71 at Xtimerint+0xb1 #6 0xffffffff804b3685 at acpi_cpu_idle+0x2c5 #7 0xffffffff80fc48f6 at cpu_idle_acpi+0x46 #8 0xffffffff80fc49ad at cpu_idle+0x9d #9 0xffffffff80b67bb6 at sched_idletd+0x576 #10 0xffffffff80aecf7f at fork_exit+0x7f #11 0xffffffff80fd7dae at fork_trampoline+0xe 0{p9999}# 0{p9999}# watchdogd --pretimeout-action panic --softtimeout --softtimeout-action panic -t 10 watchdogd: setting WDIOC_SETSOFT 1: Invalid argument watchdogd: patting the dog: Invalid argument 71{p9999}# But if I reboot the box and make sure nothing is set, and start the daemon watchdogd --pretimeout-action panic --softtimeout --softtimeout-action panic -t 10 it works 0{p9999}# watchdogd --pretimeout-action panic --softtimeout --softtimeout-action panic -t 10 0{p9999}# killall -9 watchdogd 0{p9999}# panic: watchdog soft-timeout, WD_SOFT_PANIC set cpuid = 0 time = 1727873819 KDB: stack backtrace: #0 0xffffffff80b7fefd at kdb_backtrace+0x5d #1 0xffffffff80b32bd1 at vpanic+0x131 #2 0xffffffff80b32a93 at panic+0x43 #3 0xffffffff809827bb at wd_timeout_cb+0x6b #4 0xffffffff80b50b0c at softclock_call_cc+0x12c #5 0xffffffff80b52355 at softclock_thread+0xe5 #6 0xffffffff80aecf7f at fork_exit+0x7f #7 0xffffffff80fd7dae at fork_trampoline+0xe Timeout initializing vt_vga Uptime: 50s Automatic reboot in 15 seconds - press a key on the console to abort I think some of the dead ends I ran into was due to this reason. My stock image has watchdogd_enable="YES" and having that start up would set something that would then lead to dead ends. But, if I JUST start up watchdogd --pretimeout-action panic --softtimeout --softtimeout-action panic -t 10 it works.  I wonder if warrants a PR for the docs at least. Anyways, thanks again for helping me work through all this!     ---Mike