Date: Tue, 1 Oct 2024 17:02:07 -0400 From: mike tancsa <mike@sentex.net> To: Chris6 via freebsd-hardware <freebsd-hardware@freebsd.org> Subject: Re: watchdog timer programming Message-ID: <1b346afb-d6ed-4f00-8dcf-5cdd389d210b@sentex.net> In-Reply-To: <8b730043-a759-4bb4-b7ee-323a317ce6d2@sentex.net> References: <3065debc-8d4f-4487-abbb-c9408810cea6@sentex.net> <86plotbk5b.fsf@cthulhu.stephaner.labo.int> <9008b389-ab06-401d-9a95-84f849ca602a@sentex.net> <86plosdv48.fsf@cthulhu.stephaner.labo.int> <78e9461c-b93d-403f-b3a1-3568548b9283@sentex.net> <86h6a1egcs.fsf@cthulhu.stephaner.labo.int> <c4343727-606c-409a-8618-9da732fc3059@sentex.net> <868qvddwph.fsf@cthulhu.stephaner.labo.int> <2d850ccc-2e90-4a1a-927c-045d4750d570@sentex.net> <864j5xehes.fsf@cthulhu.stephaner.labo.int> <be51c81f-11c7-4f90-9df7-9635ec1df2c4@sentex.net> <86zfnocpb8.fsf@cthulhu.stephaner.labo.int> <8b730043-a759-4bb4-b7ee-323a317ce6d2@sentex.net>
index | next in thread | previous in thread | raw e-mail
On 10/1/2024 4:03 PM, mike tancsa wrote:
> On 10/1/2024 2:07 AM, Stephane Rochoy wrote:
>>
>> mike tancsa <mike@sentex.net> writes:
>>
>>> WARNING: This e-mail comes from someone outside your organisation.
>>> Do not click
>>> on links or open attachments if you do not know the sender and are
>>> not sure that
>>> the content is safe.
>>>
>>> On 9/30/2024 3:18 AM, Stephane Rochoy wrote:
>>>>
>>>> mike tancsa <mike@sentex.net> writes:
>>>>
>>>>> Do you know off hand how to set the system to just reboot ? The
>>>>> ddb man
>>>>> page seems to imply I need options DDB as well, which is not in
>>>>> GENERIC
>>>>> in order to set script actions.
>>>>
>>>> I would try the following:
>>>>
>>>> ddb script kdb.enter.default=reset
>>>>
>>> If I build a custom kernel then that will work. But with GENERIC (I am
>>> tracking project via freebsd-update), it fails
>>>
>>> # ddb script kdb.enter.default=reset
>>> ddb: sysctl: debug.ddb.scripting.scripts: No such file or directory
>>>
>>> With a customer kernel, adding
>>>
>>> options DDB
>>>
>>> it works perfectly.
>>>
>>> Is there any way to get this to work without having ddb custom
>>> compiled in ?
>>
>> I don't understand what's happening here. AFAIK, the code
>> corresponding to the soft watchdog being triggered is the
>> following:
>>
>> static void
>> wd_timeout_cb(void *arg)
>> {
>> const char *type = arg;
>>
>> #ifdef DDB
>> if ((wd_pretimeout_act & WD_SOFT_DDB)) {
>> char kdb_why[80];
>> snprintf(kdb_why, sizeof(kdb_why), "watchdog %s-timeout",
>> type);
>> kdb_backtrace();
>> kdb_enter(KDB_WHY_WATCHDOG, kdb_why);
>> }
>> #endif
>> if ((wd_pretimeout_act & WD_SOFT_LOG))
>> log(LOG_EMERG, "watchdog %s-timeout, WD_SOFT_LOG\n", type);
>> if ((wd_pretimeout_act & WD_SOFT_PRINTF))
>> printf("watchdog %s-timeout, WD_SOFT_PRINTF\n", type);
>> if ((wd_pretimeout_act & WD_SOFT_PANIC))
>> panic("watchdog %s-timeout, WD_SOFT_PANIC set", type);
>> }
>>
>> So without DDB, it should call panic. But in your case, it
>> called kdb_backtrace. So initial hypothesis was wrong. What I
>> missed is that panic was natively able to kdb_backtrace if gently
>> asked to do so:
>>
>> #ifdef KDB
>> if ((newpanic || trace_all_panics) && trace_on_panic)
>> kdb_backtrace();
>> if (debugger_on_panic)
>> kdb_enter(KDB_WHY_PANIC, "panic");
>> else if (!newpanic && debugger_on_recursive_panic)
>> kdb_enter(KDB_WHY_PANIC, "re-panic");
>> #endif
>> /*thread_lock(td); */
>> td->td_flags |= TDF_INPANIC;
>> /* thread_unlock(td); */
>> if (!sync_on_panic)
>> bootopt |= RB_NOSYNC;
>> if (poweroff_on_panic)
>> bootopt |= RB_POWEROFF;
>> if (powercycle_on_panic)
>> bootopt |= RB_POWERCYCLE;
>> kern_reboot(bootopt);
>>
>> So it definitely should reboot but as it don't, maybe playing with
>> kern.powercycle_on_panic would help?
>>
>>
>
> Thank you for your continued help on this. Still no luck with the
> GENERIC kernel
>
> 0{p9999}# sysctl -w kern.powercycle_on_panic=1
> kern.powercycle_on_panic: 0 -> 1
> 0{p9999}# ps -auxwww | grep dog
> root 4752 0.0 0.2 12820 12916 - S<s 15:38 0:00.01
> watchdogd --softtimeout-action panic -t 10
> root 4792 0.0 0.0 12808 2644 u0 S+ 15:39 0:00.00
> grep dog
> 0{p9999}# kill -9 4752
> 0{p9999}# KDB: stack backtrace:
> #0 0xffffffff80b7fefd at kdb_backtrace+0x5d
> #1 0xffffffff80abec93 at hardclock+0x103
> #2 0xffffffff80abfe8b at handleevents+0xab
> #3 0xffffffff80ac0b7c at timercb+0x24c
> #4 0xffffffff810d0ebb at lapic_handle_timer+0xab
> #5 0xffffffff80fd8a71 at Xtimerint+0xb1
> #6 0xffffffff804b3685 at acpi_cpu_idle+0x2c5
> #7 0xffffffff80fc48f6 at cpu_idle_acpi+0x46
> #8 0xffffffff80fc49ad at cpu_idle+0x9d
> #9 0xffffffff80b67bb6 at sched_idletd+0x576
> #10 0xffffffff80aecf7f at fork_exit+0x7f
> #11 0xffffffff80fd7dae at fork_trampoline+0xe
>
> 0{p9999}#
>
> Where would be the best place to hack in something like this in the
> driver ?
> sysctl -w debug.kdb.panic_str="Watchdog Panic"
>
> which actually does panic the box
>
>
One other datapoint. It seems starting
watchdogd --softtimeout-action panic --softtimeout -t 10
After kill -9
it eventually prints out
watchdog soft-timeout, WD_SOFT_LOG
to dmesg. But after that, I cannot start a new watchdogd with just
watchdogd --softtimeout-action panic -t 10
I get
watchdogd: setting WDIOC_SETSOFT 1: Invalid argument
watchdogd: patting the dog: Invalid argument
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1b346afb-d6ed-4f00-8dcf-5cdd389d210b>
