Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 Dec 2020 12:51:32 +0100
From:      Michal Meloun <meloun.michal@gmail.com>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        Marcel Flores <marcel@brickporch.com>, freebsd-arm@freebsd.org
Subject:   Re: ThunderX Panic after r368370
Message-ID:  <91654fc4-8734-d8a7-5309-0400f418438a@freebsd.org>
In-Reply-To: <56F0E9EB-0B78-4B0B-830A-48F8AFC5ABE1@yahoo.com>
References:  <1C3442ED-278E-45B8-9206-0DD24FCBC237@brickporch.com> <4331eee0-74a6-565c-3bec-0051415b2bc1@freebsd.org> <56F0E9EB-0B78-4B0B-830A-48F8AFC5ABE1@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On 06.12.2020 10:47, Mark Millard wrote:
> 
> 
> On 2020-Dec-6, at 00:17, Michal Meloun <meloun.michal at gmail.com> wrote:
> 
>> On 06.12.2020 3:21, Marcel Flores wrote:
>>> Hi All,
>>> Looks like the ThunderX started panicking at boot after r368370:
>>> https://reviews.freebsd.org/rS368370
>>>  From a verbose boot, it looks like it bails in gic0 redistributor setup(?):
>>> gic0: CPU29 Re-Distributor woke up
>>> gic0: CPU24 enabled CPU interface via system registers
>>> gic0: CPU17 enabled CPU interface via system registers
>>> gic0: CPU29 enabled CPU interface via system registers
>>> done
>>> Full Verbose boot:
>>> https://gist.github.com/mesflores/f026122495c8494d041bce04d30b15bb
>>> I'm not really familiar with the details of the commit, but happy to test
>>> anything if anyone has any ideas.
>>
>>
>> Hi Marcel
>> are you able to get crashdump and do backtrace?
>> https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html#kerneldebug-obtain
>> and
>> https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html
>> If not, I'll make some debug patch.
>>
>> It's weird, even though GIC is potentially affected by my patch, in this case the cpuid numbering was not changed.
> 
> (I've no access to a ThunderX. I just looked for my own curiosity.
> Sorry if this is obvious and so is noise.)
> 
> When I looked at the code it appeared to be the last "->" in
> the following that was dereferencing the nullptr value (via [x8]
> in assembler notation):
> 
> static uint64_t
> its_cmd_prepare(struct its_cmd *cmd, struct its_cmd_desc *desc)
> {
>          uint64_t target;
>          uint8_t cmd_type;
>          u_int size;
> 
>          cmd_type = desc->cmd_type;
>          target = ITS_TARGET_NONE;
> 
>          switch (cmd_type) {
>          case ITS_CMD_MOVI:      /* Move interrupt ID to another collection */
>                  target = desc->cmd_desc_movi.col->col_target;
> . . .
> 
> In other words: it appeared to me that the above desc->cmd_desc_movi.col
> evaluated as 0 when used in what was reported.
> 
This is very probably right analysis. But problem is that 
cmd_desc_movi.col should not be NULL, is initialized in its_cmd_movi 
from sc->sc_its_cols which should be allocated in gicv3_its_attach().


Marcel, can you, please also try this debug patch?
https://github.com/strejda/freebsd/commit/a25ed736644b05672e3e813891af213c280daac3
Unfortunately, I have only single socket board with GIv3, Honeycomb, but 
it still boots fine.

Thanks, Michal




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?91654fc4-8734-d8a7-5309-0400f418438a>