Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 Dec 2020 13:30:52 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        mmel@freebsd.org
Cc:        freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: ThunderX Panic after r368370
Message-ID:  <BB5C4C3E-EDF6-4C3D-BEE1-F8B2989216E0@yahoo.com>
In-Reply-To: <91654fc4-8734-d8a7-5309-0400f418438a@freebsd.org>
References:  <1C3442ED-278E-45B8-9206-0DD24FCBC237@brickporch.com> <4331eee0-74a6-565c-3bec-0051415b2bc1@freebsd.org> <56F0E9EB-0B78-4B0B-830A-48F8AFC5ABE1@yahoo.com> <91654fc4-8734-d8a7-5309-0400f418438a@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help


On 2020-Dec-6, at 03:51, Michal Meloun <meloun.michal at gmail.com> =
wrote:


> On 06.12.2020 10:47, Mark Millard wrote:
>> On 2020-Dec-6, at 00:17, Michal Meloun <meloun.michal at gmail.com> =
wrote:
>>> On 06.12.2020 3:21, Marcel Flores wrote:
>>>> Hi All,
>>>> Looks like the ThunderX started panicking at boot after r368370:
>>>> https://reviews.freebsd.org/rS368370
>>>> =46rom a verbose boot, it looks like it bails in gic0 redistributor =
setup(?):
>>>> gic0: CPU29 Re-Distributor woke up
>>>> gic0: CPU24 enabled CPU interface via system registers
>>>> gic0: CPU17 enabled CPU interface via system registers
>>>> gic0: CPU29 enabled CPU interface via system registers
>>>> done
>>>> Full Verbose boot:
>>>> https://gist.github.com/mesflores/f026122495c8494d041bce04d30b15bb
>>>> I'm not really familiar with the details of the commit, but happy =
to test
>>>> anything if anyone has any ideas.
>>>=20
>>>=20
>>> Hi Marcel
>>> are you able to get crashdump and do backtrace?
>>> =
https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html#=
kerneldebug-obtain
>>> and
>>> =
https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.h=
tml
>>> If not, I'll make some debug patch.
>>>=20
>>> It's weird, even though GIC is potentially affected by my patch, in =
this case the cpuid numbering was not changed.
>> (I've no access to a ThunderX. I just looked for my own curiosity.
>> Sorry if this is obvious and so is noise.)
>> When I looked at the code it appeared to be the last "->" in
>> the following that was dereferencing the nullptr value (via [x8]
>> in assembler notation):
>> static uint64_t
>> its_cmd_prepare(struct its_cmd *cmd, struct its_cmd_desc *desc)
>> {
>>         uint64_t target;
>>         uint8_t cmd_type;
>>         u_int size;
>>         cmd_type =3D desc->cmd_type;
>>         target =3D ITS_TARGET_NONE;
>>         switch (cmd_type) {
>>         case ITS_CMD_MOVI:      /* Move interrupt ID to another =
collection */
>>                 target =3D desc->cmd_desc_movi.col->col_target;
>> . . .
>> In other words: it appeared to me that the above =
desc->cmd_desc_movi.col
>> evaluated as 0 when used in what was reported.
> This is very probably right analysis. But problem is that =
cmd_desc_movi.col should not be NULL, is initialized in its_cmd_movi =
from sc->sc_its_cols which should be allocated in gicv3_its_attach().
>=20

The following is unlikely to directly contribute to the
specific problem's solution but documents an oddity that
took my time while looking around related the problem.

One (comment?) oddity I ran into looking around:

/usr/src/sys/sys/cpuset.h:#define       CPU_FFS(p)                      =
BIT_FFS(CPU_SETSIZE, p)

but in /usr/src/sys/sys/bitset.h :

#define BIT_FFS(_s, p) BIT_FFS_AT((_s), (p), 0)

and (comment wrong about start?):

/*
 * Note that `start` and the returned value from BIT_FFS_AT are
 * 1-based bit indices.
 */
#define BIT_FFS_AT(_s, p, start) __extension__ ({                       =
\
. . .

In other words, BIT_FFS (and CPU_FFS) provide BIT_FFS_AT with start=3D=3D0=

but start is documented to be a 1-based bit index.

So, looking into what happens with start=3D=3D0, showing BIT_FFS_AT:

#define BIT_FFS_AT(_s, p, start) __extension__ ({                       =
\
        __size_t __i;                                                   =
\
        long __mask;                                                    =
\
        int __bit;                                                      =
\
                                                                        =
\
        __mask =3D ~0UL << ((start) % _BITSET_BITS);                     =
 \
        __bit =3D 0;                                                     =
 \
        for (__i =3D __bitset_word((_s), (start));                       =
 \
            __i < __bitset_words((_s));                                 =
\
            __i++) {                                                    =
\
                if (((p)->__bits[__i] & __mask) !=3D 0) {                =
 \
                        __bit =3D ffsl((p)->__bits[__i] & __mask);       =
 \
                        __bit +=3D __i * _BITSET_BITS;                   =
 \
                        break;                                          =
\
                }                                                       =
\
                __mask =3D ~0UL;                                         =
 \
        }                                                               =
\
        __bit;                                                          =
\
})


It looks like this traces to use of:

        __mask =3D ~0UL << ((start) % _BITSET_BITS);                     =
 \

and to use of:

#define __bitset_word(_s, n)                                            =
\
        (__constexpr_cond(__bitset_words((_s)) =3D=3D 1) ?               =
   \
         0 : ((n) / _BITSET_BITS))

So __mask=3D=3D~0UL and __bitset_word((_s), (start))=3D=3D0 . Then for
__i=3D=3D0:

((p)->__bits[0] & __mask) !=3D 0 evaluates like
((p)->__bits[0] & ~0UL) !=3D 0 which in turn evaluates like
(p)->__bits[0] !=3D 0.

=46rom there __bit =3D ffsl((p)->__bits[0] & __mask) would involve
(p)->__bits[0] & __mask evaluing like (p)->__bits[0] & ~0UL and
that in turn evaluating like just (p)->__bits[0] . Presuming non-zero
as a context, effectively for such a context:

__bit =3D ffsl((p)->__bits[0]);
__bit +=3D 0;

which would seem to set __bit correctly.

It looks to me like start is 0-based in BIT_FFS_AT, not 1-based. So
I expect that the comment is wrong about start.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BB5C4C3E-EDF6-4C3D-BEE1-F8B2989216E0>