Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Sep 2018 16:53:15 +0200
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        Mike Tancsa <mike@sentex.net>
Cc:        Glen Barber <gjb@freebsd.org>, George Neville-Neil <gnn@neville-neil.com>, Paul Holes <pholes@sentex.ca>,  netperf-users@freebsd.org, netperf-admin@freebsd.org
Subject:   Re: update of zoo to r338656 12.0 (was Re: zoo vs 12.0 (was: zoo vs 11.2-rc2)
Message-ID:  <CAGudoHFv1AyWZiL1KsQwP1grkrz6s=eKmtSvFudzr%2BN9f7B4oQ@mail.gmail.com>
In-Reply-To: <2dcd8d1b-12f3-1da7-673c-8d24bc0eb948@sentex.net>
References:  <CAGudoHGb3FtoWAroBzVdDks6S2td-nnqJcdrkAsoiT_Q1PCYJQ@mail.gmail.com> <e3a7c7b6-e564-b5fe-ddad-6332bf6c96a0@sentex.net> <A344C6D5-BF69-48E8-8C0A-3610FE5BA15F@neville-neil.com> <b3f74263-127f-0b33-8d35-8e5c245cf826@sentex.net> <8ca07d41-b753-9741-49be-150d42197edc@sentex.net> <7dc50e6a-191b-002d-9adf-df16e591c9da@sentex.net> <ea0b113f-7b45-1ec3-f752-fcfa0ef7b2c0@sentex.net> <20180914140300.GB52847@FreeBSD.org> <2dcd8d1b-12f3-1da7-673c-8d24bc0eb948@sentex.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 9/14/18, Mike Tancsa <mike@sentex.net> wrote:
> On 9/14/2018 10:03 AM, Glen Barber wrote:
>> Mike,
>>
>> In the interest of morbid curiosity, could you rebuild the 12.0 kernel
>> without the 'options NUMA' line?  This was turned on very late, and too
>> close to the stable/12 branch, and I'd like to at least confirm this is
>> not in any way at fault.
> A couple of people are already working on the box.
>
> If its an MFI driver issue, I could put a spare card in one of the zoo
> members that makes use of NUMA and has more than one domain ?  I just
> tried a mfi card in an EPYC based machine with the same rev, and it
> boots up OK. But his only has one NUMA domain.
>
> I think pig would have multiple numa domains as does flix1a which noone
> seems to be on right now.
>

lynx1-4, pig1 and flix* all do have multiple nodes. lynx* has the
fastest boot cycle if you can plop a controller in there.

Rebooting without NUMA as a sanity check is definitely a good idea,
but I doubt that's it.

I like the idea of using the above boxes with a mfi controller in
hopes of reproducing the issue.

Looking at differences between the driver in head and stable/11 I see
2 changes, one of which looks extremely interesting:

commit a1d4bb9b4447414168dc2ffc8d5c74a1ef8bb152
Author: scottl <scottl@FreeBSD.org>
Date:   Fri Sep 8 17:51:19 2017 +0000

    Fix intrhook release in MFI as well

diff --git a/sys/dev/mfi/mfi.c b/sys/dev/mfi/mfi.c
index 28054d9bf7d..91ec872558a 100644
--- a/sys/dev/mfi/mfi.c
+++ b/sys/dev/mfi/mfi.c
@@ -1263,8 +1263,6 @@ mfi_startup(void *arg)

        sc = (struct mfi_softc *)arg;

-       config_intrhook_disestablish(&sc->mfi_ich);
-
        sc->mfi_enable_intr(sc);
        sx_xlock(&sc->mfi_config_lock);
        mtx_lock(&sc->mfi_io_lock);
@@ -1273,6 +1271,8 @@ mfi_startup(void *arg)
            mfi_syspdprobe(sc);
        mtx_unlock(&sc->mfi_io_lock);
        sx_xunlock(&sc->mfi_config_lock);
+
+       config_intrhook_disestablish(&sc->mfi_ich);
 }

 static void

Note it may be this has no relation to the problem whatsoever, but
booting a kernel with this change reverted would definitely help.

If a zoo-testable box is confirmed to hang I can take it from there
myself, the least I can do is bisect and chase the guilty. :)

-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGudoHFv1AyWZiL1KsQwP1grkrz6s=eKmtSvFudzr%2BN9f7B4oQ>