Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 1 Feb 2021 15:42:25 -0800
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        freebsd-ppc <freebsd-ppc@freebsd.org>
Subject:   Re: Expected issue? Old PowerMac G5 [...] vs. USB [...] [RealTek EtherNet] devices (...)
Message-ID:  <20210201234225.GX31099@funkthat.com>
In-Reply-To: <0382C932-2BFB-4A5A-9ABD-B4AC2C0AC2BF@yahoo.com>
References:  <E79AA0EA-FAAE-412E-BB26-A66D9AB00AB8@yahoo.com> <EF3494BA-2B9C-43A5-931F-45313B3BDA7D@yahoo.com> <20210201194702.GU31099@funkthat.com> <1C53A656-75ED-4E7C-9FB0-6C605BCDEC14@yahoo.com> <0382C932-2BFB-4A5A-9ABD-B4AC2C0AC2BF@yahoo.com>

index | next in thread | previous in thread | raw e-mail

Mark Millard wrote this message on Mon, Feb 01, 2021 at 14:57 -0800:
> On 2021-Feb-1, at 13:34, Mark Millard <marklmi at yahoo.com> wrote:
> 
> > On 2021-Feb-1, at 11:47, John-Mark Gurney <jmg at funkthat.com> wrote:
> > 
> >> Mark Millard wrote this message on Sun, Jan 31, 2021 at 13:45 -0800:
> >>> . . .
> > 
> > I'm working on seeing if I can get Firewire/dcons based
> > access going in hopes of getting more evidence that way.
> > 
> > I hope that such can be done via a 32-bit PowerMac G4
> > against the 64-bit PowerMac G5: it looks like the only
> > other G5 no longer can reliably boot (overheating that
> > fast now).
> 
> I see that I had forgotten to say that the initial
> report was based on a non-debug kernel build's
> behavior. Sorry.
> 
> The debug build with dcons and such built in does give
> more context and allows me to copy/paste the output.
> and allows me to type at the console.
> 
> Booting, logging in, and then plugging in the ure
> results in:
> 
> ugen4.2: <Realtek USB 10/100/1000 LAN> at usbus4
> ure0 numa-domain 0 on uhub4
> ure0: <Realtek USB 10/100/1000 LAN, class 0/0, rev 2.10/30.00, addr 2> on usbus4
> panic: lock (sleep mutex) ure0 not locked @ /usr/fbsd/mm-src/sys/dev/usb/usb_request.c:459
> cpuid = 1
> time = 1612216507
> KDB: stack backtrace:
> 0xc0080000c200df70: at kdb_backtrace+0x60
> 0xc0080000c200e080: at vpanic+0x1e0
> 0xc0080000c200e130: at panic+0x40
> 0xc0080000c200e160: at witness_unlock+0x18c
> 0xc0080000c200e1f0: at __mtx_unlock_flags+0x70
> 0xc0080000c200e280: at usbd_do_request_flags+0x170

This attempted to unload the lock, but somehow, was released already...

> 0xc0080000c200e3c0: at usbd_do_request_proc+0x98
> 0xc0080000c200e430: at ure_miibus_readreg+0x22c
> 0xc0080000c200e4b0: at mii_attach+0x490
> 0xc0080000c200e5a0: at ure_attach_post_sub+0x21c
> 0xc0080000c200e660: at ue_attach_post_task+0x188

lock is dropped here to call _post_sub

> 0xc0080000c200e760: at usb_process+0x170

lock is aquired and held in this function.

> 0xc0080000c200e820: at fork_exit+0xc4
> 0xc0080000c200e8c0: at fork_trampoline+0x18
> 0xc0080000c200e8f0: at -0x4
> KDB: enter: panic
> [ thread pid 15 tid 100146 ]
> Stopped at      kdb_enter+0x78: ori     r0, r0, 0x0
> db:0:kdb.enter.default> 
> 
> It looks to be that the complaint is from the code:
> 
> usb_error_t
> usbd_do_request_flags(struct usb_device *udev, struct mtx *mtx,
>     struct usb_device_request *req, void *data, uint16_t flags,
>     uint16_t *actlen, usb_timeout_t timeout)
> . . .
> #if (USB_HAVE_USER_IO == 0)
>         if (flags & USB_USER_DATA_PTR)
>                 return (USB_ERR_INVAL);
> #endif
>         if ((mtx != NULL) && (mtx != &Giant)) {
>                 USB_MTX_UNLOCK(mtx);
>                 USB_MTX_ASSERT(mtx, MA_NOTOWNED);
>         }
> 
> Doing boot, login, plugin with the axge need
> not fail (and typically has not so far):
> 
> axge0 numa-domain 0 on uhub4
> axge0: <NetworkInterface> on usbus4
> miibus1: <MII bus> numa-domain 0 on axge0
> rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 3 on miibus1
> rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
> if_delmulti_locked: detaching ifnet instance 0xc000000013450000
> if_delmulti_locked: detaching ifnet instance 0xc000000013450000
> 
> With the axge already plugged in at power up it generally
> works as well, at least for the debug kernel:
> 
> axge0 numa-domain 0 on uhub4
> axge0: <NetworkInterface> on usbus4
> miibus1: <MII bus> numa-domain 0 on axge0
> rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 3 on miibus1
> rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
> ue0: <USB Ethernet> on axge0
> ue0: Ethernet address: ###
> ue0: link state changed to DOWN
> ue0: link state changed to UP
> if_delmulti_locked: detaching ifnet instance 0xc000000013030000
> 
> Similarly, the ure already plugged in can work:
> 
> ure0 numa-domain 0 on uhub4
> ure0: <Realtek USB 10/100/1000 LAN, class 0/0, rev 2.10/30.00, addr 2> on usbus4
> miibus1: <MII bus> numa-domain 0 on ure0
> rgephy0: <RTL8251/8153 1000BASE-T media interface> PHY 0 on miibus1
> rgephy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto
> ue0: <USB Ethernet> on ure0
> ue0: Ethernet address: ###
> ue0: link state changed to DOWN
> ue0: link state changed to UP
> if_delmulti_locked: detaching ifnet instance 0xc0000000130d6800
> 
> An interesting point about the "it happens to
> work this time" example is that ure0 is working
> without use of hw.usb.xhci.use_polling=1 ,
> unlike back during the summer's experiments.
> 
> However, I did get one failure with the axge(!) (for
> an already plugged in at power up context) that I've
> only got a digital camera picture of but it looks
> just like the earlier ure backtrace in structure,
> but saying axge instead of ure and having different
> offsets in those routines:
> 
> axge_miibus_readreg+0x124
> . . .
> axge_attach_post_sub+0x15c
> 
> The detailed stack addresses are distinct, of course.
> Same pid but different tid, as well.
> 
> So it does possibly suggest that ure and axge share
> a problem, such as lack of initialization but mixes
> of lucky and unlucky values showing up. (But I've
> no more specific evidence that such is a reasonable
> summary of what is happening.)

I really don't have time to look into this right now.. It does look
like the locking around mii and usb ethernet isn't correct, per above,
where a lock is dropped, but never reaquired...

The problem is that the functions and how locking is handled is
undocumented (or I can't find the docs in my brief looking, as
functions like usbd_do_request_flags have a man page, but aren't
documented by said man page, where as usbd_do_request_proc is
undocumented)...

Someone else who works on the USB ethernet framework should take a
look at this issue, but I'm not sure who is currently working and
maintaining it right now...  I'll work with them on testing any ure
changes needed to match the locking process...

I must be missing something as to why it's working on non-ppc systems,
but failing on ppc systems...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."


help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20210201234225.GX31099>