Date: Sun, 21 Aug 2011 16:48:56 -0700 From: YongHyeon PYUN <pyunyh@gmail.com> To: Garrett Cooper <yanegomi@gmail.com> Cc: mdf@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, Pyun YongHyeon <yongari@freebsd.org> Subject: Re: Deterministic panic due to non-sleepable lock with if_alc when reconfiguring interfaces Message-ID: <20110821234856.GB1755@michelle.cdnetworks.com> In-Reply-To: <CAGH67wRPNygNw0h5L73U21jQnAvkr6NM7ASJM=bvXocxZgPo6Q@mail.gmail.com> References: <CAGH67wRWVu0qtae7fZjAi9r1H=Tt2QYpgJgF=1stUuWe1dg%2BSw@mail.gmail.com> <CAMBSHm-R0QBCy_FshgXq=neeAaHFTYStWkE=AcJ7NngNchvwxQ@mail.gmail.com> <CAGH67wRPNygNw0h5L73U21jQnAvkr6NM7ASJM=bvXocxZgPo6Q@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Aug 19, 2011 at 12:17:12AM -0700, Garrett Cooper wrote: > On Thu, Aug 18, 2011 at 9:31 PM, <mdf@freebsd.org> wrote: > > On Thu, Aug 18, 2011 at 5:50 PM, Garrett Cooper <yanegomi@gmail.com> wrote: > >> ? ?When loading if_alc as a module on my netbook and running > >> /etc/rc.d/netif restart, I can deterministically panic my netbook with > >> the following message: > > These repro steps were overly simplified. The complete steps are: > > 1. Attach ethernet cable to alc(4) enabled NIC. > 2. Boot up machine. > 3. Login. > 4. Physically remove ethernet cable from alc(4) enabled NIC. > 5. Run `/etc/rc.d/netif restart' as root. > I can't reproduce this with AR8151 sample board. Could you give me dmesg output to know exact controller revision? One issue I'm aware of is lack of re-establishing link when controller firmware put its PHY to deep sleep mode. The deep sleep mode seems to be automatically activated by firmware when it detects no energy signal(i.e. cable unplugged) so I had to down and up the interface again to take the PHY out of the sleep mode. > >> ) at _bus_dmamap_sync+0x51 > >> alc_stop(c3dbb000,0,c0c51844,93a,80206910,...) at alc_stop+0x24e > >> alc_ioctl(c3d07400,80206910,c40423c0,c06a7935,c0914e3c,...) at alc_ioctl+0x22e > >> ifioctl(c45029c0,80206910,c40423c0,c40505c0,c4528c00,...) at ifioctl+0xc98 > >> soo_ioctl(c4574e00,80206910,c40423c0,c413e680,c40505c0,...) at soo_ioctl+0x401 > >> kern_ioctl(c40505c0,3,80206910,c40423c0,c40423c0,...) at kern_ioctl+0x1d7 > >> ioctl(c40505c0,e6ca3cec,e6ca3d28,c08e929d,0,...) at ioctl+0x118 > >> syscallenter(c40505c0,e6ca3ce4,e6ca3ce4,0,0,...) at syscallenter+0x23f > >> syscall(e6ca3d28) at syscall+0x2e > >> Xint0x80_syscall() at Xint0x80_syscall+0x21 > >> --- syscall (54kernel trap 12 with interrupts disabled > >> Kernel page fault with the following non-sleepable locks held: > >> exclusive sleep mutex alc0 (network driver) r = 0 (0xc3dbc608) locked > >> @ /usr/src/sys/modules/alc/../../dev/alc/if_alc.c:2362 > >> KDB: stack backtrace: > >> db_trace_self_wrapper(c08e727a,80,6e726500,74206c65,20706172,...) at > >> db_trace_self_wrapper+0x26 > >> kdb_backtrace(93a,0,ffffffff,c0ad6114,e6ca323c,...) at kdb_backtrace+0x2a > >> _witness_debugger(c08e9f67,e6ca3250,4,1,0,...) at _witness_debugger+0x1e > >> witness_warn(5,0,c0924fe1,c097df50,c3e42b00,...) at witness_warn+0x1f1 > >> trap(e6ca32dc) at trap+0x15a > >> calltrap() at calltrap+0x6 > >> > >> ? ?I tried to track down what the exact issue was, but I got lost > >> (the locking sort of looks ok to me, but I'm still not an expert with > >> mutex(9)). > >> ? ?I still have the vmcore and can provide more helpful details when requested. > > > > The locking itself is almost certainly fine. ?The error message is not > > very helpful, but what went wrong was the page fault. ?You just happen > > to panic on a witness warning before vm_fault can panic due to a bad > > address. > > > > The alc(4) maintainer would probably like info on the trap (line of > > code and where the bad pointer came from). > > I talked to Xin a bit and as he noted the panic was just a symptom > of the actual issue at hand. I think the problem is that the rx ring's > rx_m value isn't set to NULL when an error occurred, but getting to > the exact problem at hand, the following call is failing: > > if (bus_dmamap_load_mbuf_sg(sc->alc_cdata.alc_rx_tag, // <-- HERE > sc->alc_cdata.alc_rx_sparemap, m, segs, &nsegs, 0) != 0) { > m_freem(m); > return (ENOBUFS); > } > > It's failing with ENOMEM. Still trying to determine what the exact Even if bus_dmamap_load_mbuf_sg(9) fails driver should not panic. Could you show me full back-trace? > reason for ENOMEM is from the x86 busdma code though.. > Thanks, > -Garrett >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110821234856.GB1755>