From owner-freebsd-current@FreeBSD.ORG  Mon Aug 22 01:55:10 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3B8D7106566B;
	Mon, 22 Aug 2011 01:55:10 +0000 (UTC)
	(envelope-from pyunyh@gmail.com)
Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com
	[209.85.161.182])
	by mx1.freebsd.org (Postfix) with ESMTP id BFF248FC0A;
	Mon, 22 Aug 2011 01:55:09 +0000 (UTC)
Received: by gxk28 with SMTP id 28so3694255gxk.13
	for <multiple recipients>; Sun, 21 Aug 2011 18:55:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=from:date:to:cc:subject:message-id:reply-to:references:mime-version
	:content-type:content-disposition:in-reply-to:user-agent;
	bh=zWiSvDCK6UU1OEsE8htQZkNFJ9uaQfHhW6u0pSjnsDA=;
	b=Zlwx1hnJnke2urQHMKyoAX3r+3ZXLsOOMgtlzS25KEdVRDscraFIO96e9Azn4nBreI
	NKN0sf3vTN8kM98suZrgLREu3wYawDHyYJsS7kRjjYfzA6/92+nVI2YlS4cnLsuq7Z21
	CnsD5NHxKPMOWtIe6P0Momo4Bx7VfFiJITC+g=
Received: by 10.150.48.27 with SMTP id v27mr1895728ybv.95.1313978109019;
	Sun, 21 Aug 2011 18:55:09 -0700 (PDT)
Received: from pyunyh@gmail.com ([174.35.1.224])
	by mx.google.com with ESMTPS id 9sm2958614ybb.16.2011.08.21.18.55.06
	(version=TLSv1/SSLv3 cipher=OTHER);
	Sun, 21 Aug 2011 18:55:08 -0700 (PDT)
Received: by pyunyh@gmail.com (sSMTP sendmail emulation);
	Sun, 21 Aug 2011 18:55:02 -0700
From: YongHyeon PYUN <pyunyh@gmail.com>
Date: Sun, 21 Aug 2011 18:55:02 -0700
To: Garrett Cooper <yanegomi@gmail.com>
Message-ID: <20110822015502.GE1755@michelle.cdnetworks.com>
References: <CAGH67wRWVu0qtae7fZjAi9r1H=Tt2QYpgJgF=1stUuWe1dg+Sw@mail.gmail.com>
	<CAMBSHm-R0QBCy_FshgXq=neeAaHFTYStWkE=AcJ7NngNchvwxQ@mail.gmail.com>
	<CAGH67wRPNygNw0h5L73U21jQnAvkr6NM7ASJM=bvXocxZgPo6Q@mail.gmail.com>
	<20110821234856.GB1755@michelle.cdnetworks.com>
	<CAGH67wTsSViuSsTgxcUT2gY2Jy=D3HNN2iPdhba9v=e8_4buuA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAGH67wTsSViuSsTgxcUT2gY2Jy=D3HNN2iPdhba9v=e8_4buuA@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
Cc: mdf@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>,
	Pyun YongHyeon <yongari@freebsd.org>
Subject: Re: Deterministic panic due to non-sleepable lock with if_alc when
	reconfiguring interfaces
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: pyunyh@gmail.com
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Aug 2011 01:55:10 -0000

On Sun, Aug 21, 2011 at 06:26:45PM -0700, Garrett Cooper wrote:
> On Sun, Aug 21, 2011 at 4:48 PM, YongHyeon PYUN <pyunyh@gmail.com> wrote:
> > On Fri, Aug 19, 2011 at 12:17:12AM -0700, Garrett Cooper wrote:
> >> On Thu, Aug 18, 2011 at 9:31 PM, ?<mdf@freebsd.org> wrote:
> >> > On Thu, Aug 18, 2011 at 5:50 PM, Garrett Cooper <yanegomi@gmail.com> wrote:
> >> >> ? ?When loading if_alc as a module on my netbook and running
> >> >> /etc/rc.d/netif restart, I can deterministically panic my netbook with
> >> >> the following message:
> >>
> >> ? ? These repro steps were overly simplified. The complete steps are:
> >>
> >> 1. Attach ethernet cable to alc(4) enabled NIC.
> >> 2. Boot up machine.
> >> 3. Login.
> >> 4. Physically remove ethernet cable from alc(4) enabled NIC.
> >> 5. Run `/etc/rc.d/netif restart' as root.
> >>
> >
> > I can't reproduce this with AR8151 sample board. Could you give me
> > dmesg output to know exact controller revision?
> > One issue I'm aware of is lack of re-establishing link when
> > controller firmware put its PHY to deep sleep mode. ?The deep sleep
> > mode seems to be automatically activated by firmware when it
> > detects no energy signal(i.e. cable unplugged) so I had to down and
> > up the interface again to take the PHY out of the sleep mode.
> >
> >> >> ) at _bus_dmamap_sync+0x51
> >> >> alc_stop(c3dbb000,0,c0c51844,93a,80206910,...) at alc_stop+0x24e
> >> >> alc_ioctl(c3d07400,80206910,c40423c0,c06a7935,c0914e3c,...) at alc_ioctl+0x22e
> >> >> ifioctl(c45029c0,80206910,c40423c0,c40505c0,c4528c00,...) at ifioctl+0xc98
> >> >> soo_ioctl(c4574e00,80206910,c40423c0,c413e680,c40505c0,...) at soo_ioctl+0x401
> >> >> kern_ioctl(c40505c0,3,80206910,c40423c0,c40423c0,...) at kern_ioctl+0x1d7
> >> >> ioctl(c40505c0,e6ca3cec,e6ca3d28,c08e929d,0,...) at ioctl+0x118
> >> >> syscallenter(c40505c0,e6ca3ce4,e6ca3ce4,0,0,...) at syscallenter+0x23f
> >> >> syscall(e6ca3d28) at syscall+0x2e
> >> >> Xint0x80_syscall() at Xint0x80_syscall+0x21
> >> >> --- syscall (54kernel trap 12 with interrupts disabled
> >> >> Kernel page fault with the following non-sleepable locks held:
> >> >> exclusive sleep mutex alc0 (network driver) r = 0 (0xc3dbc608) locked
> >> >> @ /usr/src/sys/modules/alc/../../dev/alc/if_alc.c:2362
> >> >> KDB: stack backtrace:
> >> >> db_trace_self_wrapper(c08e727a,80,6e726500,74206c65,20706172,...) at
> >> >> db_trace_self_wrapper+0x26
> >> >> kdb_backtrace(93a,0,ffffffff,c0ad6114,e6ca323c,...) at kdb_backtrace+0x2a
> >> >> _witness_debugger(c08e9f67,e6ca3250,4,1,0,...) at _witness_debugger+0x1e
> >> >> witness_warn(5,0,c0924fe1,c097df50,c3e42b00,...) at witness_warn+0x1f1
> >> >> trap(e6ca32dc) at trap+0x15a
> >> >> calltrap() at calltrap+0x6
> >> >>
> >> >> ? ?I tried to track down what the exact issue was, but I got lost
> >> >> (the locking sort of looks ok to me, but I'm still not an expert with
> >> >> mutex(9)).
> >> >> ? ?I still have the vmcore and can provide more helpful details when requested.
> >> >
> >> > The locking itself is almost certainly fine. ?The error message is not
> >> > very helpful, but what went wrong was the page fault. ?You just happen
> >> > to panic on a witness warning before vm_fault can panic due to a bad
> >> > address.
> >> >
> >> > The alc(4) maintainer would probably like info on the trap (line of
> >> > code and where the bad pointer came from).
> >>
> >> ? ? I talked to Xin a bit and as he noted the panic was just a symptom
> >> of the actual issue at hand. I think the problem is that the rx ring's
> >> rx_m value isn't set to NULL when an error occurred, but getting to
> >> the exact problem at hand, the following call is failing:
> >>
> >> ? ? ? ? if (bus_dmamap_load_mbuf_sg(sc->alc_cdata.alc_rx_tag, // <-- HERE
> >> ? ? ? ? ? ? sc->alc_cdata.alc_rx_sparemap, m, segs, &nsegs, 0) != 0) {
> >> ? ? ? ? ? ? ? ? m_freem(m);
> >> ? ? ? ? ? ? ? ? return (ENOBUFS);
> >> ? ? ? ? }
> >>
> >> ? ? It's failing with ENOMEM. Still trying to determine what the exact
> >
> > Even if bus_dmamap_load_mbuf_sg(9) fails driver should not panic.
> > Could you show me full back-trace?
> 
>     I tried to hack the kernel to get it to dump properly, but that
> inevitably failed (one of the buffers or the stack data associated
> probably got stomped on when the system panicked).
>     Here are some pics.

Thanks a lot. I see that alc(4) failed to allocate RX buffers and
it seems the panic happened in alc_stop().  But I can't understand
how it could be triggered.  When RX buffer allocation failed, the
mbuf pointer would have been NULL such that bus_dmamap_sync(9)
wouldn't be invoked in alc_stop().
I also see you have wireless network setup in the back trace. Could
you also reproduce alc(4) panic without wireless network
configuration?

> Thanks,
> -Garrett