From owner-freebsd-current@FreeBSD.ORG Mon Aug 22 20:41:03 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 471B9106566C; Mon, 22 Aug 2011 20:41:03 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id CF5108FC0C; Mon, 22 Aug 2011 20:41:02 +0000 (UTC) Received: by gxk28 with SMTP id 28so4373749gxk.13 for ; Mon, 22 Aug 2011 13:41:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=SgkyhBJ754sipOHyyjBTxjvEllNrb+A1ErUltZa8lbg=; b=pzj5nNDHjDUbsPcSfyOUCPMQgDXH4j2CFWjF/DvIg3FKkM2Wi6tH0pRfsCDs1qUeT9 rJkoW1eGokVslgcG2kgIw81lQMV4a/BAk8/+aTHRvsjPdGrysVXHWTrKzyzBaz5LcXjp pGu0NKxO32+iFDO+rDRHyHYwcedrZRI+d/GtM= Received: by 10.236.73.170 with SMTP id v30mr9976212yhd.15.1314045662045; Mon, 22 Aug 2011 13:41:02 -0700 (PDT) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id z29sm996929yhn.44.2011.08.22.13.40.58 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 22 Aug 2011 13:41:00 -0700 (PDT) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Mon, 22 Aug 2011 13:40:54 -0700 From: YongHyeon PYUN Date: Mon, 22 Aug 2011 13:40:54 -0700 To: Garrett Cooper Message-ID: <20110822204054.GB4452@michelle.cdnetworks.com> References: <20110821234856.GB1755@michelle.cdnetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110821234856.GB1755@michelle.cdnetworks.com> User-Agent: Mutt/1.4.2.3i Cc: mdf@freebsd.org, FreeBSD Current , Pyun YongHyeon Subject: Re: Deterministic panic due to non-sleepable lock with if_alc when reconfiguring interfaces X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Aug 2011 20:41:03 -0000 On Sun, Aug 21, 2011 at 04:48:56PM -0700, YongHyeon PYUN wrote: > On Fri, Aug 19, 2011 at 12:17:12AM -0700, Garrett Cooper wrote: > > On Thu, Aug 18, 2011 at 9:31 PM, wrote: > > > On Thu, Aug 18, 2011 at 5:50 PM, Garrett Cooper wrote: > > >> ? ?When loading if_alc as a module on my netbook and running > > >> /etc/rc.d/netif restart, I can deterministically panic my netbook with > > >> the following message: > > > > These repro steps were overly simplified. The complete steps are: > > > > 1. Attach ethernet cable to alc(4) enabled NIC. > > 2. Boot up machine. > > 3. Login. > > 4. Physically remove ethernet cable from alc(4) enabled NIC. > > 5. Run `/etc/rc.d/netif restart' as root. > > > > I can't reproduce this with AR8151 sample board. Could you give me > dmesg output to know exact controller revision? > One issue I'm aware of is lack of re-establishing link when > controller firmware put its PHY to deep sleep mode. The deep sleep > mode seems to be automatically activated by firmware when it > detects no energy signal(i.e. cable unplugged) so I had to down and > up the interface again to take the PHY out of the sleep mode. > Not re-establishing link issue was fixed in r225088. I'm not sure whether this also fixes kern/148772 though. Because you also seem to have the same issue of the PR, it would be good to know whether it makes any difference or not. > > >> ) at _bus_dmamap_sync+0x51 > > >> alc_stop(c3dbb000,0,c0c51844,93a,80206910,...) at alc_stop+0x24e > > >> alc_ioctl(c3d07400,80206910,c40423c0,c06a7935,c0914e3c,...) at alc_ioctl+0x22e > > >> ifioctl(c45029c0,80206910,c40423c0,c40505c0,c4528c00,...) at ifioctl+0xc98 > > >> soo_ioctl(c4574e00,80206910,c40423c0,c413e680,c40505c0,...) at soo_ioctl+0x401 > > >> kern_ioctl(c40505c0,3,80206910,c40423c0,c40423c0,...) at kern_ioctl+0x1d7 > > >> ioctl(c40505c0,e6ca3cec,e6ca3d28,c08e929d,0,...) at ioctl+0x118 > > >> syscallenter(c40505c0,e6ca3ce4,e6ca3ce4,0,0,...) at syscallenter+0x23f > > >> syscall(e6ca3d28) at syscall+0x2e > > >> Xint0x80_syscall() at Xint0x80_syscall+0x21 > > >> --- syscall (54kernel trap 12 with interrupts disabled > > >> Kernel page fault with the following non-sleepable locks held: > > >> exclusive sleep mutex alc0 (network driver) r = 0 (0xc3dbc608) locked > > >> @ /usr/src/sys/modules/alc/../../dev/alc/if_alc.c:2362 > > >> KDB: stack backtrace: > > >> db_trace_self_wrapper(c08e727a,80,6e726500,74206c65,20706172,...) at > > >> db_trace_self_wrapper+0x26 > > >> kdb_backtrace(93a,0,ffffffff,c0ad6114,e6ca323c,...) at kdb_backtrace+0x2a > > >> _witness_debugger(c08e9f67,e6ca3250,4,1,0,...) at _witness_debugger+0x1e > > >> witness_warn(5,0,c0924fe1,c097df50,c3e42b00,...) at witness_warn+0x1f1 > > >> trap(e6ca32dc) at trap+0x15a > > >> calltrap() at calltrap+0x6 > > >> > > >> ? ?I tried to track down what the exact issue was, but I got lost > > >> (the locking sort of looks ok to me, but I'm still not an expert with > > >> mutex(9)). > > >> ? ?I still have the vmcore and can provide more helpful details when requested. > > > > > > The locking itself is almost certainly fine. ?The error message is not > > > very helpful, but what went wrong was the page fault. ?You just happen > > > to panic on a witness warning before vm_fault can panic due to a bad > > > address. > > > > > > The alc(4) maintainer would probably like info on the trap (line of > > > code and where the bad pointer came from). > > > > I talked to Xin a bit and as he noted the panic was just a symptom > > of the actual issue at hand. I think the problem is that the rx ring's > > rx_m value isn't set to NULL when an error occurred, but getting to > > the exact problem at hand, the following call is failing: > > > > if (bus_dmamap_load_mbuf_sg(sc->alc_cdata.alc_rx_tag, // <-- HERE > > sc->alc_cdata.alc_rx_sparemap, m, segs, &nsegs, 0) != 0) { > > m_freem(m); > > return (ENOBUFS); > > } > > > > It's failing with ENOMEM. Still trying to determine what the exact > > Even if bus_dmamap_load_mbuf_sg(9) fails driver should not panic. > Could you show me full back-trace? > > > reason for ENOMEM is from the x86 busdma code though.. > > Thanks, > > -Garrett > >