From owner-freebsd-current@FreeBSD.ORG  Fri Aug 19 17:15:11 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 90168106564A;
	Fri, 19 Aug 2011 17:15:11 +0000 (UTC)
	(envelope-from pyunyh@gmail.com)
Received: from mail-iy0-f172.google.com (mail-iy0-f172.google.com
	[209.85.210.172])
	by mx1.freebsd.org (Postfix) with ESMTP id 0DB578FC19;
	Fri, 19 Aug 2011 17:15:04 +0000 (UTC)
Received: by iye7 with SMTP id 7so11019939iye.17
	for <multiple recipients>; Fri, 19 Aug 2011 10:15:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=from:date:to:cc:subject:message-id:reply-to:references:mime-version
	:content-type:content-disposition:in-reply-to:user-agent;
	bh=qtPP5z7UFw0GN+cqjPYL4Ben/iyT4F8PqUzgCCpAclA=;
	b=DRcnMbYfkCHw+bd51hS3Js2AxkHBUIcCipAtibwMRNrDwaFkXu4+iOzNb0AvmxIH24
	aDZnLYtesYCSpulid9EGr2hPtSNBpl73hHuJEQD/7Kbl0SZ/WW6CVMbl37RcuN577E+Y
	3O/kmovvuj9/MXV4Uh4QJnfqJG7tOWtlYnfO4=
Received: by 10.231.60.69 with SMTP id o5mr3657281ibh.65.1313774103647;
	Fri, 19 Aug 2011 10:15:03 -0700 (PDT)
Received: from pyunyh@gmail.com ([174.35.1.224])
	by mx.google.com with ESMTPS id m21sm1741120ibf.59.2011.08.19.10.15.01
	(version=TLSv1/SSLv3 cipher=OTHER);
	Fri, 19 Aug 2011 10:15:03 -0700 (PDT)
Received: by pyunyh@gmail.com (sSMTP sendmail emulation);
	Fri, 19 Aug 2011 10:14:59 -0700
From: YongHyeon PYUN <pyunyh@gmail.com>
Date: Fri, 19 Aug 2011 10:14:59 -0700
To: John Baldwin <jhb@freebsd.org>
Message-ID: <20110819171459.GB17324@michelle.cdnetworks.com>
References: <CAGH67wRWVu0qtae7fZjAi9r1H=Tt2QYpgJgF=1stUuWe1dg+Sw@mail.gmail.com>
	<CAMBSHm-R0QBCy_FshgXq=neeAaHFTYStWkE=AcJ7NngNchvwxQ@mail.gmail.com>
	<CAGH67wRPNygNw0h5L73U21jQnAvkr6NM7ASJM=bvXocxZgPo6Q@mail.gmail.com>
	<201108190810.31886.jhb@freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <201108190810.31886.jhb@freebsd.org>
User-Agent: Mutt/1.4.2.3i
Cc: Garrett Cooper <yanegomi@gmail.com>, mdf@freebsd.org,
	freebsd-current@freebsd.org, Pyun YongHyeon <yongari@freebsd.org>
Subject: Re: Deterministic panic due to non-sleepable lock with if_alc when
	reconfiguring interfaces
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: pyunyh@gmail.com
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 17:15:11 -0000

On Fri, Aug 19, 2011 at 08:10:31AM -0400, John Baldwin wrote:
> On Friday, August 19, 2011 3:17:12 am Garrett Cooper wrote:
> > On Thu, Aug 18, 2011 at 9:31 PM,  <mdf@freebsd.org> wrote:
> > > On Thu, Aug 18, 2011 at 5:50 PM, Garrett Cooper <yanegomi@gmail.com> 
> wrote:
> > >>    When loading if_alc as a module on my netbook and running
> > >> /etc/rc.d/netif restart, I can deterministically panic my netbook with
> > >> the following message:
> > 
> >     These repro steps were overly simplified. The complete steps are:
> > 
> > 1. Attach ethernet cable to alc(4) enabled NIC.
> > 2. Boot up machine.
> > 3. Login.
> > 4. Physically remove ethernet cable from alc(4) enabled NIC.
> > 5. Run `/etc/rc.d/netif restart' as root.
> > 
> > >> ) at _bus_dmamap_sync+0x51
> > >> alc_stop(c3dbb000,0,c0c51844,93a,80206910,...) at alc_stop+0x24e
> > >> alc_ioctl(c3d07400,80206910,c40423c0,c06a7935,c0914e3c,...) at 
> alc_ioctl+0x22e
> > >> ifioctl(c45029c0,80206910,c40423c0,c40505c0,c4528c00,...) at 
> ifioctl+0xc98
> > >> soo_ioctl(c4574e00,80206910,c40423c0,c413e680,c40505c0,...) at 
> soo_ioctl+0x401
> > >> kern_ioctl(c40505c0,3,80206910,c40423c0,c40423c0,...) at kern_ioctl+0x1d7
> > >> ioctl(c40505c0,e6ca3cec,e6ca3d28,c08e929d,0,...) at ioctl+0x118
> > >> syscallenter(c40505c0,e6ca3ce4,e6ca3ce4,0,0,...) at syscallenter+0x23f
> > >> syscall(e6ca3d28) at syscall+0x2e
> > >> Xint0x80_syscall() at Xint0x80_syscall+0x21
> > >> --- syscall (54kernel trap 12 with interrupts disabled
> > >> Kernel page fault with the following non-sleepable locks held:
> > >> exclusive sleep mutex alc0 (network driver) r = 0 (0xc3dbc608) locked
> > >> @ /usr/src/sys/modules/alc/../../dev/alc/if_alc.c:2362
> > >> KDB: stack backtrace:
> > >> db_trace_self_wrapper(c08e727a,80,6e726500,74206c65,20706172,...) at
> > >> db_trace_self_wrapper+0x26
> > >> kdb_backtrace(93a,0,ffffffff,c0ad6114,e6ca323c,...) at kdb_backtrace+0x2a
> > >> _witness_debugger(c08e9f67,e6ca3250,4,1,0,...) at _witness_debugger+0x1e
> > >> witness_warn(5,0,c0924fe1,c097df50,c3e42b00,...) at witness_warn+0x1f1
> > >> trap(e6ca32dc) at trap+0x15a
> > >> calltrap() at calltrap+0x6
> > >>
> > >>    I tried to track down what the exact issue was, but I got lost
> > >> (the locking sort of looks ok to me, but I'm still not an expert with
> > >> mutex(9)).
> > >>    I still have the vmcore and can provide more helpful details when 
> requested.
> > >
> > > The locking itself is almost certainly fine.  The error message is not
> > > very helpful, but what went wrong was the page fault.  You just happen
> > > to panic on a witness warning before vm_fault can panic due to a bad
> > > address.
> > >
> > > The alc(4) maintainer would probably like info on the trap (line of
> > > code and where the bad pointer came from).
> > 
> >     I talked to Xin a bit and as he noted the panic was just a symptom
> > of the actual issue at hand. I think the problem is that the rx ring's
> > rx_m value isn't set to NULL when an error occurred, but getting to
> > the exact problem at hand, the following call is failing:
> > 
> >         if (bus_dmamap_load_mbuf_sg(sc->alc_cdata.alc_rx_tag, // <-- HERE
> >             sc->alc_cdata.alc_rx_sparemap, m, segs, &nsegs, 0) != 0) {
> >                 m_freem(m);
> >                 return (ENOBUFS);
> >         }
> > 
> >     It's failing with ENOMEM. Still trying to determine what the exact
> > reason for ENOMEM is from the x86 busdma code though..
> 
>              ENOMEM       The load request has failed due to insufficient
>                           resources, and the caller specifically used the
>                           BUS_DMA_NOWAIT flag.
> 
> (bus_dmamap_load_mbuf*() imply BUS_DMA_NOWAIT.)
> 
> You couldn't allocate enough bounce pages:
> 
>         /* Reserve Necessary Bounce Pages */
>         if (map->pagesneeded != 0) {
>                 mtx_lock(&bounce_lock);
>                 if (flags & BUS_DMA_NOWAIT) {
>                         if (reserve_bounce_pages(dmat, map, 0) != 0) {
>                                 mtx_unlock(&bounce_lock);
>                                 return (ENOMEM);
>                         }
> 
> Of course, now the question is why you even need bounce pages for alc(4):
> 
> 
>         /* Create DMA tag for Rx buffers. */
>         error = bus_dma_tag_create(
>             sc->alc_cdata.alc_buffer_tag, /* parent */
>             ALC_RX_BUF_ALIGN, 0,        /* alignment, boundary */
>             BUS_SPACE_MAXADDR,          /* lowaddr */
>             BUS_SPACE_MAXADDR,          /* highaddr */
>             NULL, NULL,                 /* filter, filterarg */
>             MCLBYTES,                   /* maxsize */
>             1,                          /* nsegments */
>             MCLBYTES,                   /* maxsegsize */
>             0,                          /* flags */
>             NULL, NULL,                 /* lockfunc, lockarg */
>             &sc->alc_cdata.alc_rx_tag);
> 
> It can handle 64-bit DMA just fine, and mbuf clusters used for RX should 
> always be aligned and never need bounce pages.

Right. alc(4) hardware has no DMA address limit for TX/RX buffers
but its descriptors/status block DMA address should be within a
4GB. alc(4) explicitly checks whether allocated descriptor/status
blocks crossed 4GB limit. If alc(4) detect that condition, it will
limit DMA address space of descriptor/status block to 4GB and that
can use bounce pages but that still does not explain why bounce
buffers are used in RX buffer allocation.

> 
> -- 
> John Baldwin