From owner-freebsd-current@FreeBSD.ORG  Fri Aug 19 07:17:13 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9462C106564A;
	Fri, 19 Aug 2011 07:17:13 +0000 (UTC)
	(envelope-from yanegomi@gmail.com)
Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com
	[209.85.216.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 2187C8FC08;
	Fri, 19 Aug 2011 07:17:12 +0000 (UTC)
Received: by qyk9 with SMTP id 9so1186310qyk.13
	for <multiple recipients>; Fri, 19 Aug 2011 00:17:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=f8hJwXwYuXVXfpn5oTd1fDNHSIUnXaO+ZOr2ee85500=;
	b=O7YFNwoE+bwb4i1UwdU/22a0r/nij+/H76Bw+MH0pypNiOjuTaBx7gDa3dY//XcP52
	/+QBV+i/Iz5IZRcvsubKu/9CkJbWh2qjCTt6UCzHg9Qv3g4yrok0nCDM5KW7b26/nGj5
	7/Y8m2HanjARbDmlP4j2xO8LQrk7DOPYQp/gA=
MIME-Version: 1.0
Received: by 10.224.218.193 with SMTP id hr1mr1710848qab.29.1313738232311;
	Fri, 19 Aug 2011 00:17:12 -0700 (PDT)
Received: by 10.224.178.65 with HTTP; Fri, 19 Aug 2011 00:17:12 -0700 (PDT)
In-Reply-To: <CAMBSHm-R0QBCy_FshgXq=neeAaHFTYStWkE=AcJ7NngNchvwxQ@mail.gmail.com>
References: <CAGH67wRWVu0qtae7fZjAi9r1H=Tt2QYpgJgF=1stUuWe1dg+Sw@mail.gmail.com>
	<CAMBSHm-R0QBCy_FshgXq=neeAaHFTYStWkE=AcJ7NngNchvwxQ@mail.gmail.com>
Date: Fri, 19 Aug 2011 00:17:12 -0700
Message-ID: <CAGH67wRPNygNw0h5L73U21jQnAvkr6NM7ASJM=bvXocxZgPo6Q@mail.gmail.com>
From: Garrett Cooper <yanegomi@gmail.com>
To: mdf@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: FreeBSD Current <freebsd-current@freebsd.org>,
	Pyun YongHyeon <yongari@freebsd.org>
Subject: Re: Deterministic panic due to non-sleepable lock with if_alc when
 reconfiguring interfaces
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Aug 2011 07:17:13 -0000

On Thu, Aug 18, 2011 at 9:31 PM,  <mdf@freebsd.org> wrote:
> On Thu, Aug 18, 2011 at 5:50 PM, Garrett Cooper <yanegomi@gmail.com> wrot=
e:
>> =A0 =A0When loading if_alc as a module on my netbook and running
>> /etc/rc.d/netif restart, I can deterministically panic my netbook with
>> the following message:

    These repro steps were overly simplified. The complete steps are:

1. Attach ethernet cable to alc(4) enabled NIC.
2. Boot up machine.
3. Login.
4. Physically remove ethernet cable from alc(4) enabled NIC.
5. Run `/etc/rc.d/netif restart' as root.

>> ) at _bus_dmamap_sync+0x51
>> alc_stop(c3dbb000,0,c0c51844,93a,80206910,...) at alc_stop+0x24e
>> alc_ioctl(c3d07400,80206910,c40423c0,c06a7935,c0914e3c,...) at alc_ioctl=
+0x22e
>> ifioctl(c45029c0,80206910,c40423c0,c40505c0,c4528c00,...) at ifioctl+0xc=
98
>> soo_ioctl(c4574e00,80206910,c40423c0,c413e680,c40505c0,...) at soo_ioctl=
+0x401
>> kern_ioctl(c40505c0,3,80206910,c40423c0,c40423c0,...) at kern_ioctl+0x1d=
7
>> ioctl(c40505c0,e6ca3cec,e6ca3d28,c08e929d,0,...) at ioctl+0x118
>> syscallenter(c40505c0,e6ca3ce4,e6ca3ce4,0,0,...) at syscallenter+0x23f
>> syscall(e6ca3d28) at syscall+0x2e
>> Xint0x80_syscall() at Xint0x80_syscall+0x21
>> --- syscall (54kernel trap 12 with interrupts disabled
>> Kernel page fault with the following non-sleepable locks held:
>> exclusive sleep mutex alc0 (network driver) r =3D 0 (0xc3dbc608) locked
>> @ /usr/src/sys/modules/alc/../../dev/alc/if_alc.c:2362
>> KDB: stack backtrace:
>> db_trace_self_wrapper(c08e727a,80,6e726500,74206c65,20706172,...) at
>> db_trace_self_wrapper+0x26
>> kdb_backtrace(93a,0,ffffffff,c0ad6114,e6ca323c,...) at kdb_backtrace+0x2=
a
>> _witness_debugger(c08e9f67,e6ca3250,4,1,0,...) at _witness_debugger+0x1e
>> witness_warn(5,0,c0924fe1,c097df50,c3e42b00,...) at witness_warn+0x1f1
>> trap(e6ca32dc) at trap+0x15a
>> calltrap() at calltrap+0x6
>>
>> =A0 =A0I tried to track down what the exact issue was, but I got lost
>> (the locking sort of looks ok to me, but I'm still not an expert with
>> mutex(9)).
>> =A0 =A0I still have the vmcore and can provide more helpful details when=
 requested.
>
> The locking itself is almost certainly fine. =A0The error message is not
> very helpful, but what went wrong was the page fault. =A0You just happen
> to panic on a witness warning before vm_fault can panic due to a bad
> address.
>
> The alc(4) maintainer would probably like info on the trap (line of
> code and where the bad pointer came from).

    I talked to Xin a bit and as he noted the panic was just a symptom
of the actual issue at hand. I think the problem is that the rx ring's
rx_m value isn't set to NULL when an error occurred, but getting to
the exact problem at hand, the following call is failing:

        if (bus_dmamap_load_mbuf_sg(sc->alc_cdata.alc_rx_tag, // <-- HERE
            sc->alc_cdata.alc_rx_sparemap, m, segs, &nsegs, 0) !=3D 0) {
                m_freem(m);
                return (ENOBUFS);
        }

    It's failing with ENOMEM. Still trying to determine what the exact
reason for ENOMEM is from the x86 busdma code though..
Thanks,
-Garrett