From owner-freebsd-current@FreeBSD.ORG  Fri Apr  4 04:28:42 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D4E8337B401
	for <current@freebsd.org>; Fri,  4 Apr 2003 04:28:42 -0800 (PST)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 81A3443F3F
	for <current@freebsd.org>; Fri,  4 Apr 2003 04:28:42 -0800 (PST)
	(envelope-from mux@freebsd.org)
Received: by elvis.mu.org (Postfix, from userid 1920)
	id 73B4A2ED413; Fri,  4 Apr 2003 04:28:42 -0800 (PST)
Date: Fri, 4 Apr 2003 14:28:42 +0200
From: Maxime Henrion <mux@freebsd.org>
To: Nate Lawson <nate@root.org>
Message-ID: <20030404122842.GP1750@elvis.mu.org>
References: <Pine.BSF.4.21.0304032356590.15815-100000@root.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.BSF.4.21.0304032356590.15815-100000@root.org>
User-Agent: Mutt/1.4.1i
cc: current@freebsd.org
Subject: Re: MPSAFE fxp m_pkthdr not valid
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Apr 2003 12:28:43 -0000

Nate Lawson wrote:
> I have gotten fxp running with MPSAFE and did a large scp transfer.  It
> ran for a few minutes and then paniced.  It was trap 12 (page fault) at
> address 0x24.  Here is where it crashed:
> 
> fxp_start+0xcc
> 0xc0194a4c is in fxp_start (../../../dev/fxp/if_fxp.c:1263).
> 1258                     * been computed and stored in the checksum field
> 1259                     * in the TCP header. The stack should have
> 1260                     * already done this for us.
> 1261                     */
> 1262    
> 1263                    if (mb_head->m_pkthdr.csum_flags) {
> 1264                            if (mb_head->m_pkthdr.csum_flags & CSUM_DELAY_DATA) {
> 1265                                    txp->tx_cb->ipcb_ip_activation_high =
> 1266                                       FXP_IPCB_HARDWAREPARSING_ENABLE;
> 1267                                    txp->tx_cb->ipcb_ip_schedule =
> 
> The deref of mb_head->m_pkthdr is invalid.  Note that my fxp_intr function
> acquires the fxp lock right away so this shouldn't be a race in fxp.

Since fxp_start() will usually be called by ether_output(), I don't see
how acquiring the lock in fxp_intr() can protect you from such a race.
You need to acquire the lock in fxp_start() before touching the
interface queue, otherwise it may be preempted by an interrupt and this
will lead to a race if fxp_intr() ends up calling fxp_start().  It really
looks like that's what happened.

Cheers,
Maxime