From owner-freebsd-stable@FreeBSD.ORG Fri Jun 4 00:35:28 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CCC8D1065672; Fri, 4 Jun 2010 00:35:28 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 5667E8FC14; Fri, 4 Jun 2010 00:35:28 +0000 (UTC) Received: by pwj1 with SMTP id 1so455244pwj.13 for ; Thu, 03 Jun 2010 17:35:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:from:date:to:cc :subject:message-id:reply-to:references:mime-version:content-type :content-disposition:in-reply-to:user-agent; bh=pAARgqQ4rGOhFyNP0sPwKVofENdOFrDSF6+2iE/5JeY=; b=siqczMbUZjL14xokmoP6EowKHDDckY9B3z0SNN5tZa+CL6E4R+fZmxP+8xkKVmoCU4 bIuU+AcfgNkyt7XNiyWv3CWjCWe5xCE64ex1MWWUoJ+qBFisS1FbK5QkH+cYQoxTe+hg cLT2lpKFuV1CHpID41sxrcwfCLXXRbsFLbqds= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=aUB4k/HKvCjrC3dHTsD4lv3KwKE7R3UhiUIbdU7id+qbp0cI3Fpmo8GjmqyoNIz/qr QDfyR9SgAzHI20a/TygWyNRiKLaZg/76IDUjr3pVaWOvplLgDp6xw4+mEYA22AUlZqlv MxAlhyMh3ZROSlhvCPSX7nmwbpxRI2A3Bu6NQ= Received: by 10.114.236.18 with SMTP id j18mr7915954wah.16.1275611727690; Thu, 03 Jun 2010 17:35:27 -0700 (PDT) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id d20sm3576038waa.15.2010.06.03.17.35.25 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 03 Jun 2010 17:35:25 -0700 (PDT) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Thu, 3 Jun 2010 17:35:02 -0700 From: Pyun YongHyeon Date: Thu, 3 Jun 2010 17:35:02 -0700 To: Nikolay Denev Message-ID: <20100604003502.GF13502@michelle.cdnetworks.com> References: <77DFF2E5-7A1E-4063-A852-2C7AD9BC3DD4@gmail.com> <201005240948.33555.jhb@freebsd.org> <20100524171210.GA1418@michelle.cdnetworks.com> <87BA8EDC-BE95-4C84-94CD-5CA12961708A@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87BA8EDC-BE95-4C84-94CD-5CA12961708A@gmail.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-stable@freebsd.org, John Baldwin Subject: Re: if_sge related panics X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Jun 2010 00:35:28 -0000 On Thu, Jun 03, 2010 at 09:29:20AM +0300, Nikolay Denev wrote: > On May 24, 2010, at 8:12 PM, Pyun YongHyeon wrote: > > > On Mon, May 24, 2010 at 09:48:33AM -0400, John Baldwin wrote: > >> On Monday 24 May 2010 6:35:01 am Nikolay Denev wrote: > >>> On May 24, 2010, at 8:57 AM, Nikolay Denev wrote: > >>> > >>>> Hi, > >>>> > >>>> Recently I started to experience a if_sge(4) related panic. > >>>> It happens almost every time I try to download a torrent file for example. > >>>> Copying of large files over NFS seem not to trigger it, but I haven't tested extensively. > >>>> > >>>> Here is the panic message : > >>>> > >>>> Fatal trap 12: page fault while in kernel mode > >>>> cpuid = 0; apic id = 00 > >>>> fault virtual address = 0x8 > >>>> fault code = supervisor write data, page not present > >>>> instruction pointer = 0x20:0xffffffff80230413 > >>>> stack pointer = 0x28:0xffffff80001e9280 > >>>> frame pointer = 0x28:0xffffff80001e9510 > >>>> code segment = base 0x0, limit 0xfffff, type 0x1b > >>>> = DPL 0, pres 1, long 1, def32 0, gran 1 > >>>> processor eflags = interrupt enabled, resume, IOPL = 0 > >>>> current process = 12 (irq19: sge0) > >>>> trap number = 12 > >>>> panic: page fault > >>>> cpuid = 0 > >>>> Uptime: 1d20h56m20s > >>>> Cannot dump. Device not defined or unavailable > >>>> Automatic reboot in 15 seconds - press a key on the console to abort > >>>> Sleeping thread (tid 100039, pid 12) owns a non-sleepable lock > >>>> > >>>> My swap is on a zvol, so I don't have dump. I'll try to attach a disk on the eSATA port and dump there if needed. > >>> > >>> Here is some info from the crashdump : > >>> > >>> (kgdb) #0 doadump () at pcpu.h:223 > >>> #1 0xffffffff802fb149 in boot (howto=260) > >>> at /usr/src/sys/kern/kern_shutdown.c:416 > >>> #2 0xffffffff802fb57c in panic (fmt=0xffffffff8055d564 "%s") > >>> at /usr/src/sys/kern/kern_shutdown.c:590 > >>> #3 0xffffffff805055b8 in trap_fatal (frame=0xffffff000288a3e0, eva=Variable "eva" is not available. > >>> ) > >>> at /usr/src/sys/amd64/amd64/trap.c:777 > >>> #4 0xffffffff805059dc in trap_pfault (frame=0xffffff80001e91d0, usermode=0) > >>> at /usr/src/sys/amd64/amd64/trap.c:693 > >>> #5 0xffffffff805061c5 in trap (frame=0xffffff80001e91d0) > >>> at /usr/src/sys/amd64/amd64/trap.c:451 > >>> #6 0xffffffff804eb977 in calltrap () > >>> at /usr/src/sys/amd64/amd64/exception.S:223 > >>> #7 0xffffffff80230413 in sge_start_locked (ifp=0xffffff000270d800) > >>> at /usr/src/sys/dev/sge/if_sge.c:1591 > >> > >> Try this. sge_encap() can sometimes return an error with m_head set to NULL: > >> > > > > Thanks John. Committed in r208512. > > > >> Index: if_sge.c > >> =================================================================== > >> --- if_sge.c (revision 208375) > >> +++ if_sge.c (working copy) > >> @@ -1588,7 +1588,8 @@ > >> if (m_head == NULL) > >> break; > >> if (sge_encap(sc, &m_head)) { > >> - IFQ_DRV_PREPEND(&ifp->if_snd, m_head); > >> + if (m_head != NULL) > >> + IFQ_DRV_PREPEND(&ifp->if_snd, m_head); > >> ifp->if_drv_flags |= IFF_DRV_OACTIVE; > >> break; > >> } > >> > >> -- > >> John Baldwin > > After the patch I experienced several network outages (ping reporting "no buffer space available") > that were resolved by ifconfig down/up of the sge(4) interface. > Because I don't have access to sge(4) controllers I never had chance to run it. Does ping(8) generates "no buffer space available" when the system is in idle state? Could you show me more information on how you checked network outages? > I can see that most of the other drivers that handle XXX_encap() returning m_head pointing NULL, break when this condition Yes, most drivers written/touched by me behaves like that. > is hit: i.e. : > > Index: if_sge.c > =================================================================== > --- if_sge.c (revision 208375) > +++ if_sge.c (working copy) > @@ -1588,7 +1588,8 @@ > if (m_head == NULL) > break; > if (sge_encap(sc, &m_head)) { > - IFQ_DRV_PREPEND(&ifp->if_snd, m_head); > + if (m_head == NULL) > + break; > IFQ_DRV_PREPEND(&ifp->if_snd, m_head); > ifp->if_drv_flags |= IFF_DRV_OACTIVE; > break; > } > > But here in sge(4) we always set IFF_DRV_OACTIVE. > Do you think this can be the source of the problem ? > More correct way to set IFF_DRV_OACTIVE would be check the number of queued frames or just exit the transmit loop. If there is no queued frames, IFF_DRV_OACTIVE would never be cleared which in turn cause ENOBUFS in ping(8). I think your change looks more reasonable to me. Do you still see the same issue with the change you suggested?