From owner-freebsd-stable@FreeBSD.ORG Fri Aug 19 03:04:08 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4241C106564A for ; Fri, 19 Aug 2011 03:04:08 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta13.emeryville.ca.mail.comcast.net (qmta13.emeryville.ca.mail.comcast.net [76.96.27.243]) by mx1.freebsd.org (Postfix) with ESMTP id 289338FC0C for ; Fri, 19 Aug 2011 03:04:07 +0000 (UTC) Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74]) by qmta13.emeryville.ca.mail.comcast.net with comcast id N2zm1h0021bwxycAD343qu; Fri, 19 Aug 2011 03:04:03 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta18.emeryville.ca.mail.comcast.net with comcast id N33d1h00m1t3BNj8e33d5S; Fri, 19 Aug 2011 03:03:38 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 2E624102C1A; Thu, 18 Aug 2011 20:04:05 -0700 (PDT) Date: Thu, 18 Aug 2011 20:04:05 -0700 From: Jeremy Chadwick To: Doug Barton Message-ID: <20110819030405.GA83032@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@FreeBSD.org, "Vogel, Jack" Subject: Re: crash on 8.2-RELEASE amd64, high-traffic squid server X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 03:04:08 -0000 On Thu, Aug 18, 2011 at 07:36:50PM -0700, Doug Barton wrote: > Howdy, > > I have some high-traffic squid servers, most of which are running a > flavor of RELENG_7 very successfully, but one that I've been > evaluating 8.x on has had a lot of problems. Most recently we had > the crash below twice in the last 2 weeks. Same exact backtrace. Any > suggestions on where to look would be appreciated. > > > Thanks, > > Doug > > #0 doadump () at pcpu.h:224 > 224 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) #0 doadump () at pcpu.h:224 > #1 0xffffffff803ec4be in boot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:419 > #2 0xffffffff803ec8f1 in panic (fmt=Variable "fmt" is not available. > ) > at /usr/src/sys/kern/kern_shutdown.c:592 > #3 0xffffffff8069a4d0 in trap_fatal (frame=0x1c, eva=Variable "eva" is not available. > ) > at /usr/src/sys/amd64/amd64/trap.c:783 > #4 0xffffffff8069aab9 in trap (frame=0xffffff800012f650) > at /usr/src/sys/amd64/amd64/trap.c:592 > #5 0xffffffff80682e84 in calltrap () > at /usr/src/sys/amd64/amd64/exception.S:224 > #6 0xffffffff80698896 in bcopy () > at /usr/src/sys/amd64/amd64/support.S:124 > #7 0xffffffff8044df61 in sbcompress (sb=0xffffff01d98945e0, > m=0xffffff010b815300, n=0xffffff006baa3700) > at /usr/src/sys/kern/uipc_sockbuf.c:779 > #8 0xffffffff8044e1e6 in sbappendstream_locked (sb=0xffffff01d98945e0, > m=0xffffff010b815300) at /usr/src/sys/kern/uipc_sockbuf.c:534 > #9 0xffffffff80527530 in tcp_do_segment (m=0xffffff010b815300, th=Variable "th" is not available. > ) > at /usr/src/sys/netinet/tcp_input.c:2588 > #10 0xffffffff80528b4b in tcp_input (m=0xffffff010b815300, off0=Variable "off0" is not available. > ) > at /usr/src/sys/netinet/tcp_input.c:1029 > #11 0xffffffff804c3b2c in ip_input (m=0xffffff010b815300) > at /usr/src/sys/netinet/ip_input.c:787 > #12 0xffffffff804a631e in netisr_dispatch_src (proto=1, source=Variable "source" is not available. > ) > at /usr/src/sys/net/netisr.c:917 > #13 0xffffffff8049d73d in ether_demux (ifp=0xffffff0002d30000, > m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:894 > #14 0xffffffff8049db2d in ether_input (ifp=0xffffff0002d30000, > m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:753 > #15 0xffffffff8027c18a in em_rxeof (rxr=0xffffff0002d7c600, count=98, > done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4293 > #16 0xffffffff8027c5a8 in em_handle_que (context=Variable "context" is not available. > ) > at /usr/src/sys/dev/e1000/if_em.c:1482 > #17 0xffffffff80429ab5 in taskqueue_run_locked (queue=0xffffff0002d8d800) > at /usr/src/sys/kern/subr_taskqueue.c:250 > #18 0xffffffff80429c4e in taskqueue_thread_loop (arg=Variable "arg" is not available. > ) > at /usr/src/sys/kern/subr_taskqueue.c:387 > #19 0xffffffff803c30f8 in fork_exit ( > callout=0xffffffff80429c00 , > arg=0xffffff80005a8748, frame=0xffffff800012fc40) > at /usr/src/sys/kern/kern_fork.c:845 > #20 0xffffffff8068334e in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:565 > #21 0x0000000000000000 in ?? () > #22 0x0000000000000000 in ?? () > #23 0x0000000000000000 in ?? () > #24 0x0000000000000000 in ?? () > #25 0x0000000000000000 in ?? () > #26 0x0000000000000000 in ?? () > #27 0x0000000000000000 in ?? () > #28 0x0000000000000000 in ?? () > #29 0x0000000000000000 in ?? () > #30 0x0000000000000000 in ?? () > #31 0x0000000000000000 in ?? () > #32 0x0000000000000000 in ?? () > #33 0x0000000000000000 in ?? () > #34 0x0000000000000000 in ?? () > #35 0x0000000000000000 in ?? () > #36 0x0000000000000000 in ?? () > #37 0x0000000000000000 in ?? () > #38 0x0000000000000000 in ?? () > #39 0x0000000000000000 in ?? () > #40 0x0000000000000000 in ?? () > #41 0x0000000000000000 in ?? () > #42 0x0000000000000000 in ?? () > #43 0x0000000000000000 in ?? () > #44 0x0000000000000000 in ?? () > #45 0xffffffff8095ac00 in affinity () > #46 0x0000000000000000 in ?? () > #47 0x0000000000000000 in ?? () > #48 0xffffff0002d2d8c0 in ?? () > #49 0xffffff800012f320 in ?? () > #50 0xffffff800012f2c8 in ?? () > #51 0xffffff0002c59000 in ?? () > #52 0xffffffff80411db9 in sched_switch (td=0xffffffff80429c00, > newtd=0xffffff80005a8748, flags=Variable "flags" is not available. > ) > at /usr/src/sys/kern/sched_ule.c:1852 > Previous frame inner to this frame (corrupt stack?) > (kgdb) CC'ing Jack Vogel here, since I see em(4) is involved. Jack will probably want this data from the system: # uname -a (hostname can be XXX'd out) # dmesg (particularly the emX entries and driver version) # pciconf -lvbc (specifically the emX entries and related data) # ifconfig -a (IPs and MACs can be X'd out; mainly interested in options and other pieces) # netstat -m (if possible from a system which has been up a while and is a likely crash candidate) # vmstat -i (same condition as netstat -m) There isn't enough data above for me to determine what's going on, but from the stack trace it looks like sbcompress() may be given some data which is null or inaccessible. The source for that hasn't been touched directly in a while. The TCP stack/code, however, has been (since 8.2-RELEASE for sure). I think em(4) has as well. This may end up being a case where running RELENG_8 is the fix, but I'd love to be able to say that for certain. "bt full" would be helpful but the above indicates the kernel might not have debugging symbols included in it? I've seen this kind of output even on a system with "makeoptions DEBUG=-g" in its kernel config before though. Never was sure how to deal with that problem. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |