From owner-freebsd-current@FreeBSD.ORG Fri May 3 18:16:33 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DABFA1CE for ; Fri, 3 May 2013 18:16:33 +0000 (UTC) (envelope-from peter@wemm.org) Received: from mail-ve0-x231.google.com (mail-ve0-x231.google.com [IPv6:2607:f8b0:400c:c01::231]) by mx1.freebsd.org (Postfix) with ESMTP id 9991B1922 for ; Fri, 3 May 2013 18:16:33 +0000 (UTC) Received: by mail-ve0-f177.google.com with SMTP id jw11so1753705veb.8 for ; Fri, 03 May 2013 11:16:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wemm.org; s=google; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=HJ+fFk0gr+nOS6E7DUc2Zvha9UVAvSFdQaxZNhWxkF0=; b=ZJGfmVDRvmAmsQgOV5H+/YupjtA60BweL4mAiOUtdETLejbvPY2NBS4S9L+TUHCmH4 M7MOkY9FJ3aikWqqfEk8S9E87ywjQZ7bqm6qWob07cpahSf1EwqDhjkfDRpIl9UBK6s0 7GZlRxKYPewN3txBIgcFFpnHcqqW1+KuuAoO0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:x-gm-message-state; bh=HJ+fFk0gr+nOS6E7DUc2Zvha9UVAvSFdQaxZNhWxkF0=; b=Fj1tZj/sLg4AJhP0qVniYA10hRC8f6LF+RVrAonN7an37MDr3LmMkD0hHAfYhKPhEo IXMdj1yAS5deS5F8mtiYIkrmi/wWpParAeQe0/VzEl+Q1VLtS94o6X3l/UKi1GAZtLjI 2Zvx3zf9kbRh0rR+b2kpfFjpdQeVQ31/gpfi6fsprd07sRSP1Ua/AHvXKFBu+5tgoPxl bDBbomyx/zHgCk67cJXQ7VYGRYjFyJYIPjcZvPY9PM35/q1lDFxzQjLwonHQZPd2+C2F uh8Uh5o/RmVj/SOmHE/YRJbcd186PZKyb10xg2RQQhmDqHzcbI646DYQ3NL1olRlIH8k jQgg== MIME-Version: 1.0 X-Received: by 10.52.165.83 with SMTP id yw19mr3376278vdb.74.1367604992786; Fri, 03 May 2013 11:16:32 -0700 (PDT) Received: by 10.220.197.66 with HTTP; Fri, 3 May 2013 11:16:32 -0700 (PDT) In-Reply-To: <201305021432.34456.jhb@freebsd.org> References: <201305021209.41221.jhb@freebsd.org> <52B3AEE5-D24A-4ED3-BB11-E7E27BFB447F@freebsd.org> <201305021432.34456.jhb@freebsd.org> Date: Fri, 3 May 2013 11:16:32 -0700 Message-ID: Subject: Re: panic: in_pcblookup_local (?) From: Peter Wemm To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQlhTMuLvirpyelzKV2TQOk7fJ7y7W0lYOsqM0CJC5RVCOlaHX8hl5XxLxagkdyg+Dv3HuOw Cc: Glen Barber , Ian FREISLICH , "Robert N. M. Watson" , freebsd-current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 May 2013 18:16:33 -0000 On Thu, May 2, 2013 at 11:32 AM, John Baldwin wrote: > On Thursday, May 02, 2013 1:53:47 pm Ian FREISLICH wrote: >> John Baldwin wrote: >> > On Thursday, May 02, 2013 7:25:08 am Robert N. M. Watson wrote: >> > > >> > > On 2 May 2013, at 11:42, Glen Barber wrote: >> > > >> > > > Hmm. Perhaps it would be worthwhile for me to rebuild the current >> > > > kernel with DDB support. It looks like the machine has panicked a few >> > > > times over the last two weeks or so, but based on the timestamps of the >> > > > crash dumps and nagios complaints, happened during the middle of the >> > > > night when I would not have really noticed, or otherwise would have just >> > > > blamed my ISP. >> > > > >> > > > Two of the panics are ath(4) related. One looks similar to the one >> > > > referenced in this thread, similarly triggered by a CFEngine process. >> > > > >> > > > In that case, the backtrace looks like: >> > > > >> > > > #4 0xffffffff808cdbb3 at calltrap+0x8 >> > > > #5 0xffffffff807371d8 at in_pcb_lport+0x128 >> > > > #6 0xffffffff8073745a at in_pcbbind_setup+0x16a >> > > > #7 0xffffffff80737d8e at in_pcbconnect_setup+0x71e >> > > > #8 0xffffffff80737df9 at in_pcbconnect_mbuf+0x59 >> > > > #9 0xffffffff807bf29f at udp_connect+0x11f >> > > > #10 0xffffffff80680615 at kern_connectat+0x275 >> > > > >> > > > Regarding DDB though, it would be rather difficult to access the machine >> > > > if it drops to a DDB debugger session, since the machine acts as my >> > > > firewall. >> > > >> > > Thanks -- will take a look at the attached. >> > > >> > > FWIW, though, I'm worried by the number of panics you are seeing, especiall >> y >> > given that they involve multiple subsystems, and in particular, John's >> > observation about a potentially corrupted pointer. This makes me wonder >> > whether (a) you are experiencing hardware faults -- it would be worth running >> >> > some memory/cpu/etc tests and (b) if we might be seeing a software memory >> > corruption bug of some sort. >> > >> > Other users have reported this (Ian Lepore), and Peter Wemm can now reproduce >> > these at will as well, so I think this is a software bug. What might be >> > easiest if we can't figure this out from the crashdump is just to bisect the >> > offending revision. >> >> I've started a binary search. I'll let you know what that turns up. > > Thanks, and sorry for getting my Ian's mixed up. :-/ > > -- > John Baldwin I forgot to roll back one of the routers at nyi.freebsd.org and it paniced again, the same way as before: Fatal trap 9: general protection fault while in kernel mode^M cpuid = 3; apic id = 03^M instruction pointer = 0x20:0xffffffff8067284c^M stack pointer = 0x28:0xffffff8098688760^M frame pointer = 0x28:0xffffff80986887a0^M code segment = base 0x0, limit 0xfffff, type 0x1b^M = DPL 0, pres 1, long 1, def32 0, gran 1^M processor eflags = interrupt enabled, resume, IOPL = 0^M current process = 15041 (svn)^M [ thread pid 15041 tid 100208 ]^M Stopped at in_pcblookup_local+0x5c: cmpw %r12w,0x18(%rax)^M #8 0xffffffff80829dff in calltrap () at ../../../amd64/amd64/exception.S:228 #9 0xffffffff8067284c in in_pcblookup_local (pcbinfo=0xffffffff80c9e180, laddr= {s_addr = 708980576}, lport=607, lookupflags=1, cred=0xfffffe006956d700) at ../../../netinet/in_pcb.c:1438 #10 0xffffffff80672d38 in in_pcb_lport (inp=0xfffffe00098aa620, laddrp=0xffffff809845d860, lportp=0xffffff809845d86e, cred=0xfffffe006956d700, lookupflags=1) at ../../../netinet/in_pcb.c:457 #11 0xffffffff80672fba in in_pcbbind_setup (inp=0xfffffe00098aa620, nam=0x0, laddrp=0xffffff809845d900, lportp=0xffffff809845d90e, cred=0xfffffe006956d700) at ../../../netinet/in_pcb.c:615 #12 0xffffffff806738ee in in_pcbconnect_setup (inp=0xfffffe00098aa620, nam=, laddrp=0xffffff809845d9b8, lportp=0xffffff809845d9be, faddrp=0xffffff809845d9b4, fportp=0xffffff809845d9bc, oinpp=0x0, cred=0xfffffe006956d700) at ../../../netinet/in_pcb.c:1019 #13 0xffffffff80673959 in in_pcbconnect_mbuf (inp=0xfffffe00098aa620, nam=, cred=, m=0x0) at ../../../netinet/in_pcb.c:645 #14 0xffffffff806fafcf in udp_connect (so=0xfffffe002e150d48, nam=0xfffffe00264df3b0, td=0xfffffe00091df490) at ../../../netinet/udp_usrreq.c:1530 #15 0xffffffff805faea5 in kern_connectat (td=0xfffffe00091df490, dirfd=-100, fd=, sa=0xfffffe00264df3b0) at ../../../kern/uipc_syscalls.c:593 #16 0xffffffff805fafc1 in sys_connect (td=0xfffffe00091df490, uap=0xffffff809845db70) at ../../../kern/uipc_syscalls.c:559 #17 0xffffffff8083f571 in amd64_syscall (td=0xfffffe00091df490, traced=0) at subr_syscall.c:134 There's been two separate machines, at least twice each on this exact panic / trace. Always with doing a 'svn update'. Rolling back to April 5th 249172 solves it. (There's nothing particular about that rev, except it was top-of-tree when the last update was done). I see a number locking changes in the area. Note that this is UDP, most likely a dns lookup. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV