From owner-freebsd-current@FreeBSD.ORG Wed May 1 16:19:39 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id ABF535E6; Wed, 1 May 2013 16:19:39 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) by mx1.freebsd.org (Postfix) with ESMTP id 88B3A1737; Wed, 1 May 2013 16:19:39 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id D43CAB993; Wed, 1 May 2013 12:19:38 -0400 (EDT) From: John Baldwin To: Glen Barber Subject: Re: panic: in_pcblookup_local (?) Date: Wed, 1 May 2013 11:56:03 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <201304301653.13845.jhb@freebsd.org> <20130430211908.GB1621@glenbarber.us> In-Reply-To: <20130430211908.GB1621@glenbarber.us> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201305011156.03974.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 01 May 2013 12:19:38 -0400 (EDT) Cc: Ian FREISLICH , freebsd-current@freebsd.org, Robert Watson X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 May 2013 16:19:39 -0000 On Tuesday, April 30, 2013 5:19:08 pm Glen Barber wrote: > On Tue, Apr 30, 2013 at 04:53:13PM -0400, John Baldwin wrote: > > Try 'p phd' to start. INP_PCBPORTHASH is a macro, so you will > > have to do it by hand: > > > > 'p pcbinfo->ipi_porthashbase[lport & pcbinfo->ipi_porthashmask]' > > > > (That should be what 'porthash' is.) > > > > Thanks for the pointers. (Hah!) > > Hopefully this is the info you are looking for: > > Script started on Tue Apr 30 17:16:07 2013 > root@orion:/usr/obj/usr/src/sys/ORION # kgdb ./kernel.debug /var/crash/vmcore.4 > [...] > #0 doadump (textdump=) at pcpu.h:231 > 231 __asm("movq %%gs:%1,%0" : "=r" (td) > (kgdb) frame 6 > #6 0xffffffff80736cec in in_pcblookup_local (pcbinfo=0xffffffff80dc9180, laddr= > {s_addr = 50374848}, lport=339, lookupflags=1, cred=0xfffffe016cdad100) > at /usr/src/sys/netinet/in_pcb.c:1438 > 1438 LIST_FOREACH(phd, porthash, phd_hash) { > (kgdb) p phd > $1 = (struct inpcbport *) 0x9e17b100fffffe00 That is odd, that looks word-swapped, as if it should be 0xfffffe009e17b100 (which would be a more normal pointer in the kernel on amd64). > (kgdb) p pcbinfo->ipi_porthashbase[lport & pcbinfo->ipi_porthashmask] > $2 = {lh_first = 0x0} So the list is now empty. :( This feels like the list was updated out from under the pcbinfo. Looking at your earlier e-mail: (kgdb) p *pcbinfo $1 = {ipi_lock = {lock_object = {lo_name = 0xffffffff809d4d82 "udp", lo_flags = 69926912, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, ipi_listhead = 0xffffffff80dc9108, ipi_count = 28, ipi_gencnt = 535501, ipi_lastport = 21249, ipi_lastlow = 0, ipi_lasthi = 0, ipi_zone = 0xfffffe0017b60380, ipi_pcbgroups = 0x0, ipi_npcbgroups = 0, ipi_hashfields = 0, ipi_hash_lock = {lock_object = { lo_name = 0xffffffff80a03d80 "pcbinfohash", lo_flags = 69402624, lo_data = 0, lo_witness = 0x0}, rw_lock = 18446741877615517696}, ipi_hashbase = 0xfffffe00120f6000, ipi_hashmask = 127, ipi_porthashbase = 0xfffffe00120f5c04, ipi_porthashmask = 127, ipi_wildbase = 0x0, ipi_wildmask = 0, ipi_vnet = 0x0, ipi_pspare = {0x0, 0x0}} It looks like the ipi_hash_lock is locked (and udp_connect() locks it), so I think the offending code is somewhere else. Also, I can't find anything that removes an inp without hold the correct pcbinfo lock. Only thing I can think of is if the pcbinfo pointer for an inp could change, so we could maybe lock the wrong one while removing it? Hmmmmmm, you know. In in_pcbremlists() and in_pcbdrop(), we read inp_phd without holding the hash lock. I think that probably don't actaully break anything, but this feels like a locking issue of some sort. -- John Baldwin