From owner-freebsd-stable@FreeBSD.ORG Wed Oct 17 21:55:40 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5EDD7532; Wed, 17 Oct 2012 21:55:40 +0000 (UTC) (envelope-from guy.helmer@gmail.com) Received: from mail-ie0-f182.google.com (mail-ie0-f182.google.com [209.85.223.182]) by mx1.freebsd.org (Postfix) with ESMTP id EF2968FC0A; Wed, 17 Oct 2012 21:55:39 +0000 (UTC) Received: by mail-ie0-f182.google.com with SMTP id k10so16544767iea.13 for ; Wed, 17 Oct 2012 14:55:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=+VPzoL2TvdeKPfNnawln5jqNkmyBemf0SdUXnawLnFU=; b=dc9CcfcZkWorCHQz0jYbalZ/Abbkc1ThndzcXkgem+YVc4iCI2B87NSN0g2ncB9fBt t2zo45pXmpONFezcy73vn36oz5G1iqwehaRg6iH/9fAGSe8bmiZskZgCslLlBdaxVVp3 DWuZ23DVaTP1n4JU4Px3HpLAequHW3R5VUEb+pWAkw0rtswaIfpdlxykxuIqwh9ds68P hMYYp+C+zIy6SpReSmBf7YdJpPVQldTJ9s8L3O6Ax8YeL4jcs8nm2+Ew082UDTUzg6Fr l2PHgC+2tN3wbFpGAMjiJDs6eVGk4n8aR2s31172dKdeQyma4IwQIgNMaldmb9JGhcrO IqMg== Received: by 10.50.135.38 with SMTP id pp6mr3069810igb.36.1350510939583; Wed, 17 Oct 2012 14:55:39 -0700 (PDT) Received: from [192.168.221.99] ([216.81.189.9]) by mx.google.com with ESMTPS id yf6sm12681211igb.0.2012.10.17.14.55.37 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 17 Oct 2012 14:55:38 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: 8.3: kernel panic in bpf.c catchpacket() From: Guy Helmer In-Reply-To: <381E3EEC-7EDB-428B-A724-434443E51A53@gmail.com> Date: Wed, 17 Oct 2012 16:55:40 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4B5399BF-4EE0-4182-8297-3BB97C4AA884@gmail.com> <59F9A36E-3DB2-4F6F-BB2A-A4C9DA76A70C@gmail.com> <5075C05E.9070800@FreeBSD.org> <1EDA1615-2CDE-405A-A725-AF7CC7D3E273@gmail.com> <381E3EEC-7EDB-428B-A724-434443E51A53@gmail.com> To: "Alexander V. Chernikov" X-Mailer: Apple Mail (2.1499) Cc: freebsd-net@freebsd.org, FreeBSD Stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2012 21:55:40 -0000 On Oct 17, 2012, at 8:58 AM, Guy Helmer wrote: > On Oct 12, 2012, at 8:54 AM, Guy Helmer wrote: >=20 >>=20 >> On Oct 10, 2012, at 1:37 PM, Alexander V. Chernikov = wrote: >>=20 >>> On 10.10.2012 00:36, Guy Helmer wrote: >>>>=20 >>>> On Oct 8, 2012, at 8:09 AM, Guy Helmer = wrote: >>>>=20 >>>>> I'm seeing a consistent new kernel panic in FreeBSD 8.3: >>>>> I'm not seeing how bd_sbuf would be NULL here. Any ideas? >>>>=20 >>>> Since I've not had any replies, I hope nobody minds if I reply with = more information. >>>>=20 >>>> This panic seems to be occasionally triggered now that my user land = code is changing the packet filter a while after the bpd device has been = opened and an initial packet filter was set (previously, my code did not = change the filter after it was initially set). >>>>=20 >>>> I'm focusing on bpf_setf() since that seems to be the place that = could be tickling a problem, and I see that bpf_setf() calls reset_d(d) = to clear the hold buffer. I have manually verified that the BPFD lock is = held during the call to reset_d(), and the lock is held every other = place that the buffers are manipulated, so I haven't been able to find = any place that seems vulnerable to losing one of the bpf buffers. Still = searching, but any help would be appreciated. >>>=20 >>> Can you please check this code on -current? >>> Locking has changed quite significantly some time ago, so there is = good chance that you can get rid of this panic (or discover different = one which is really "new") :). >>=20 >> I'm not ready to run this app on current, so I have merged revs = 229898, 233937, 233938, 233946, 235744, 235745, 235746, 235747, 236231, = 236251, 236261, 236262, 236559, and 236806 to my 8.3 checkout to get = code that should be virtually identical to current without the timestamp = changes. >>=20 >> Unfortunately, I have only been able to trigger the panic in my test = lab once -- so I'm not sure whether a lack of problems with the updated = code will be indicative of likely success in the field where this has = been trigged regularly at some sites=85 >>=20 >> Thanks, >> Guy >>=20 >=20 >=20 > FWIW, I was able to trigger the panic with the original 8.3 code again = in my test lab. With these changes resulting from merging the revs = mentioned above, I have not seen any panics in my test lab setup in two = days of load testing, and AFAIK, packet capturing seems to be working = fine. Of course, the test system panic'ed with the same problem in = catchpacket() an hour after I wrote this. (kgdb) where #0 doadump () at pcpu.h:224 #1 0xffffffff804c8280 in boot (howto=3D260) at = ../../../kern/kern_shutdown.c:441 #2 0xffffffff804c8703 in panic (fmt=3D0x0) at = ../../../kern/kern_shutdown.c:614 #3 0xffffffff8069ffad in trap_fatal (frame=3D0xffffffff809edbc0, = eva=3DVariable "eva" is not available. ) at ../../../amd64/amd64/trap.c:825 #4 0xffffffff806a02e1 in trap_pfault (frame=3D0xffffff800014a8a0, = usermode=3D0) at ../../../amd64/amd64/trap.c:741 #5 0xffffffff806a06bf in trap (frame=3D0xffffff800014a8a0) at ../../../amd64/amd64/trap.c:478 #6 0xffffffff80687cd4 in calltrap () at = ../../../amd64/amd64/exception.S:228 #7 0xffffffff8069dc06 in bcopy () at ../../../amd64/amd64/support.S:124 #8 0xffffffff8056f69e in catchpacket (d=3D0xffffff005aaaf000,=20 pkt=3D0xffffff0001f46200 "", pktlen=3D522, snaplen=3DVariable = "snaplen" is not available. ) at ../../../net/bpf.c:2240 #9 0xffffffff8056fc66 in bpf_mtap (bp=3D0xffffff0001be8c80,=20 m=3D0xffffff0001f46200) at ../../../net/bpf.c:2064 #10 0xffffffff80579c15 in ether_input (ifp=3D0xffffff0001b73800,=20 m=3D0xffffff0001f46200) at ../../../net/if_ethersubr.c:635 #11 0xffffffff802b694a in em_rxeof (rxr=3D0xffffff0001bca200, count=3D99, = done=3D0x0) at ../../../dev/e1000/if_em.c:4404 #12 0xffffffff802b6db8 in em_handle_que (context=3DVariable "context" is = not available. ) at ../../../dev/e1000/if_em.c:1494 #13 0xffffffff80506d85 in taskqueue_run_locked = (queue=3D0xffffff0001be1580) at ../../../kern/subr_taskqueue.c:250 ---Type to continue, or q to quit---q=20 Quit (kgdb) frame 8 #8 0xffffffff8056f69e in catchpacket (d=3D0xffffff005aaaf000,=20 pkt=3D0xffffff0001f46200 "", pktlen=3D522, snaplen=3DVariable = "snaplen" is not available. ) at ../../../net/bpf.c:2240 warning: Source file is more recent than executable. 2240 bpf_append_bytes(d, d->bd_sbuf, curlen, &hdr, = sizeof(hdr)); (kgdb) print *d $1 =3D {bd_next =3D {le_next =3D 0xffffff0023fff400, le_prev =3D = 0xffffff0001be8c90},=20 bd_sbuf =3D 0x0, bd_hbuf =3D 0xffffff8000ffa000 "??~P", bd_fbuf =3D = 0x0,=20 bd_slen =3D 0, bd_hlen =3D 2068, bd_bufsize =3D 8388608,=20 bd_bif =3D 0xffffff0001be8c80, bd_rtout =3D 1, bd_rfilter =3D = 0xffffff0001e6f580,=20 bd_wfilter =3D 0x0, bd_bfilter =3D 0x0, bd_rcount =3D 7, bd_dcount =3D = 0,=20 bd_promisc =3D 1 '\001', bd_state =3D 0 '\0', bd_immediate =3D 1 = '\001',=20 bd_writer =3D 0 '\0', bd_hdrcmplt =3D 1, bd_direction =3D 1, = bd_feedback =3D 0,=20 bd_async =3D 0, bd_sig =3D 23, bd_sigio =3D 0x0, bd_sel =3D {si_tdlist = =3D { tqh_first =3D 0x0, tqh_last =3D 0x0}, si_note =3D {kl_list =3D { slh_first =3D 0x0}, kl_lock =3D 0xffffffff80497920 = ,=20 kl_unlock =3D 0xffffffff804978f0 ,=20 kl_assert_locked =3D 0xffffffff804945d0 = ,=20 kl_assert_unlocked =3D 0xffffffff804945e0 = ,=20 kl_lockarg =3D 0xffffff005aaaf0d8}, si_mtx =3D 0x0}, bd_lock =3D { lock_object =3D {lo_name =3D 0xffffff0001a5fce0 "bpf", lo_flags =3D = 16973824,=20 lo_data =3D 0, lo_witness =3D 0x0}, mtx_lock =3D = 18446742974226712768},=20 bd_callout =3D {c_links =3D {sle =3D {sle_next =3D 0x0}, tqe =3D = {tqe_next =3D 0x0,=20 tqe_prev =3D 0x0}}, c_time =3D 0, c_arg =3D 0x0, c_func =3D 0,=20= c_lock =3D 0xffffff005aaaf0d8, c_flags =3D 0, c_cpu =3D 0}, bd_label = =3D 0x0,=20 bd_fcount =3D 7, bd_pid =3D 89517, bd_locked =3D 0, bd_bufmode =3D 1, = bd_wcount =3D 0,=20 bd_wfcount =3D 0, bd_wdcount =3D 0, bd_zcopy =3D 0, bd_compat32 =3D 0 = '\0'} Now, I am thinking the malloc() of the sbuf is failing but not sure = how/why -- I thought malloc(size, M_BPF, M_WAITOK) should not fail? Guy=