Date: Wed, 17 Oct 2012 16:55:40 -0500 From: Guy Helmer <guy.helmer@gmail.com> To: "Alexander V. Chernikov" <melifaro@freebsd.org> Cc: freebsd-net@freebsd.org, FreeBSD Stable <freebsd-stable@freebsd.org> Subject: Re: 8.3: kernel panic in bpf.c catchpacket() Message-ID: <FA1F07D4-C6F3-4F55-B084-749366C0DAE6@gmail.com> In-Reply-To: <381E3EEC-7EDB-428B-A724-434443E51A53@gmail.com> References: <4B5399BF-4EE0-4182-8297-3BB97C4AA884@gmail.com> <59F9A36E-3DB2-4F6F-BB2A-A4C9DA76A70C@gmail.com> <5075C05E.9070800@FreeBSD.org> <1EDA1615-2CDE-405A-A725-AF7CC7D3E273@gmail.com> <381E3EEC-7EDB-428B-A724-434443E51A53@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Oct 17, 2012, at 8:58 AM, Guy Helmer <guy.helmer@gmail.com> wrote: > On Oct 12, 2012, at 8:54 AM, Guy Helmer <guy.helmer@gmail.com> wrote: >=20 >>=20 >> On Oct 10, 2012, at 1:37 PM, Alexander V. Chernikov = <melifaro@freebsd.org> wrote: >>=20 >>> On 10.10.2012 00:36, Guy Helmer wrote: >>>>=20 >>>> On Oct 8, 2012, at 8:09 AM, Guy Helmer <guy.helmer@gmail.com> = wrote: >>>>=20 >>>>> I'm seeing a consistent new kernel panic in FreeBSD 8.3: >>>>> I'm not seeing how bd_sbuf would be NULL here. Any ideas? >>>>=20 >>>> Since I've not had any replies, I hope nobody minds if I reply with = more information. >>>>=20 >>>> This panic seems to be occasionally triggered now that my user land = code is changing the packet filter a while after the bpd device has been = opened and an initial packet filter was set (previously, my code did not = change the filter after it was initially set). >>>>=20 >>>> I'm focusing on bpf_setf() since that seems to be the place that = could be tickling a problem, and I see that bpf_setf() calls reset_d(d) = to clear the hold buffer. I have manually verified that the BPFD lock is = held during the call to reset_d(), and the lock is held every other = place that the buffers are manipulated, so I haven't been able to find = any place that seems vulnerable to losing one of the bpf buffers. Still = searching, but any help would be appreciated. >>>=20 >>> Can you please check this code on -current? >>> Locking has changed quite significantly some time ago, so there is = good chance that you can get rid of this panic (or discover different = one which is really "new") :). >>=20 >> I'm not ready to run this app on current, so I have merged revs = 229898, 233937, 233938, 233946, 235744, 235745, 235746, 235747, 236231, = 236251, 236261, 236262, 236559, and 236806 to my 8.3 checkout to get = code that should be virtually identical to current without the timestamp = changes. >>=20 >> Unfortunately, I have only been able to trigger the panic in my test = lab once -- so I'm not sure whether a lack of problems with the updated = code will be indicative of likely success in the field where this has = been trigged regularly at some sites=85 >>=20 >> Thanks, >> Guy >>=20 >=20 >=20 > FWIW, I was able to trigger the panic with the original 8.3 code again = in my test lab. With these changes resulting from merging the revs = mentioned above, I have not seen any panics in my test lab setup in two = days of load testing, and AFAIK, packet capturing seems to be working = fine. Of course, the test system panic'ed with the same problem in = catchpacket() an hour after I wrote this. (kgdb) where #0 doadump () at pcpu.h:224 #1 0xffffffff804c8280 in boot (howto=3D260) at = ../../../kern/kern_shutdown.c:441 #2 0xffffffff804c8703 in panic (fmt=3D0x0) at = ../../../kern/kern_shutdown.c:614 #3 0xffffffff8069ffad in trap_fatal (frame=3D0xffffffff809edbc0, = eva=3DVariable "eva" is not available. ) at ../../../amd64/amd64/trap.c:825 #4 0xffffffff806a02e1 in trap_pfault (frame=3D0xffffff800014a8a0, = usermode=3D0) at ../../../amd64/amd64/trap.c:741 #5 0xffffffff806a06bf in trap (frame=3D0xffffff800014a8a0) at ../../../amd64/amd64/trap.c:478 #6 0xffffffff80687cd4 in calltrap () at = ../../../amd64/amd64/exception.S:228 #7 0xffffffff8069dc06 in bcopy () at ../../../amd64/amd64/support.S:124 #8 0xffffffff8056f69e in catchpacket (d=3D0xffffff005aaaf000,=20 pkt=3D0xffffff0001f46200 "", pktlen=3D522, snaplen=3DVariable = "snaplen" is not available. ) at ../../../net/bpf.c:2240 #9 0xffffffff8056fc66 in bpf_mtap (bp=3D0xffffff0001be8c80,=20 m=3D0xffffff0001f46200) at ../../../net/bpf.c:2064 #10 0xffffffff80579c15 in ether_input (ifp=3D0xffffff0001b73800,=20 m=3D0xffffff0001f46200) at ../../../net/if_ethersubr.c:635 #11 0xffffffff802b694a in em_rxeof (rxr=3D0xffffff0001bca200, count=3D99, = done=3D0x0) at ../../../dev/e1000/if_em.c:4404 #12 0xffffffff802b6db8 in em_handle_que (context=3DVariable "context" is = not available. ) at ../../../dev/e1000/if_em.c:1494 #13 0xffffffff80506d85 in taskqueue_run_locked = (queue=3D0xffffff0001be1580) at ../../../kern/subr_taskqueue.c:250 ---Type <return> to continue, or q <return> to quit---q=20 Quit (kgdb) frame 8 #8 0xffffffff8056f69e in catchpacket (d=3D0xffffff005aaaf000,=20 pkt=3D0xffffff0001f46200 "", pktlen=3D522, snaplen=3DVariable = "snaplen" is not available. ) at ../../../net/bpf.c:2240 warning: Source file is more recent than executable. 2240 bpf_append_bytes(d, d->bd_sbuf, curlen, &hdr, = sizeof(hdr)); (kgdb) print *d $1 =3D {bd_next =3D {le_next =3D 0xffffff0023fff400, le_prev =3D = 0xffffff0001be8c90},=20 bd_sbuf =3D 0x0, bd_hbuf =3D 0xffffff8000ffa000 "??~P", bd_fbuf =3D = 0x0,=20 bd_slen =3D 0, bd_hlen =3D 2068, bd_bufsize =3D 8388608,=20 bd_bif =3D 0xffffff0001be8c80, bd_rtout =3D 1, bd_rfilter =3D = 0xffffff0001e6f580,=20 bd_wfilter =3D 0x0, bd_bfilter =3D 0x0, bd_rcount =3D 7, bd_dcount =3D = 0,=20 bd_promisc =3D 1 '\001', bd_state =3D 0 '\0', bd_immediate =3D 1 = '\001',=20 bd_writer =3D 0 '\0', bd_hdrcmplt =3D 1, bd_direction =3D 1, = bd_feedback =3D 0,=20 bd_async =3D 0, bd_sig =3D 23, bd_sigio =3D 0x0, bd_sel =3D {si_tdlist = =3D { tqh_first =3D 0x0, tqh_last =3D 0x0}, si_note =3D {kl_list =3D { slh_first =3D 0x0}, kl_lock =3D 0xffffffff80497920 = <knlist_mtx_lock>,=20 kl_unlock =3D 0xffffffff804978f0 <knlist_mtx_unlock>,=20 kl_assert_locked =3D 0xffffffff804945d0 = <knlist_mtx_assert_locked>,=20 kl_assert_unlocked =3D 0xffffffff804945e0 = <knlist_mtx_assert_unlocked>,=20 kl_lockarg =3D 0xffffff005aaaf0d8}, si_mtx =3D 0x0}, bd_lock =3D { lock_object =3D {lo_name =3D 0xffffff0001a5fce0 "bpf", lo_flags =3D = 16973824,=20 lo_data =3D 0, lo_witness =3D 0x0}, mtx_lock =3D = 18446742974226712768},=20 bd_callout =3D {c_links =3D {sle =3D {sle_next =3D 0x0}, tqe =3D = {tqe_next =3D 0x0,=20 tqe_prev =3D 0x0}}, c_time =3D 0, c_arg =3D 0x0, c_func =3D 0,=20= c_lock =3D 0xffffff005aaaf0d8, c_flags =3D 0, c_cpu =3D 0}, bd_label = =3D 0x0,=20 bd_fcount =3D 7, bd_pid =3D 89517, bd_locked =3D 0, bd_bufmode =3D 1, = bd_wcount =3D 0,=20 bd_wfcount =3D 0, bd_wdcount =3D 0, bd_zcopy =3D 0, bd_compat32 =3D 0 = '\0'} Now, I am thinking the malloc() of the sbuf is failing but not sure = how/why -- I thought malloc(size, M_BPF, M_WAITOK) should not fail? Guy=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FA1F07D4-C6F3-4F55-B084-749366C0DAE6>