Date: Sun, 13 Jan 2013 16:07:44 -0600 From: Guy Helmer <guy.helmer@gmail.com> To: Christian Peron <csjp@sqrt.ca> Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: bpf hold buffer in-use flag Message-ID: <0D770FD0-6FD1-4D01-84A3-69B610894B9D@gmail.com> In-Reply-To: <0AAF6A81-2594-449E-A38D-28DE5B055AFF@sqrt.ca> References: <9C928117-2230-4F01-9B95-B6D945AF4416@gmail.com> <0AAF6A81-2594-449E-A38D-28DE5B055AFF@sqrt.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jan 11, 2013, at 5:48 PM, Christian Peron <csjp@sqrt.ca> wrote: > Guy, >=20 > Can you please describe the race and how to reproduce it? When we = introduced zerocopy-bpf, we also introduced bpf_bufheld() and = bpf_bufreclaimed() which are effectively nops for the regular buffer = mode. Maybe we could maintain state there as well? In any case, I would = like to understand the race/and reproduce it >=20 > Thanks! Related to rwatson's comment from r175903, dropping the bpf descriptor = lock while performing the uiomove() in bpfread() enables a couple of = races: 1) simultaneous reads on a bpf device (not great), and 2) races = between bpfread() and catchpacket() (panics due to NULL pointer = dereference). The symptom I encountered was a panic in catchpacket() where the save = buffer was NULL. I added diagnostic code and found that one of the two = buffers was being lost. As I reviewed the code, it seemed that the only = way this could happen was if the hold buffer was being invalidated = (rotated) out from under these lines in bpfread(): BPFD_UNLOCK(d);=20 =20 /*=20 * Move data from hold buffer into user space.=20 * We know the entire buffer is transferred since=20 * we checked above that the read buffer is bpf_bufsize bytes.=20= *=20 * XXXRW: More synchronization needed here: what if a second = thread=20 * issues a read on the same fd at the same time? Don't want = this=20 * getting invalidated.=20 */=20 error =3D bpf_uiomove(d, d->bd_hbuf, d->bd_hlen, uio);=20 =20 BPFD_LOCK(d);=20 d->bd_fbuf =3D d->bd_hbuf; /* if other code in the driver = simultaneously sets hbuf to NULL, this loses a buffer! */ d->bd_hbuf =3D NULL;=20 d->bd_hlen =3D 0;=20 bpf_buf_reclaimed(d);=20 BPFD_UNLOCK(d);=20 My instrumented code validated this hypothesis, and I added the = bd_hbuf_in_use flag to allow dropping the descriptor lock over the = bpf_uiomove() call while protecting the hold buffer from rotation = anywhere else in the driver. In my own tree, I have left some additional = diagnostic code to make sure hbuf is still valid after uiomove = completes, and that code has not been triggered since I added the = bd_hbuf_in_use flag and associated code. The way I was able to trigger the panic in catchpacket() was to call = pcap_setfilter() more than one time on a live, promiscuous pcap = descriptor while reading from a very busy network. I was able to = temporarily work around this issue by using the BIOCSETFNR ioctl = directly on the bpf descriptor rather than using pcap_setfilter(), which = avoided calling reset_d() in the ioctl handler and thus avoided clearing = bd_hbuf in a race with bpfread(). Just as I'm writing this, I realized the code that triggered the problem = may have called pcap_setfilter() in a separate thread from the packet = reading thread. I wrote a unit test for this issue, but as it was = single-threaded, I was never able to cause the crash with it. Regarding the zero-copy code, I'm not sure how bpf_bufheld() could be = used as a general approach to solving this race. It appears to me that a = condition variable is necessary to work around the necessity of = releasing the bpf descriptor lock over the uiomove() call. Hope this helps! Guy >=20 > On 2012-11-13, at 3:40 PM, Guy Helmer wrote: >=20 >> To try to completely resolve the race in bpfread(), I have put = together these changes to add a flag to indicate when the hold buffer = cannot be modified because it is in use. Since it's my first time using = mtx_sleep() and wakeup(), I wanted to run these past the list to see if = I can get any feedback on the approach. >>=20 >>=20 >> Index: bpf.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- bpf.c (revision 242997) >> +++ bpf.c (working copy) >> @@ -819,6 +819,7 @@ bpfopen(struct cdev *dev, int flags, int fmt, = stru >> * particular buffer method. >> */ >> bpf_buffer_init(d); >> + d->bd_hbuf_in_use =3D 0; >> d->bd_bufmode =3D BPF_BUFMODE_BUFFER; >> d->bd_sig =3D SIGIO; >> d->bd_direction =3D BPF_D_INOUT; >> @@ -872,6 +873,9 @@ bpfread(struct cdev *dev, struct uio *uio, int = iof >> callout_stop(&d->bd_callout); >> timed_out =3D (d->bd_state =3D=3D BPF_TIMED_OUT); >> d->bd_state =3D BPF_IDLE; >> + while (d->bd_hbuf_in_use) >> + mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock, >> + PRINET|PCATCH, "bd_hbuf", 0); >> /* >> * If the hold buffer is empty, then do a timed sleep, which >> * ends when the timeout expires or when enough packets >> @@ -940,27 +944,27 @@ bpfread(struct cdev *dev, struct uio *uio, int = iof >> /* >> * At this point, we know we have something in the hold slot. >> */ >> + d->bd_hbuf_in_use =3D 1; >> BPFD_UNLOCK(d); >>=20 >> /* >> * Move data from hold buffer into user space. >> * We know the entire buffer is transferred since >> * we checked above that the read buffer is bpf_bufsize bytes. >> - * >> - * XXXRW: More synchronization needed here: what if a second = thread >> - * issues a read on the same fd at the same time? Don't want = this >> - * getting invalidated. >> + * >> + * We do not have to worry about simultaneous reads because >> + * we waited for sole access to the hold buffer above. >> */ >> error =3D bpf_uiomove(d, d->bd_hbuf, d->bd_hlen, uio); >>=20 >> BPFD_LOCK(d); >> - if (d->bd_hbuf !=3D NULL) { >> - /* Free the hold buffer only if it is still valid. */ >> - d->bd_fbuf =3D d->bd_hbuf; >> - d->bd_hbuf =3D NULL; >> - d->bd_hlen =3D 0; >> - bpf_buf_reclaimed(d); >> - } >> + KASSERT(d->bd_hbuf !=3D NULL, ("bpfread: lost bd_hbuf")); >> + d->bd_fbuf =3D d->bd_hbuf; >> + d->bd_hbuf =3D NULL; >> + d->bd_hlen =3D 0; >> + bpf_buf_reclaimed(d); >> + d->bd_hbuf_in_use =3D 0; >> + wakeup(&d->bd_hbuf_in_use); >> BPFD_UNLOCK(d); >>=20 >> return (error); >> @@ -1114,6 +1118,9 @@ reset_d(struct bpf_d *d) >>=20 >> BPFD_LOCK_ASSERT(d); >>=20 >> + while (d->bd_hbuf_in_use) >> + mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock, PRINET, >> + "bd_hbuf", 0); >> if ((d->bd_hbuf !=3D NULL) && >> (d->bd_bufmode !=3D BPF_BUFMODE_ZBUF || bpf_canfreebuf(d))) = { >> /* Free the hold buffer. */ >> @@ -1254,6 +1261,9 @@ bpfioctl(struct cdev *dev, u_long cmd, caddr_t = add >>=20 >> BPFD_LOCK(d); >> n =3D d->bd_slen; >> + while (d->bd_hbuf_in_use) >> + mtx_sleep(&d->bd_hbuf_in_use, = &d->bd_lock, >> + PRINET, "bd_hbuf", 0); >> if (d->bd_hbuf) >> n +=3D d->bd_hlen; >> BPFD_UNLOCK(d); >> @@ -1967,6 +1977,9 @@ filt_bpfread(struct knote *kn, long hint) >> ready =3D bpf_ready(d); >> if (ready) { >> kn->kn_data =3D d->bd_slen; >> + while (d->bd_hbuf_in_use) >> + mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock, >> + PRINET, "bd_hbuf", 0); >> if (d->bd_hbuf) >> kn->kn_data +=3D d->bd_hlen; >> } else if (d->bd_rtout > 0 && d->bd_state =3D=3D BPF_IDLE) { >> @@ -2299,6 +2312,9 @@ catchpacket(struct bpf_d *d, u_char *pkt, u_int = pk >> * spot to do it. >> */ >> if (d->bd_fbuf =3D=3D NULL && bpf_canfreebuf(d)) { >> + while (d->bd_hbuf_in_use) >> + mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock, >> + PRINET, "bd_hbuf", 0); >> d->bd_fbuf =3D d->bd_hbuf; >> d->bd_hbuf =3D NULL; >> d->bd_hlen =3D 0; >> @@ -2341,6 +2357,9 @@ catchpacket(struct bpf_d *d, u_char *pkt, u_int = pk >> ++d->bd_dcount; >> return; >> } >> + while (d->bd_hbuf_in_use) >> + mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock, >> + PRINET, "bd_hbuf", 0); >> ROTATE_BUFFERS(d); >> do_wakeup =3D 1; >> curlen =3D 0; >> Index: bpf.h >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- bpf.h (revision 242997) >> +++ bpf.h (working copy) >> @@ -1235,7 +1235,8 @@ SYSCTL_DECL(_net_bpf); >> /* >> * Rotate the packet buffers in descriptor d. Move the store buffer = into the >> * hold slot, and the free buffer ino the store slot. Zero the length = of the >> - * new store buffer. Descriptor lock should be held. >> + * new store buffer. Descriptor lock should be held. Hold buffer = must >> + * not be marked "in use". >> */ >> #define ROTATE_BUFFERS(d) do { = \ >> (d)->bd_hbuf =3D (d)->bd_sbuf; = \ >> Index: bpf_buffer.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- bpf_buffer.c (revision 242997) >> +++ bpf_buffer.c (working copy) >> @@ -189,6 +189,9 @@ bpf_buffer_ioctl_sblen(struct bpf_d *d, u_int *i) >> return (EINVAL); >> } >>=20 >> + while (d->bd_hbuf_in_use) >> + mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock, >> + PRINET, "bd_hbuf", 0); >> /* Free old buffers if set */ >> if (d->bd_fbuf !=3D NULL) >> free(d->bd_fbuf, M_BPF); >> Index: bpfdesc.h >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- bpfdesc.h (revision 242997) >> +++ bpfdesc.h (working copy) >> @@ -63,6 +63,7 @@ struct bpf_d { >> caddr_t bd_sbuf; /* store slot */ >> caddr_t bd_hbuf; /* hold slot */ >> caddr_t bd_fbuf; /* free slot */ >> + int bd_hbuf_in_use; /* don't rotate buffers */ >> int bd_slen; /* current length of store = buffer */ >> int bd_hlen; /* current length of hold buffer = */ >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0D770FD0-6FD1-4D01-84A3-69B610894B9D>