Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 Jan 2013 16:07:44 -0600
From:      Guy Helmer <guy.helmer@gmail.com>
To:        Christian Peron <csjp@sqrt.ca>
Cc:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: bpf hold buffer in-use flag
Message-ID:  <0D770FD0-6FD1-4D01-84A3-69B610894B9D@gmail.com>
In-Reply-To: <0AAF6A81-2594-449E-A38D-28DE5B055AFF@sqrt.ca>
References:  <9C928117-2230-4F01-9B95-B6D945AF4416@gmail.com> <0AAF6A81-2594-449E-A38D-28DE5B055AFF@sqrt.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

On Jan 11, 2013, at 5:48 PM, Christian Peron <csjp@sqrt.ca> wrote:

> Guy,
>=20
> Can you please describe the race and how to reproduce it? When we =
introduced zerocopy-bpf, we also introduced bpf_bufheld() and =
bpf_bufreclaimed() which are effectively nops for the regular buffer =
mode. Maybe we could maintain state there as well? In any case, I would =
like to understand the race/and reproduce it
>=20
> Thanks!

Related to rwatson's comment from r175903, dropping the bpf descriptor =
lock while performing the uiomove() in bpfread() enables a couple of =
races: 1) simultaneous reads on a bpf device (not great), and 2) races =
between bpfread() and catchpacket() (panics due to NULL pointer =
dereference).

The symptom I encountered was a panic in catchpacket() where the save =
buffer was NULL. I added diagnostic code and found that one of the two =
buffers was being lost. As I reviewed the code, it seemed that the only =
way this could happen was if the hold buffer was being invalidated =
(rotated) out from under these lines in bpfread():

        BPFD_UNLOCK(d);=20
=20
        /*=20
         * Move data from hold buffer into user space.=20
         * We know the entire buffer is transferred since=20
         * we checked above that the read buffer is bpf_bufsize bytes.=20=

         *=20
         * XXXRW: More synchronization needed here: what if a second =
thread=20
         * issues a read on the same fd at the same time?  Don't want =
this=20
         * getting invalidated.=20
         */=20
        error =3D bpf_uiomove(d, d->bd_hbuf, d->bd_hlen, uio);=20
=20
        BPFD_LOCK(d);=20
        d->bd_fbuf =3D d->bd_hbuf; /* if other code in the driver =
simultaneously sets hbuf to NULL, this loses a buffer! */
        d->bd_hbuf =3D NULL;=20
        d->bd_hlen =3D 0;=20
        bpf_buf_reclaimed(d);=20
        BPFD_UNLOCK(d);=20

My instrumented code validated this hypothesis, and I added the =
bd_hbuf_in_use flag to allow dropping the descriptor lock over the =
bpf_uiomove() call while protecting the hold buffer from rotation =
anywhere else in the driver. In my own tree, I have left some additional =
diagnostic code to make sure hbuf is still valid after uiomove =
completes, and that code has not been triggered since I added the =
bd_hbuf_in_use flag and associated code.

The way I was able to trigger the panic in catchpacket() was to call =
pcap_setfilter() more than one time on a live, promiscuous pcap =
descriptor while reading from a very busy network. I was able to =
temporarily work around this issue by using the BIOCSETFNR ioctl =
directly on the bpf descriptor rather than using pcap_setfilter(), which =
avoided calling reset_d() in the ioctl handler and thus avoided clearing =
bd_hbuf in a race with bpfread().

Just as I'm writing this, I realized the code that triggered the problem =
may have called pcap_setfilter() in a separate thread from the packet =
reading thread. I wrote a unit test for this issue, but as it was =
single-threaded, I was never able to cause the crash with it.



Regarding the zero-copy code, I'm not sure how bpf_bufheld() could be =
used as a general approach to solving this race. It appears to me that a =
condition variable is necessary to work around the necessity of =
releasing the bpf descriptor lock over the uiomove() call.

Hope this helps!

Guy

>=20
> On 2012-11-13, at 3:40 PM, Guy Helmer wrote:
>=20
>> To try to completely resolve the race in bpfread(), I have put =
together these changes to add a flag to indicate when the hold buffer =
cannot be modified because it is in use. Since it's my first time using =
mtx_sleep() and wakeup(), I wanted to run these past the list to see if =
I can get any feedback on the approach.
>>=20
>>=20
>> Index: bpf.c
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- bpf.c	(revision 242997)
>> +++ bpf.c	(working copy)
>> @@ -819,6 +819,7 @@ bpfopen(struct cdev *dev, int flags, int fmt, =
stru
>> 	 * particular buffer method.
>> 	 */
>> 	bpf_buffer_init(d);
>> +	d->bd_hbuf_in_use =3D 0;
>> 	d->bd_bufmode =3D BPF_BUFMODE_BUFFER;
>> 	d->bd_sig =3D SIGIO;
>> 	d->bd_direction =3D BPF_D_INOUT;
>> @@ -872,6 +873,9 @@ bpfread(struct cdev *dev, struct uio *uio, int =
iof
>> 		callout_stop(&d->bd_callout);
>> 	timed_out =3D (d->bd_state =3D=3D BPF_TIMED_OUT);
>> 	d->bd_state =3D BPF_IDLE;
>> +	while (d->bd_hbuf_in_use)
>> +		mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
>> +		    PRINET|PCATCH, "bd_hbuf", 0);
>> 	/*
>> 	 * If the hold buffer is empty, then do a timed sleep, which
>> 	 * ends when the timeout expires or when enough packets
>> @@ -940,27 +944,27 @@ bpfread(struct cdev *dev, struct uio *uio, int =
iof
>> 	/*
>> 	 * At this point, we know we have something in the hold slot.
>> 	 */
>> +	d->bd_hbuf_in_use =3D 1;
>> 	BPFD_UNLOCK(d);
>>=20
>> 	/*
>> 	 * Move data from hold buffer into user space.
>> 	 * We know the entire buffer is transferred since
>> 	 * we checked above that the read buffer is bpf_bufsize bytes.
>> -	 *
>> -	 * XXXRW: More synchronization needed here: what if a second =
thread
>> -	 * issues a read on the same fd at the same time?  Don't want =
this
>> -	 * getting invalidated.
>> +  	 *
>> +	 * We do not have to worry about simultaneous reads because
>> +	 * we waited for sole access to the hold buffer above.
>> 	 */
>> 	error =3D bpf_uiomove(d, d->bd_hbuf, d->bd_hlen, uio);
>>=20
>> 	BPFD_LOCK(d);
>> -	if (d->bd_hbuf !=3D NULL) {
>> -		/* Free the hold buffer only if it is still valid. */
>> -		d->bd_fbuf =3D d->bd_hbuf;
>> -		d->bd_hbuf =3D NULL;
>> -		d->bd_hlen =3D 0;
>> -		bpf_buf_reclaimed(d);
>> -	}
>> +	KASSERT(d->bd_hbuf !=3D NULL, ("bpfread: lost bd_hbuf"));
>> +	d->bd_fbuf =3D d->bd_hbuf;
>> +	d->bd_hbuf =3D NULL;
>> +	d->bd_hlen =3D 0;
>> +	bpf_buf_reclaimed(d);
>> +	d->bd_hbuf_in_use =3D 0;
>> +	wakeup(&d->bd_hbuf_in_use);
>> 	BPFD_UNLOCK(d);
>>=20
>> 	return (error);
>> @@ -1114,6 +1118,9 @@ reset_d(struct bpf_d *d)
>>=20
>> 	BPFD_LOCK_ASSERT(d);
>>=20
>> +	while (d->bd_hbuf_in_use)
>> +		mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock, PRINET,
>> +		    "bd_hbuf", 0);
>> 	if ((d->bd_hbuf !=3D NULL) &&
>> 	    (d->bd_bufmode !=3D BPF_BUFMODE_ZBUF || bpf_canfreebuf(d))) =
{
>> 		/* Free the hold buffer. */
>> @@ -1254,6 +1261,9 @@ bpfioctl(struct cdev *dev, u_long cmd, caddr_t =
add
>>=20
>> 			BPFD_LOCK(d);
>> 			n =3D d->bd_slen;
>> +			while (d->bd_hbuf_in_use)
>> +				mtx_sleep(&d->bd_hbuf_in_use, =
&d->bd_lock,
>> +				    PRINET, "bd_hbuf", 0);
>> 			if (d->bd_hbuf)
>> 				n +=3D d->bd_hlen;
>> 			BPFD_UNLOCK(d);
>> @@ -1967,6 +1977,9 @@ filt_bpfread(struct knote *kn, long hint)
>> 	ready =3D bpf_ready(d);
>> 	if (ready) {
>> 		kn->kn_data =3D d->bd_slen;
>> +		while (d->bd_hbuf_in_use)
>> +			mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
>> +			    PRINET, "bd_hbuf", 0);
>> 		if (d->bd_hbuf)
>> 			kn->kn_data +=3D d->bd_hlen;
>> 	} else if (d->bd_rtout > 0 && d->bd_state =3D=3D BPF_IDLE) {
>> @@ -2299,6 +2312,9 @@ catchpacket(struct bpf_d *d, u_char *pkt, u_int =
pk
>> 	 * spot to do it.
>> 	 */
>> 	if (d->bd_fbuf =3D=3D NULL && bpf_canfreebuf(d)) {
>> +		while (d->bd_hbuf_in_use)
>> +			mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
>> +			    PRINET, "bd_hbuf", 0);
>> 		d->bd_fbuf =3D d->bd_hbuf;
>> 		d->bd_hbuf =3D NULL;
>> 		d->bd_hlen =3D 0;
>> @@ -2341,6 +2357,9 @@ catchpacket(struct bpf_d *d, u_char *pkt, u_int =
pk
>> 			++d->bd_dcount;
>> 			return;
>> 		}
>> +		while (d->bd_hbuf_in_use)
>> +			mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
>> +			    PRINET, "bd_hbuf", 0);
>> 		ROTATE_BUFFERS(d);
>> 		do_wakeup =3D 1;
>> 		curlen =3D 0;
>> Index: bpf.h
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- bpf.h	(revision 242997)
>> +++ bpf.h	(working copy)
>> @@ -1235,7 +1235,8 @@ SYSCTL_DECL(_net_bpf);
>> /*
>> * Rotate the packet buffers in descriptor d.  Move the store buffer =
into the
>> * hold slot, and the free buffer ino the store slot.  Zero the length =
of the
>> - * new store buffer.  Descriptor lock should be held.
>> + * new store buffer.  Descriptor lock should be held. Hold buffer =
must
>> + * not be marked "in use".
>> */
>> #define	ROTATE_BUFFERS(d)	do {					=
\
>> 	(d)->bd_hbuf =3D (d)->bd_sbuf;					=
\
>> Index: bpf_buffer.c
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- bpf_buffer.c	(revision 242997)
>> +++ bpf_buffer.c	(working copy)
>> @@ -189,6 +189,9 @@ bpf_buffer_ioctl_sblen(struct bpf_d *d, u_int *i)
>> 		return (EINVAL);
>> 	}
>>=20
>> +	while (d->bd_hbuf_in_use)
>> +		mtx_sleep(&d->bd_hbuf_in_use, &d->bd_lock,
>> +		    PRINET, "bd_hbuf", 0);
>> 	/* Free old buffers if set */
>> 	if (d->bd_fbuf !=3D NULL)
>> 		free(d->bd_fbuf, M_BPF);
>> Index: bpfdesc.h
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- bpfdesc.h	(revision 242997)
>> +++ bpfdesc.h	(working copy)
>> @@ -63,6 +63,7 @@ struct bpf_d {
>> 	caddr_t		bd_sbuf;	/* store slot */
>> 	caddr_t		bd_hbuf;	/* hold slot */
>> 	caddr_t		bd_fbuf;	/* free slot */
>> +	int		bd_hbuf_in_use;	/* don't rotate buffers */
>> 	int 		bd_slen;	/* current length of store =
buffer */
>> 	int 		bd_hlen;	/* current length of hold buffer =
*/
>>=20
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to =
"freebsd-net-unsubscribe@freebsd.org"
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0D770FD0-6FD1-4D01-84A3-69B610894B9D>