Date: Tue, 14 Jun 2011 12:23:03 +0300 From: Kostik Belousov <kostikbel@gmail.com> To: Mikolaj Golub <trociny@freebsd.org> Cc: freebsd-net@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org> Subject: Re: Scenario to make recv(MSG_WAITALL) stuck Message-ID: <20110614092303.GG48734@deviant.kiev.zoral.com.ua> In-Reply-To: <86pqmhn1pf.fsf@kopusha.home.net> References: <86pqmhn1pf.fsf@kopusha.home.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--3lc1OntGIaWzUKJL Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jun 13, 2011 at 07:19:40PM +0300, Mikolaj Golub wrote: > Hi, >=20 > Below is a scenario how to make recv(2) with MSG_WAITALL flag get stuck. >=20 > (See http://people.freebsd.org/~trociny/test_MSG_WAITALL.4.c for the test= code). >=20 > Let's the size of the receive buffer is SOBUF_SIZE (e.g. 10000 bytes). >=20 > On sender side do 2 send() requests: >=20 > 1) data of size much smaller than SOBUF_SIZE (e.g. SOBUF_SIZE / 10); >=20 > 2) data of size equal to SOBUF_SIZE. >=20 > After this on receiver side do 2 recv() requests with MSG_WAITALL flag se= t: >=20 > 1) recv() data of SOBUF_SIZE / 10 size; >=20 > 2) recv() data of SOBUF_SIZE size; >=20 > The second recv() will last for very long time. In tcpdump one can observe > that the window is permanently stuck at 0 and pending data is only sent v= ia > TCP window probes (so one byte every few seconds). >=20 > 18:09:14.784698 IP 127.0.0.1.53378 > 127.0.0.1.23481: Flags [S], seq 1907= 676797, win 65535, options [mss 16344,nop,wscale 3,sackOK,TS val 22207 ecr = 0], length 0 > 18:09:14.784729 IP 127.0.0.1.23481 > 127.0.0.1.53378: Flags [S.], seq 229= 8857585, ack 1907676798, win 10000, options [mss 16344,nop,wscale 3,sackOK,= TS val 2718467987 ecr 22207], length 0 > 18:09:14.784749 IP 127.0.0.1.53378 > 127.0.0.1.23481: Flags [.], ack 1, w= in 8960, options [nop,nop,TS val 22207 ecr 2718467987], length 0 > 18:09:14.785168 IP 127.0.0.1.53378 > 127.0.0.1.23481: Flags [P.], seq 1:1= 001, ack 1, win 8960, options [nop,nop,TS val 22207 ecr 2718467987], length= 1000 > 18:09:14.785264 IP 127.0.0.1.53378 > 127.0.0.1.23481: Flags [.], seq 1001= :10001, ack 1, win 8960, options [nop,nop,TS val 22207 ecr 2718467987], len= gth 9000 > 18:09:14.785280 IP 127.0.0.1.23481 > 127.0.0.1.53378: Flags [.], ack 1000= 1, win 0, options [nop,nop,TS val 2718467987 ecr 22207], length 0 > 18:09:19.784440 IP 127.0.0.1.53378 > 127.0.0.1.23481: Flags [.], seq 1000= 1:10002, ack 1, win 8960, options [nop,nop,TS val 22707 ecr 2718467987], le= ngth 1 > 18:09:19.784480 IP 127.0.0.1.23481 > 127.0.0.1.53378: Flags [.], ack 1000= 1, win 0, options [nop,nop,TS val 2718468487 ecr 22707], length 0 > 18:09:24.784439 IP 127.0.0.1.53378 > 127.0.0.1.23481: Flags [.], seq 1000= 1:10002, ack 1, win 8960, options [nop,nop,TS val 23207 ecr 2718468487], le= ngth 1 > 18:09:24.784472 IP 127.0.0.1.23481 > 127.0.0.1.53378: Flags [.], ack 1000= 2, win 0, options [nop,nop,TS val 2718468987 ecr 23207], length 0 > 18:09:29.784437 IP 127.0.0.1.53378 > 127.0.0.1.23481: Flags [.], seq 1000= 2:10003, ack 1, win 8960, options [nop,nop,TS val 23707 ecr 2718468987], le= ngth 1 > 18:09:29.784478 IP 127.0.0.1.23481 > 127.0.0.1.53378: Flags [.], ack 1000= 3, win 0, options [nop,nop,TS val 2718469487 ecr 23707], length 0 > 18:09:34.784444 IP 127.0.0.1.53378 > 127.0.0.1.23481: Flags [.], seq 1000= 3:10004, ack 1, win 8960, options [nop,nop,TS val 24207 ecr 2718469487], le= ngth 1 > 18:09:34.784486 IP 127.0.0.1.23481 > 127.0.0.1.53378: Flags [.], ack 1000= 4, win 0, options [nop,nop,TS val 2718469987 ecr 24207], length 0 > 18:09:39.784443 IP 127.0.0.1.53378 > 127.0.0.1.23481: Flags [.], seq 1000= 4:10005, ack 1, win 8960, options [nop,nop,TS val 24707 ecr 2718469987], le= ngth 1 > 18:09:39.784478 IP 127.0.0.1.23481 > 127.0.0.1.53378: Flags [.], ack 1000= 5, win 0, options [nop,nop,TS val 2718470487 ecr 24707], length 0 > 18:09:44.784442 IP 127.0.0.1.53378 > 127.0.0.1.23481: Flags [.], seq 1000= 5:10006, ack 1, win 8960, options [nop,nop,TS val 25207 ecr 2718470487], le= ngth 1 > 18:09:44.784477 IP 127.0.0.1.23481 > 127.0.0.1.53378: Flags [.], ack 1000= 6, win 0, options [nop,nop,TS val 2718470987 ecr 25207], length 0 > ... >=20 > I first noticed this issue with HAST and suspect other people observed it= with > HAST too. >=20 > Below is explanation what is going on. >=20 > We totaly filled the receiver buffer with one SOBUF_SIZE/10 size request = and > partial SOBUF_SIZE request. When the first request was processed we got > SOBUF_SIZE/10 free space. It was just enogh to recive the rest of bytes f= or > the second request, and the reciving thread went in soreceive_generic->sb= wait > here: >=20 > /* > * If we have less data than requested, block awaiting more (subj= ect > * to any timeout) if: > * 1. the current count is less than the low water mark, or > * 2. MSG_WAITALL is set, and it is possible to do the entire > * receive operation at once if we block (resid <=3D hiwat). > * 3. MSG_DONTWAIT is not set > * If MSG_WAITALL is set but resid is larger than the receive buf= fer, > * we have to do the receive in sections, and thus risk returning= a > * short count if a timeout or signal occurs after we start. > */ > if (m =3D=3D NULL || (((flags & MSG_DONTWAIT) =3D=3D 0 && > so->so_rcv.sb_cc < uio->uio_resid) && > (so->so_rcv.sb_cc < so->so_rcv.sb_lowat || > ((flags & MSG_WAITALL) && uio->uio_resid <=3D so->so_rcv.sb_h= iwat)) && > m->m_nextpkt =3D=3D NULL && (pr->pr_flags & PR_ATOMIC) =3D=3D= 0)) { > ... > error =3D sbwait(&so->so_rcv); >=20 > recvbuf is almost full but has enough space to satisfy MSG_WAITALL request > without draining data to user buffer, and soreceive waits for data. But t= he > window was closed when the buffer was filled and to avoid silly window > syndrome it opens only when available space is larger than sb_hiwat/4 or > maxseg: >=20 > tcp_output(): >=20 > /* > * Calculate receive window. Don't shrink window, > * but avoid silly window syndrome. > */ > if (recwin < (long)(so->so_rcv.sb_hiwat / 4) && > recwin < (long)tp->t_maxseg) > recwin =3D 0; >=20 > so it is stuck and pending data is only sent via TCP window probes. >=20 > It looks like the fix could be to remove this condition to block if > MSG_WAITALL is set and it is possible to do the entire receive operation = at > once, like in the patch: >=20 > http://people.freebsd.org/~trociny/uipc_socket.c.soreceive_generic.MSG_DO= NTWAIT.patch >=20 > This works for me but I am not sure this is a correct solution. >=20 > Note, the issue is not reproduced with soreceive_stream. >=20 I do not understand what then happens for the recvfrom(2) call ? Would it get some error, or 0 as return and no data, or something else ? Also, what is the MT_CONTROL chunk about ? --3lc1OntGIaWzUKJL Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk33KHcACgkQC3+MBN1Mb4iprACg1vS2OwYrzEl3p9lkyzEg0GuH 3PQAoIO+Pj62IonkyB2UzamxDS3TGX2Z =KRFM -----END PGP SIGNATURE----- --3lc1OntGIaWzUKJL--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110614092303.GG48734>