Date: Thu, 09 Jun 2011 11:31:31 -0700 From: Maxim Sobolev <sobomax@FreeBSD.org> To: Mikolaj Golub <trociny@freebsd.org> Cc: vadim_nuclight@mail.ru, Kostik Belousov <kib@FreeBSD.org>, svn-src-all@FreeBSD.org, Pawel Jakub Dawidek <pjd@FreeBSD.org> Subject: Re: svn commit: r222688 - head/sbin/hastd Message-ID: <4DF11183.3060806@FreeBSD.org> In-Reply-To: <86wrgvkv67.fsf@kopusha.home.net> References: <201106041601.p54G1Ut7016697@svn.freebsd.org> <BA66495E-AED3-459F-A5CD-69B91DB359BC@lists.zabbadoz.net> <4DEA653F.7070503@FreeBSD.org> <201106061057.p56Av3u7037614@kernblitz.nuclight.avtf.net> <4DED1CC5.1070001@FreeBSD.org> <86wrgvkv67.fsf@kopusha.home.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 6/9/2011 6:10 AM, Mikolaj Golub wrote: > >>> Hmm, not sure what exactly is wrong? Sender does 3 writes to the TCP > >>> socket - 32k, 32k and 1071 bytes, while receiver does one > >>> recv(MSG_WAITALL) with the size of 66607. So I suspect sender's kernel > >>> does deliver two 32k packets and fills up receiver's buffer or > >>> something. And the remaining 1071 bytes stay somewhere in sender's > >>> kernel indefinitely, while recv() cannot complete in receiver's. Using > >>> the same size when doing recv() solves the issue for me. > > With MSG_WAITALL, if data to receive are larger than receive buffer, after > receiving some part of data it is drained to user buffer and the protocol is > notified (sending window update) that there is some space in the receive > buffer. So, normally, there should not be an issue with the scenario described > above. But there was a race in soreceive_generic(), I believe I have fixed in > r222454, when the connection could stall in sbwait. Do you still observe the > issue with only r222454 applied? The patch makes things slightly better, but it appears that there are still some "magic" buffer sizes that got stuck somewhere. Particularly 66607 bytes in my case. You can probably easily reproduce the issue by creating large disk with data of various kind (i.e. FreeBSD UFS with source/object code for example), enabling compression and setting block size to 128kb. Then at least if you run this scenario over WAN it should stuck from time to time when hitting that "magic" size. One can probably easily write simple test case in C with server part sending 32k, 32k and 1071 bytes and receiver reading the whole message with WAITALL. Unfortunately I am overloaded right now, so it's unlikely that I would do it. > MS> MSG_WAITALL might be an issue here. I suspect receiver's kernel can't > MS> dequeue two 32k packets until the last chunk arrives. I don't have a > MS> time to look into it in detail unfortunately. > > Sorry, but I think your patch is wrong. If even it fixes the issue for you, > actually I think it does not fix but hides a real problem we have to address. > > Receiving the whole chunk at once should be more effectively because we do one > syscall instead of several. Also, if you receive in smaller chunks no need to > set MSG_WAITALL at all. > > Besides, with your patch I am observing hangs on primary startup in > > init_remote->primary_connect->proto_connection_recv->proto_common_recv > > The primary worker process asks the parent to connect to the secondary. After > establishing the connection the parent sends connection protocol name and > descriptor to the worker (proto_connection_send/proto_connection_recv). The > issue here is that in proto_connection_recv() the size of protoname is > unknown, so it calls proto_common_recv() with size = 127, larger than > protoname ("tcp"). > > It worked previously because after sending protoname proto_connection_send() > sends the descriptor calling sendmsg(). This is data of different type and it > makes recv() return although only 4 bytes of 127 requested were received. > > With your patch, after receiving these 4 bytes it returns back to recv() > waiting for rest 123 bytes and gets stuck forever. Don't you observe this? It > is strange, because for me it hangs on every start up. I am seeing this on > yesterday current. Yes, you are right. It appears that I did not test new code on primary, only on secondary. Which explains why I did not see that issue. Can you please try the following patch and let me know if it solves the issue for you? http://sobomax.sippysoft.com/hastd.diff -Maxim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DF11183.3060806>