Date: Tue, 10 Aug 2010 11:11:49 +0200 From: Andre Oppermann <andre@freebsd.org> To: Seth Jeacopello <sethj@greatbaysoftware.com> Cc: freebsd-net@freebsd.org Subject: Re: Server sporadically sending unexpected RST to Client Message-ID: <4C6117D5.2070207@freebsd.org> In-Reply-To: <D12EF2FC899A47289CE235D18F219F9E@sjeacopello> References: <B5F0E510AC5548FD83D57BE077F66866@sjeacopello> <4C5DE0D5.7090802@freebsd.org> <D12EF2FC899A47289CE235D18F219F9E@sjeacopello>
next in thread | previous in thread | raw e-mail | index | archive | help
On 09.08.2010 15:03, Seth Jeacopello wrote: > Thanks for the quick reply Andre; we have some new information. > > First I took some time to review some of the tcpdumps per your > recommendation and have not found /any/ reuse (with most dumps spanning > approx. a one hour time frame and the problem occurring toward the end of > the time frame). OK. I thought this to be the most likely source of the problem. > The client system is another FreeBSD system (we are unsure of the version at > this time). If there is no port reuse then the client OS shouldn't matter. > You may be correct about the syncache simply showing the symptoms; as we dug > deeper we began looking at changes in netisr, in particular the direct > dispatch policy modifications. We've run some tests over the weekend and > found something that seems to work for us. > > We've found that moving from 'Always Direct' to 'Hybrid' mode seems to > resolve the issue for us without any noticed consequences (setting > net.isr.direct_force=0). Can anyone comment on this setting and let us know > of any downsides or problems that may occur running in this mode? I haven't worked on the netisr code but a quick glance suggest that running in hybrid mode should be fine and not cause any further problems. > We believe that this problem is also only isolated to one of our Server > platforms (testing on our other platform is still on-going, though initial > results look good). OK. > Both platforms are Intel based (current generation vs. last generation) with > various differences, though the one that may be most relative is the change > of the on-board NIC from being 'em' based to 'igb' based (that is the > systems with the issue all have 'em' based NICs vs. 'igb' of the newer > systems). This could be red-herring as well, though I feel it's probably a > good idea to include as much information as possible when troubleshooting. It is unlikely that the network card or the driver has anything to do with it. > Thank you for all of your help and I look forward to hearing any further > thoughts on this issue. Please try the attached patch so I get better information from syncache_socket on the particular error that comes up. Socket creation and PCB setup are very complicated areas. -- Andre Index: tcp_syncache.c =================================================================== --- tcp_syncache.c (revision 211131) +++ tcp_syncache.c (working copy) @@ -627,6 +627,7 @@ struct inpcb *inp = NULL; struct socket *so; struct tcpcb *tp; + int error = 0; char *s; INP_INFO_WLOCK_ASSERT(&V_tcbinfo); @@ -675,7 +676,7 @@ } #endif inp->inp_lport = sc->sc_inc.inc_lport; - if (in_pcbinshash(inp) != 0) { + if ((error = in_pcbinshash(inp)) != 0) { /* * Undo the assignments above if we failed to * put the PCB on the hash lists. @@ -687,6 +688,12 @@ #endif inp->inp_laddr.s_addr = INADDR_ANY; inp->inp_lport = 0; + if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) { + log(LOG_DEBUG, "%s; %s: in_pcbinshash failed " + "with error %i\n", + s, __func__, error); + free(s, M_TCPLOG); + } goto abort; } #ifdef IPSEC @@ -721,9 +728,15 @@ laddr6 = inp->in6p_laddr; if (IN6_IS_ADDR_UNSPECIFIED(&inp->in6p_laddr)) inp->in6p_laddr = sc->sc_inc.inc6_laddr; - if (in6_pcbconnect(inp, (struct sockaddr *)&sin6, - thread0.td_ucred)) { + if ((error = in6_pcbconnect(inp, (struct sockaddr *)&sin6, + thread0.td_ucred)) != 0) { inp->in6p_laddr = laddr6; + if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) { + log(LOG_DEBUG, "%s; %s: in6_pcbconnect failed " + "with error %i\n", + s, __func__, error); + free(s, M_TCPLOG); + } goto abort; } /* Override flowlabel from in6_pcbconnect. */ @@ -750,9 +763,15 @@ laddr = inp->inp_laddr; if (inp->inp_laddr.s_addr == INADDR_ANY) inp->inp_laddr = sc->sc_inc.inc_laddr; - if (in_pcbconnect(inp, (struct sockaddr *)&sin, - thread0.td_ucred)) { + if ((error = in_pcbconnect(inp, (struct sockaddr *)&sin, + thread0.td_ucred)) != 0) { inp->inp_laddr = laddr; + if ((s = tcp_log_addrs(&sc->sc_inc, NULL, NULL, NULL))) { + log(LOG_DEBUG, "%s; %s: in_pcbconnect failed " + "with error %i\n", + s, __func__, error); + free(s, M_TCPLOG); + } goto abort; } }
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C6117D5.2070207>