Date: Fri, 29 Jun 2007 15:47:25 -0700 From: Steve Kargl <sgk@troutmask.apl.washington.edu> To: David Malone <dwmalone@maths.tcd.ie> Cc: freebsd-current@freebsd.org Subject: Re: SYNCOOKIE authentication problems Message-ID: <20070629224725.GA72396@troutmask.apl.washington.edu> In-Reply-To: <200706292227.aa62881@salmon.maths.tcd.ie> References: <20070629163247.GA6373@troutmask.apl.washington.edu> <200706292227.aa62881@salmon.maths.tcd.ie>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jun 29, 2007 at 10:27:06PM +0100, David Malone wrote: > > Jun 29 09:21:58 node11 kernel: TCP: [192.168.0.12]:54528 to [192.168.0.11]:526 > > OK - I can see the packets corresponding to this error by doing something > like: > > % tcpdump -S -r synfinrstdata -n port 62391 and port 60621 (output elided). > The start of this looks like a perfectly normal TCP connection - > it opens normally, transfers about 12 bytes in one direction and > then closes. Strangley, 192.168.0.11 then sends two FIN packets, > followed by a reset. The error message produced by the kernel should > have produced a reset in response, but I'm not sure I can see quite > enough to see what happened. > > We could try to get all of the packets in the connection by doing: > > tcpdump -i whatever_interface -w /tmp/fulldump -s 80 I'm doing this now. It seems that putting bge0 in promiscous mode has provided some stability. fulldump is currently at 2.4 GB. > > poll({4/POLLIN 5/POLLIN 6/POLLIN 7/POLLIN 9/POLLIN 10/POLLIN 11/POLLIN 13/POLL > > It looks like MPI is looking only for file discriptors to become > ready for reading. I'd guess one of the file discriptors is in an > error state, but MPI isn't checking for theat, so it is spinning. > I've both OpenMPI and MPICH2 implementation. Neither handles a disappearing process in an elegant manner. They simply assume that network is robust and 100% reliable. -- Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070629224725.GA72396>