From owner-freebsd-current@FreeBSD.ORG Fri Jun 29 21:27:08 2007 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D9D1516A400 for ; Fri, 29 Jun 2007 21:27:08 +0000 (UTC) (envelope-from dwmalone@maths.tcd.ie) Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by mx1.freebsd.org (Postfix) with SMTP id 5BC4913C457 for ; Fri, 29 Jun 2007 21:27:08 +0000 (UTC) (envelope-from dwmalone@maths.tcd.ie) Received: from localhost ([127.0.0.1] helo=maths.tcd.ie) by salmon.maths.tcd.ie with SMTP id ; 29 Jun 2007 22:27:07 +0100 (BST) To: Steve Kargl In-reply-to: Your message of "Fri, 29 Jun 2007 09:32:47 PDT." <20070629163247.GA6373@troutmask.apl.washington.edu> Date: Fri, 29 Jun 2007 22:27:06 +0100 From: David Malone Message-ID: <200706292227.aa62881@salmon.maths.tcd.ie> Cc: freebsd-current@freebsd.org Subject: Re: SYNCOOKIE authentication problems X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jun 2007 21:27:08 -0000 > Jun 29 09:21:58 node11 kernel: TCP: [192.168.0.12]:54528 to [192.168.0.11]:526 OK - I can see the packets corresponding to this error by doing something like: % tcpdump -S -r synfinrstdata -n port 62391 and port 60621 17:22:01.607876 192.168.0.15.62391 > 192.168.0.11.60621: S 106955928:106955928(0) win 65535 17:22:01.607967 192.168.0.11.60621 > 192.168.0.15.62391: S 2273558377:2273558377(0) ack 106955929 win 65535 17:22:01.608514 192.168.0.15.62391 > 192.168.0.11.60621: F 106955941:106955941(0) ack 2273558378 win 260 17:22:01.609638 192.168.0.11.60621 > 192.168.0.15.62391: F 2273558378:2273558378(0) ack 106955942 win 260 17:22:01.609697 192.168.0.11.60621 > 192.168.0.15.62391: F 2273558378:2273558378(0) ack 106955942 win 260 17:22:01.610103 192.168.0.11.60621 > 192.168.0.15.62391: R 2273558379:2273558379(0) win 0 The start of this looks like a perfectly normal TCP connection - it opens normally, transfers about 12 bytes in one direction and then closes. Strangley, 192.168.0.11 then sends two FIN packets, followed by a reset. The error message produced by the kernel should have produced a reset in response, but I'm not sure I can see quite enough to see what happened. We could try to get all of the packets in the connection by doing: tcpdump -i whatever_interface -w /tmp/fulldump -s 80 then wait for an error (that is not local to this machine - I think they are going to lo0). Then note the port numbers and do: tcpdump -r /tmp/fulldump port _port1_ and port _port2_ With regard to the truss output > poll({4/POLLIN 5/POLLIN 6/POLLIN 7/POLLIN 9/POLLIN 10/POLLIN 11/POLLIN 13/POLL It looks like MPI is looking only for file discriptors to become ready for reading. I'd guess one of the file discriptors is in an error state, but MPI isn't checking for theat, so it is spinning. David.