From owner-freebsd-current@FreeBSD.ORG Fri Jun 29 17:49:11 2007 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 824F316A41F for ; Fri, 29 Jun 2007 17:49:11 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.208.78.105]) by mx1.freebsd.org (Postfix) with ESMTP id 4D8D513C46C for ; Fri, 29 Jun 2007 17:49:11 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.1/8.13.8) with ESMTP id l5THmSTd007014; Fri, 29 Jun 2007 10:48:28 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.1/8.13.8/Submit) id l5THmOSA007013; Fri, 29 Jun 2007 10:48:24 -0700 (PDT) (envelope-from sgk) Date: Fri, 29 Jun 2007 10:48:24 -0700 From: Steve Kargl To: David Malone Message-ID: <20070629174824.GA6989@troutmask.apl.washington.edu> References: <20070628014311.GA50012@troutmask.apl.washington.edu> <20070629105140.GA51586@walton.maths.tcd.ie> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070629105140.GA51586@walton.maths.tcd.ie> User-Agent: Mutt/1.4.2.2i Cc: freebsd-current@freebsd.org Subject: Re: SYNCOOKIE authentication problems X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jun 2007 17:49:11 -0000 On Fri, Jun 29, 2007 at 11:51:40AM +0100, David Malone wrote: > On Wed, Jun 27, 2007 at 06:43:11PM -0700, Steve Kargl wrote: > > Any advice on how to isolate or avoid? > > > > Jun 27 18:31:19 node11 kernel: TCP: [192.168.0.11]:59661 to > > [192.168.0.11]:63266 tcpflags 0x10; syncache_expand: Segment failed > > SYNCOOKIE authentication, segment rejected (probably spoofed) > > It looks like you tried to open a TCP connection to yourself, but > the connection failed. You could try leaving a tcpdump running: > > tcpdump -i whatever_interface -w /tmp/synfinrstdata -s 1500 'tcp[tcpflags] & (tcp-syn|tcp-fin|tcp-rst) != 0' > > while your MPI app runs and then we can have a look at the packets > that caused the problem. The above should collect all TCP SYN, FIN > and RST packets, which would probably be enough to diagnose the > problem. > Another tidbit, once the MPI app started to trash, I ran truss on rank=0 process. I have a very file containing a sigprocmask(SIG_UNBLOCK,SIGCHLD,0x0) = 0 (0x0) poll({4/POLLIN 5/POLLIN 6/POLLIN 7/POLLIN 9/POLLIN 10/POLLIN 11/POLLIN 13/POLLIN 8/POLLIN 12/POLLIN 14/POLLIN 15/POLLIN 16/POLLIN 17/POLLIN 18/POLLIN 19/POLLIN 20/POLLIN 21/POLLIN 22/POLLIN 23/POLLIN 24/POLLIN 25/POLLIN 26/POLLIN 27/POLLIN 28/POLLIN 29/POLLIN 30/POLLIN},27,0) = 0 (0x0) sigprocmask(SIG_BLOCK,SIGCHLD,0x0) = 0 (0x0) sigaction(SIGCHLD,{ 0x3c0d2c850 SA_RESTART ss_t },0x0) = 0 (0x0) gettimeofday({1183138884.532826},0x0) = 0 (0x0) sched_yield(0x3c1d44180,0x3c0b39ec0,0x0,0x3c0b39ec0,0x3c1d44280) = 0 (0x0) sigprocmask(SIG_BLOCK,SIGCHLD,0x0) = 0 (0x0) sigaction(SIGCHLD,{ 0x3c0d2c850 SA_RESTART ss_t },0x0) = 0 (0x0) gettimeofday({1183138884.535137},0x0) = 0 (0x0) sigprocmask(SIG_UNBLOCK,SIGCHLD,0x0) = 0 (0x0) poll({4/POLLIN 5/POLLIN 6/POLLIN 7/POLLIN 9/POLLIN 10/POLLIN 11/POLLIN 13/POLLIN 8/POLLIN 12/POLLIN 14/POLLIN 15/POLLIN 16/POLLIN 17/POLLIN 18/POLLIN 19/POLLIN 20/POLLIN 21/POLLIN 22/POLLIN 23/POLLIN 24/POLLIN 25/POLLIN 26/POLLIN 27/POLLIN 28/POLLIN 29/POLLIN 30/POLLIN},27,0) = 0 (0x0) sigprocmask(SIG_BLOCK,SIGCHLD,0x0) = 0 (0x0) sigaction(SIGCHLD,{ 0x3c0d2c850 SA_RESTART ss_t },0x0) = 0 (0x0) gettimeofday({1183138884.538484},0x0) = 0 (0x0) ad nausem I'm using the 4BSD scheduler. -- Steve