From owner-freebsd-questions@FreeBSD.ORG Wed Jun 23 21:43:46 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 98E87106564A for ; Wed, 23 Jun 2010 21:43:46 +0000 (UTC) (envelope-from martin.minkus@punz.co.nz) Received: from smtp5.clear.net.nz (smtp5.clear.net.nz [203.97.33.68]) by mx1.freebsd.org (Postfix) with ESMTP id 604388FC1A for ; Wed, 23 Jun 2010 21:43:45 +0000 (UTC) Received: from silver.pulse.local (mail.pulseenergy.co.nz [203.167.138.163]) by smtp5.clear.net.nz (CLEAR Net Mail) with ESMTP id <0L4H004M2LOV6E20@smtp5.clear.net.nz> for freebsd-questions@freebsd.org; Thu, 24 Jun 2010 09:43:44 +1200 (NZST) Received: from silver.pulse.local (localhost [127.0.0.1]) by silver.pulse.local (8.13.8/8.13.8) with ESMTP id o5NLhg5P006937 for ; Thu, 24 Jun 2010 09:43:43 +1200 Content-return: prohibited Date: Thu, 24 Jun 2010 09:43:42 +1200 From: Martin Minkus In-reply-to: <44hbkt4ecf.fsf@be-well.ilk.org> To: freebsd-questions Message-id: MIME-version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: quoted-printable Content-disposition: inline x-scalix-Hops: 1 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on silver.pulse.local X-Spam-Status: No, score=-4.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 Subject: RE: sshd / tcp packet corruption ? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jun 2010 21:43:46 -0000 Thanks for the reply. I actually posted a response to this original=20 message with more details showing just raw tcp data sent from one box to=20 another box is getting corrupted. The culprit is definitely kinetic. Futhermore, i've determined both NICs are doing it. kinetic:~# netstat -i Name Mtu Network Address Ipkts Ierrs Idrop =20 Opkts Oerrs Coll em0 1500 00:0e:0c:6b:d6:d3 222249 0 0 =20 190062 0 0 em0 1500 10.64.10.0 kinetic 198516 - - =20 189315 - - nfe0 1500 00:24:1d:15:11:48 17932 0 0 =20 219 0 0 nfe0 1500 10.64.11.0 10.64.11.253 12675 - - =20 217 - - plip0 1500 0 0 0 =20 0 0 0 lo0 16384 592 0 0 =20 592 0 0 lo0 16384 fe80:4::1 fe80:4::1 0 - - =20 0 - - lo0 16384 localhost ::1 0 - - =20 0 - - lo0 16384 your-net localhost 552 - - =20 592 - - kinetic:~#=20 Perhaps it is ram, though.... good point. I'll do a memtest. Martin. -----Original Message----- From: Lowell Gilbert [mailto:freebsd-questions-local@be-well.ilk.org]=20 Sent: Thursday, 24 June 2010 09:41 To: Martin Minkus Cc: freebsd-questions Subject: Re: sshd / tcp packet corruption ? Martin Minkus writes: > It seems this issue I reported below may actually be related to some > kind of TCP packet corruption ? Possible. Or memory errors. Hard to say much at this point, when you don't even know which side is actually causing the errors. > Still same box. Ive noticed my SSH connections into the box will die > randomly, with errors. > > =20 > > Sshd logs the following on the box itself: > > =20 > > Jun 18 11:15:32 kinetic sshd[1406]: Received disconnect from > 10.64.10.251: 2: Invalid packet header. This probably indicates a > problem with key exchange or encryption.=20 > You might find more useful information by getting verbose messages from the other end. =20 I don't have time to check this in detail, but if I recall correctly, that message means that the other side closed the connection based on an apparent invalid header type in a packet that 'kinetic' received. Random corruption isn't likely in that case, because the error is always in the same place in the packet. Check the 'netstat -i' numbers to see if the drivers are picking up any packet errors. It's hard to debug network problems in ssh, though, because (obviously) you can't tell in general whether packet data is corrupt. If you can set up a test case with, say, UDP echo, that would be easier to see the damage to the packets if they are, in fact, being corrupted. =20 Unfortunately, I'm so used to having sophisticated test equipment in the lab to look at these kinds of problems that I'm probably missing what would be obvious to someone who deals with problems "in the field." Hope I've been somewhat helpful anyway.