From owner-freebsd-questions@FreeBSD.ORG Wed Jun 23 04:01:19 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C3F5D1065670 for ; Wed, 23 Jun 2010 04:01:19 +0000 (UTC) (envelope-from martin.minkus@punz.co.nz) Received: from smtp5.clear.net.nz (smtp5.clear.net.nz [203.97.33.68]) by mx1.freebsd.org (Postfix) with ESMTP id 5CD558FC12 for ; Wed, 23 Jun 2010 04:01:18 +0000 (UTC) Received: from silver.pulse.local (www.pulseenergy.co.nz [203.167.138.163]) by smtp5.clear.net.nz (CLEAR Net Mail) with ESMTP id <0L4G00H038HK3G50@smtp5.clear.net.nz> for freebsd-questions@freebsd.org; Wed, 23 Jun 2010 16:00:58 +1200 (NZST) Received: from silver.pulse.local (localhost [127.0.0.1]) by silver.pulse.local (8.13.8/8.13.8) with ESMTP id o5N40qjO018905 for ; Wed, 23 Jun 2010 16:00:53 +1200 Content-return: prohibited Date: Wed, 23 Jun 2010 16:00:52 +1200 From: Martin Minkus To: freebsd-questions Message-id: MIME-version: 1.0 x-scalix-Hops: 1 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on silver.pulse.local X-Spam-Status: No, score=-4.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, HTML_MESSAGE autolearn=ham version=3.2.5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: sshd / tcp packet corruption ? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Jun 2010 04:01:19 -0000 It seems this issue I reported below may actually be related to some kind of TCP packet corruption ? =20 Still same box. I=E2=80=99ve noticed my SSH connections into the box will= die randomly, with errors. =20 Sshd logs the following on the box itself: =20 Jun 18 11:15:32 kinetic sshd[1406]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption.=20 Jun 18 11:15:41 kinetic sshd[15746]: Accepted publickey for martinm from 10.64.10.251 port 56469 ssh2 Jun 18 11:15:58 kinetic su: nss_ldap: could not get LDAP result - Can't contact LDAP server Jun 18 11:15:58 kinetic su: martinm to root on /dev/pts/0 Jun 18 11:16:06 kinetic su: martinm to root on /dev/pts/1 Jun 18 11:16:29 kinetic sshd[15748]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption.=20 Jun 18 11:16:30 kinetic sshd[15746]: syslogin_perform_logout: logout() returned an error Jun 18 11:16:34 kinetic sshd[16511]: Accepted publickey for martinm from 10.64.10.251 port 56470 ssh2 Jun 18 11:16:41 kinetic sshd[16513]: Received disconnect from 10.64.10.251: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption.=20 Jun 18 11:16:41 kinetic sshd[16511]: syslogin_perform_logout: logout() returned an error =20 Jun 23 15:52:59 kinetic sshd[56974]: Received disconnect from 10.64.10.209: 5: Message Authentication Code did not verify (packet #75658). Data integrity has been compromised.=20 Jun 23 15:53:12 kinetic sshd[57109]: Accepted publickey for martinm from 10.64.10.209 port 9494 ssh2 Jun 23 15:53:38 kinetic su: martinm to root on /dev/pts/3 Jun 23 15:56:36 kinetic sshd[57111]: Received disconnect from 10.64.10.209: 2: Invalid packet header. This probably indicates a problem with key exchange or encryption.=20 Jun 23 15:56:44 kinetic sshd[57151]: Accepted publickey for martinm from 10.64.10.209 port 9534 ssh2 =20 My googlefu has failed me on this. =20 Any ideas what on earth this could be ? =20 Ethernet card? =20 em0: port 0xcc00-0xcc3f mem 0xfdfe0000-0xfdffffff,0xfdfc0000-0xfdfdffff irq 17 at device 7.0 on pci1 em0: [FILTER] em0: Ethernet address: 00:0e:0c:6b:d6:d3 =20 em0: flags=3D8843 metric 0 mtu 1500 =20 options=3D209b ether 00:0e:0c:6b:d6:d3 inet 10.64.10.10 netmask 0xffffff00 broadcast 10.64.10.255 media: Ethernet autoselect (1000baseT ) status: active =20 Thanks, Martin. =20 =20 From: Martin Minkus=20 Sent: Monday, 14 June 2010 11:21 To: freebsd-questions@freebsd.org Subject: FreeBSD+ZFS+Samba: open_socket_in: Protocol not supported - after a few days? =20 Samba 3.4 on FreeBSD 8-STABLE branch. After a few days I start getting weird errors and windows PC's can't access the samba share, have trouble accessing files, etc, and samba becomes totally unusable. Restarting samba doesn't fix it =E2=80=93 only a reboot does. =20 Accessing files on the ZFS pool locally is fine. Other services (like dhcpd, openldap server) on the box continue to work fine. Only samba dies and by dies I mean it can no longer service clients and windows brings up bizarre errors. Windows can access our other samba servers (on linux, etc) just fine. Kernel: =20 FreeBSD kinetic.pulse.local 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #4: Wed May 26 18:09:14 NZST 2010 martinm@kinetic.pulse.local:/usr/obj/usr/src/sys/PULSE amd64 =20 Zpool status: =20 kinetic:~$ zpool status pool: pulse state: ONLINE scrub: none requested config: =20 NAME STATE READ WRITE CKSUM pulse ONLINE 0 =20 0 0 raidz1 ONLINE 0 =20 0 0 gptid/3baa4ef3-3ef8-0ac0-f110-f61ea23352 ONLINE 0 =20 0 0 gptid/0eaa8131-828e-6449-b9ba-89ac63729d ONLINE 0 =20 0 0 gptid/77a8da7c-8e3c-184c-9893-e0b12b2c60 ONLINE 0 =20 0 0 gptid/dddb2b48-a498-c1cd-82f2-a2d2feea01 ONLINE 0 =20 0 0 =20 errors: No known data errors kinetic:~$ log.smb: [2010/06/10 17:22:39, 0] lib/util_sock.c:902(open_socket_in) open_socket_in(): socket() call failed: Protocol not supported [2010/06/10 17:22:39, 0] smbd/server.c:457(smbd_open_one_socket) smbd_open_once_socket: open_socket_in: Protocol not supported [2010/06/10 17:22:39, 2] smbd/server.c:676(smbd_parent_loop) waiting for connections log.ANYPC: [2010/06/08 19:55:55, 0] lib/util_sock.c:1491(get_peer_addr_internal) getpeername failed. Error was Socket is not connected read_fd_with_timeout: client 0.0.0.0 read error =3D Socket is not connected. The code in lib/util_sock.c, around line 902: /*********************************************************************** ***** Open a socket of the specified type, port, and address for incoming data. ************************************************************************ ****/ int open_socket_in(int type, uint16_t port, int dlevel, const struct sockaddr_storage *psock, bool rebind) { struct sockaddr_storage sock; int res; socklen_t slen =3D sizeof(struct sockaddr_in); sock =3D *psock; #if defined(HAVE_IPV6) if (sock.ss_family =3D=3D AF_INET6) { ((struct sockaddr_in6 *)&sock)->sin6_port =3D htons(port); slen =3D sizeof(struct sockaddr_in6); } #endif if (sock.ss_family =3D=3D AF_INET) { ((struct sockaddr_in *)&sock)->sin_port =3D htons(port); } res =3D socket(sock.ss_family, type, 0 ); if( res =3D=3D -1 ) { if( DEBUGLVL(0) ) { dbgtext( "open_socket_in(): socket() call failed: " ); dbgtext( "%s\n", strerror( errno ) ); } In other words, it looks like something in the kernel is exhausted (what?). I don=E2=80=99t know if tuning is required, or this is some kind= of bug? /boot/loader.conf: mvs_load=3D"YES" zfs_load=3D"YES" vm.kmem_size=3D"20G" #vfs.zfs.arc_min=3D"512M" #vfs.zfs.arc_max=3D"1536M" vfs.zfs.arc_min=3D"512M" vfs.zfs.arc_max=3D"3072M" I=E2=80=99ve played with a few sysctl settings (found these recommendatio= ns online, but they make no difference) /etc/sysctl.conf: kern.ipc.maxsockbuf=3D2097152 net.inet.tcp.sendspace=3D262144 net.inet.tcp.recvspace=3D262144 net.inet.tcp.mssdflt=3D1452 net.inet.udp.recvspace=3D65535 net.inet.udp.maxdgram=3D65535 net.local.stream.recvspace=3D65535 net.local.stream.sendspace=3D65535 Any ideas on what could possibly be going wrong? =20 Any help would be greatly appreciated! =20 Thanks, Martin