From owner-freebsd-net@FreeBSD.ORG Wed Jun 6 06:13:55 2007 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 48CB016A421 for ; Wed, 6 Jun 2007 06:13:55 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.181]) by mx1.freebsd.org (Postfix) with ESMTP id E650513C45E for ; Wed, 6 Jun 2007 06:13:54 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: by py-out-1112.google.com with SMTP id a29so72787pyi for ; Tue, 05 Jun 2007 23:13:54 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:received:received:date:from:to:cc:subject:message-id:reply-to:references:mime-version:content-type:content-disposition:in-reply-to:user-agent; b=ZhGpO1IRHwmA/0YK/Ckvj0OV/q4h18gMICX2iOly6NpB6Cu+zz/NqKVKVWl59W2OTuMWpYcVEYhYCsidI5hP0UPQrMEttsk1jJF9GByGTEdr9TVqZxiuyqjXLyx3QR4wbtFbv5sye5lLpGgSfYDhDUD4NzZfqymL4fYDwrPH2x0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:date:from:to:cc:subject:message-id:reply-to:references:mime-version:content-type:content-disposition:in-reply-to:user-agent; b=hmhVsNNY8IqXv/7VWLpTYxfqjYDGoJCXLzTUGEFZxcBrP6Dmkht4HWXh9rujnPvntmCFePQjvdlziiQzqRPDkPuHMy7zydufweydOp/cEZ/LJ8oQVgc0vCmuFva8G5NNz814mT4kF1C8VrJSpbEVjUD/4l2pyHuZmcWFyqIVS60= Received: by 10.114.209.1 with SMTP id h1mr161376wag.1181108731129; Tue, 05 Jun 2007 22:45:31 -0700 (PDT) Received: from michelle.cdnetworks.co.kr ( [211.53.35.84]) by mx.google.com with ESMTP id m10sm4258301waf.2007.06.05.22.45.28; Tue, 05 Jun 2007 22:45:29 -0700 (PDT) Received: from michelle.cdnetworks.co.kr (localhost.cdnetworks.co.kr [127.0.0.1]) by michelle.cdnetworks.co.kr (8.13.5/8.13.5) with ESMTP id l565jOwB019107 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 6 Jun 2007 14:45:24 +0900 (KST) (envelope-from pyunyh@gmail.com) Received: (from yongari@localhost) by michelle.cdnetworks.co.kr (8.13.5/8.13.5/Submit) id l565jNVe019106; Wed, 6 Jun 2007 14:45:23 +0900 (KST) (envelope-from pyunyh@gmail.com) Date: Wed, 6 Jun 2007 14:45:22 +0900 From: Pyun YongHyeon To: Paul Bielecki Message-ID: <20070606054522.GA18286@cdnetworks.co.kr> References: <2e420cc20706051003k64f829bbhd7fa38c7fc2ee29f@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2e420cc20706051003k64f829bbhd7fa38c7fc2ee29f@mail.gmail.com> User-Agent: Mutt/1.4.2.1i Cc: freebsd-net@freebsd.org Subject: Re: lge fiber-optic loose connection for 1-6s X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jun 2007 06:13:55 -0000 On Tue, Jun 05, 2007 at 06:03:20PM +0100, Paul Bielecki wrote: > Hello All > > I have network connection problems with my small database/samba server. > Machine is on small shuttle box with lge fiber-optic 1000baseSX on LAN > and rl0 to VPN connection. > Server been set up by somebody else, about 4 years ago and have not > been update since. > I have 6x FreeBSD +2x linux + 4x M$ servers, but it is only one server > I have connection problems with. > > It is FreeBSD 4.8 stable, Mysql 4.0.12, Samba 2.2.8 > > Network: 330 machines + network printers; 60 machines including this > server on 10.0.0.0/24, printers are on 10.0.0.0/22 and the rest lan is > 10.0.1.0/22, 10.0.2.0/22, 10.0.3.0/22. > Default gateway is set to host in 10.0.0.0/24. > rl link is connected to a second FreeBSD box which act only as a VPN, > network 172.16.12.0/24. > There is one main switch which connects servers and uplinks from all > rooms and buildings. > Almost all windows machines in network are up-to date and all have > anti virus software installed. > > What happen is that occasionally, from 6 to 20 times a day, all > machines seems to lose connection with this server for 1-6 seconds. > > If it happens > -I can ping google.com or other host in the same network from server > itself and I have reply (?) > -I lose my ssh connection to this server > -there is no errors or warnings in messages apart smbd errors > -samba gives me lots of "smbd read_data: read failure for 4. Error = > Operation time out" or smbd_oplock/oplock break. > -tcpdump shows lots of ACK packtes from to server on 139 > > I think that having 10.0.0.0/24 and 10.0.0.0/22 as a one big thing > doesn't help, believe that it should be set up with VLANs but I can't > change it just like that. > The second thing is that M$ network is not configured properly, there > should be one wins server or PDC, no bcasts. > > I use to just blindly watch tcpdump -v -s 255 -i lge0 port not 22 and > port not 139 and not icmp > but I dont know what should I look for. > > Let me know your thoughts and please give me some "tips" how can I > diagnose what can cause my problems. > > some help with tcpdump would be much appreciated too, > for instance: > 17:05:49.644256 0.00:01:e6:9d:07:16.452 > > 0.ff:ff:ff:ff:ff:ff.452:ipx-sap-resp 30c '0001E69D071680DDNPI9D0716' > addr 0.00:01:e6:9d:07:16 > 17:33:04.521449 802.1d config 8000.00:05:5d:1f:00:80.8002 root > 8000.00:05:5d:1f:00:80 pathcost 0 age 0 max 20 hello 2 fdelay 15 > > # printers > 17:33:07.370377 10.0.0.225.svrloc > HP-DEVICE-DISC.MCAST.NET.svrloc: > [udp sum ok] udp 151 (ttl 4, id 51568, len 179) > 17:05:18.409507 10.0.0.237.netbios-dgm > 255.255.255.255.netbios-dgm: > [udp sum ok] NBT UDP PACKET(138) (ttl 60, id 14452, len 229) > 17:05:18.757053 10.0.0.218.netbios-dgm > 255.255.255.255.netbios-dgm: > [udp sum ok] NBT UDP PACKET(138) (ttl 60, id 20727, len 229) > > # another samba server to bcast > 17:05:29.708120 10.0.0.127.33191 > 10.0.3.255.netbios-ns: [udp sum ok] > NBT UDP PACKET(137): QUERY; REQUEST; BROADCAST (DF) (ttl 64, id 0, len > 78) > > I'm unsure what caused this issue but it seems that lge(4) lacks some protections from overly-fragmented packets. Did you see "watchdog timeout" messages in console? I don't have lge(4) hardwares so it's hard to fix it. It seems that lge(4) needs the following work. - endian clean - bus_dma(9) conversion - fragment handling as the hardware can't handle more than 10 fragments. -- Regards, Pyun YongHyeon