From owner-freebsd-questions@FreeBSD.ORG Fri Jun 10 23:53:17 2011 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E6AB1065676 for ; Fri, 10 Jun 2011 23:53:17 +0000 (UTC) (envelope-from jherman@dichotomia.fr) Received: from mail.dichotomia.fr (hydrogen.dichotomia.net [91.121.82.228]) by mx1.freebsd.org (Postfix) with ESMTP id B1CD78FC1E for ; Fri, 10 Jun 2011 23:53:16 +0000 (UTC) Received: from [192.168.1.18] (bgn92-1-81-57-223-72.fbx.proxad.net [81.57.223.72]) (Authenticated sender: kha@dichotomia.fr) by sslmail.dichotomia.fr (Postfix) with ESMTPSA id ADDA63DD07A for ; Sat, 11 Jun 2011 01:31:41 +0200 (CEST) Message-ID: <4DF2A9F9.8090708@dichotomia.fr> Date: Sat, 11 Jun 2011 01:34:17 +0200 From: Jerome Herman User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-questions@freebsd.org References: <20110609005656.GA9183@thought.org> In-Reply-To: <20110609005656.GA9183@thought.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (sslmail.dichotomia.fr); Sat, 11 Jun 2011 01:31:41 +0200 (CEST) Subject: Re: Long Day's Journey into X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jun 2011 23:53:17 -0000 On 09/06/2011 02:56, Gary Kline wrote: > Well, people, > > It's been a long, long century. I've been down for 5 days. > Couldn't understand _why_ I couldn't ping anywhere [expect the > Server itself]. Finally, tho, it became more and more likely that > my FreeBSD was fine ... even tho I kept stripping the most likely > problem points. My large 16-port LinkSys router was either *it* or > it was some kind of bug unknown to geekdom. After a friend bought > me a new (and tiny) 8-port switch, yes! I could ping everywhere. > > I'm still bringing back the dozens of things I removed from ethic. > And testing new ideas. But I have a general question: have any of > you wizards who run your own domains or otherwise use a switch [or > hub] *ever* had it just-quit?! It is solid-state. Yes, the box is > within my feet/foot reach. I have accidently kicked it i suppose, > but still. > > After wandering in the wilderness for 5 days,<>, dunno. > > gary > > PS: yes, this is a serious question. 1) I like things-Cisco, and > LinkSys. I just bought this switch about 2.5 years ago, so I really > am looking for feedback. > > PPS: Another question to ask about upgrading is next. > > I had a lot of faulty switch, either going all out by themselves or doing stranger things. The most common thing is of course the defective port - One port will start spurting errors and eventually die, with little to no impact on the rest of the ports. (easy to detect : ping on one port vs ping on an other port) Another common error is the "I want full duplex" error. The switch will announce itself as full duplex before falling back to half duplex immediately. Most of the time the port will act fine, but under heavy load you will have a nice panel of network error happening one after the other. (Also easy to detect : force connected elements to half duplex for test, if everything starts working again you got your problem) Of course there is also the problem with "not so anti-loopback" switches - that cause packets to go round and round and round and round. (ping will be very inconsistent in its timing, going from a few ms to entire seconds) On pure level 2 switches I had few other problems - though two took me days to figure out : 1 - Faulty power source : The switch could simply not bear full load anymore. Various errors, packet corruption, DHCP errors, misrouting and so on. When tested port by port, functions by functions the switch would work wonders. I spent an entire week testing every boxes for virus/trojan/rootkits/DHCP rogue servers. The problem was only solved after I changed every element of the network one by one. Final diagnostic made by Netgear 2 - Memory corruption (suspected, not validated) : Everything would work fine from 9 A.M to 3 to 4 P.M for an entire branch, then the network would slow to a crawl. Rebooting the switches would solve the problem for a while and then it would be nightmare again after less than an hour. Some boxes would complain about duplicate IP addresses. We managed to find that most of the defective IP addresses converged to just one switch - from there we theorized that there was a problem with the ARP cache of the switch that would make it explode after a sufficient number of updates (since there was a lot of VPN connection made after 3PM, we imagined that it was the triggering factor). We took of the switch and replaced it, but no light came from the manufacturer to either confirm or infirm our theories. Jerome