From owner-freebsd-sparc64@FreeBSD.ORG Thu Oct 14 09:47:11 2004 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8F71316A4CE for ; Thu, 14 Oct 2004 09:47:11 +0000 (GMT) Received: from sockar.homeip.net (tourist.net1.nerim.net [62.212.109.212]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6E9F743D60 for ; Thu, 14 Oct 2004 09:47:10 +0000 (GMT) (envelope-from amon@sockar.homeip.net) Received: from sockar.homeip.net (localhost [127.0.0.1]) by sockar.homeip.net (8.12.9p2/8.12.9) with ESMTP id i9E9kpoP073612 for ; Thu, 14 Oct 2004 11:46:51 +0200 (CEST) (envelope-from amon@sockar.homeip.net) Received: (from amon@localhost) by sockar.homeip.net (8.12.9p2/8.12.9/Submit) id i9E9klHW073611 for freebsd-sparc64@freebsd.org; Thu, 14 Oct 2004 11:46:47 +0200 (CEST) (envelope-from amon) Date: Thu, 14 Oct 2004 11:46:47 +0200 From: Herve Boulouis To: freebsd-sparc64@freebsd.org Message-ID: <20041014114647.A69222@ra.aabs> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i Subject: Strange timing problems with BETA7 X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Oct 2004 09:47:11 -0000 Hi, I'm having very strange stability problems with BETA7 which seems related to timing/clock : Hardware is a Netra t 1125 with 2 CPU. Symptoms : After a fresh reboot, when I do a standard ping on any ip adress, the interval between the pings is not constant and is generally lower than the 1 second it should be by default. I sometimes also get negative latencies with ping or traceroute : # ping 62.4.16.70 PING 62.4.16.70 (62.4.16.70): 56 data bytes 64 bytes from 62.4.16.70: icmp_seq=0 ttl=60 time=-432.827 ms 64 bytes from 62.4.16.70: icmp_seq=1 ttl=60 time=1.955 ms # traceroute 62.4.16.70 traceroute to 62.4.16.70 (62.4.16.70), 64 hops max, 52 byte packets 1 gi0-12-swr102-mix-courbevoie (213.215.63.1) 436.046 ms 0.733 ms 0.611 ms 2 gi0-2-3-edou.nerim.net (194.79.130.114) 0.619 ms -434.763 ms 435.882 ms 3 gi0-3-32-svenny.nerim.net (194.79.130.1) 1.737 ms 1.435 ms 1.715 ms After a few hours of activity (this box is an ftp server), the kernel gives this kind of message : calcru: negative runtime of -893918 usec for pid 1344 (pure-ftpd) calcru: negative runtime of -761379 usec for pid 1339 (pure-ftpd) calcru: negative runtime of -1687109 usec for pid 1337 (pure-ftpd) calcru: negative runtime of -295856 usec for pid 7 (pagedaemon) calcru: runtime went backwards from 162673274 usec to 159978646 usec for pid 29 (intr2017: hme0) calcru: runtime went backwards from 33673531 usec to 30674086 usec for pid 4 (g_down) calcru: runtime went backwards from 102734677682 usec to 102731983847 usec for pid 12 (idle: cpu0) calcru: runtime went backwards from 102678868452 usec to 102678764016 usec for pid 11 (idle: cpu1) At this point, doing a netstat -Iw 1 gives nothing but the fields header. In the same fashion, pinging any ip address gives a single reply and the ping command is then stuck. (both processes are in select() state when they are stuck and are interruptible with ^C) When doing a reboot after a few hours of uptime, the reboot process seems to get stuck after killing all the running processes, I never see the kernel shutdown messages and have to power cycle the box. Some apps seem to have problems with timing too : wget gives randomly : Assertion failed: (msecs >= 0), function calc_rate, file retr.c, line 262. Abort trap (core dumped) This started when I upgraded from 5.2.1 to BETA3 and the problem is still present in BETA7 (last cvsup from Oct 5). I reseted the date according to the heads up about the mk48txx commit. I tried mpsafenet=0 with same result. My kernel config is pretty much like GENERIC except that I'm using SCHED_4BSD, maxusers 512 and ZERO_COPY_SOCKETS (no WITNESS, no INVARIANTS). Any ideas on this ? Can this be a hardware problem ? -- Herve Boulouis