From owner-freebsd-current@FreeBSD.ORG Sun Oct 12 14:31:01 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B284216A4B3 for ; Sun, 12 Oct 2003 14:31:01 -0700 (PDT) Received: from mail.komquats.com (h24-108-145-252.gv.shawcable.net [24.108.145.252]) by mx1.FreeBSD.org (Postfix) with ESMTP id F3D7843FB1 for ; Sun, 12 Oct 2003 14:30:59 -0700 (PDT) (envelope-from Cy.Schubert@komquats.com) Received: from cwsys.cwsent.com (cwsys [10.1.1.1]) by mail.komquats.com (Postfix) with ESMTP id 5336A824D9 for ; Sun, 12 Oct 2003 14:30:58 -0700 (PDT) Received: from cwsys (localhost [127.0.0.1]) by cwsys.cwsent.com (8.12.10/8.12.8) with ESMTP id h9CLV3s8001240 for ; Sun, 12 Oct 2003 14:31:03 -0700 (PDT) (envelope-from Cy.Schubert@uumail.gov.bc.ca) Resent-Message-Id: <200310122131.h9CLV3s8001240@cwsys.cwsent.com> Message-Id: <200310122131.h9CLV3s8001240@cwsys.cwsent.com> X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4 From: Cy Schubert X-os: FreeBSD X-Sender: cy@cwsent.com X-URL: http://www.komquats.com/ To: Garance A Drosihn In-Reply-To: Your message of "Fri, 10 Oct 2003 14:05:40 EDT." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 12 Oct 2003 09:32:02 -0700 Sender: Cy.Schubert@komquats.com Resent-To: current@freebsd.org Resent-Date: Sun, 12 Oct 2003 14:31:03 -0700 Resent-From: Cy Schubert Subject: Re: Seeing system-lockups on recent current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Cy Schubert List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Oct 2003 21:31:01 -0000 I'm seeing similar lockups, however they started shortly after the new ATA code was committed. The lockups usually occur when there's a lot of ATA activity, e.g. filesystem or fsck. At the moment I can only guess as to what the problem might be (missing interrupt is my most educated guesss) but keeping the amount of ATA I/O to a minimum does help the situation. Both machines which have suffered the problem have intel chipsets. One is a 12 year old P120 (I cannot recall the exact chipset) and the other is a PIII with an 815E chipset. On a couple of occasions I had systat running and noticed that buffers in use climbed until the system just froze, responding only to pings. In all cases all filesystems were generally "clean" just with the dirty bit set, except for filesystem on an ATA drive (/var or /export) which required considerable cleanup. Filesystems that reside on SCSI devices have yet to exhibit any symptoms, e.g. requiring anything more than resetting the dirty bit. Due to this problem I've yet to complete a portupgrade, something I've been trying to complete over the last four weeks, as it usually hangs the system within 12 hours. Cheers, -- Cy Schubert http://www.komquats.com/ BC Government . FreeBSD UNIX Cy.Schubert@osg.gov.bc.ca . cy@FreeBSD.org http://www.gov.bc.ca/ . http://www.FreeBSD.org/ In message , Garance A Drosihn writes: > For the past week or so, I have been having a frustrating time > with my freebsd-current/i386 system. It is a dual Athlon > system. It has been running -current just fine since December, > with me updating the OS every week or two. I did not update it > for most of September, and then went to update it to pick up > the recent round of security-related fixes. > > My first update run picked up a change which caused system > panics. Other people were also seeing that panic, and it > wasn't long before updates were committed to current to fix > that problem. However, ever since then my -current system > has very frequently locked up. Totally locked. The only way > to get it back is a hardware reset. > > I have rebuilt the system at least a dozen times since then. > I have built it with snapshots of /usr/src from Sept 12th > to Oct 8th (which is what it's running at the moment). I > have dropped back to a single-CPU kernel. I turned off X > (in /etc/ttys) so that doesn't start up at all. All those > attempts to get a reliable 5.x-system have not worked. > Sometimes the system will crash in the middle of a buildworld, > other times it will crash while it's basically idle and the > monitor is turned off. One time it crashed in the middle of > an installworld -- right when it was replacing /lib files. > Boy was that a headache to recover from! > > On the same PC, in a different DOS partition, is a 4.x-stable > system. If I boot into 4.x, I have no problems. I fire up > all the servers that I run, start buildworlds, run cvsup's, > and even had all the 5.x partitions mounted and was running > a infinite-loop that MD5'd every file in the 5.x system. I > had all of that going on at the same time, and the system is > fine. While in the 4.x system, I've removed /usr/src on the > 5.x system and recreated it, just in case there were some > files corrupted in there. And once the problems started, I > made a point of always removing all of /usr/obj/usr/src > before starting the buildworld, in case there were corrupted > files in there. > > I still have a few things I want to try. And I know it could > still be a hardware problem (although it bugs me that it fails > so consistently on 5.x and never fails on 4.x). Perhaps it > is just some disk-corruption problem that occurred during the > first few panics. But I thought I'd at least mention it, and > see if anyone else has been having similar problems. > > -- > Garance Alistair Drosehn = gad@gilead.netel.rpi.edu > Senior Systems Programmer or gad@freebsd.org > Rensselaer Polytechnic Institute or drosih@rpi.edu > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >