From owner-freebsd-questions@FreeBSD.ORG Mon Apr 25 14:10:30 2005 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7DE3216A4CE for ; Mon, 25 Apr 2005 14:10:30 +0000 (GMT) Received: from new.mss1.myactv.net (new.mss1.myactv.net [24.89.0.30]) by mx1.FreeBSD.org (Postfix) with SMTP id 92A2543D46 for ; Mon, 25 Apr 2005 14:10:29 +0000 (GMT) (envelope-from chris@xecu.net) Received: (qmail 7470 invoked from network); 25 Apr 2005 14:10:28 -0000 Received: from dyn-153-112-163.myactv.net (HELO ?127.0.0.1?) (24.153.112.163) by new.mss1.myactv.net with SMTP; 25 Apr 2005 14:10:28 -0000 Message-ID: <426CFA51.800@xecu.net> Date: Mon, 25 Apr 2005 10:10:25 -0400 From: Christopher McGee User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-questions@freebsd.org References: <8390c4b6d29d135ecb6e84e5a9270ec7@xecu.net> <20050422024159.GA9555@SDF.LONESTAR.ORG> <426AFB37.5060205@xecu.net> In-Reply-To: <426AFB37.5060205@xecu.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Stability problems with 5.3-Release X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Apr 2005 14:10:30 -0000 Christopher McGee wrote: > Justin R. Pessa wrote: > >> On Apr 21 05 06:22PM, Chris McGee wrote: >> >> >>> I've got 2 identical boxes (Supermicro sys-6023P-8R) running with >>> ZCR adaptec cards with 6 73Gig seagate scsi drives, 4 Gigs of ram, >>> and dual 2.4 Ghz Xeons. Both of these machines are running >>> 5.3-Release-p8. The usually run for a day, give or take, and then >>> they crash. The just deadlock, no console response, no nothing. >>> The get power cycled and they are fine for a little while again. >>> These are configured to be mysql database servers. I can provide >>> any information necessary, but i'm stumped and it's causing me a lot >>> of heartache now. >>> >> >> >> I've had similar problems as well. One thing I noticed is that a process >> may get hung in the D state and never returns. From there it seems the >> system enters a downward spiral and everything locks up. I've had this >> problem with p6 and p7. Attached is my dmesg output. Not sure if it's >> helpful... >> I can't offer anything in the form of a solution but figured I'd chime >> in so that Chris doesn't think he's the only (crazy) one with this >> problem! ;) >> >> >> >>> Chris >>> >>> _______________________________________________ >>> freebsd-questions@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-questions >>> To unsubscribe, send any mail to >>> "freebsd-questions-unsubscribe@freebsd.org" >>> >> >> >> >> - j >> >> .__________________________________. >> | Justin R. Pessa - BOFH | www: http://jstn.sdf1.org | pgp: >> http://jstn.sdf1.org/pgp.html >> | irc: asdf @ irc.freenode.net >> ' >> >> >> ------------------------------------------------------------------------ >> >> Copyright (c) 1992-2004 The FreeBSD Project. >> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 >> The Regents of the University of California. All rights reserved. >> FreeBSD 5.3-RELEASE-p7 #3: Thu Apr 21 13:24:51 EDT 2005 >> jstn@twinturbo:/usr/obj/usr/src/sys/TWINTURBO >> Timecounter "i8254" frequency 1193182 Hz quality 0 >> CPU: Intel(R) Pentium(R) 4 CPU 1700MHz (1707.56-MHz 686-class CPU) >> Origin = "GenuineIntel" Id = 0xf0a Stepping = 10 >> Features=0x3febf9ff >> >> real memory = 268349440 (255 MB) >> avail memory = 257130496 (245 MB) >> npx0: [FAST] >> npx0: on motherboard >> npx0: INT 16 interface >> acpi0: on motherboard >> acpi0: Power Button (fixed) >> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 >> acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0 >> cpu0: on acpi0 >> acpi_button0: on acpi0 >> pcib0: port 0xcf8-0xcff on acpi0 >> pci0: on pcib0 >> agp0: mem 0xf8000000-0xfbffffff at >> device 0.0 on pci0 >> pcib1: at device 1.0 on pci0 >> pci1: on pcib1 >> pci1: at device 0.0 (no driver attached) >> pcib2: at device 30.0 on pci0 >> pci2: on pcib2 >> pcm0: port 0xd800-0xd81f irq 9 at device 6.0 on pci2 >> pcm0: >> rl0: port 0xd000-0xd0ff mem >> 0xf1800000-0xf18000ff irq 9 at device 11.0 on pci2 >> miibus0: on rl0 >> rlphy0: on miibus0 >> rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto >> rl0: Ethernet address: 00:02:2a:b3:24:a0 >> isab0: at device 31.0 on pci0 >> isa0: on isab0 >> atapci0: port >> 0xb800-0xb80f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0 >> ata0: channel #0 on atapci0 >> ata1: channel #1 on atapci0 >> uhci0: port >> 0xb400-0xb41f irq 9 at device 31.2 on pci0 >> uhci0: [GIANT-LOCKED] >> usb0: on uhci0 >> usb0: USB revision 1.0 >> uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 >> uhub0: 2 ports with 2 removable, self powered >> ums0: KYE Genius USB Wheel Mouse, rev 1.00/0.00, addr 2, iclass 3/1 >> ums0: 3 buttons and Z dir. >> pci0: at device 31.3 (no driver attached) >> uhci1: port >> 0xb000-0xb01f irq 9 at device 31.4 on pci0 >> uhci1: [GIANT-LOCKED] >> usb1: on uhci1 >> usb1: USB revision 1.0 >> uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 >> uhub1: 2 ports with 2 removable, self powered >> fdc0: port 0x3f7,0x3f2-0x3f5 irq 6 drq 2 on >> acpi0 >> fdc0: [FAST] >> fd0: <1440-KB 3.5" drive> on fdc0 drive 0 >> ppc0: port 0x778-0x77a,0x378-0x37f irq 7 >> drq 3 on acpi0 >> ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode >> ppc0: FIFO with 16/16/8 bytes threshold >> ppbus0: on ppc0 >> sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 >> on acpi0 >> sio0: type 16550A >> sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 >> sio1: type 16550A >> atkbdc0: port 0x64,0x60 irq 1 on acpi0 >> atkbd0: irq 1 on atkbdc0 >> kbd0 at atkbd0 >> atkbd0: [GIANT-LOCKED] >> orm0: at iomem 0xcc000-0xcffff,0xc0000-0xcb7ff on isa0 >> pmtimer0 on isa0 >> sc0: at flags 0x100 on isa0 >> sc0: VGA <16 virtual consoles, flags=0x300> >> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on >> isa0 >> Timecounter "TSC" frequency 1707557872 Hz quality 800 >> Timecounters tick every 10.000 msec >> acpi_cpu: throttling enabled, 8 steps (100% to 12.5%), currently 100.0% >> ad0: 19092MB [38792/16/63] at ata0-master UDMA100 >> ad1: 117246MB [238216/16/63] at ata0-slave >> UDMA100 >> acd0: CDRW at ata1-master PIO4 >> cd0 at ata1 bus 0 target 0 lun 0 >> cd0: Removable CD-ROM SCSI-0 device cd0: >> 16.000MB/s transfers >> cd0: cd present [1 x 2048 byte records] >> Mounting root from ufs:/dev/ad0s1a >> link_elf: symbol in6_cksum undefined >> link_elf: symbol in6_cksum undefined >> ums0: at uhub0 port 2 (addr 2) disconnected >> ums0: detached >> ums0: KYE Genius USB Wheel Mouse, rev 1.00/0.00, addr 2, iclass 3/1 >> ums0: 3 buttons and Z dir. >> ums0: at uhub0 port 2 (addr 2) disconnected >> ums0: detached >> ums0: KYE Genius USB Wheel Mouse, rev 1.00/0.00, addr 2, iclass 3/1 >> ums0: 3 buttons and Z dir. >> >> > I have a little more information about this problem. We use dump for > backups, and when the machines perform a dump with the -L flag, they > always crash. If you dump without the -L it usually works, but they > will still crash at random intervals. I have upgraded one of them to > 5.4-RC3 and it's been about 20 hours without a crash, but the real > test is when the dump runs. I'll update the list if I get anymore > information about this. Okay, here is the current status of the 2 machines: server1 - still on the latest patch of 5.3. It was up for a couple of days and this morning started acting funny. It didn't actually lock up, but when you issued commands, it would just go back to a prompt, regardless of the command. No errors in any of the log files, and mysql was still responding to remote queries. A reboot brought it back to normal for now. server2 - running 5.4-rc3. Locked up in the middle of doing a dump early this morning. It had been up for almost 48 hours. Once again it hard locked. This problem and unreliability is driving me crazy, unless anyone has any suggestions, I'm going to start the process to revert back to 4.x. Thanks, Chris