Date: Tue, 5 Jun 2007 15:38:51 -0700 (PDT) From: youshi10@u.washington.edu To: "N. Harrington" <drumslayer2@yahoo.com> Cc: questions@freebsd.org Subject: Re: How to solve mysterious system lockups? Message-ID: <Pine.LNX.4.43.0706051538510.27212@hymn09.u.washington.edu> In-Reply-To: <362995.35822.qm@web34505.mail.mud.yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 5 Jun 2007, N. Harrington wrote: > > --- Garrett Cooper <youshi10@u.washington.edu> wrote: > >> N. Harrington wrote: >>> --- Garrett Cooper <youshi10@u.washington.edu> >> wrote: >>> >>> >>>> N. Harrington wrote: >>>> >>>>> Hello >>>>> I have several systems that are used as squid >>>>> caching servers. I have some systems that use >> SCSI >>>>> disks and some that use SATA disks. They are >>>>> identical in everyway except for the sata vs >> SCSI >>>>> drives. >>>>> >>>>> At random times, the sata based systems seem to >>>>> >>>> be >>>> >>>>> freezing. You can ping them and they respond, >> but >>>>> >>>> you >>>> >>>>> cannot log in. Nor are any logs processed during >>>>> >>>> that >>>> >>>>> time. >>>>> >>>>> I figure it mist be something to do with the >>>>> >>>> disks, >>>> >>>>> but I am not sure how to solve it. There seems >> to >>>>> >>>> be >>>> >>>>> little rhyme or reason. It does not happen >>>>> >>>> necessarily >>>> >>>>> during busy times. It can happen in the middle >> of >>>>> >>>> the >>>> >>>>> night. >>>>> >>>>> Any pointers in how to track down the cause >> would >>>>> >>>> be >>>> >>>>> much appreciated. >>>>> >>>>> Tyan S2881 Motherboard - 4gigs mem >>>>> Using 4 SATA (or scsi) drives >>>>> FreeBSD amd64 6.2-STABLE. >>>>> >>>>> Thanks! >>>>> >>>>> Nicole >>>>> >>>>> >>>> Nicole, >>>> What's the driver in use for the SATA and the >>>> SCSI drives? >>>> -Garrett >>>> >>> >>> Hi Garret >>> Here is the driver info. >>> >>> -- SATA >>> >>> atapci0: <SiI 3114 SATA150 controller> port >>> >> > 0xbc00-0xbc07,0xb400-0xb403,0xb000-0xb007,0xac00-0xac03,0xa800-0xa80f >>> >>> mem >>> 0xfeafec00-0xfeafefff irq 17 at device 5.0 on pci3 >>> ata2: <ATA channel 0> on atapci0 >>> ata3: <ATA channel 1> on atapci0 >>> ata4: <ATA channel 2> on atapci0 >>> ata5: <ATA channel 3> on atapci0 >>> pci3: <display, VGA> at device 6.0 (no driver >>> attached) >>> isab0: <PCI-ISA bridge> at device 7.0 on pci0 >>> isa0: <ISA bus> on isab0 >>> atapci1: <AMD 8111 UDMA133 controller> port >>> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf >> at >>> device 7.1 on pci0 >>> ata0: <ATA channel 0> on atapci1 >>> ata1: <ATA channel 1> on atapci1 >>> pci0: <serial bus, SMBus> at device 7.2 (no driver >>> attached) >>> pci0: <bridge> at device 7.3 (no driver attached) >>> pcib2: <ACPI PCI-PCI bridge> at device 10.0 on >> pci0 >>> pci2: <ACPI PCI bus> on pcib2 >>> >>> -- SCSI >>> >>> ahd0: <Adaptec AIC7902 Ultra320 SCSI adapter> port >> >>> 0x8000-0x80ff,0x7800-0x78ff >>> mem 0xfc89c000-0xfc89dfff irq 24 at device 10.0 on >>> pci2 >>> ahd0: [GIANT-LOCKED] >>> aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X >>> 67-100Mhz, 512 SCBs >>> ahd1: <Adaptec AIC7902 Ultra320 SCSI adapter> port >> >>> 0x8800-0x88ff,0x8400-0x84ff >>> mem 0xfc89e000-0xfc89ffff irq 25 at device 10.1 on >>> pci2 >>> ahd1: [GIANT-LOCKED] >>> aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X >>> 67-100Mhz, 512 SCBs >>> pci0: <base peripheral, interrupt controller> at >>> device 10.1 (no driver attached) >>> pcib3: <ACPI PCI-PCI bridge> at device 11.0 on >> pci0 >>> pci1: <ACPI PCI bus> on pcib3 >>> pci0: <base peripheral, interrupt controller> at >>> device 11.1 (no driver attached) >>> >>> >>> >>> Thanks! >>> >>> Nicole >> Ok, so it's an AMD 8111 northbridge versus an >> Adaptec onboard SCSI >> controller. >> >> 1. What release / version of FreeBSD are you using? >> You should upgrade >> to 6.2 STABLE because there have been a variety of >> issues worked out in previous releases. > > I have a range of Versions from 6.1-Pre to 6.2-STABLE > as of a few months ago. > >> 2. Do you have any logs for activity during the >> hours when it locks up >> (in particular anything interesting / fishy popping >> up)? > > Nope. That would make it too easy :) > They commit suicide without a note. > >> 3. What scheduler are you using? 4BSD, ULE? > > 4BSD > >> 4. Does your machine (using the SATA controllers) >> lock up under heavy >> load? If so, you may have a northbridge cooling >> issue that you need to >> put a fan on. For instance, the motherboard that I >> was using for a while >> (ASUS P5N-E SLI) was really close to my CPU >> heatsink, and there was a >> lot of heat transfer between my northbridge and CPU >> heatsink, which was >> raising the onboard temperatures 5~10 degrees C. The >> new motherboard >> (ASUS P5B DLX) doesn't do that though. > > The lockups seem rather random. I have healthd > running and they never seem to show very warm. The > room is cold and the servers have great fans. Altho > healthd can seem wonky as the cpu temp has actually > gone below the minimum. Also the -2Volt line seems > very low. But some servers runs forever that way. > > At least with SCSI, since it seems to manage itself > as another layer away from the system, you get some > error messages. Sort of like windows 3.1 dropping to > dos. Verses sata issues where it's just blue screen of > death but without even some debugging code. > > I am going to try the patch chuck Swiger sent me and > see how that effects things. Also try a few > replacement sata cards. Altho that is always fun > especially in 1U servers. As well as seeing if using > SAS drives may help if I can find some cheap enough. > Do you think that using the ULE scheduler could > really help? Don't try it in 6.x. It's not stable by any means. 7-CURRENT's getting a lot closer though, especially as of late (past week).. -Garrett
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.43.0706051538510.27212>