Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 03 Apr 2010 15:48:50 -0500
From:      Nathan Whitehorn <nwhitehorn@freebsd.org>
To:        Kevin Day <toasty@dragondata.com>
Cc:        freebsd-ppc@freebsd.org
Subject:   Re: Xserve G4 stability (random processes crashing)
Message-ID:  <4BB7A9B2.3080901@freebsd.org>
In-Reply-To: <7F22E2B9-34FB-4E3B-981E-8D2EF73A4F64@dragondata.com>
References:  <40B1BEB2-6620-4188-BB71-F8B5ED4AA234@dragondata.com> <4BB5EE68.2040504@freebsd.org> <7F22E2B9-34FB-4E3B-981E-8D2EF73A4F64@dragondata.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Kevin Day wrote:
> On Apr 2, 2010, at 8:17 AM, Nathan Whitehorn wrote:
>
>   
>> Kevin Day wrote:
>>     
>>> Thanks to some help, we've got 8.0-STABLE running on several Xserve G4 boxes now, in both UP and SMP configurations.
>>>
>>> However, all of them are showing weird stability problems. Running OS X Server, they were completely stable for years doing pretty hard work (video encoding) with no errors. They all pass Apple's hardware burn-in, too. But, doing a "buildworld" or "buildkernel" will result in random segfaults, invalid .o files being created, or ICEs that go away after immediately retrying. (i.e. it doesn't appear to be data from the disks being cached incorrectly, I don't have to force a re-read to fix) Pure CPU tasks (like memtester from ports) work fine for days. 
>>> Are there any known issues with 8.0 on an XServe G4?
>>>
>>> -- Kevin
>>>  
>>>       
>> Could you try rolling back from 8.0-STABLE to 8.0-RELEASE on one? I think Marcel was seeing similar G4-specific problems, and it is likely to have been something introduced recently.
>> -Nathan
>>     
>
> If anything, it seems worse on -RELEASE than -STABLE.  In -STABLE I was at least able to get through a buildworld with only restarting it once, and now in -RELEASE I've restarted about 10 times and still haven't made it all the way through.
>
> Same symptoms as before, gcc giving internal compiler errors, segfaults, or corrupt .o files being produced.  Memtester (even running in parallel with buildworld) never reports any errors. I'll keep fiddling with this, but if anyone has any suggestions on where to look for some clues, it'd be appreciated.
>   
Since you say UP kernels have the same problems, other G4 machines seem 
not to have issues, and SMP G5 Xserves are completely stable, that 
points at some G4 Xserve-specific piece of hardware. I'd guess the ATA 
controller. Could you try chroot to an NFS volume mounted from a 
known-stable machine, or a USB or Firewire disk, and trying the same things?
-Nathan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BB7A9B2.3080901>