Date: Tue, 08 Jan 2008 01:58:22 +0100 From: Miroslav Lachman <000.fbsd@quip.cz> To: freebsd-stable@freebsd.org Cc: andrej@antiszoc.hu Subject: Re: Sun Fire X2100 SATA problem [was - sun x2100 gmirror problem] Message-ID: <4782CAAE.80109@quip.cz> In-Reply-To: <4613A66A.50204@quip.cz> References: <58209.195.70.43.76.1175680466.squirrel@duloc.webmedia.hu> <4613A66A.50204@quip.cz>
next in thread | previous in thread | raw e-mail | index | archive | help
Miroslav Lachman wrote: > andrej@antiszoc.hu wrote: > >> Hi, >> >> We're using gmirror on our sun fire x2100 and FreeBSD 6.1-p10. Some days >> ago I found this in the logs: >> >> Apr 1 02:12:05 x2100 kernel: ad6: WARNING - WRITE_DMA48 UDMA ICRC error >> (retrying request) LBA=612960533 >> Apr 1 02:12:05 x2100 kernel: ad6: FAILURE - WRITE_DMA48 >> status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=612960533 >> Apr 1 02:12:05 x2100 kernel: GEOM_MIRROR: Request failed (error=5). >> ad6[WRITE(offset=313835792896, length=4096)] >> Apr 1 02:12:05 x2100 kernel: GEOM_MIRROR: Device gm0: provider ad6 >> disconnected. >> >> Normally it looks like a disk error, but I think our half year old disks >> (WD RE2) shouldn't fail after this short time. Of course they have moving >> parts so they MAY fail. :( Yesterday I tried to reinit the sata channel >> and insert the disk back into the mirror. I got this: >> >> Apr 3 23:00:32 x2100 kernel: GEOM_MIRROR: Device gm0: provider ad6 >> detected. >> Apr 3 23:00:32 x2100 kernel: GEOM_MIRROR: Device gm0: rebuilding >> provider >> ad6. >> Apr 3 23:00:36 x2100 kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error >> (retrying request) LBA=245760 >> Apr 3 23:00:38 x2100 kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error >> (retrying request) LBA=392576 >> Apr 3 23:00:38 x2100 kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error >> (retrying request) LBA=392960 >> Apr 3 23:00:53 x2100 kernel: ad6: FAILURE - device detached >> >> After this, the disk disappeared from the sata channel completely. >> >> The wierd is that we used the onboard nvidia-raid and the very same error >> occured, but there was no report in the kernel the machine just don't >> asked for operating system. Later I found out that the disk was forgotten >> ~2 weeks before that reboot (data was ~2 week old on it). Otherwise that >> "forgotten/failed" disk was also half year old and was fine without a >> problem. >> >> Is there anybody who experienced something similar with SUN X2100 or any >> other servers running FreeBSD 6 and sata? >> >> Regards, >> Andras > > > Hi, > > I can confirm your problem. I have same problem on one X2100 but not on > the others. Currenty I have 4 X2100 machines, but only one with this > strange problem. The problem is not caused by HDD it self, I tried to > replace it with brand new and same error appears after few days. May be > there are some problems with cables / connectors or something on mainboard. > I am well known by problems with SATA(n) disk drives problems / > disappearing on this list and local (czech) mailing list. I had similar > problems on ASUS boards with Intel chipsets... so in my point of view - > there is something bad with SATA in general. I never had problem like > this with old good ATA drives. > > I have not solution for this problem. Disk is OK after reboot for a few > dasy or weeks... if there is somebody which can help with investigating > this kind of problem, I'll be happy to cooperate. > > output of dmesg, smartctl, gmirror etc.: > http://www.quip.cz/1/freebsd/sata-hdd-problems/2007-03-07_errors_ad6.txt > > Miroslav Lachman Just for the record - mine problem was fixed by SATA cable replacement. Machine has uptime 227 days and no more disk errors. Miroslav Lachman
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4782CAAE.80109>