Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 02 Nov 2007 23:03:13 +0100
From:      =?ISO-8859-1?Q?S=F8ren_Schmidt?= <sos@deepcore.dk>
To:        "Arno J. Klaassen" <arno@heho.snv.jussieu.fr>
Cc:        Thierry Herbelot <thierry@herbelot.com>, freebsd-hackers@FreeBSD.ORG, Alexander Sabourenkov <screwdriver@lxnt.info>, "Matthew D. Fuller" <fullermd@over-yonder.net>, sos@FreeBSD.ORG
Subject:   Re: Patch RFC:  Promise SATA300 TX4 hardware bug workaround.
Message-ID:  <472B9EA1.6060205@deepcore.dk>
In-Reply-To: <wpsl3oe7h9.fsf@heho.snv.jussieu.fr>
References:  <472A548B.50406@lxnt.info> <wpsl3oe7h9.fsf@heho.snv.jussieu.fr>

next in thread | previous in thread | raw e-mail | index | archive | help
Arno J. Klaassen wrote:
> definitely an improvement, but not sufficient (for my setup ) :
>
> amd64-releng_6 on an ASUS A8V UP (box ran rock-stable
> for years i386-releng_5 with same hardware apart TX4 and
> drives)
>
> from dmesg :
>
> atapci0: <Promise PDC40718 SATA300 controller> port 0xe000-0xe07f,0xd80=
0-0xd8ff mem 0xfbb00000-0xfbb00fff,0xfba00000-0xfba1ffff irq 18 at device=
 13.0 on pci0
> ata2: <ATA channel 0> on atapci0
> ata3: <ATA channel 1> on atapci0
> ata4: <ATA channel 2> on atapci0
> ata5: <ATA channel 3> on atapci0
> atapci1: <VIA 6420 SATA150 controller> port 0xd400-0xd407,0xd000-0xd003=
,0xc800-0xc807,0xc400-0xc403,0xc000-0xc00f,0xb800-0xb8ff irq 20 at device=
 15.0 on pci0
> ata6: <ATA channel 0> on atapci1
> ata7: <ATA channel 1> on atapci1
> atapci2: <VIA 8237 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x1=
77,0x376,0xfc00-0xfc0f at device 15.1 on pci0
> ata0: <ATA channel 0> on atapci2
> ata1: <ATA channel 1> on atapci2
>
> [ ... ]
>
> ad0: 38166MB <Seagate ST3402111A 3.AAJ> at ata0-master UDMA100
> ad6: 476940MB <WDC WD5000AAKS-00TMA0 12.01C01> at ata3-master SATA300
> ad12: 305245MB <WDC WD3200JD-22KLB0 08.05J08> at ata6-master SATA150
>
> booting from ad0 and simple gconcat over ad6 and ad12.
>
> Improvement : I now can fsck /dev/concat/data without
> ad6 being detached
>
> Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/da=
ta,
> I get after about some Gigs of data have been transfered :
>
> Nov  2 16:39:55 charlotte kernel: ad6: WARNING - WRITE_DMA UDMA ICRC er=
ror (retrying request) LBA=3D268435392
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSF=
ER MODE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSF=
ER MODE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCA=
CHE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCA=
CHE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue ti=
meout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA retrying (0 =
retries left) LBA=3D268435392
> Nov  2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA status=3Dff<=
BUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> error=3Dff<ICRC,UNC=
ORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MED=
IA,ILLEGAL_LENGTH> LBA=3D268435392
> Nov  2 16:40:50 charlotte kernel: g_vfs_done():concat/data[WRITE(offset=
=3D137438920704, length=3D131072)]error =3D 5
> Nov  2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA48 retrying (=
1 retry left) LBA=3D268435648
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - WRITE_DMA48 UDMA ICRC =
error (retrying request) LBA=3D268435648
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSF=
ER MODE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSF=
ER MODE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCA=
CHE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCA=
CHE taskqueue timeout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue ti=
meout - completing request directly
> Nov  2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA48 timed out =
LBA=3D268435648
> Nov  2 16:40:50 charlotte kernel: g_vfs_done():concat/data[WRITE(offset=
=3D137439051776, length=3D131072)]error =3D 5
>
> ...
>
> I will test again with "#define PDC_MAXLASTSGSIZE 32*4" (just to see
> if that makes a difference)
>  =20
One thing to try is to loose any geom raid, if raid needed use ataraid=20
instead.

I'm shuffeling boards and controllers here to try to reproduce, so far=20
no luck it "just works(tm)", it seems to depend quite heavily on the=20
"right" combination of possibly marginal HW....

-S=F8ren





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?472B9EA1.6060205>