From owner-freebsd-hackers@FreeBSD.ORG Sat Nov 3 09:23:36 2007 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 44F8016A417 for ; Sat, 3 Nov 2007 09:23:36 +0000 (UTC) (envelope-from ender@enderzone.com) Received: from www.ksdhost.com (www.ksdhost.com [75.126.66.82]) by mx1.freebsd.org (Postfix) with ESMTP id 0804213C48D for ; Sat, 3 Nov 2007 09:23:35 +0000 (UTC) (envelope-from ender@enderzone.com) Received: (qmail 56698 invoked from network); 2 Nov 2007 17:16:25 -0500 Received: from buff-broadband-ws-86.dsl.pwrtc.com (HELO ?192.168.5.100?) (64.184.124.87) by www.ksdhost.com with SMTP; 2 Nov 2007 17:16:22 -0500 Message-ID: <472BA1AD.6050408@enderzone.com> Date: Fri, 02 Nov 2007 18:16:13 -0400 From: Ender User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: "Arno J. Klaassen" References: <472A548B.50406@lxnt.info> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, Alexander Sabourenkov , sos@freebsd.org, "Matthew D. Fuller" , Thierry Herbelot Subject: Re: Patch RFC: Promise SATA300 TX4 hardware bug workaround. X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Nov 2007 09:23:36 -0000 Arno J. Klaassen wrote: > Hello, > > Alexander Sabourenkov writes: > > >> Hello. >> >> I have ported the workaround for the hardware bug that causes data >> corruption on Promise SATA300 TX4 cards to RELENG_7. >> >> Bug description: >> SATA300 TX4 hardware chokes if last PRD entry (in a dma transfer) is >> larger than 164 bytes. This was found while analysing vendor-supplied >> linux driver. >> >> Workaround: >> Split trailing PRD entry if it's larger that 164 bytes. >> >> Two supplied patches do fix problem on my machine. >> > > > definitely an improvement, but not sufficient (for my setup ) : > > amd64-releng_6 on an ASUS A8V UP (box ran rock-stable > for years i386-releng_5 with same hardware apart TX4 and > drives) > > from dmesg : > > atapci0: port 0xe000-0xe07f,0xd800-0xd8ff mem 0xfbb00000-0xfbb00fff,0xfba00000-0xfba1ffff irq 18 at device 13.0 on pci0 > ata2: on atapci0 > ata3: on atapci0 > ata4: on atapci0 > ata5: on atapci0 > atapci1: port 0xd400-0xd407,0xd000-0xd003,0xc800-0xc807,0xc400-0xc403,0xc000-0xc00f,0xb800-0xb8ff irq 20 at device 15.0 on pci0 > ata6: on atapci1 > ata7: on atapci1 > atapci2: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 15.1 on pci0 > ata0: on atapci2 > ata1: on atapci2 > > [ ... ] > > ad0: 38166MB at ata0-master UDMA100 > ad6: 476940MB at ata3-master SATA300 > ad12: 305245MB at ata6-master SATA150 > > booting from ad0 and simple gconcat over ad6 and ad12. > > Improvement : I now can fsck /dev/concat/data without > ad6 being detached > > Persistent problem : when I rsync an nfs-mounted disk to /dev/concat/data, > I get after about some Gigs of data have been transfered : > > Nov 2 16:39:55 charlotte kernel: ad6: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=268435392 > Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly > Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly > Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly > Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly > Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly > Nov 2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435392 > Nov 2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA status=ff error=ff LBA=268435392 > Nov 2 16:40:50 charlotte kernel: g_vfs_done():concat/data[WRITE(offset=137438920704, length=131072)]error = 5 > Nov 2 16:40:50 charlotte kernel: ad6: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=268435648 > Nov 2 16:40:50 charlotte kernel: ad6: WARNING - WRITE_DMA48 UDMA ICRC error (retrying request) LBA=268435648 > Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly > Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly > Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly > Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly > Nov 2 16:40:50 charlotte kernel: ad6: WARNING - SET_MULTI taskqueue timeout - completing request directly > Nov 2 16:40:50 charlotte kernel: ad6: FAILURE - WRITE_DMA48 timed out LBA=268435648 > Nov 2 16:40:50 charlotte kernel: g_vfs_done():concat/data[WRITE(offset=137439051776, length=131072)]error = 5 > > ... > > I will test again with "#define PDC_MAXLASTSGSIZE 32*4" (just to see > if that makes a difference) > > Regards, Arno > > Just a guess here, I bet that patch helped, but there are compound problems reguarding SATA on amd64 in 7.x Do a quick search for [sata] (especially g_vfs_done) in the PR database. Hopefully this removed a layer of bugs so the other ones are easyer to fix.