From owner-freebsd-current@FreeBSD.ORG Sun Aug 14 18:26:21 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D025416A41F for ; Sun, 14 Aug 2005 18:26:21 +0000 (GMT) (envelope-from Chris@LainOS.org) Received: from mail.neovanglist.net (blackacid.neovanglist.net [69.16.150.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7587643D45 for ; Sun, 14 Aug 2005 18:26:21 +0000 (GMT) (envelope-from Chris@LainOS.org) Received: from localhost (localhost.neovanglist.net [127.0.0.1]) by mail.neovanglist.net (Postfix) with ESMTP id E5D066D44A for ; Sun, 14 Aug 2005 11:24:54 -0700 (MST) Received: from mail.neovanglist.net ([127.0.0.1]) by localhost (blackacid.neovanglist.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 70793-05 for ; Sun, 14 Aug 2005 11:24:52 -0700 (MST) Received: from melchior.neovanglist.net (cpe.atm2-0-1081027.0x50c4e512.bynxx14.customer.tele.dk [80.196.229.18]) (using SSLv3 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by mail.neovanglist.net (Postfix) with ESMTP id EFBD96D448 for ; Sun, 14 Aug 2005 11:24:51 -0700 (MST) From: Chris Gilbert To: freebsd-current@freebsd.org Date: Sun, 14 Aug 2005 20:16:17 +0200 User-Agent: KMail/1.8 References: <200508132321.37654.Chris@LainOS.org> In-Reply-To: <200508132321.37654.Chris@LainOS.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200508142016.17769.Chris@LainOS.org> X-Virus-Scanned: amavisd-new at neovanglist.net Subject: Re: Panic during install on Sparc64 - Only with large HDD X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Chris@LainOS.org List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2005 18:26:22 -0000 Also, it seems that setting hw.ata.ata_dma=0 (forcing it into PIO mode) fixes the issue. # sysctl -a hw.ata.ata_dma hw.ata.ata_dma: 0 # dd count=1 obs=1024 seek=93321656 if=/dev/urandom of=/dev/ad0g 1+0 records in 0+1 records out 512 bytes transferred in 0.001390 secs (368351 bytes/sec) Also, seems there is a bug summitted on this, and a posting to the freebsd-sparc64 mailing list. http://lists.freebsd.org/pipermail/freebsd-sparc64/2005-June/003212.html Will continue looking into the chipset docs and FreeBSD driver... but thought I should point this out. -- Thanks, Chris (Lance) Gilbert Ph: +45 33 73 29 31 (UTC +0100) On Saturday 13 August 2005 23:21, Chris Gilbert wrote: > Well, I've continued looking into this problem as I really _really_ want to > see it fixed for 6.0-RELEASE. > > I did some general device stress-testing to make sure that is was directly > triggerable and reproducible, and was not just an intermittent failure. > > I have successfully created, and installed FreeBSD on (without any errors): > > /dev/ad0a > /dev/ad0b > /dev/ad0c > /dev/ad0d > /dev/ad0e > /dev/ad0f > > Even though the newfs on it failed, creating the slice itself worked for my > large partition (/dev/ad0g). > > Therefore, I can dd data to it, but I can't write a UFS filesystem to it in > order to mount. > > I then went about writing data to this filesystem for long periods of time > to try and hit the problem: > > # time dd if=/dev/urandom of=/dev/ad0g > 143337401+0 records in > 143337401+0 records out > 73388749312 bytes transferred in 89392.318911 secs (820974 bytes/sec) > 614.444u 41826.640s 24:49:52.35 47.4% 244+1708k 0+0io 0pf+0w > > After this ran without a single error for about 20 hours, I stopped it and > started trying to hit the block that triggered the issue manually. > > After a few hours of "double and half(ing) " I finally managed to find the > block: > > # dd count=1 obs=1024 seek=93321655 if=/dev/urandom of=/dev/ad0g > 1+0 records in > 0+1 records out > 512 bytes transferred in 0.001470 secs (348278 bytes/sec) > > This one was successful... but the very next one: > > # dd count=1 obs=1024 seek=93321656 if=/dev/urandom of=/dev/ad0g > ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435456 > ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435456 > ad0: FAILURE - WRITE_DMA timed out LBA=268435456 > dd: /dev/ad0g: Input/output error > 1+0 records in > 0+0 records out > 0 bytes transferred in 16.453833 secs (0 bytes/sec) > > And incrementing this by one block shows: > > # dd count=1 obs=1024 seek=93321657 if=/dev/urandom of=/dev/ad0g > ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435458 > ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435458 > ad0: FAILURE - WRITE_DMA timed out LBA=268435458 > dd: /dev/ad0g: Input/output error > 1+0 records in > 0+0 records out > 0 bytes transferred in 16.452722 secs (0 bytes/sec) > > This makes perfect sense because my block size is specified at 1024 on the > dd command, and the default blocksize is 512. Therefore, incrementing it by > a single 1024 size block would return 2 blocks further in the LBA. > > ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435456 > (then...) > ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435458 > > Bingo! We've finally found the wall! > > I'm going to look further into the IDE chipset (atapci0: UDMA66 controller>) tonight. Both for it's whitepapers (To see if it has > some sort of quirk or limitation around this area.) and it's FreeBSD > driver, to see if something funky is going on. > > As I said before, if anyone is interesting in helping me resolve this I > would appreciate it greatly. This is a bug which has haunted me and several > others since FreeBSD 5.2-RC2 and it needs to be fixed.